Manage prompts like code: our workflow
Versioning, reviews, tests — prompts are critical business logic.
Prompts aren't 'just strings'
A prompt decides how your agent sounds, what it's allowed to do, when it escalates. That's business logic. Yet prompts often live copy-pasted in a Notion doc or hardcoded in the admin UI.
Our setup:
1. Prompts in Git
Each agent has its own directory with system.md, examples.json, tools.yaml. Changes go through pull request, with review.
2. Prompt linting
We automatically check: length, variable coherence (all {customer_name} are in the schema), no API keys, no PII examples.
3. Prompt tests
For each prompt we have 20–50 test conversations. They run on every PR and compare against gold-standard answers. LLM-as-judge rates tone + correctness.
4. Canary deployment
New prompt versions roll out to 10% of traffic first. Quality score is measured live. If it drops by >5%, automatic rollback.
5. Built-in A/B testing
We can run two prompt variants in parallel and compare conversion metrics (lead quality, escalation rate, NPS). Winner stays.
Tools
We use PromptLayer, but our own setup with Git + GitHub Actions works for most cases. If you want to build this yourself, ping us — we'll share the template.