Manage prompts like code: our workflow

Prompts aren't 'just strings'

A prompt decides how your agent sounds, what it's allowed to do, when it escalates. That's business logic. Yet prompts often live copy-pasted in a Notion doc or hardcoded in the admin UI.

Our setup:

1. Prompts in Git

Each agent has its own directory with system.md, examples.json, tools.yaml. Changes go through pull request, with review.

2. Prompt linting

We automatically check: length, variable coherence (all {customer_name} are in the schema), no API keys, no PII examples.

3. Prompt tests

For each prompt we have 20–50 test conversations. They run on every PR and compare against gold-standard answers. LLM-as-judge rates tone + correctness.

4. Canary deployment

New prompt versions roll out to 10% of traffic first. Quality score is measured live. If it drops by >5%, automatic rollback.

5. Built-in A/B testing

We can run two prompt variants in parallel and compare conversion metrics (lead quality, escalation rate, NPS). Winner stays.

Tools

We use PromptLayer, but our own setup with Git + GitHub Actions works for most cases. If you want to build this yourself, ping us — we'll share the template.