Zum Inhalt springen
Agent Hub

What is RAG?

RAG: how AI agents answer using your data — without hallucinating it.

Retrieval-Augmented Generation in 8 minutes: from concept to flow to the pitfalls we've learned in two years of practice.

What RAG is

<p id="definition"><strong>Retrieval-Augmented Generation</strong> (RAG) is the default technique for getting language models to answer based on your own content. Instead of <em>retraining</em> the model (fine-tuning), we <em>give</em> it the relevant document chunks at every request.</p><p>The acronym breaks down as:</p><ul><li><strong>Retrieval</strong> — we fetch the most similar chunks from your knowledge base (FAQs, PDFs, Notion, Confluence, databases …).</li><li><strong>Augmented</strong> — those are passed to the LLM as context.</li><li><strong>Generation</strong> — the model writes the answer grounded in real sources.</li></ul><p>Result: current, precise, with source attribution — and no need for the model to guess or hallucinate. More background in the <a href="/glossary/rag">RAG glossary entry</a>.</p>
01features
why

Why not just fine-tune?

Three hard advantages RAG has over retraining the model.

  • Current

    You change a FAQ — the bot knows 30 seconds later. Fine-tuning would need a new training run.

    • Live index
  • Cheaper

    Re-indexing costs cents per document. Fine-tuning costs hundreds depending on model and data — per iteration.

    • 100× cheaper
  • Transparent

    Every answer comes with a source ID. You see which paragraph was quoted. With fine-tuning the model itself is the black box.

    • Source enforcement
02how-it-works
flow

How a RAG system works, step by step.

  1. Step 01

    Index

    Your content is split into chunks. Each chunk gets an embedding — a vector representing its semantic meaning. Stored in a vector DB (we use pgvector).

  2. Step 02

    Retrieve

    On a question, the query itself becomes an embedding. We fetch the k most similar chunks (5–10) by cosine similarity. Optional: hybrid search with BM25 keyword boost.

  3. Step 03

    Generate

    Question + top chunks go to the LLM. It answers grounded in those sources — with source IDs for each statement.

03faq
quality

What separates great RAG from mediocre.

Common pitfalls (and how we avoid them)

<p id="pitfalls">We had to learn these the hard way:</p><ul><li><strong>Embeddings without an update pipeline</strong>. Source changes, embedding stays stale → half-knowledge. Fix: webhook on every CMS change, automatic re-indexing in &lt;30 seconds.</li><li><strong>Chunks too small</strong>. 100-token chunks lose context; the model can't answer coherently. We use 500–800 tokens as default.</li><li><strong>No source enforcement</strong>. Model writes something not in the chunks — hallucination. We reject answers without a source ID and retry with a stricter prompt.</li><li><strong>No eval suite</strong>. Without weekly runs against 200+ real customer questions, we only notice regressions when the customer complains. Today: every deploy needs a green eval, otherwise rollback.</li></ul>

Want to see RAG running on your use case?

30-minute demo with your real content — free, no commitment.