What is RAG?
RAG: how AI agents answer using your data — without hallucinating it.
Retrieval-Augmented Generation in 8 minutes: from concept to flow to the pitfalls we've learned in two years of practice.
What RAG is
Why not just fine-tune?
Three hard advantages RAG has over retraining the model.
Current
You change a FAQ — the bot knows 30 seconds later. Fine-tuning would need a new training run.
- Live index
Cheaper
Re-indexing costs cents per document. Fine-tuning costs hundreds depending on model and data — per iteration.
- 100× cheaper
Transparent
Every answer comes with a source ID. You see which paragraph was quoted. With fine-tuning the model itself is the black box.
- Source enforcement
How a RAG system works, step by step.
- Step 01
Index
Your content is split into chunks. Each chunk gets an embedding — a vector representing its semantic meaning. Stored in a vector DB (we use pgvector).
- Step 02
Retrieve
On a question, the query itself becomes an embedding. We fetch the k most similar chunks (5–10) by cosine similarity. Optional: hybrid search with BM25 keyword boost.
- Step 03
Generate
Question + top chunks go to the LLM. It answers grounded in those sources — with source IDs for each statement.
What separates great RAG from mediocre.
Common pitfalls (and how we avoid them)
Want to see RAG running on your use case?
30-minute demo with your real content — free, no commitment.