architecture

RAG

RAG (Retrieval-Augmented Generation) is a technique where the language model retrieves relevant documents from your knowledge base before each answer and uses them as the basis for its response — so the agent stays current and answers with sources.

Also known as: Retrieval-Augmented Generation

In detail

Language models don't know anything about your company, products, or contracts — they were trained on public data. RAG closes that gap:

We index your content (PDFs, help center, Notion, FAQs) as embeddings in a vector database.
For every question, we retrieve the most similar document chunks.
They're sent together with the question to the LLM, which answers grounded in those specific sources.

Versus fine-tuning: more current, cheaper, transparent (with source attribution), no training on customer content needed.

Example

Question: 'How long is the warranty on model X-200?' — RAG finds the matching section in the warranty PDF, the LLM writes the answer with a link back to the original document.

In detail

Related terms