← Back to glossaryarchitecture
RAG
RAG (Retrieval-Augmented Generation) is a technique where the language model retrieves relevant documents from your knowledge base before each answer and uses them as the basis for its response — so the agent stays current and answers with sources.
Also known as: Retrieval-Augmented Generation
In detail
Language models don't know anything about your company, products, or contracts — they were trained on public data. RAG closes that gap:
- We index your content (PDFs, help center, Notion, FAQs) as embeddings in a vector database.
- For every question, we retrieve the most similar document chunks.
- They're sent together with the question to the LLM, which answers grounded in those specific sources.
Versus fine-tuning: more current, cheaper, transparent (with source attribution), no training on customer content needed.
Example
Question: 'How long is the warranty on model X-200?' — RAG finds the matching section in the warranty PDF, the LLM writes the answer with a link back to the original document.
Related terms
- EmbeddingAn embedding is a numeric representation of text (or an image) in a high-dimensional space where similar content sits close together — the foundation of semantic search and RAG.
- Multi-agentMulti-agent refers to an architecture where several specialized AI agents collaborate — a router decides who takes over, and they hand off tasks cleanly.
- Tool useTool use (or function calling) is a language model's ability to call external functions with structured arguments — e.g. an API, a database query, or a skill.