Dictionary
RAG
Retrieval-Augmented Generation (RAG) is a technique for improving the accuracy and relevance of responses from large language models by supplying external context at query time. A standard LLM generates answers based solely on patterns learned during training, which means it has no knowledge of events after its training cutoff and cannot access proprietary or real-time information. RAG addresses this by retrieving relevant documents from an external knowledge base before generating a response.
The workflow has two main stages. First, a retrieval step searches a document store -- typically a vector database -- for content semantically similar to the incoming query. The retrieved text is then combined with the user query and passed to the language model as context. The model generates its response conditioned on both the original question and the retrieved material, which grounds the output in real, checkable sources rather than learned associations.
RAG is now a standard pattern in enterprise AI applications. Common use cases include internal knowledge bases where employees can query company documents, customer support systems that retrieve product manuals, legal research tools that search case law, and any application where accuracy and source attribution matter more than creative generation.
The quality of a RAG system depends heavily on the retrieval step. If the wrong chunks are retrieved, the model will either produce incorrect answers or say it cannot help. Chunking strategy, embedding model quality, and relevance ranking all significantly affect output quality. Building a reliable RAG pipeline requires careful evaluation and iteration, not just wiring components together.