How retrieval-grounded systems answer from evidence, and how to tell when that architecture is actually the right fit.
The basic problem is access. A model can answer from its training data and from whatever you put in the prompt, but it cannot answer from your policies, PDFs, notes, or records unless the system has some way to reach those materials. Demos that look impressive on a model's training data are much less useful on your own files.
RAG (Retrieval-Augmented Generation) is one common way to provide that access. The system looks up relevant material from a bounded collection and answers with that evidence in view. When people say a system is grounded, they mean its answers are tied to retrievable sources rather than generated from the model's training data alone.
For this series, the practical question is whether the work really calls for a retrieval layer at all. A bounded collection, repeated access, and a need to tie answers back to inspectable documents can make the case quickly. But people often assume RAG requires specialized infrastructure before checking whether a simpler approach would do.
The main alternative to retrieval is retraining or fine-tuning a model (adjusting its internal weights by running it through new data), which is expensive, slow, and out of reach for most teams without dedicated ML infrastructure. Retrieval avoids that. When your policies change, you update the document collection. You do not have to retrain the model. The system can also point to specific sources, which means users can verify what it found. And because retrieval and generation are separate steps, you can swap one collection for another without changing the model.
Retrieval can work two ways. The standard pattern pre-indexes a collection and searches it at query time. The alternative, often more practical in agentic work, lets the model use tools to look things up on demand.
The retrieval step searches your documents for relevant passages, then the generation step produces an answer that draws on those passages.
Most RAG systems follow the same basic sequence, even if the software stack and naming conventions differ:
You do not need to know the internals. You need to know where retrieval can fail. If the system fetches the wrong passages, the answer can still sound composed even though it is built on the wrong evidence. For anyone with catalog or database experience, the tradeoff will feel familiar. Keyword search is good for exact terms. Vector search is good for concept-level matching. Most practical systems combine the two.
This is the most familiar approach. The system looks for exact matches to the terms in your query. A search for "dog" will find documents containing "dog" but may miss documents that only say "puppy" or "canine." Keyword search is fast and predictable, and the system's behavior is easy to understand. It is bad at recall. If you and the document collection use different words for the same idea, the relevant passages do not show up in the results.
Vector search tries to match meaning across differently worded passages. The core idea is that text gets converted into a vector (also called an embedding), a long list of numbers that functions like coordinates in a high-dimensional space. If two passages mean similar things, their vectors are close together in that space, even when the wording is different. (If you have worked with subject headings, you can think of it roughly as automated concept collocation, with the same benefits and some of the same hazards.) A machine learning model generates these vectors, and they are stored in a specialized vector database optimized for comparing them. When you search, your query is also converted to a vector, and the system finds passages whose coordinates are closest to yours.
The result is that a search for "dog" might also retrieve text about "puppies" or "canines," since the system is matching by meaning as well as wording. Vector search has its own problems. It can return passages that are thematically related without answering the question. And the reason a particular passage was retrieved can be harder to inspect than a keyword match. Many systems combine the two approaches, using keyword matching for precision and vector similarity for broader recall.
A library connects an AI to its policy documents, procedures, and FAQs.
The answer is tied to the library's actual policy document, so a staff member can check whether it reflects what was adopted in January.
A researcher connects an AI to their collection of PDFs: journal articles, reports, notes.
Every citation traces back to a document in the researcher's own collection. A general model response would give you nothing to verify against.
Everything above assumes a pre-indexed collection, which means upfront work: splitting documents, generating embeddings, choosing and configuring a vector database. For a small, stable collection this setup can take an afternoon. Larger or more complex collections can require dedicated engineering time, ongoing maintenance, and nontrivial API costs for embedding. That investment is sensible when the same collection will be searched repeatedly and the documents do not change often.
Systems can also use tools to find information on demand, without processing documents in advance. The grounding goal is the same. The system looks things up live instead of using a pre-built index. This works better when the sources are changing or are spread across different systems.
When Claude Code helped research this guide, it used live lookup with no vector database in the middle. It:
This is tool-based retrieval, sometimes called "agentic RAG." There is no indexing pipeline and no vector database. The model decides what to look up next.
Traditional RAG works well for stable collections you want to search repeatedly: policies, manuals, archives. The upfront setup (indexing pipeline, embedding model, vector database) involves work, but once the index exists the ongoing cost per query is low. Tool-based retrieval tends to fit better when sources are changing, mixed, or distributed across systems, since it skips the indexing pipeline entirely.
RAG fits when your answer depends on a specific collection and readers need to see what the answer was based on. A library with stable policy documents and a researcher working from a personal corpus of PDFs share two conditions. The evidence is bounded, and readers need to trace answers back to inspectable sources. Without those conditions, the retrieval layer's overhead is hard to justify.
If the relevant material fits in a prompt, or if the task does not involve retrieval at all (writing, brainstorming, coding), you are adding infrastructure you do not need. For live or rapidly changing information, tool-based retrieval works better, since it can reach current sources without waiting on a re-indexing cycle.
Even with retrieval, the model may misread retrieved material or fill gaps with unsupported claims. If the underlying documents disagree, it may blend them into something misleading. Retrieval quality matters. If the system retrieves the wrong passages, the answer is confidently built on the wrong evidence. The state of the collection (how it is organized, how current it is, how internally consistent) determines what the system can find.