RAG & Knowledge Systems¶

Retrieval-augmented generation (RAG) solves a fundamental problem with LLMs: their knowledge is frozen at training time, and their context windows — however large — cannot hold an entire enterprise knowledge base. RAG bridges that gap by fetching relevant information at inference time and grounding the model's response in retrieved facts.

This section covers everything from the basics of how RAG pipelines work, through embedding model selection, chunking strategy trade-offs, vector database operations, graph-enhanced retrieval, and how to measure whether any of it is actually working.

When to Use RAG¶

Not every use case needs RAG. The diagram below maps common scenarios to the right approach.

flowchart TD
    A["Start: What does your use case need?"]
    A --> B{"Static knowledge,\nno private data?"}
    B -- Yes --> C["Prompt Only\nSimple Q&A, general tasks"]
    B -- No --> D{"Private or\nfrequently updated data?"}
    D -- Yes --> E{"Relationships\nbetween entities matter?"}
    E -- No --> F["RAG\nDocument retrieval, knowledge bases,\ncustomer support, internal search"]
    E -- Yes --> G["GraphRAG\nOrg charts, legal networks,\ntechnical dependency graphs"]
    D -- No --> H{"Need domain-specific\nstyle or behavior?"}
    H -- Yes --> I["Fine-tuning\nSpecialized vocabulary, tone,\nor task format"]
    H -- No --> C
    F --> J{"High accuracy critical\nor both needed?"}
    J -- Yes --> K["Hybrid: RAG + Fine-tuning\nDomain model with live retrieval"]
    J -- No --> F

    style C fill:#0284c7,color:#fff
    style F fill:#0d9488,color:#fff
    style G fill:#16a34a,color:#fff
    style I fill:#d97706,color:#fff
    style K fill:#14b8a6,color:#fff

Prerequisites

This section assumes you are comfortable with:

Large language model basics (tokens, context windows, prompt structure)
Python or a similar scripting language for code examples
Basic familiarity with REST APIs and JSON

If you are new to LLMs, start with Getting Started first.

What's in This Section¶

RAG Fundamentals

The evolution from Naive RAG through Advanced, Modular, and Agentic RAG. Covers failure modes and the techniques that address them.

Difficulty: Intermediate

RAG Fundamentals

Embeddings

How embedding models convert text to vectors, a comparison of OpenAI, Cohere, and open-source options, and production considerations including storage cost and batching.

Difficulty: Intermediate

Embeddings

Chunking Strategies

Eight chunking approaches from fixed-size to agentic, with trade-off tables, a decision flowchart, and chunk size guidelines by use case.

Difficulty: Intermediate

Chunking Strategies

Vector Databases

How vector indexes work (HNSW, IVF), a feature comparison of Pinecone, Weaviate, Qdrant, Azure AI Search, and pgvector, and guidance on hybrid search.

Difficulty: Intermediate

Vector Databases

GraphRAG

Knowledge graph construction, entity and relationship extraction, community summaries, and when graph-based retrieval outperforms dense vector search.

Difficulty: Advanced

GraphRAG

RAG Evaluation

Metrics that matter — faithfulness, answer relevance, context recall — and how to use RAGAS, TruLens, and DeepEval in a continuous evaluation pipeline.

Difficulty: Intermediate

RAG Evaluation

Next Steps¶

RAG Fundamentals — start here if you are new to RAG pipelines
Embeddings — understand the foundation of all vector-based retrieval
Chunking Strategies — chunking decisions have an outsized impact on retrieval quality