Introducing the Graph RAG Project and GraphRetriever: Layering Connected Knowledge onto Your RAG Stack

Retrieval-augmented generation (RAG) has become one of the go-to techniques for building better AI applications with large language models (LLMs). By pulling information from external sources—such as your organization’s document store—at query time, RAG enables AI systems to discover information and answer questions that extend beyond the LLM’s training data.

But if you've worked with RAG pipelines, you know that basic retrieval has its limits. Vector search can match documents based on semantic similarity, but it often retrieves isolated chunks of information without any understanding of how those chunks connect to one another.

That’s where graph RAG comes in. This post shows how the Graph RAG Project and its GraphRetriever make it easy to connect documents and knowledge in an intuitive and lightweight way, expanding the capabilities of RAG systems without creating much additional complexity.

What is graph RAG?

The concept is simple: combine RAG techniques with graph-structured knowledge to help LLMs retrieve connected, meaningful information — not just isolated text chunks. Instead of relying purely on document similarity, graph-based retrieval enables AI systems to traverse relationships: between topics, entities, events, or ideas.

Many graph RAG implementations have existed for some time, with varying levels of complexity and usability. While promising, many of these early graph RAG approaches were either hard to implement (needing heavy graph engineering) or too narrow (focused on specific datasets or knowledge types).

There wasn’t an easy, open-source way to apply graph-powered retrieval to general-purpose RAG pipelines—until now.

The Graph RAG Project and the GraphRetriever provide a simple way to make connections between documents in your vector store—all at query time, without the added complexity of building and storing a traditional knowledge graph.

For reference, the project builds upon our own previous iterations on graph RAG, primarily the GraphVectorStore in LangChain—we now advise developers to use the GraphRetriever instead, as a cleaner and more extensible implementation.

In addition, we recently published an article (with code!) showing how pairing graph RAG with the Unstructured platform makes it extra simple to transform unstructured documents into structured, graph-ready data that works seamlessly with the GraphRetriever.

GraphRetriever doesn’t need a stored graph or graph DB

A key feature of GraphRetriever is that it traverses connected documents using metadata alone. It does not require a pre-existing knowledge graph or a separate graph database.

Instead, GraphRetriever builds an in-memory, relevant subgraph at query time, based on simple rules that define how documents are related through their metadata fields. This keeps your architecture lightweight and flexible—you can start adding graph-style retrieval to your RAG system without having to pre-build a heavy, manually curated knowledge graph.

If your documents already have metadata like authors, topics, product categories, or hyperlinks, you can immediately take advantage of GraphRetriever's traversal capabilities.

Defining edges for traversal

Edges specify how content should be linked together for traversal. With the GraphRetriever, edges are defined based on the metadata fields in your documents. For example, a vector store containing movie reviews might include metadata like:

Movie name and unique ID
Reviewer name
Date of review
Movie rating

GraphRetriever lets you declare simple rules for which metadata fields create edges. For instance, two reviews written by the same reviewer or that review the same movie can be considered "neighbors" in the graph. You can define multiple types of edges depending on your use case—allowing for flexible, domain-specific ways of connecting documents.

For example, a demo notebook in the Graph RAG Project uses a dataset of movie reviews from Rotten Tomatoes. This dataset contains movie reviews as well as information about the movies that were reviewed. Our main goal in this simple demo is to be able to search movie reviews for certain types of comments—such as “What is a good family movie?”---and then immediately connect the resulting reviews to the movies they are discussing. It’s a simple use case, but very illustrative for graph RAG: first, there’s a semantic search, and second, there is a direct and deterministic connection to be made.

The below code block is from the demo notebook. In it, we configure the GraphRetriever:

from graph_retriever.strategies import Eager
from langchain_graph_retriever import GraphRetriever

retriever = GraphRetriever(
    store=vectorstore,
    edges=[("reviewed_movie_id", "movie_id")],
    strategy=Eager(start_k=10, adjacent_k=10, select_k=100, max_depth=1),
)

See the Specifying Edges section of the project page for more information and examples.

Traversal and retrieval strategies

At a high level, GraphRetriever enhances traditional RAG by introducing two complementary retrieval steps:

Semantic search - First, the system uses vector embeddings to find documents that are semantically similar to the user’s query—just like standard RAG.
Graph traversal - Next, starting from the top semantic search results, GraphRetriever traverses the graph of connected documents based on the defined edges. It collects neighboring documents that are closely related through metadata-defined relationships.

The retrieved documents from both semantic search and graph traversal are combined to form a richer context window. This context is then passed to the LLM for answering the user's query.

By blending semantic similarity and graph-connected reasoning, GraphRetriever helps your AI system:

Surface deeper, more relevant knowledge
Reduce hallucination risk
Provide more explainable and well-grounded responses

All of this happens without needing to rebuild your data infrastructure—just by making better use of your existing vector store and metadata.

Graph traversal itself is also highly configurable, with various traversal strategies and parameters. Learn more about the available graph traversal strategies and configurations in the documentation.

Getting started with GraphRetriever

The GraphRetriever and the Graph RAG Project make it easier than ever to bring connected knowledge into your AI systems. For stronger grounding of responses, better reasoning over complex information, or more trustworthy linking and citations, it’s worth exploring how lightweight graph traversal can level-up your RAG pipelines.

You can learn more, see examples, and start building at the Graph RAG Project site.