Featured Example
Vercel AI SDK organizes 590 sections to guide AI through their entire platform
Build a powerful RAG Agent with Vercel AI SDK and Next.js. Unlock seamless chatbot integration and enhance your projects with advanced retrieval features!
35,264
Lines
+2396% vs avg
590
Sections
+2358% vs avg
742+
Companies
using llms.txt
1
Files
llms.txt
Key Insights
Comprehensive structure
With 590 distinct sections, this file provides thorough coverage for AI systems.
Comprehensive detail
35264 lines of thorough documentation for AI systems.
llms.txt Preview
First 100 lines of 35,264 total
---
title: RAG Agent
description: Learn how to build a RAG Agent with the AI SDK and Next.js
tags:
[
'rag',
'chatbot',
'next',
'embeddings',
'database',
'retrieval',
'memory',
'agent',
]
---
# RAG Agent Guide
In this guide, you will learn how to build a retrieval-augmented generation (RAG) agent.
<video
src="/images/rag-guide-demo.mp4"
autoplay
height={540}
width={910}
controls
playsinline
/>
Before we dive in, let's look at what RAG is, and why we would want to use it.
### What is RAG?
RAG stands for retrieval augmented generation. In simple terms, RAG is the process of providing a Large Language Model (LLM) with specific information relevant to the prompt.
### Why is RAG important?
While LLMs are powerful, the information they can reason on is restricted to the data they were trained on. This problem becomes apparent when asking an LLM for information outside of their training data, like proprietary data or common knowledge that has occurred after the model’s training cutoff. RAG solves this problem by fetching information relevant to the prompt and then passing that to the model as context.
To illustrate with a basic example, imagine asking the model for your favorite food:
```txt
**input**
What is my favorite food?
**generation**
I don't have access to personal information about individuals, including their
favorite foods.
```
Not surprisingly, the model doesn’t know. But imagine, alongside your prompt, the model received some extra context:
```txt
**input**
Respond to the user's prompt using only the provided context.
user prompt: 'What is my favorite food?'
context: user loves chicken nuggets
**generation**
Your favorite food is chicken nuggets!
```
Just like that, you have augmented the model’s generation by providing relevant information to the query. Assuming the model has the appropriate information, it is now highly likely to return an accurate response to the users query. But how does it retrieve the relevant information? The answer relies on a concept called embedding.
<Note>
You could fetch any context for your RAG application (eg. Google search).
Embeddings and Vector Databases are just a specific retrieval approach to
achieve semantic search.
</Note>
### Embedding
[Embeddings](/docs/ai-sdk-core/embeddings) are a way to represent words, phrases, or images as vectors in a high-dimensional space. In this space, similar words are close to each other, and the distance between words can be used to measure their similarity.
In practice, this means that if you embedded the words `cat` and `dog`, you would expect them to be plotted close to each other in vector space. The process of calculating the similarity between two vectors is called ‘cosine similarity’ where a value of 1 would indicate high similarity and a value of -1 would indicate high opposition.
<Note>
Don’t worry if this seems complicated. a high level understanding is all you
need to get started! For a more in-depth introduction to embeddings, check out
[this guide](https://jalammar.github.io/illustrated-word2vec/).
</Note>
As mentioned above, embeddings are a way to represent the semantic meaning of **words and phrases**. The implication here is that the larger the input to your embedding, the lower quality the embedding will be. So how would you approach embedding content longer than a simple phrase?
### Chunking
Chunking refers to the process of breaking down a particular source material into smaller pieces. There are many different approaches to chunking and it’s worth experimenting as the most effective approach can differ by use case. A simple and common approach to chunking (and what you will be using in this guide) is separating written content by sentences.
Once your source material is appropriately chunked, you can embed each one and then store the embedding and the chunk together in a database. Embeddings can be stored in any database that supports vectors. For this tutorial, you will be using [Postgres](https://www.postgresql.org/) alongside the [pgvector](https://github.com/pgvector/pgvector) plugin.
<MDXImage
srcLight="/images/rag-guide-1.png"
srcDark="/images/rag-guide-1-dark.png"
width={800}
height={800}
/>
### All Together Now
Combining all of this together, RAG is the process of enabling the model to respond with information outside of it’s training data by embedding a users query, retrieving the relevant source material (chunks) with the highest semantic similarity, and then passing them alongside the initial query as context. Going back to the example where you ask the model for your favorite food, the prompt preparation process would look like this.Vercel AI SDK is ready for AI search. Are you?
Join 742+ companies preparing their websites for the future of search. Create your llms.txt file in minutes.
Generate Your llms.txtDon't get left behind
Your competitors are preparing for AI search.
Vercel AI SDK has 590 organized sections ready for AI crawlers. Generate your llms.txt file and join the companies optimizing for the future of search.