RAG (Retrieval-Augmented Generation)

kc@example.com (kc kim) — Sat, 30 Aug 2025 13:09:00 +0900

RAG (Retrieval-Augmented Generation) Concept

RAG = Retrieval + Generation
An LLM (large language model) does not generate answers only from its internal knowledge. Instead, it retrieves relevant information from external databases such as documents, vector DBs, wikis, and company materials, then generates an answer based on those results.

In other words, it is not simply using “what the model knows,” but is like a smart assistant that “looks things up externally when needed and then answers.”

Why Is It Needed?

Overcoming the knowledge limits of LLMs
- LLMs do not know the latest information after the point when they were trained.
- For example, models like GPT do not know the latest information after their training point.
- With RAG, materials retrieved from a DB or the web can be used.
Reducing hallucinations
- LLMs sometimes make up things they do not know.
- Using external evidence can increase the reliability of answers.
- Instead of unsupported answers, responses can be based on actual documents or databases.
Using customized knowledge
- LLMs can use dedicated data such as internal company documents, reports, customer FAQs, papers, and codebases.
- Internal confidential documents can be used without training the model on them.

How RAG Works

Query input
- The user enters a question.
Retrieval stage
- The question is vectorized as an embedding, then related documents are retrieved from a vector database.
- Representative DBs: Pinecone, Weaviate, Milvus, FAISS, and others.
Generation stage
- The LLM generates an answer by referring to the retrieved documents and delivers it together.

In short, it has a “find -> refer -> answer” structure.

Example

Suppose a question comes in: “What was our company’s revenue in 2023?”

LLM alone: “Revenue in 2023 was 100 million dollars.” (No evidence, may be wrong)
Using RAG: Search internal financial reports -> retrieve related data -> “Our company’s revenue in 2023 was 920 billion KRW, an 8% increase from the previous year.” (Evidence-based answer)

Understanding Through an Analogy

LLM alone: A person with a good memory, but they may not know the latest information.
Using RAG: A person with a good memory answers while referring to a dictionary or search engine.

Comparing RAG and Fine-Tuning

Fine-tuning: Further trains the model itself, “internalizing” new knowledge
RAG: Leaves the model as is and retrieves external materials for use

Method	Advantages	Disadvantages
Fine-tuning	Fast and natural responses	Retraining is required whenever data is updated
RAG	Can always reflect up-to-date and customized information; quick to build	Answer quality depends on retrieval quality

In practice, RAG is often combined with some fine-tuning when needed.

Technology Stack Used to Implement RAG

Embedding models: OpenAI Embeddings, Sentence-BERT, and others
Vector DBs: Pinecone, Weaviate, Milvus, FAISS
LLMs: GPT, Claude, LLaMA, Gemini, and others
Frameworks: LangChain, LlamaIndex, Haystack

Summary

RAG is a method where an LLM uses a retrieval system together with the model to generate answers that are reliable and reflect up-to-date information.
In other words, it is a core technology for expanding knowledge and strengthening reliability.

devkuma – RAG