<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>devkuma – RAG</title>
    <link>https://www.devkuma.com/en/tags/rag/</link>
    <image>
      <url>https://www.devkuma.com/en/tags/rag/logo/180x180.jpg</url>
      <title>RAG</title>
      <link>https://www.devkuma.com/en/tags/rag/</link>
    </image>
    <description>Recent content in RAG on devkuma</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <managingEditor>kc@example.com (kc kim)</managingEditor>
    <webMaster>kc@example.com (kc kim)</webMaster>
    <copyright>The devkuma</copyright>
    
	  <atom:link href="https://www.devkuma.com/en/tags/rag/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>RAG (Retrieval-Augmented Generation)</title>
      <link>https://www.devkuma.com/en/docs/ai/rag/</link>
      <pubDate>Sat, 30 Aug 2025 13:09:00 +0900</pubDate>
      <author>kc@example.com (kc kim)</author>
      <guid>https://www.devkuma.com/en/docs/ai/rag/</guid>
      <description>
        
        
        &lt;h2 id=&#34;rag-retrieval-augmented-generation-concept&#34;&gt;RAG (Retrieval-Augmented Generation) Concept&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RAG = Retrieval + Generation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;An LLM (large language model) does not generate answers only from its internal knowledge. Instead, it retrieves relevant information from external databases such as documents, vector DBs, wikis, and company materials, then generates an answer based on those results.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, it is not simply using &amp;ldquo;what the model knows,&amp;rdquo; but is like a smart assistant that &amp;ldquo;looks things up externally when needed and then answers.&amp;rdquo;&lt;/p&gt;
&lt;h2 id=&#34;why-is-it-needed&#34;&gt;Why Is It Needed?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Overcoming the knowledge limits of LLMs
&lt;ul&gt;
&lt;li&gt;LLMs do not know the latest information after the point when they were trained.&lt;/li&gt;
&lt;li&gt;For example, models like GPT do not know the latest information after their training point.&lt;/li&gt;
&lt;li&gt;With RAG, materials retrieved from a DB or the web can be used.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Reducing hallucinations
&lt;ul&gt;
&lt;li&gt;LLMs sometimes make up things they do not know.&lt;/li&gt;
&lt;li&gt;Using external evidence can increase the reliability of answers.&lt;/li&gt;
&lt;li&gt;Instead of unsupported answers, responses can be based on actual documents or databases.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Using customized knowledge
&lt;ul&gt;
&lt;li&gt;LLMs can use &lt;strong&gt;dedicated data&lt;/strong&gt; such as internal company documents, reports, customer FAQs, papers, and codebases.&lt;/li&gt;
&lt;li&gt;Internal confidential documents can be used without training the model on them.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;how-rag-works&#34;&gt;How RAG Works&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Query input
&lt;ul&gt;
&lt;li&gt;The user enters a question.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Retrieval stage
&lt;ul&gt;
&lt;li&gt;The question is vectorized as an embedding, then related documents are retrieved from a vector database.&lt;/li&gt;
&lt;li&gt;Representative DBs: Pinecone, Weaviate, Milvus, FAISS, and others.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Generation stage
&lt;ul&gt;
&lt;li&gt;The LLM generates an answer by referring to the retrieved documents and delivers it together.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&#34;https://www.devkuma.com/docs/ai/rag.png&#34; alt=&#34;RAG&#34;&gt;&lt;/p&gt;
&lt;p&gt;In short, it has a &lt;strong&gt;&amp;ldquo;find -&amp;gt; refer -&amp;gt; answer&amp;rdquo;&lt;/strong&gt; structure.&lt;/p&gt;
&lt;h2 id=&#34;example&#34;&gt;Example&lt;/h2&gt;
&lt;p&gt;Suppose a question comes in: &amp;ldquo;What was our company&amp;rsquo;s revenue in 2023?&amp;rdquo;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;LLM alone: &amp;ldquo;Revenue in 2023 was 100 million dollars.&amp;rdquo; (No evidence, may be wrong)&lt;/li&gt;
&lt;li&gt;Using RAG: Search internal financial reports -&amp;gt; retrieve related data -&amp;gt; &amp;ldquo;Our company&amp;rsquo;s revenue in 2023 was 920 billion KRW, an 8% increase from the previous year.&amp;rdquo; (Evidence-based answer)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;understanding-through-an-analogy&#34;&gt;Understanding Through an Analogy&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;LLM alone&lt;/strong&gt;: A person with a good memory, but they may not know the latest information.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Using RAG&lt;/strong&gt;: A person with a good memory answers while referring to a &lt;strong&gt;dictionary or search engine&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;comparing-rag-and-fine-tuning&#34;&gt;Comparing RAG and Fine-Tuning&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Fine-tuning: Further trains the model itself, &amp;ldquo;internalizing&amp;rdquo; new knowledge&lt;/li&gt;
&lt;li&gt;RAG: Leaves the model as is and retrieves external materials for use&lt;/li&gt;
&lt;/ul&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th&gt;Method&lt;/th&gt;
          &lt;th&gt;Advantages&lt;/th&gt;
          &lt;th&gt;Disadvantages&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td&gt;Fine-tuning&lt;/td&gt;
          &lt;td&gt;Fast and natural responses&lt;/td&gt;
          &lt;td&gt;Retraining is required whenever data is updated&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td&gt;RAG&lt;/td&gt;
          &lt;td&gt;Can always reflect up-to-date and customized information; quick to build&lt;/td&gt;
          &lt;td&gt;Answer quality depends on retrieval quality&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;In practice, RAG is often combined with some fine-tuning when needed.&lt;/p&gt;
&lt;h2 id=&#34;technology-stack-used-to-implement-rag&#34;&gt;Technology Stack Used to Implement RAG&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;Embedding models: OpenAI Embeddings, Sentence-BERT, and others&lt;/li&gt;
&lt;li&gt;Vector DBs: Pinecone, Weaviate, Milvus, FAISS&lt;/li&gt;
&lt;li&gt;LLMs: GPT, Claude, LLaMA, Gemini, and others&lt;/li&gt;
&lt;li&gt;Frameworks: LangChain, LlamaIndex, Haystack&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;RAG is a method where an LLM uses a retrieval system together with the model to generate answers that are reliable and reflect up-to-date information.&lt;/li&gt;
&lt;li&gt;In other words, it is a core technology for expanding knowledge and strengthening reliability.&lt;/li&gt;
&lt;/ul&gt;

      </description>
      
      <category>AI</category>
      
      <category>RAG</category>
      
    </item>
    
  </channel>
</rss>
