<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>devkuma – Robots</title>
    <link>https://www.devkuma.com/en/tags/robots/</link>
    <image>
      <url>https://www.devkuma.com/en/tags/robots/logo/180x180.jpg</url>
      <title>Robots</title>
      <link>https://www.devkuma.com/en/tags/robots/</link>
    </image>
    <description>Recent content in Robots on devkuma</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en</language>
    <managingEditor>kc@example.com (kc kim)</managingEditor>
    <webMaster>kc@example.com (kc kim)</webMaster>
    <copyright>The devkuma</copyright>
    
	  <atom:link href="https://www.devkuma.com/en/tags/robots/index.xml" rel="self" type="application/rss+xml" />
    
    
      
        
      
    
    
    <item>
      <title>robots.txt</title>
      <link>https://www.devkuma.com/en/docs/robots/</link>
      <pubDate>Sat, 17 Apr 2021 08:32:00 +0900</pubDate>
      <author>kc@example.com (kc kim)</author>
      <guid>https://www.devkuma.com/en/docs/robots/</guid>
      <description>
        
        
        &lt;h2 id=&#34;how-search-engines-originally-work&#34;&gt;How Search Engines Originally Work&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;It is the Robots Exclusion Protocol.&lt;/li&gt;
&lt;li&gt;A robot called a crawler travels around the Internet and collects site information.&lt;/li&gt;
&lt;li&gt;The indexer analyzes the information collected by the crawler.&lt;/li&gt;
&lt;li&gt;Based on the analyzed data, each search engine returns search results according to its algorithm.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;what-is-robotstxt&#34;&gt;What Is robots.txt?&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;robots.txt&lt;/code&gt; is a text file that tells crawlers which pages to crawl or not crawl.&lt;/li&gt;
&lt;li&gt;It is published in the top-level directory of the domain.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;robots.txt&lt;/code&gt; is still a recommendation, so there is no absolute obligation to follow it.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;robotstxt-format&#34;&gt;robots.txt Format&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;User-agent: search bot name&lt;/li&gt;
&lt;li&gt;Allow: access permission setting, available only for Googlebot&lt;/li&gt;
&lt;li&gt;Disallow: access blocking setting&lt;/li&gt;
&lt;li&gt;Crawl-delay: delay before the next visit, in seconds&lt;/li&gt;
&lt;li&gt;Sitemap: sitemap specification&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;robotstxt-examples&#34;&gt;robots.txt Examples&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Allow all search bots to access all documents&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User-agent: *
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Allow: /
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;code&gt;*&lt;/code&gt; means all robots, and &lt;code&gt;/&lt;/code&gt; means all directories.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Block all search bots from all documents&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User-agent: *
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Disallow: /
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;&lt;strong&gt;Allow access to a specific directory&lt;/strong&gt;&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;User-agent: Googlebot
Allow: /foo/bar/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;Block access to a specific directory&lt;/strong&gt;&lt;/p&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;User-agent: Googlebot
Disallow: /foo/bar/
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;Allow only Googlebot and block all others&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;background-color:#f8f8f8;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;&#34;&gt;&lt;code class=&#34;language-text&#34; data-lang=&#34;text&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User-agent: Googlebot
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Allow: /
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;User-agent: *
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;Disallow: /
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;expose-only-part-of-a-homepage-directory-to-search-engines&#34;&gt;Expose Only Part of a Homepage Directory to Search Engines&lt;/h3&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt; User-agent: * 
 Disallow: /conection/ 
 Disallow: /my_conection/ 
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id=&#34;block-part-of-a-homepage-directory-from-search-engines&#34;&gt;Block Part of a Homepage Directory from Search Engines&lt;/h3&gt;
&lt;pre tabindex=&#34;0&#34;&gt;&lt;code&gt;User-agent: *
Disallow: /my_page/
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id=&#34;site-load-and-performance-perspective&#34;&gt;Site Load and Performance Perspective&lt;/h2&gt;
&lt;p&gt;If crawler visits increase site load, unimportant large amounts of content can be removed from crawler traversal using &lt;code&gt;robots.txt&lt;/code&gt;, reducing site load and improving crawl efficiency for important content.&lt;/p&gt;
&lt;p&gt;Separating important content from unimportant content is also beneficial for SEO and site load.&lt;/p&gt;
&lt;p&gt;Unimportant content may include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pages that do not need to be indexed by search engines&lt;/li&gt;
&lt;li&gt;Low-value content pages&lt;/li&gt;
&lt;li&gt;Multiple pages with identical content&lt;/li&gt;
&lt;li&gt;Landing pages for ads placed on the site&lt;/li&gt;
&lt;li&gt;Pages you want to make available only to limited people&lt;/li&gt;
&lt;li&gt;Management system files&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;security-perspective&#34;&gt;Security Perspective&lt;/h2&gt;
&lt;p&gt;If &amp;ldquo;pages to crawl&amp;rdquo; or &amp;ldquo;pages not to crawl&amp;rdquo; are set in &lt;code&gt;robots.txt&lt;/code&gt;, content intended only for limited people may become visible.&lt;/p&gt;
&lt;p&gt;If management system files or pages intended for limited disclosure are set in &lt;code&gt;robots.txt&lt;/code&gt;, they may not appear in search engine results, but they are still publicly exposed through &lt;code&gt;robots.txt&lt;/code&gt;. In other words, files related to management or pages meant only for limited people can be revealed.&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;robots.txt&lt;/code&gt; can reduce the risk of appearing in search results, but if security-sensitive content is exposed in &lt;code&gt;robots.txt&lt;/code&gt;, a security risk occurs.&lt;/p&gt;
&lt;p&gt;Therefore, files related to security-sensitive management or pages meant only for specific limited people must have reliable access restrictions such as login authentication or IP address restrictions.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;https://developers.google.com/search/docs/advanced/robots/intro?hl=ko&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Google Search Central | Introduction to Robots.txt&lt;i class=&#34;fas fa-external-link-alt&#34;&gt;&lt;/i&gt;&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;https://namu.wiki/w/robots.txt&#34; target=&#34;_blank&#34; rel=&#34;noopener&#34;&gt;Namu Wiki: robots.txt&lt;i class=&#34;fas fa-external-link-alt&#34;&gt;&lt;/i&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

      </description>
      
      <category>web</category>
      
      <category>SEO</category>
      
      <category>robots</category>
      
    </item>
    
  </channel>
</rss>
