Sitemap

What Is a Sitemap?

A sitemap is a file that helps search engines such as Google and Naver comprehensively index a site. Basically, it can be thought of as a file that lists URLs. Bots crawl the site based on this file.

Content types and update frequency can be specified, but the most important thing is where sitemap.xml is located. Since only URLs under the domain below sitemap.xml are crawled, the installation location must be chosen carefully. In general, it is best to place it at the root.

Sitemap XML Format

<?xml version="1.0" encoding="UTF-8"?>
<urlset>
  <url>
    <loc>https://www.devkuma.com/docs/java/static/</loc>
    <lastmod>2022-04-03T20:41:00+09:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
   </url>
</urlset>
Tag Required/Optional Description
<urlset> Required The tag that wraps the whole document and refers to the current protocol standard.
<url> Required
<loc> Required Page URL.
It must include a trailing slash and the value must be 2,048 characters or fewer.
<lastmod> Optional Last update date of the file.
<changefreq> Optional Page update frequency.
<priority> Optional URL priority.
A value from 0.0 to 1.0 can be specified.
The default is 0.5.
Do not set high priority for every URL on the site.

Page update frequency (changefreq) list:

  • always: contents are updated every time the page is accessed
  • hourly: once per hour or less
  • daily: at least once per day
  • weekly: at least once per week
  • monthly: at least once per month
  • yearly: at least once per year
  • never: crawled periodically, for pages that do not need updates

When Using Multiple Sitemap Files

If there are 50,000 or more URLs, multiple sitemaps are needed. In that case, create a sitemap index file and tell crawlers that multiple sitemaps exist.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>https://www.devkuma.com/sitemap1.xml.gz</loc>
      <lastmod>2022-12-06T01:57:17+09:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>https://www.devkuma.com/sitemap2.xml.gz</loc>
      <lastmod>2021-01-01</lastmod>
   </sitemap>
</sitemapindex>
Tag Required/Optional Description
<loc> Required Sitemap file name
<lastmod> Optional Last update date of the file

References