A sitemap contains all the sub-pages of your website that are to be indexed by Google. Since it is written in a standardized XML format, it is also called an XML sitemap. With a sitemap, you help Google crawl your website. It is usually located in the main directory of a domain and can be called up there. If a sitemap is located on a subdirectory, It can only be used for that subdirectory, the same rule is also valid for the subdomains.
Sitemaps cannot list more than 50,000 URLs and cannot be larger than 50 MB. If your sitemap exceeds one or more of these numbers, you will need to create more than one.
Some of the related guidelines for SEO and Sitemaps:
- What is an Image Sitemap?
- What is a News Sitemap?
- What is an HTML Sitemap?
- How to Submit a Sitemap to the Search Engine?
Origins of the Sitemaps
Google introduced sitemaps in 2005 for webmasters and web developers so that they can contain all the necessary and important URLs in one location for the Search Engine. The first valid Sitemap Version for Google was 0.84. In November 2006, Yahoo, Microsoft, and Google has established the Common Mechanism for Website Submission. In April 2007, IBM and also Ask.com joined the Common Mechanism for Website Submission. Google and Yahoo have started the auto-discovery process for the sitemap files via robots.txt files. You may see Google’s first announcement related to the Sitemaps below:
Also, you may see the Declaration of the Common Mechanism of Website Submission by Google, Yahoo, and Microsoft below:
Which elements are considered by Google?
The first two lines define the XML schema for our sitemap and specify that UTF-8 encoding is used.
- Loc denotes the URL that is listed in the sitemap.
- lastmod shows when the URL was last changed. In our example, it was on January 1st, 2005. With this entry, the search engine recognizes when the post was updated – and above all, whether it is worth crawling the page again. The notation is in the W3C DateTime format. The year is at the front and the day is at the back. The lastmod element is taken into account by Google when crawling.
- Changefreq is optional and describes the frequency of changes, i.e., how often the URL is likely to be changed. The following values are accepted here: always, hourly, daily, weekly, monthly, yearly, and never. With always, the URL is changed with every call. Never is only used for archived URLs. In our example, the URL (most likely) will be updated every month. Changefreq is only a recommendation to Google and not an order. A URL with a changefreq of hourly can be crawled less typically, and a URL with yearly can be crawled more frequently than once a year. Even with URLs with never, the crawler comes in from time to time. If you are not sure what to enter here, you can leave this element blank.
- Priority means how important the URL is within the entire domain. The values range from 0.0 to 1.0. The default is 0.5. This value is ignored by Google, so it can also be omitted.
1. XML Declaration
<? xml version = "1.0" encoding = "UTF-8"?>
This simply tells that our sitemap is in XML format, and it is encoded with the UTF-8 character set. It also specifies the XML version.
2. URL set
<urlset xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9">
It is the container of all URLs in the sitemap. It also specifies the Sitemap Version which is used. In this case, our sitemap version is 0.9, it is being supported by Microsoft, Google, and Yahoo.
<url> <loc> https://www.holisticseo.digital/ </loc> <lastmod> 2020-07-21T16: 12: 20 + 03: 00 </lastmod> </url>
Every sitemap starts with the most inclusive URL. But still, Google doesn’t care about the URL Order in the Sitemaps for crawling orders. They are collecting the URLs and queuing them for crawling according to their internal crawl algorithms. Every URL should be used as an absolute, not a relative URL. Also, every URL should be original, any canonical URL, duplicate page URL, or page without a 200 status code shouldn’t be in the sitemap. Every URL in the Sitemap has to be the URL that is necessary to be indexed.
<loc> also means location, it says the location of the URL.
<lastmod> saying the last modified date of the URL in W3Schoold Date Format. If we would update the content on December 29, 2020, it would turn into 2020-12-29.
<priority> says the importance of URL for crawling. In the old times, Google cared about this property in a sitemap, but most of the SEOs used it for manipulating the Search Engine, as always. Because of this, Google has created new metrics such as internal link popularity, PageRank, traffic, or historical data to understand the importance of a URL on a website.
“<changefreq>” is for specifying the change frequency of the URL so that the Search Engine can determine a crawl frequency. In the old times, Google was able to refresh its index only once a month. After a while, it started to update all indexes in 3-4 days. After Google Caffeine Update, it started to refresh the index for every web page separately. <changefreq> is an old property in Sitemap for those days in which Search Engines are slow to crawl and index. Also, SEOs started to use this metric to manipulate the Search Engine, so Google created its own metrics.
You may find John Mueller’s quote below related to the <changefreq>:
Priority and change frequency doesn’t really play that much of a role with Sitemaps anymore.
This is something where we’ve tried various things but essentially, if you have a sitemap file and you are using it to tell us about the pages that were changed or updated, it is much better to just specify the time stamp directly so that we can look into our internal systems and say we haven’t crawled since this date, therefore, we should crawl again.
And just crawling daily doesn’t make much sense if your content doesn’t change. So that is something where we see a lot of sites they give us this information in the sitemap, they said it changes daily or weekly, and we look in our database and it hasn’t changed in a month or years…
So what I’d really recommend is using the timestamp.John Mueller
XML, RSS, text: which formats are still available for Sitemaps?
Google accepts various sitemap formats. The most common is the XML format described above. In addition, Google can also read other formats.
- RSS: If you have a blog with an RSS feed, you can also submit the URL of your feed. RSS 2.0 and Atom 1.0 feeds are accepted. With a media RSS feed, you can still provide Google with information about videos on your website. Remember that the RSS feed only contains current URLs.
- Text file: If your sitemap should only contain website URLs and no other information, you can also create a text file. There is only one URL in each line and the whole thing is saved in .txt format.
- Google Sites: If you have created your website with Google Sites, the sitemap will be generated automatically. However, this is not automatically submitted to Google, you can do it yourself. How to submit a sitemap is explained below in the article.
- XML: In my opinion the best solution for your sitemap. The standardized format ensures that Google receives all the information it needs.
- HTML Sitemap: HTML Sitemaps for showing all URLs to the Users rather than Search Engines. Since it is not directly related to the SEO, I recommend you to read our HTML Sitemap Guideline.
A Sitemap is Good for Ranking, Right?
A sitemap is not a direct ranking factor, but it helps Google to find your content more easily and to recognize changes quickly. Especially with new websites, it is worth notifying Google as soon as possible that there are new URLs. This way your pages will be indexed faster and you can directly control which pages should be included in the index.
Do I also need an HTML Sitemap?
In contrast to the XML sitemap, the HTML sitemap helps your users to find their way around the page, like a kind of table of contents. The HTML sitemap does not replace the XML sitemap but can be seen as a supplement to it.
The HTML sitemap from Samsung shows the user which categories there are. It serves as a guide for the user.
As you can see, the HTML sitemap shown above provides information about the categories and structure for the user. Unlike the XML sitemap, the HTML sitemap is a separate subpage that is usually linked in the footer and is therefore visible to your users.
Why Do I Need an XML Sitemap?
As you already read above, a sitemap is not relevant for the ranking and there are also some pages that don’t have any. Nevertheless, there are some advantages that a sitemap brings:
- Google detects changes faster: If new URLs are added, you can tell Google with a sitemap. This will help Google crawl.
- New websites are indexed faster: Your website is still fresh and Google must first know that it exists. Through your sitemap, you can actively tell Google that there is something new.
- Your pages are not linked to each other: If your content pages are not linked to each other, you use a sitemap to ensure that Google finds them anyway. This way they are not overlooked when crawling. Of course, a sitemap does not replace a well-thought-out internal link!
- Your page is extensive: If you have a lot of URLs on your page, a sitemap reduces the likelihood that something will not be crawled.
- Even if you have rich media content or would like to be displayed on Google News, Google may also consider additional information in the sitemap. For Google News, you also need a separate sitemap.
So you see, there are good reasons for a sitemap. Google itself says that a sitemap does not guarantee that everything will be crawled, but there are no disadvantages.
What if I Don’t Have a Sitemap?
Then create one! You can either easily create them in your content management system (CMS) or create them manually. But I recommend the first variant.
How Big Can a Sitemap Be?
A sitemap can have a maximum of 50,000 URLs, and it can be a maximum of 50 megabytes. If you have a larger website, then your sitemap must be split up. Depending on which CMS you are using, this can happen automatically. The individual sitemaps are linked in the sitemap index file. The whole thing looks like this:
You can see this sitemap when you go to www.seokratie.de/sitemap.xml.
What is the XML sitemap index?
If you exceed the limits for the sitemap, you need to divide it into several smaller maps and index these maps. It is an individual file that includes all the maps on your website.
<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
This index includes two maps of the XML site:
sitemap2.xml.gz. Now we will explain this file in parts.
<?xml version="1.0" encoding="UTF-8"?>
It’s nothing new, the index header (just like the XML sitemap header) defines the XML standard version and character encoding.
Definition of Sitemapindex
Instead of defining urlset be defined here as sitemap index. The definition encompasses all the sitemaps and tells which version of the XML standard was used. Just like the urlset definition closes at the bottom of the document:
Definitions of Sitemaps
<sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap>
Most important sections of the sitemap are the <loc> and the <lastmod> tags. <loc> tag shows the URL’s location in the sitemap for the Search Engine Crawlers while the <lastmod> tag shows the latest modification date of the content in the <loc> tag.
Apart from this, it is also possible to define lastmod as the date of the last modification of the XML sitemap in the “W3C” format.
What Criteria does a Sitemap Have to Meet to be Convenient to Google Guidelines?
In order for your sitemap to be error-free and accepted by Google, some requirements must be met. I’ll tell you what those are.
- Your sitemap file must be encoded in UTF-8 format and the corresponding escape codes are stored if some characters cannot be displayed correctly.
- Make sure your sitemap only contains URLs from the same domain. If you have multiple domains, each domain gets its own sitemap.
- Your sitemap may only have content that should be indexed and actually accessible. You can see possible errors in the sitemap in the Google Search Console.
Use consistent URLs
Google crawls your URLs exactly as you enter them in your sitemap. So be consistent and don’t mix different spellings. With the sitemap of HolisticSEO.Digital looks like this:
The individual URLs in the sitemap of HolisticSEO.Digital.
We all give our URLs in this form: https // www.HolisticSEO.Digital /. Avoid omitting the “www” in some URLs or using relative URLs.
How to create an XML sitemap in the content management system
Most content management systems such as WordPress have a corresponding extension with which you can easily create a sitemap for your website. Let’s take a look at how to generate a sitemap in WordPress.
First, you need a plugin that supports you in creating it. When choosing the plugin, make sure that it is well-written. You can see this, for example, in the ratings and the number of users who use it. Well-written plugins consider Rel = Canonical and Noindex, while bad plugins simply include everything in the sitemap. We use the Yoast plugin in this example. It’s that easy:
- Go to the “General” button in the Yoast settings and then click on “Functions” at the top.
- Activate “XML sitemaps”. By clicking on the question mark you can display further information. If you have activated the function, Yoast automatically creates an XML sitemap for your page.
If the switch is set to “On”, Yoast automatically generates a sitemap.
By clicking on “View the XML sitemap,” your sitemap will open in a new tab. You will need the link later if you want to submit the sitemap to Google.
If you would like posts to be excluded from the search results, you can set this under “Display in search” and there under “Content Types”. Your contributions would then receive the robots meta tag noindex and will not be included in the sitemap. Since I very much want my posts to appear in the search results, the switch remains on “Yes”.
In the settings, you can have certain content types set to noindex.
Under Taxonomies, you can still decide whether categories should also be displayed in the search results. The following also applies here: If the switch is off, category pages are set to noindex and are therefore not listed in the sitemap.
The practical thing if you have your sitemap created via the CMS: It is always up-to-date and less prone to errors. The larger your page gets, the more difficult it is to manually view the content, especially if something changes. That’s why I recommend that you always have XML sitemaps created automatically.
How to manually create an XML sitemap for your website
Alternatively, you can also create your sitemap manually. You should really only do this if you are not using a CMS. But remember: If you generate your sitemap manually, you have to create it every time something changes at any URL. That is why I recommend you use a tool for this and not click on your sitemap by hand.
For example, you can use XML-sitemaps.com. It even recognizes Noindex and Canonical elements and does not add the corresponding URLs to the sitemap. There is also a Pro version of this tool, which then automatically updates the sitemap when changes are made. If you only use the normal version, you have to generate a new sitemap every time your page changes. You can quickly lose track of things.
How did your sitemap with Screaming Frog are creating, Luisa explains in her blog post.
It is best to always create your sitemap automatically, otherwise, errors will creep in quickly, which then lead to problems in indexing. You can find out which elements lead to errors in the next paragraph.
What should be not in your Sitemap?
Unfortunately, it happens again and again that sitemaps contain elements that do not belong in them. I’ve already written that only information that should be indexed and actually accessible is allowed in your sitemap. If you have faulty pages or redirects in your sitemap, there are problems with crawling. These elements have lost nothing in your sitemap:
- Duplicates of a URL: Only the correct version of each URL should be indexed. So there is no point in including seokratie.de/blog and seokratie.de/blog/ in the sitemap. Decide on one of the two versions.
- URLs with a canonical tag: If a page has a canonical tag, then this is a sign for Google that it should not be indexed. However, if it still appears in your sitemap, this sends contradictory signals. Everything that is listed in a sitemap should also be indexed. Avoid URLs with a canonical tag in your sitemap to avoid conflicts during crawling.
- Session IDs: If session IDs are in the URL of a page, a unique link is generated each time the page is visited. Since the link changes with every page visit, it looks like duplicate content for the Googlebot.
- Pages with status code 404/410: These pages report an error and have no place in your sitemap. Either delete the relevant entries from the sitemap or make the links work again.
- Redirects: Only unique URLs should be listed in your sitemap. Redirects mislead the Googlebot.
- Pages with noindex tag: As with Canonical – day, conflicting signals are sent when you page Noindex- aufnimmst day in your Sitemap. These pages have to stay outside.
- Images: In your normal sitemap, only URLs to content pages are listed. If you have a lot of images that you want to index, use an image sitemap. I’ll explain something to you below.
How do you submit your sitemap to Google?
Now you have successfully created a sitemap for your website, but how does Google know that you have one?
Reference your sitemap in the robots.txt
First, save your sitemap in robots.txt. This file helps the crawlers find their way around your website. A reference to the sitemap in the robots.txt tells the crawlers which URL structure your website has. A reference to the sitemap in robots.txt looks like this:
The path for the sitemap of your domain can be put into the last line of the Robots.txt file.
How to submit your sitemap to Google
To submit your sitemap to Google, you need a link from your website to the Google Search Console. Here you can submit your sitemap under the menu item Sitemaps.
In the sidebar of the Search Console, you will find the menu item “Sitemaps”.
Here you can also see whether you have already submitted a sitemap and whether it was successfully submitted or whether there were problems. You can also easily enter the URL of your sitemap so that it is submitted.
List of submitted sitemaps with the status “Successful”. You can also see the Index Coverage Report.
Is your sitemap wrong?
You can find out in the search console whether your submitted sitemap has errors. This is shown in the sitemap report in the “Status” column. Our sitemap has the status of “Successful”. If your sitemap is incorrect, then the Search Console shows you the status “sitemap contains errors”. If Google cannot retrieve your sitemap, you will see this under the status “Could not be retrieved”. You can find a list of all possible error codes in Google Help when you scroll down. Here you will also get suggested solutions.
Check this report regularly to see if your sitemaps are still free of errors. Errors in the sitemap can lead to problems with indexing and should, therefore, be corrected. Ideally, you would have your sitemap created automatically and have already reduced the susceptibility to errors.
Do you have to update your sitemap?
It is a good idea to inform Google as soon as there is new content on your website. If your sitemap is generated via the CMS, the sitemap is automatically updated when changes are made. Now you can see why it makes sense not to create the sitemap manually. Especially if new content is added frequently, a plugin does a lot of work for you, so you don’t have to worry about updating yourself. If, on the other hand, you create your sitemap manually, then you also have to update it with every change and this can quickly become confusing.
Dynamic Sitemap is the definition of automatically updated sitemaps, while static sitemaps mean the sitemap which can’t be updated automatically.
How to handle multiple language versions in the XML sitemap
If you operate several languages with your website, you must also tell Google. To do this, create a markup in the sitemap. There are two other methods to include the hreflang attribute. Luisa summarized what they are and what you still need to know about different language versions in her article on hreflang.
To define the language versions via the XML sitemap, an XHTML: link element is added to the loc element of each URL, in which the different languages are defined. This must be done for each URL of the website. It will look like that:
With hreflang you define alternative language versions in your sitemap.
As you can see, the whole thing gets very extensive very quickly. Therefore, be sure to check your sitemap for errors before submitting it to Google. You can find more information about the hreflang attribute directly on the Google support page.
What other types of sitemaps are there?
In the sitemap that I presented to you, the URLs of your website are listed. Do you have a lot of video content or images or do you want to be listed on Google News, then I recommend you also create a corresponding sitemap for this content?
With a video sitemap, you help Google to identify and find the videos on your website as such. You generate a video sitemap with an appropriate extension in your CMS.
In the settings of Yoast, you will find the item “Video SEO” if you have installed the plugin.
If you also use Yoast to create the video sitemap, then first install the plugin “Yoast SEO: Video”. This adds another menu item under the settings for Yoast with the title “Video SEO”.
You don’t have to do anything to create a video sitemap, Yoast does it for you. You can of course specify other settings for your videos, but normally the default settings are sufficient.
As with a normal sitemap, you can have the video sitemap created automatically with Yoast.
It is also possible to include your pictures in a sitemap. There are special criteria for images (as well as for videos or Google News) that change again and again. An image sitemap is not necessary for normal website users, but if you have a large image portal, you would also like to be found in the image search. An image sitemap can contain information such as subtitles, geographic location, title, or image license.
Google News sitemaps
If you have a news portal, it would be conceivable that you would also like to be listed on Google News. Firstly, you need to be signed in to Google News as a publisher so that your content can be displayed there. There are special requirements for a Google News sitemap, which you can read here at Google Support. The special thing about it: If your Google News sitemap is faulty, you fly out of Google News until the errors are fixed. So make sure that your Google News sitemap is always clean.
Last Thoughts on Sitemaps as Holistic SEO
Sitemaps are one of the most essential SEO elements in Search Engine History. It improves the Crawl Efficiency and Crawls Budget while making easier the job of the Search Engine. Having a categorized Sitemap Hierarchy in a structured Sitemap Index file also can help to understand Google or other search engines’ site hierarchy and the relationship between site sections. Using a Sitemap Index and Multiple Sitemaps in the Sitemap Index can increase the crawl budget and efficiency. Also, it shows which section of the website is more important for the users if the Sitemap Index is correlative with the internal link structure and site hierarchy. Probably this is because as the sitemap grows and the number of URLs in it increases, the cost and time required for the Search Engine’s actions with the sitemap also increase.
There are more magical sides to Sitemaps. We will continue to improve our guidelines for better a better SEO Community.
- Entity Identity Creation and Management: A Feminist SEO Case Study - March 23, 2023
- Exact Match Domain SEO: Why does EMD Work for Rankings – Case Study - February 23, 2023
- Importance of Entity, Attribute, Value (EAV) Architecture for SEO: Choosing the Right Attributes with Accurate Values from Text - February 16, 2023