Canonical Tags: Definition and Guideline for 2022

A canonical tag (also known as “rel = canonical”) is a way of telling search engines that a particular URL is the main copy of a page. Using the canonical tag prevents problems due to identical or “duplicate” content that appears on multiple URLs. In practical terms, the canonical tag tells search engines which version of a URL should appear in the search results.

Canonical Tags are not commands for the Search Engine. They are just hints for finding the canonical URL of a web page. Signals that Canonical tags are hints instead of commands can be seen in Google Search Console’s Coverage Report section when the Search Engine chooses a different URL as the canonical version from the users’ canonical tag.

GSC Coverage Report
Canonical Tag Usage can fix lots of unhealthy Coverage Errors.

As such, Canonical Tags are part of the technical SEO area.

Why is canonization important?

Duplicate content is a complicated topic, but if search engines crawl a lot of URLs with identical (or very similar) content, this can cause a lot of SEO problems such as Ranking Signal Dilution. First, if Search Engine Crawlers like Google’s Googlebot have to dig through too much duplicate content, they may miss some of your unique content. Second, so-called “duplicate content” can water down your ranking potential. Finally, it can also happen that search engines select the wrong URL as the “original”, even if your content has a good ranking. Using canonization helps you to control your duplicate content and to avoid “noindex”.

Also, in the context of choosing the wrong URL for indexing, you can read our URL Hijacking Article to learn more about Search Engine’s perception of indexing and the uncertainty principle.

The problem with URLs

You may be thinking, “Why should someone duplicate a page?” And wrongly assume that canonization is not something to worry about. The problem is that, as humans, we tend to see a page as a concept, such as your homepage. For search engines, however, each unique URL is a separate page.

For example, in a situation such as that Google could reach your homepage in all of the following ways at the same time:

  • http://www.example.com
  • https://www.example.com
  • http://example.com
  • http://example.com/index.php
  • http://example.de/index.php?r…

For a human, all of these URLs represent a single page. For a crawler, however, each of these URLs is a unique page. Even in this single example, we can see that five copies of the homepage are in play. However, this is just a small sample of the variations you might encounter. Modern content management systems (CMS) and dynamic websites exacerbate the problem. Many websites automatically add tags, allow multiple paths, and URLs to the same content, and add URL parameters for searches, sorts, currency options, etc. You may have thousands of duplicate URLs on your page and may not even notice it. This is just one of many factors that need to be considered when creating WordPress websites.

Duplicate Content Reasons and Canonicalization

Canonical tags are being used for URL Consolidation and preventing the Ranking Signal Dilution. Duplicate URLs and their consolidation are important for Search Engines so that they can save resources why the URLs’ original version can be met with the search engine users with the right search intent. Duplicate URLs or Duplicate content can be caused because of the reasons below.

  • Parameterized URLs for search parameters: example.com/?q=search-term
  • Session ID Parameters in the URLs: example.com/?sessionsid=3
  • Pages for different device types with different URLs: example.com/mobile and example.com/desktop
  • Having pages for different connection types with different URLs: example.com/mobile-3g and example.com/desktop-3g
  • Serving the same content with “www” and “non-www” versions: example.com and www.example.com
  • Serving the same content with HTTP and HTTPS at the same time: http://example.com and https://www.example.com
  • Serving the same content with and without trailing slash (slash at the end of the URL): example.com/ and example.com
  • Serving the content with capitalized and lowered letters in the URL such as example.com/pagE and example.com/page
  • Serving the same content on the different file extensions such as example.com/page.html and example.com/page.htm
  • Serving the same content’s printable and presentable versions on the different URLs such as example.com/page and example.com/print/page
  • Having the AMP versions of the same content on different URLs such as example.com and amp.example.com.
  • Serving the same content with syndication on the different URLs such as “example.com/original-content” and “syndicationsite.com/syndicated-content”.

In all of those duplicated content problems, using canonical tags will help search engines cluster the same content from the different URLs and index the correct URL while transferring the duplicated URLs’ ranking signals to the canonical version. Also, if there is a lot of syndicated content for an original content publisher, cross-domain canonical is one of the key applications here.

If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That’s why we recommend using canonical or blocking. The publishers syndicating can require this.

Danny Sullivan, September 18, 2019

Canonical Tags: Best Practices

Duplicate content issues can be extremely tricky, so here are a few important things to keep in mind when using the Canonical Tag to avoid common mistakes:

1. Canonical tags can be self-referencing

Self-referencing canonical tags are the canonical tags that point to the URL that they are being found on. A self-referencing canonical tag means that the content’s original source is the URL that the canonical tag is inside. For instance, if there is a web page on the “https://example.com/example” URL, the canonical tag should be the URL itself. An example of a self-referencing canonical tag is below.

<link rel="canonical" href="https://example.com/example">

Google also said that self-referencing canonical tags make clear that the URL is the original source of the content, this makes clear that the Search Engine can count on that web page more. A quote about self-referencing canonical tags from John Mueller is below.

I recommend [using a] self-referential canonical because it really makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed. Even if you have one page, sometimes there are different variations of the URL that can pull that page up. For example, with parameters in the end with upper lower case or www and non-www. All of these things can be cleaned up with a rel canonical tag.

John Mueller, Google Webmaster Trend Analyst

2. Proactively canonize your homepage

Given that duplicates of websites are very common and that people can link to your homepage in many ways (which you cannot control), it is usually a good idea to add a canonical tag to your homepage template to avoid unforeseen problems.

3. Spot check your dynamic canonical tags

Sometimes bad code causes a page to write a different canonical tag for each version of the URL (with the entire canonical tag missing). Make sure to check your URLs randomly, especially for e-commerce and CMS-controlled pages.

4. Avoid mixed signals

Search engines can avoid or misinterpret a canonical tag when sending mixed signals. In other words, don’t canonize side A -> side B and then side B -> side A. Nor canonize side A -> side B and then create a 301 redirect from side B -> side A. It’s also generally a not-good idea to concatenate canonical tags (A–> B, B–> C, C–> D) if you can avoid it. Send clear signals or force search engines to make bad decisions.

5. Be careful when canonizing near-duplicates

When most people think of canonization, they think of exact duplicate content. It is possible to use the canonical tag on near-duplicates (pages with very similar content), but be careful. There is a lot of discussion on this topic, but in general, it is okay to use canonical tags for very similar pages. For example, a product page differs only by currency, location, or a small product attribute. Keep in mind that the non-canonical versions of this page may not be eligible for ranking. If the pages are too different, search engines can ignore the tag entirely.

6. Canonize cross-domain duplicates

If you control both sides, you can use the canonical tag across domains. Let’s say you are a publisher that often publishes the same article on half a dozen pages. If you use the canonical tag, your ranking power will only be concentrated on one page. Keep in mind that canonization excludes non-canonical pages from the ranking, so make sure that this usage suits your business case.

7. Use Only Canonical URL in Sitemaps

In a sitemap, only the Canonical URL should have a place. A non-canonical URL will be excluded by Google. This will decrease the website’s quality score for Google, and the signals from the website will be taken in a more suspicious way by the Search Engine Algorithms.

8. Use Internal Links for Showing Canonical URLs

Showing canonical URLs can also be done via internal links. If every duplicate page’s canonical tag points out page B, but every one of the internal links mostly points to Page C for the same content group, this will create confusion. Canonical Tag is just a hint, not just a command. Because of those wrong and mixed signals, Google started to calculate canonical tags by its own algorithms according to the Uncertainty Principle. Using always semantic and consistent signals for Search Engine Optimization will create a trust for the web entity from the viewpoint of the Search Engine.

9. Use only One Canonical Tag for Every Web Page

If there are multiple canonical tags on a web page Google will ignore them both. Multiple canonical tag designations on a web page can happen because of the wrong implementation of the page templates or copying and pasting the different source codes to the different pages. If there are multiple canonical tags on a web page, the benefit of the canonical tag for the canonicalization target URL will be lost.

10. Use Canonical Tags only in the Head Section of the URL

Google will ignore canonical tags in the body section of the HTML Document. To avoid any HTML Parsing problem, canonical tags should be added to the head section of the URL, and also it should be located as early as possible. Also, the canonical tag in the head section of the HTML Document helps the Search Engine for crawling purposes. It can simply use different HTML Parsers or Information Extracting methodologies for the different HTML Document Sections.

11. Be Careful While Using Relative URLs in the Canonical Tags

Canonical tags can be used with relative URLs. A relative URL in the canonical tag can’t use the “HTTP” or “HTTPS” prefix on it. Thus, only the path, folder, and file names should be in the relative URL. If a domain name without the “HTTP” or “HTTP” prefixes is written into the relative URL, the canonical tag will mostly target a 404 page. For instance, if there is a canonical tag like the below.

<link rel="canonical" href="example.com/page-example"/>

This canonical URL will be read by Google as “http://example.com/example.com/page-example” since it doesn’t have the “HTTP” or “HTTPS” protocol as the prefix. The right relative URL usage in the canonical in this example would be as below.

<link rel="canonical" href="/page-example"/>

The canonical tag above will point to the URL “http://example.com/page-example” as it should be. The relative URL usage in the Canonical Tag is an important point to be paid attention to.

Canonical Tags that Point 404 Pages

Canonical Tag refers to the page that has the original content for the duplicated pages that have the same content. If a canonical tag refers to 404 pages, this will lead Search Engines to ignore the canonical tag since it gives a clear wrong signal for indexing. 404 Pages are not indexable and so they can’t be used within the canonical tags. 404 Pages that have a canonical tag that points them out can be seen in the Google Search Console’s Performance Report, in the Excluded version.

To fix the “404 Pages with Canonical Tags problem”, a webmaster should change those pages’ canonical tags with the 200 HTTP Status Code Versions that have the actual, original content.

Canonical Tags and Hreflang Tags

Canonical Tags and Hreflang Tags should be consistent. If there is a hreflang on a web page for a specific region and the language, the canonical of that web page should be consistent with the language and the region that has been specified in the hreflang tag. For instance, if there is a hreflang tag such as “<link rel=”alternate” hreflang=”en-US” href=”http://example.com/example-content-en-us-version”>, then the canonical tag of the “http://example.com/example-content-en-us-version” should be the same. Hreflang tag is an important Technical SEO and International SEO element for ranking signal consolidation.

Canonical Tags vs. 301 redirects

A common SEO question is whether a canonical tag shares link authority (PageRank) as well as 301 redirects. In most cases, they seem to be doing this. Remember that these two solutions produce two very different results for crawlers and website visitors.

If you redirect page A–> page B, visitors will be automatically redirected to page B and will never see page A. If you set a canonical tag from page A–> page B, search engines know that page B is canonical, but users can visit both URLs. Make sure your solution matches the desired result.

How to check your canonical tags for SEO

When you check your canonical tags, there are a number of things that are worth checking for optimal SEO performance. Here is a quick checklist:

  • Does the site have a canonical tag?
  • Does the canonical mark point to the correct side?
  • Are the pages crawlable and indexable?

A common mistake is to point the canonical tag to a URL that is either blocked by robots.txt or set to “noindex”. This can send mixed signals to search engines. Below are some common ways to inspect and review your canonical tags.

1. Show Source Code

In most browsers you can right-click on “show source code” or simply access the source code via the address bar, like so:

view-source: https: //example.com/canonical-tags

Examine the source code and look for the canonical tag in the <head> section of your page.

2. Verification in bulk with software solutions

Many SEO software allows you to check canonical tags in bulk. For example, Screaming Frog checks for missing canonical tags and can do so for thousands of pages at once. Also, you can create a Python Script for checking the canonical tag usage in bulk.

Last Thoughts on Canonical Tags and Holistic SEO

Canonical tags are a topic that many technophobic people are reluctant to deal with. Unfortunately, missing canonical tags are often the source of enormous amounts of duplicate content, which have an extremely negative impact on the ranking of individual subpages or even an entire website.

Video explanation by Matt Cutts / Google

What should be known regarding Canonical Tag Usage:

  • The URL marked by the canonical tag must be accessible and must not refer to a 404 page. This happens, for example, when a “www.” Is forgotten or the web page accessed has a changed URL.
  • The URL must have the exact name; an additional or missing slash (slash) or “/index.php” at the end can cause an incorrect canonical tag.
  • Only one canonical tag may be used per website at a time. Otherwise, search engines like Google ignore this award.
  • Absolute URLs (with HTTP: //) should always be linked. The canonical tag also accepts relative URLs (example.com/article), but the linked page is then linked with http://example.com/example.de/article.
  • The linked page and the URL with the canonical tag must not have a “noindex”, ” nofollow ” or “disallow” meta tag.
  • Pages with a canonical link are not considered for the search results – except for the pages that refer to themselves with a canonical link to prevent possible URL generation via session IDs.
  • For patinated pages, which are marked with rel = “next” or rel = “prev”, using Canonical tags does not make sense (since there is no actual duplicate content at this point).

To learn more about HTML Tags, you may read our guidelines.

In Holistic SEO, canonical tags have a basic theoretical and practice place. It shows how a Search Engine thinks and what were the reasons for the creation of a canonical tag in the first place. Why did Google start to take the canonical tag as a hint instead of a command? What other things have changed in the same direction in the Search Engine Ecosystem? For instance: Pagination commands such as link rel=” next” and link rel=”prev” are not used by Google anymore. No follow attribute is also a hint, instead of a command. What is the Uncertainty Principle? How you can use all of that information to create more clear and understandable signals for the Search Engine via Canonical Tags and other elements of the Holistic SEO?

All of those questions and their answers are important to think about and see beyond a simple Google Blogpost as Holistic SEOs.

Our Canonical Tag Guideline has tons of missing points. We will improve this guideline by time.

Koray Tuğberk GÜBÜR

Leave a Comment