Canonical Tags Definition

Canonical Tags: Definition and Guideline for 2020

Canonical tag (also known as “rel = canonical”) is a way of telling search engines that a particular URL is the main copy of a page. The use of the canonical tag prevents problems due to identical or “duplicate” content that appears on multiple URLs. In practical terms, the canonical tag tells search engines which version of a URL should appear in the search results.

Canonical Tags are not commands for the Search Engine. They are just hints for finding the canonical URL of a web page. Signals of that Canonical tags are hints instead of commands can be seen in Google Search Console’s Coverage Report section when the Search Engine chooses a different URL as the canonical version from the users’ canonical tag.

GSC Coverage Report
Canonical Tag Usage can fix lots of unhealthy Coverage Errors.

As such, Canonical Tags are part of the technical SEO area.

Why is canonization important?

Duplicate content is a complicated topic, but if search engines crawl a lot of URLs with identical (or very similar) content, this can cause a number of SEO problems such as Ranking Signal Dilution. First, if Search Engine Crawlers like Google’s Googlebot have to dig through too much duplicate content, they may miss some of your unique content. Second, so-called “duplicate content” can water down your ranking potential. Finally, it can also happen that search engines select the wrong URL as the “original”, even if your content has a good ranking. The use of canonization helps you to control your duplicate content and to avoid “noindex”.

Also, in the context of choosing the wrong URL for indexing, you can read our URL Hijacking Article to learn more about Search Engine’s perception for indexing and the uncertainty principle.

The problem with URLs

You may be thinking, “Why should someone duplicate a page?” And wrongly assume that canonization is not something to worry about. The problem is that, as humans, we tend to see a page as a concept, such as your homepage. For search engines, however, each unique URL is a separate page.

For example, in a situation such as that Google could reach your homepage in all of the following ways at the same time:

  • http://www.example.com
  • https://www.example.com
  • http://example.com
  • http://example.com/index.php
  • http://example.de/index.php?r…

For a human, all of these URLs represent a single page. For a crawler, however, each of these URLs is a unique page. Even in this single example, we can see that five copies of the homepage are in play. In reality, however, this is just a small sample of the variations you might encounter.

Modern content management systems (CMS) and dynamic websites exacerbate the problem. Many websites automatically add tags, allow multiple paths (and URLs) to the same content, and add URL parameters for searches, sorts, currency options, etc. You may have thousands of duplicate URLs on your page and may not even notice it. This is just one of many factors that need to be considered when creating WordPress websites.

Duplicate Content Reasons and Canonicalization

Canonical tags are being used for URL Consolidation and preventing the Ranking Signal Dilution. Duplicate URLs and their consodolidation is important for Search Engines so that they can save resources why the every URLs’ original version can be met with the search engine users with right search intent. Duplicate URLs or Duplicate content can be caused because of the reasons below.

  • Parameterized URLs for search parameters: example.com/?q=search-term
  • Session ID Parameters in the URLs: example.com/?sessionsid=3
  • Pages for different device types with different URLs: example.com/mobile and example.com/desktop
  • Having pages for different connection types with different URLs: example.com/mobile-3g and example.com/desktop-3g
  • Serving the same content with “www” and “non-www” versions: example.com and www.example.com
  • Serving the same content with http and https at the same time: http://example.com and https://www.example.com
  • Serving the same content with and without trailing slash (slash at the end of the URL): example.com/ and example.com
  • Serving the content with capitalized and lowered letters in the URL such as example.com/pagE and example.com/page
  • Serving the same content on the different file extensions such as example.com/page.html and example.com/page.htm
  • Serving the same content’s printable and presentable versions on the different URLs such as example.com/page and example.com/print/page
  • Having the AMP versions of the same content on different URLs such as example.com and amp.example.com.
  • Serving the same content with syndication on the different URLs such as “example.com/original-content” and “syndicationsite.com/syndicated-content”.

In all of those duplicated content problems, using canonical tags will help search engines for clustering the same content from the different URLs and index the correct URL while transferring the duplicated URLs’ ranking signals to the canonical version. Also, if there are lots of syndicated content for an original content publisher cross-domain canonical is one of the key applications here.

If people deliberately chose to syndicate their content, it makes it difficult to identify the originating source. That’s why we recommend the use of canonical or blocking. The publishers syndicating can require this.

Danny Sullivan, September 18, 2019

Canonical Tags: Best Practices

Duplicate content issues can be extremely tricky, so here are a few important things to keep in mind when using the Canonical Tag to avoid common mistakes:

1. Canonical tags can be self-referencing

Self-referencing canonical tags are the canonical tags that point to the URL that they are being found on. A self-referencing canonical tag means that the content’s original source is the URL that canonical tag is inside. For instance, if there is a web page on the “https://example.com/example” URL, the canonical tag should be the URL’s itself. An example of a self-referencing canonical tag is below.

<link rel="canonical" href="https://example.com/example">

Google also said that self-referencing canonical tags make clear that the URL is the original source of the content, this makes clear that the Search Engine can count on that web page more. A quote about self-referencing canonical tags from John Mueller is below.

I recommend [using a] self-referential canonical because it really makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed. Even if you have one page, sometimes there are different variations of the URL that can pull that page up. For example, with parameters in the end, perhaps with upper lower case or www and non-www. All of these things can be kind of cleaned up with a rel canonical tag.

John Mueller, Google Webmaster Trend Analyst

2. Proactively canonize your homepage

Given that duplicates of websites are very common and that people can link to your homepage in many ways (which you cannot control), it is usually a good idea to add a canonical tag to your homepage template to avoid unforeseen problems to avoid.

3. Spot check your dynamic canonical tags

Sometimes bad code causes a page to write a different canonical tag for each version of the URL (with the entire canonical tag missing). Make sure to check your URLs randomly, especially for e-commerce and CMS-controlled pages.

4. Avoid mixed signals

Search engines can avoid or misinterpret a canonical tag when sending mixed signals. In other words, don’t canonize side A -> side B and then side B -> side A. Nor canonize side A -> side B and then create a 301 redirect from side B -> side A. It’s also generally not good idea to concatenate canonical tags (A–> B, B–> C, C–> D) if you can avoid it. Send clear signals or force search engines to make bad decisions.

5. Be careful when canonizing near duplicates

When most people think of canonization, they think of exact duplicate content. It is possible to use the canonical tag on near duplicates (pages with very similar content), but be careful. There is a lot of discussion on this topic, but in general it is okay to use canonical tags for very similar pages, for example a product page that differs only by currency, location or a small product attribute. Keep in mind that the non-canonical versions of this page may not be eligible for ranking. If the pages are too different, search engines can ignore the tag entirely.

6. Canonize cross-domain duplicates

If you control both sides, you can use the canonical tag across domains. Let’s say you are a publisher that often publishes the same article on half a dozen pages. If you use the canonical tag, your ranking power will only be concentrated on one page. Keep in mind that canonization excludes the non-canonical pages from the ranking, so make sure that this usage suits your business case.

7. Use Only Canonical URL in Sitemaps

In a sitemap, only Canonical URL should have a place. A non-canonical URL will be excluded by the Google. This will decrease the web site’s quality score for Google, and the signals from the web site will be taken in a more suspicious way by the Search Engine Algorithms.

8. Use Internal Links for Showing Canonical URLs

Showing canonical URL also can be done via internal links. If every duplicate page’s canonical tag points out the page B, but every one of the internal links mostly point to the Page C for the same content group, this will create a confusion. Canonical Tag is just a hint, not just a command. Because of those wrong and mixed signals, Google started to calculate canonical tag by its own algorithms according to the Uncertainty Principle. Using always semantic and consistent signals for Search Engine Optimization will create a trust for the web entity in the viewpoint of the Search Engine.

9. Use only One Canonical Tag for Every Web Page

If there are multiple canonical tags on a web page Google will ignore them both. Multiple canonical tag designation on a web page can happen because of the wrong implementation of the page templates or copying and pasting the different source codes to the different pages. If there are multiple canonical tags on a web page, the benefit of the canonical tag for the canonicalization target URL will be lost.

10. Use Canonical Tags only in the Head Section of the URL

Google will ignore canonical tags in the body section of the HTML Document. To avoid any HTML Parsing problem, canonical tags should be added into the head section of the URL and also it should be located as early as possible. Also, canonical tag in the head section of the HTML Document helps the Search Engine for the crawling purposes, it can simply use different HTML Parsers or Information Extracting methodologies for the different HTML Document Sections.

11. Be Careful While Using Relative URLs in the Canonical Tags

Canonical tags can be used with relative URLs. A relative URL in the canonical tag can’t use the “http” or “https” prefix on it. Thus, only the path, folder and file names should be in the relative URL. If domain name without the “http” or “http” prefixes is written into the relative URL, the canonical tag will mostly target a 404 page. For instance, if there is a canonical tag like below.

<link rel="canonical" href="example.com/page-example"/>

This canonical URL will be read by Google as “http://example.com/example.com/page-example” since it doesn’t have the “http” or “https” protocol as prefix. The right relative URL usage in the canonical in this example would be as below.

<link rel="canonical" href="/page-example"/>

The canonical tag above will point the URL “http://example.com/page-example” as it should be. The relative URL usage in the Canonical Tag is an important point to be paid attention.

Canonical Tags that Point 404 Pages

Canonical Tag refers the page that has the original content for the duplicated pages that have the same content. If a canonical tag refers to 404 pages, this will lead Search Engine to ignore the canonical tag since it gives a clear wrong signal for indexing. 404 Pages are not being indexable and so they can’t be used within the canonical tags. 404 Pages that has a canonical tag that points out them can be seen in the Google Search Console’s Performance Report, in the Excluded version.

To fix “404 Pages with Canonical Tags problem”, a webmaster should change those pages’ canonical tags with the 200 HTTP Status Code Versions that has the actual, original content.

Canonical Tags and Hreflang Tags

Canonical Tags and Hreflang Tags should be consistent. If there is a hreflang on a web page for a specific region and the language, the canonical of that web page should be consistent with the language and the region that has been specified in the hreflang tag. For instance, if there is a hreflang tag such as “<link rel=”alternate” hreflang=”en-US” href=”http://example.com/example-content-en-us-version”>, than the canonical tag of the “http://example.com/example-content-en-us-version” should be the same. Hreflang tag is an important Technical SEO and International SEO element for ranking signal consolidation.

Canonical Tags vs. 301 redirects

A common SEO question is whether a canonical tag shares link authority (PageRank) as well as 301 redirects. In most cases, they seem to be doing this. Remember that these two solutions produce two very different results for crawlers and website visitors.

If you redirect page A–> page B, visitors will be automatically redirected to page B and will never see page A. If you set a canonical tag from page A–> page B, search engines know that page B is canonical, but users can visit both URLs. Make sure your solution matches the desired result.

How to check your canonical tags for SEO

When you check your canonical tags, there are a number of things that are worth checking for optimal SEO performance. Here is a quick checklist:

  • Does the site have a canonical tag?
  • Does the canonical mark point to the correct side?
  • Are the pages crawlable and indexable?

A common mistake is to point the canonical tag to a URL that is either blocked by robots.txt or set to “noindex”. This can send mixed signals to search engines. Below are some common ways to inspect and review your canonical tags.

1. Show Source Code

In most browsers you can right-click on “Show source code” or simply access the source code via the address bar, like so:

view-source: https: //example.com/canonical-tags

Examine the source code and look for the canonical tag in the <head> section of your page.

2. Verification in bulk with software solutions

Many SEO software allows you to check canonical tags in bulk. For example, Screaming Frog checks for missing canonical tags and can do so for thousands of pages at once. Also, you can create a Python Script for checking the canonical tag usage in bulk.

Last Thoughts on Canonical Tags and Holistic SEO

Canonical tags are a topic that many technophobic people are reluctant to deal with. Unfortunately, missing canonical tags are often the source of enormous amounts of duplicate content, which have an extremely negative impact on the ranking of individual subpages or even an entire website.

Video explanation by Matt Cutts / Google

What should be known regarding to Canonical Tag Usage:

  • The URL marked by the canonical tag must be accessible and must not refer to a 404 page. This happens, for example, when a “www.” Is forgotten or the web page accessed has a changed URL.
  • The URL must have the exact name; an additional or missing slash (slash) or “/index.php” at the end can cause an incorrect canonical tag.
  • Only one canonical tag may be used per website at a time, otherwise, search engines like Google ignore this award.
  • Absolute URLs (with http: //) should always be linked. The canonical tag also accepts relative URLs (example.com/article), but the linked page is then linked with http://example.com/example.de/article.
  • The linked page and the URL with canonical tag must not have a “noindex”, ” nofollow ” or “disallow” meta tag.
  • Pages with a canonical link are not taken into account for the search results – with the exception of the pages that refer to themselves with a canonical link to prevent possible URL generation via session IDs.
  • For paginated pages, which are marked with rel = “next” or rel = “prev”, the use of Canonical tags does not make sense (since there is no actual duplicate content at this point).

To learn more about HTML Tags, you may read our guideline.

In Holistic SEO, canonical tags have a basic theoric and practice place. It shows how a Search Engine thinks and what was the reasons for the creation of a canonical tag in the first place. Why did Google start to take the canonical tag as a hint instead of command? What other things changed in the same direction in the Search Engine Ecosystem? For instance: Pagination Commands such as link rel=” next” and link rel=”prev” is not used by Google anymore. Nofollow attribute is also a hint, instead of a command. What is the Uncertainty Principle? How you can use all of that information for creating more clear and understandable signals for the Search Engine via Canonical Tags and other elements of the Holistic SEO?

All of those questions and their answers are important to think and see beyond a simple Google Blogpost as Holistic SEOs.

Our Canonical Tag Guideline has tons of missing points, we will improve this guideline by time.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top