Today, we’re answering one of the biggest questions in the SEO world. Canonical vs Noindex – which one do I need?
If you’ve been asking this, you’re probably not the first one. Many have asked the same question before.
So let’s unwrap the answer and settle it once and for all.
What is a Canonical Tag?
A rel canonical tag hints to search engines that this page is not important and points to the master version of a page instead.
It’s a way to tell search engines that these are the duplicates of page A, and we want page A to be indexed and ranked. Here’s how the search giant claims it selects a canonical.
“If Google finds multiple pages that seem to be the same or the primary content very similar, it chooses the page that, based on the factors (or signals) the indexing process collected, is objectively the most complete and useful for search users, and marks it as canonical.”
This is usually useful when one page has multiple versions due to slight variations. For example, a product page for a t-shirt has multiple colour variations. And hence, a URL will be dedicated to each colour. These variant pages could be perceived as duplicates by search crawlers and may dilute page authority unless the canonical URL is specified clearly.
How To Specify A Canonical URL?
There are many ways to indicate a canonical URL, but Google suggests the three most effective techniques in their blog.
- Implementing the rel canonical tag in the <head> section of duplicate pages that points out to the canonical URL.
- Applying 301 redirects from a duplicate to a canonical page.
- Including all canonical URLs in the sitemap.
The strongest among them is the canonical tag and redirects; however, stacking all methods could improve the chances of a canonical URL being identified.
When To Use Canonical Tag?
A canonical tag is used when a site has two or more versions of the a URL. This could be possible in multiple scenarios, such as having regional pages or device variants.
Here’s when applying a rel canonical tag could be effective:
- When a business operates in multiple locations and has URL versions dedicated to each region.
- When a website has both HTTP and HTTPS versions.
- When a website contains various URL parameters for sorting and filtering functions.
- When a website has separate mobile and desktop versions.
- In case of accidental variants, like the demo version of a website.
What is a Noindex Tag?
The noindex is a meta tag directive that signals to Google bots that this page is not to be indexed. Unlike a canonical tag that works like a hint for crawlers, this one is a directive that’s never ignored by search bots. Google says about the noindex tag.
“When Googlebot crawls that page and extracts the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.”
That means that the page will not appear in search results, and so it won’t contribute to your search signals or authority.
The noindex tag works only when a page is otherwise accessible to search engines. It won’t work if it’s already blocked, say via the robots.txt file.
How to implement Noindex?
A noindex tag helps de-index a page for search crawlers that support the noindex rule. There are two ways to apply it.
- Apply the noindex meta tag to the <head> section of a page. You can customize this tag to apply to all search engines or just Google alone. The general noindex tag looks like this.
<meta name=”robots” content=”noindex”>
Whereas a Google-specific noindex tag is only intended for a Googlebot and looks like this:
<meta name=”googlebot” content=”noindex”>
- For non-HTML pages, it’s not possible to apply a <meta> tag in the header. That’s why an HTTP response with an X-Robots-Tag header is used to instruct search engines to avoid indexing a page. Here’s an example of how it’s applied:
HTTP/1.1 200 OK
(…)
X-Robots-Tag: noindex
(…)
When to Use a Noindex Tag?
So, when exactly is a noindex tag justified? Since it prevents a page from getting indexed or displayed in search results, it should never be tossed around casually.
You can, however, apply it in the following situations.
- To de-index content in privacy policies, about pages, disclaimers, and other similar pages. Since the content on these pages does not contribute to your search performance, a noindex tag is appropriate.
- To hide backend or staging content.
- To restrict access to pages containing content reserved for members or exclusive customers only.
- To avoid content from getting flagged as a duplicate when the pages are highly identical.
Canonical vs Noindex: Where To Use What?
To break down when to use what, we’ll go over a real-life scenario.
Someone asked this very same question in the 2021 Google office-hours hangout. They said that they’ve been sorting through their e-commerce website pages and found that many were thin pages. They proceeded to list them down and sort them based on which ones needed to be indexed and which ones didn’t. This led to a common query many come across: canonical vs noindex tag—which one’s the most suitable for these pages?
To this John Mueller answered, saying:
“So usually what I would look at there is what your really strong preference there is.”
He proceeded to explain that if the preference is for this content to completely disappear from search, a noindex would be the choice. Whereas, if you want all this content combined in one page while the individual ones are still accessible, then a canonical tag would be the perfect choice.
To put it simply, it’s a matter of certainty. With noindex, you’re certain the page would be ignored. With a rel canonical, it’s not certain, but likely it won’t be shown.
Can You Combine The Two – Noindex & Canonical?
Before the rel=”canonical” link attribute was added, traditional SEO handled canonicalization using 301 redirects and noindex tags. But the rel canonical provides a direct solution for duplication issues. This way, you don’t need a 301 redirect to tell search engines which page should be prioritized and indexed.
But since its introduction, discussions around combining the rel canonical and noindex gained more traction. Webmasters started asking questions like.
How do Google bots react if both tags are present on the same page?
Would the canonical pass the noindex attribute to the canonicalized page?
John Mueller answers these questions, saying that:
“Our algorithms, theoretically, could get confused.”
He detailed further.
“Google in practice generally just assumes in these cases the canonical is a mistake and ignore it,”
In a separate thread, he elaborated on how the search bots react to this situation.
“One reason for this is that we sometimes find a non-canonical URL first. If this URL has a noindex robots meta tag, we might decide not to index anything until we crawl and index the canonical URL. Without the noindex robots meta tag (with the rel=canonical link element), we can start by indexing that URL and show it to users in search results. As soon as we crawl the canonical URL, we can change to the canonical URL instead. It’s also much safer because you don’t have to worry about serving different versions of the content depending on the exact URL :-).”
So, the bottom line is it’s best to avoid using the two together. Combined, they can create confusion and may deliver mixed signals. Becasue you’re telling Google ‘don’t index this’ and ‘treat this as a duplicate of X’ at the same time. In practice, it’s clearer to choose one based on your goal.