Canonicalization: The Gospel of HTTP 301

by Hamlet Batista | July 19, 2007 | 10 Comments

book_gospel_closed.jpgUsually I don’t cover basic material in this blog, but as a loyal reader, Paul Montwill, requested it, I’m happy to oblige. As I learned back in school, if one person asks a question, there are probably many others at the back of the class quietly wondering the same thing. So here is a brief explanation of web server redirects and their use to solve URL canonicalization issues.

And just what is that ecclesiastic-sounding word “canonicalization”? It was Matt Cutts and not the Pope that made it famous, as he used the nomenclature to describe a certain issue that popped up at Google. Here is the problem. All of us have these URLs:

1) sitename.com/

2) sitename.com/index.html

3) www.sitename.com 

4) www.sitename.com/index.html

You know they are all the same page. I know they are all the same page. But computers — unfortunately, they aren't on the same page. They aren’t that smart and need to be told that each one of these addresses represents the same page. One way is for you to pick one of them and use it consistently in all your linking. The harder part, however, is getting other website owners linking to you to do the same. Some might use one, others another, and a few are bound to choose a third.

The best way to solve this is to pick one URL and have your web server automatically force all requests for other variations to go to the one you picked. We can use HTTP redirects to accomplish this.

HTTP redirects are simply web server response codes of this form (this is how it looks to the web browser):

HTTP 30x http://anotherurl.com

The number 30x is a status code from 300–307. The most commonly used are 301 and 302. (For a more complete description of each of the status codes, please read the HTTP Request For Comments (RFC2616), section 10.) We only need to use 301, which is the permanent redirect. This status code tells the crawler that the new address for the currently requested page is the one in the message. For example, you may want http://sitename.com to be your canonical page (like I do for my blog). If a visitor types http://www.sitename.com you want the web server to send back HTTP 301 http://sitename.com so that the crawler 'understands' that this is the proper, canonical page.

How do we do that?

There are two ways we can accomplish this with Apache — a basic one and an advanced one. Keep in mind that the basic one does not help with www vs non-www issues, though. It involves using the mod_alias module and directives: Redirect, RedirectPermanent or RedirectMatch.

 

In your .htaccess file, add one of these:

Redirect 301 /index.html http://sitename.com/

RedirectPermanent /index.html http://sitename.com/

RedirectMatch 301 /(.*)\.html http://sitename.com/$1.html

The more advanced one, which I recommend, is the one that I use. It involves changing the mod_rewrite module. Here is what my Apache configuration looks like:

# URL Rewriting

RewriteEngine on

RewriteLog logs/rewrite.log

RewriteLogLevel 0

RewriteCond %{HTTP_HOST} ^www\.hamletbatista\.com [NC]

RewriteRule ^/(.*) http://hamletbatista.com/$1 [R=301,L]

 

As you have probably noticed, I prefer http://hamletbatista.com. If I wanted http://www.hamletbatista.com/ instead, I would rewrite it this way:

RewriteCond %{HTTP_HOST} ^hamletbatista\.com [NC]

RewriteRule ^/(.*) http://www.hamletbatista.com/$1 [R=301,L]

If it was a regular website and not a blog, I'd add this line too.

RewriteRule ^/index.html http://hamletbatista.com/ [R=301,L]

As always, when you begin playing with files like these, it’s a good idea to check the Apache documentation for more details. It may not be the Bible, but for canonicalization issues, it’s as good as gospel.

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months

10

REPLIES

Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.

OUR BLOG

Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Getting Started with NLP and Python for SEO [Webinar]

Custom Python scripts are much more customizable than Excel spreadsheets.  This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit.  One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...

READ POST
Making it easier to implement SEO changes on your website

Changes to the RankSense SEO rules interface

As we continue to improve the RankSense app for Cloudflare, we are always working to make the app more intuitive and easy to use. I'm pleased to share that we have made significant changes to our SEO rules interface in the settings tab of our app. It is now easier to publish multiple rules sheets and to see which changes have not yet been published to production.

READ POST

How to Find Content Gaps at Scale: Atrapalo vs Skyscanner

For the following Ranksense Webinar, we were joined by Antoine Eripret, who works at Liligo as an SEO lead. Liligo.com is a travel search engine which instantly searches all available flight, bus and train prices on an exhaustive number of travel sites such as online travel agencies, major and low-cost airlines and tour-operators. In this...

READ POST

Exciting News!
seoClarity acquires RankSense

X