Engage the Cloaking Device: My presentation at SMX Advanced (slides and comments, too)

by Hamlet Batista | June 07, 2008 | 9 Comments

I promised everybody that I’d be posting my presentation slides from my talk at the SMX Advanced Bot Herding panel, so here they are!
First, let me say that I was very excited to be speaking at a major search marketing conference, and I can say with confidence that all the traveling was definitely worth it. My only regret is that I did not get to finish my presentation. This is the first time I spoke publicly and as an inexperienced speaker I was not even looking at the timer. My apologies to all those in attendance. 🙂 Frankly, I do think speakers should be allowed a little bit more time for SMX Advanced, as you really do need time to lay the groundwork before delving deeply into these sorts of topics.
For those that didn’t come, let me summarize the key takeaways from my speech and put it into context regarding Google’s recent post on Webmaster Central:

Cloaking: Serving different content to users than to Googlebot. This is a violation of our webmaster guidelines. If the file that Googlebot sees is not identical to the file that a typical user sees, then you’re in a high-risk category. A program such as md5sum or diff can compute a hash to verify that two different files are identical.

Basically, Google says that geolocation and IP delivery (when used for geolocation purposes) are fine as long as you present the same content to the Googlebot as you would present to the user coming from the same region. Altering the content the robot sees puts you in “a high-risk category.” Google is so strict that it suggests you need a checksum program to make sure you are delivering the same content. Obviously, it doesn’t matter if your intention is to improve the crawling and indexing of your site or not.
Why would you want to cloak anyway?
Let’s talk about the key scenarios I discussed in my speech:

  • Content accessibility

–          Search unfriendly Content Management Systems. According to Google, if you are using a proprietary CMS that does not allow the flexibility of making the URLs search-engine friendly, or if it has cookie-based session IDs, or has unique titles and descriptions, you need to replace your CMS with a newer one. Using a reverse proxy that cloaks to fix those issues is a “bad idea.” Again: easy for Google, hard for the customer.

–          Rich media sites.  If you use Scalable Inman Replacement, SWFObject, JavaScript or CSS to render rich media content to the user and regular text to the search engine then you are fine, because the checksums will be the same.

–          Content behind forms. Google is experimenting with a bot that will try to pull content from basic forms using HTTP GET and providing values listed in the HTML.

  • Membership sites

–          Free and paid content. Google recommends we register our premium content using Google News’ First click free. The idea is that you need to give searchers the first page of your content for free and they need to register for the rest. This is very practical for newspapers that have resorted to cloaking in the past. I do see a problem with this technique for sites like SEOmoz where some of the premium pages are guides that cost money. If SEOmoz signed up for this service, I would be able to pull all the guides by guessing search terms that would bring them up in the results.

  • Site structure improvements

–          Alternative to PageRank sculpting via “no-follow.” I explained a clever technique where you can cloak a different link path to robots than you present to regular users. The link path for users should be focused on ease of navigation and the link path to regular users should be focused on ease of crawling and deeper index penetration. This is very practical but not really mandatory.

  • Geolocation/IP delivery

–          According to the post we don’t need to worry about this. Some good news at last!

  • Multivariate testing

–          This is a very interesting case and I would have liked them to explain this in the Webmaster Central post. Search engine robots don’t take part in these experiments because they don’t execute JavaScript, yet many users will see a different version of the page than the robot sees. JavaScript-based cloaking will provide the same checksum for the page the bot sees and the page the user sees. I’m sure some clever black hats are taking advantage of this to do “approved” cloaking. 🙂

Google = Romulans
Just like the Romulans from Star Trek, Google doesn’t want cloaking technology in the hands of everyone. I didn’t get to talk about this in my presentation, but let me speculate as to why Google is drawing such a hard line on cloaking: Simply put, it is the easiest, cheapest and most scalable solution for them.
1.       As a developer I can tell you that running checksums against the content presented to Googlebot vs. the content presented to the cloaking detection bots is the easiest and most scalable way for them to do it.
2.       Similar to the problem with paid links, it is easier to let us do all the work of labeling our sites so they can detect the bad guys without having to dedicate a huge amount of resources to solve such problems.
Enjoy the slides and feel free to ask any questions. If you were there at SMX Advanced and watched me present, please let me know your honest comments. Criticism can only help me improve. Let me know what you think of the slides, too. Originally, I had planned to use more graphics than text, but ultimately I thought that the advanced audience would appreciate the added information.

Hamlet Batista

Chief Executive Officer

“We kept putting more energy into getting SEO audits done to understand why we started to lose positions to smaller, newer companies, but we kept losing ground. We kept trying to SEO audit recommendations, but our programmers couldn’t go fast enough.

9

REPLIES

Leave a Reply

Want to join the discussion? Feel free to contribute!

Install Our Free SEO Monitoring App Today!

It is not humanly possible to properly optimize every page of a big site. This leaves serious money on the table.

OUR BLOG

Latest News and Tactics

What do you do when you’re losing organic traffic and you don’t know why?

SEO Tactic #7: AB Testing Your Organic Search Snippets

SEO Tactic #7: AB testing your Organic Search Snippets. AB Testing Your Organic Search Snippets Meta descriptions, the little snippets that show up when you Google something…what business really needs ‘em, right? Before you answer, let’s walk a moment in a hypothetical online shopper’s shoes… Summer’s around the corner, and let’s say that 2018 is...

READ POST

SEO Tactic #6: Optimize Media for Search Engines

In this tactic, we’ll be optimizing your images, videos, and PDFs for search engines. This will help your multimedia content perform better in image and video searches, which have less competition than general web searches. I’ll go over the best practices for multimedia SEO.

READ POST

Request your SEO Monitoring Invitation

* indicates required





Please select all the ways you would like to hear from RankSense:


You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices here.