The Never Ending SERPs Hijacking Problem: Is there a definite solution?

by Hamlet Batista | July 03, 2007 | 6 Comments

hijacker.jpgIn 2005 it was the infamous 302, temporary redirect page hijacking. That was supposedly fixed, according to Matt Cutts. Now there is a new interesting twist. Hijackers have found another exploitable hole in Google: the use of cgi proxies to hijack search engine rankings.

The problem is basically the same. Two URLs pointing to the same content. Google's duplicate content filters kick in and drop one of the URLs. They normally drop the page with the lower PageRank. That is Google's core problem. They need to find a better way to identify the original author of the page.

When someone blatantly copies your content and hosts it on their site, you can take the offending page down by sending a DMCA complaint to Google, et al. The problem with 302 redirects and cgi proxies is that there is no content being copied. They are simply tricking the search engine into believing there are multiple URLs hosting the same content.

What is a cgi proxy anyway? Glad you asked. I love explaining technical things 🙂

A cgi proxy is a type of proxy server that is accessible via URLs. is a well known example. Normal proxy servers are configured on your browser advanced options. I explained proxy servers briefly on another post.

I am sure most hijackers are using a very well known and public cgi proxy called CGIProxy by James Marshall. I have used it in the past (not for hijacking) and I can say that it is very easy to setup and use. The SSL and connection code is a little bit complex if you are not familiar with socket programming and encryption, but the fact that it is written in Perl makes a lot of things easy.

The cgi proxy in the hijacking context works like this:

  1. Googlebot finds and fetches a proxied URL (http://cgiproxyserver/http/

  2. The CGI proxy script pulls the page from your site ( )

  3. The CGI script replaces all your internal links ( to http://cgiproxyserver/http/ ). This is to make your the search engine bot continues to request the pages from the cgy proxy.

  4. The page Google is fetching via the cgi proxy is identical to the page on your site.

  5. Duplicate content filters kick in and the rest is history

Should you care? If you don't mind losing your search engine rankings to a hijacker, there is nothing to worry about. If you do, please keep reading.

How can I prevent this from happening to me?

One solution that I've seen in the forums and that Google recommends is to verify the IP saying that it is Googlebot, actually is. Let me explain briefly the solution they propose as well as the drawbacks of such solution.

The solution is what is known as reverse-forward DNS check. Email severs have used this for a while to detect valid, non-spamming SMTP hosts.

The detection code does a reverse DNS lookup (IP(a) -> host(a)) , followed by a forward DNS lookup (host(a)->IP(b)). IP(a) and IP(b) must be the same and host name must include the robot's domain name ( This is similar to the double-optin process to validate email leads in email marketing.

This is the best solution I could take from what I read on the forums. Unfortunately there is a problem.

Doing this for every single hit or new IP address to your site is not a good idea. The server would be down on its knees. The proposed solution recommends you identify the bots by user agent. Unfortunately that information is very easy to fake for the hijackers.

$USER_AGENT ~= s/Googlebot/Mozilla/ if $USER_AGENT == 'Googlebot';

The solution needs to be strengthened with IP address detection too. It is a little bit complicated but here is the main idea.

  1. Collect all the hijacking proxy servers IPs by setting up honeypots or similar traps. I will talk about this in more detail in a later post.

  2. Verify each IP against that database. For efficiency it is better to use a DNS server to maintain the IP database.

This is probably better to do on a public server with volunteers similar to email anti-spam efforts.

Happy anti-hijacking, as we wait for the ideal solution, which is for Google to fix it.


Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months



Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.


Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Book Summary: Sales Engagement by Medina, Altschuler, and Kosoglow

Book Summary: Sales Engagement by Medina, Altschuler, and Kosoglow

My name is Jason Levin and I am a Marketing Consultant for RankSense. I help our users achieve their goals with our technology and educational resources.  I am a big fan of business books, and enjoy sharing my findings with others.  Last month, I covered The Sales Acceleration Formula by Mark Roberge. In this article,...


Tutorial: Implementing “View All” Canonical Tags to Fix Pagination Issues

Whether it’s used on an e-commerce site or a forum, pagination is the process of dividing category pages into several smaller sets in order to display information in a more organized manner. While pagination is a great and efficient technique, SEO issues can often arise within Content Management Systems (CMS), specifically with canonicalizing these paginated...


Tutorial: How to Use Unavailable_After Tags to Increase SEO Crawling Efficiency

Table of Contents Introduction to Unavailable_After Practical Uses Implementing “Unavailable_After” Using the RankSense SEO Tool Using the RankSense SEO tool to verify your changes Introduction to Unavailable_After Put simply, unavailable_after tags send a signal to search engines letting them know that the contents of a page should not be crawled after a certain date. Unlike...