Request your SEO Monitoring Invitation

By submitting your email address, you agree to receive follow up emails about RankSense’s products and services. You can opt out at any time by clicking the link in the footer of our emails. We share your information with our customer relationship management partners. For information about our privacy practices, please see our privacy policy

The Never Ending SERPs Hijacking Problem: Is there a definite solution?

by Hamlet Batista | July 03, 2007 | 6 Comments

hijacker.jpgIn 2005 it was the infamous 302, temporary redirect page hijacking. That was supposedly fixed, according to Matt Cutts. Now there is a new interesting twist. Hijackers have found another exploitable hole in Google: the use of cgi proxies to hijack search engine rankings.

The problem is basically the same. Two URLs pointing to the same content. Google's duplicate content filters kick in and drop one of the URLs. They normally drop the page with the lower PageRank. That is Google's core problem. They need to find a better way to identify the original author of the page.

When someone blatantly copies your content and hosts it on their site, you can take the offending page down by sending a DMCA complaint to Google, et al. The problem with 302 redirects and cgi proxies is that there is no content being copied. They are simply tricking the search engine into believing there are multiple URLs hosting the same content.

What is a cgi proxy anyway? Glad you asked. I love explaining technical things 🙂

A cgi proxy is a type of proxy server that is accessible via URLs. Anonymizer.com is a well known example. Normal proxy servers are configured on your browser advanced options. I explained proxy servers briefly on another post.

I am sure most hijackers are using a very well known and public cgi proxy called CGIProxy by James Marshall. I have used it in the past (not for hijacking) and I can say that it is very easy to setup and use. The SSL and connection code is a little bit complex if you are not familiar with socket programming and encryption, but the fact that it is written in Perl makes a lot of things easy.

The cgi proxy in the hijacking context works like this:

  1. Googlebot finds and fetches a proxied URL (http://cgiproxyserver/http/yoursite.com)

  2. The CGI proxy script pulls the page from your site (http://yoursite.com )

  3. The CGI script replaces all your internal links (http://yoursite.com/page1.htm to http://cgiproxyserver/http/yoursite.com/page1.htm ). This is to make your the search engine bot continues to request the pages from the cgy proxy.

  4. The page Google is fetching via the cgi proxy is identical to the page on your site.

  5. Duplicate content filters kick in and the rest is history

Should you care? If you don't mind losing your search engine rankings to a hijacker, there is nothing to worry about. If you do, please keep reading.

How can I prevent this from happening to me?

One solution that I've seen in the forums and that Google recommends is to verify the IP saying that it is Googlebot, actually is. Let me explain briefly the solution they propose as well as the drawbacks of such solution.

The solution is what is known as reverse-forward DNS check. Email severs have used this for a while to detect valid, non-spamming SMTP hosts.

The detection code does a reverse DNS lookup (IP(a) -> host(a)) , followed by a forward DNS lookup (host(a)->IP(b)). IP(a) and IP(b) must be the same and host name must include the robot's domain name (google.com). This is similar to the double-optin process to validate email leads in email marketing.

This is the best solution I could take from what I read on the forums. Unfortunately there is a problem.

Doing this for every single hit or new IP address to your site is not a good idea. The server would be down on its knees. The proposed solution recommends you identify the bots by user agent. Unfortunately that information is very easy to fake for the hijackers.

$USER_AGENT ~= s/Googlebot/Mozilla/ if $USER_AGENT == 'Googlebot';

The solution needs to be strengthened with IP address detection too. It is a little bit complicated but here is the main idea.

  1. Collect all the hijacking proxy servers IPs by setting up honeypots or similar traps. I will talk about this in more detail in a later post.

  2. Verify each IP against that database. For efficiency it is better to use a DNS server to maintain the IP database.

This is probably better to do on a public server with volunteers similar to email anti-spam efforts.

Happy anti-hijacking, as we wait for the ideal solution, which is for Google to fix it.

 

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months

6

REPLIES

Leave a Reply

Want to join the discussion? Feel free to contribute!

Install our free SEO monitoring app today!

RankSense can detect traffic-killing SEO issues in real time, and send instant notifications to your e-mail, phone or Slack channel. You have full control of the type of alerts you receive by severity and the frequency of alerts.

OUR BLOG

Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Forecasting SEO Traffic When There is no Historical Data

Every company website is different, so SEO best practices must be evaluated on a case-by-case basis. Usually, clients come to us when there are clear SEO issues with an existing site that need to be addressed with our data-driven expertise. However, we recommend clients to seek out expert SEO advice before they even launch their...

READ POST

Quality Over Quantity: The Value of Agile SEO to Increasing ROI

Editor’s Note: Línea Directa S.A.S is a leading fashion manufacturer in Colombia. Their main brands include: Pacifika, Carmel and Loguin. In July 2017, the company hired Andrea Ramírez as Digital Producer, where a key part of her responsibilities is the performance of the SEO channel. In collaboration with RankSense, Andrea has managed to implement successful...

READ POST