A Never-ending Battle — Protecting your content from CGI hijackers

by Hamlet Batista | September 03, 2007 | 3 Comments

frogsoldier1.jpgIn computer security we have several ongoing battles: the virus/spyware writers vs. the antivirus vendors, the spammers vs. the anti-spam vendors, the hackers vs. the security experts. Add to that list the search engine marketers vs. the CGI hijackers.

Dan Thies, the undisputed keyword research master, used his influence in the search engine marketing industry to bring the problem we have blogged about in the past to a wider audience. Specifically, the issue is the CGI proxy hijacking. He mentioned a couple of solutions, but as I pointed out in my comment, both solutions have weaknesses. I recommended a stronger countermeasure, similar to what is in use in the anti-spam industry at the moment. But after reflecting on my proposed solutions and others’, it is clear in my head that this is a never-ending battle. We can create defenses to current techniques and attackers will adapt and make their attacks smarter.

Why? All the content and headers must pass through the proxy, and the proxy can alter it without problems. A determined hijacker will be able to circumvent any defenses. If we check the HTTP_USER_AGENT, the proxy can provide a fake one to avoid detection. If we alter the content of the page to pass a meta robot’s “noindex” tag, the proxy can remove it. The same can happen if we pass an X-Robots-Tag header. Every page passes through the proxy and the proxy can alter the content.

The solution I proposed requires more work on the hijacker to beat, but it is definitely possible to break too. It requires the content be altered and the proxy can identify that content and remove it. This will make collecting the IPs impossible. For example, to tell what has changed, the code can compare the content to the one cached by the search engine or, even better, to pass the content directly from the search engine cache.

Dan is confident that most attacks will not come from modified proxies, but from hijackers using other people's unmodified proxies. They would not install the proxies themselves to avoid being identified. The problem is that serious hackers rarely use their own systems; they use compromised ones. They first hack into servers where the administrator has not installed the latest security patches, or where there are web applications with exploitable holes.

In principle we need to understand the same concept that is used in security in general. We need to make it hard enough for the attacker so that the reward isn’t worth the effort involved. Sometimes this is easier said than done.

The bottom line is that we can go back and forth battling CGI hijackers, but it is ultimately Google that needs to fix this problem. They need to change the method they use to determine the original source of some content. I proposed a solution to them in another post. I'd appreciate your feedback.

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months



Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.


Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Getting Started with NLP and Python for SEO [Webinar]

Custom Python scripts are much more customizable than Excel spreadsheets.  This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit.  One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...

Making it easier to implement SEO changes on your website

Changes to the RankSense SEO rules interface

As we continue to improve the RankSense app for Cloudflare, we are always working to make the app more intuitive and easy to use. I'm pleased to share that we have made significant changes to our SEO rules interface in the settings tab of our app. It is now easier to publish multiple rules sheets and to see which changes have not yet been published to production.


How to Find Content Gaps at Scale: Atrapalo vs Skyscanner

For the following Ranksense Webinar, we were joined by Antoine Eripret, who works at Liligo as an SEO lead. Liligo.com is a travel search engine which instantly searches all available flight, bus and train prices on an exhaustive number of travel sites such as online travel agencies, major and low-cost airlines and tour-operators. In this...


Exciting News!
seoClarity acquires RankSense