In computer security we have several ongoing battles: the virus/spyware writers vs. the antivirus vendors, the spammers vs. the anti-spam vendors, the hackers vs. the security experts. Add to that list the search engine marketers vs. the CGI hijackers.
Dan Thies, the undisputed keyword research master, used his influence in the search engine marketing industry to bring the problem we have blogged about in the past to a wider audience. Specifically, the issue is the CGI proxy hijacking. He mentioned a couple of solutions, but as I pointed out in my comment, both solutions have weaknesses. I recommended a stronger countermeasure, similar to what is in use in the anti-spam industry at the moment. But after reflecting on my proposed solutions and others’, it is clear in my head that this is a never-ending battle. We can create defenses to current techniques and attackers will adapt and make their attacks smarter.
Why? All the content and headers must pass through the proxy, and the proxy can alter it without problems. A determined hijacker will be able to circumvent any defenses. If we check the HTTP_USER_AGENT, the proxy can provide a fake one to avoid detection. If we alter the content of the page to pass a meta robot’s “noindex” tag, the proxy can remove it. The same can happen if we pass an X-Robots-Tag header. Every page passes through the proxy and the proxy can alter the content.
The solution I proposed requires more work on the hijacker to beat, but it is definitely possible to break too. It requires the content be altered and the proxy can identify that content and remove it. This will make collecting the IPs impossible. For example, to tell what has changed, the code can compare the content to the one cached by the search engine or, even better, to pass the content directly from the search engine cache.
Dan is confident that most attacks will not come from modified proxies, but from hijackers using other people's unmodified proxies. They would not install the proxies themselves to avoid being identified. The problem is that serious hackers rarely use their own systems; they use compromised ones. They first hack into servers where the administrator has not installed the latest security patches, or where there are web applications with exploitable holes.
In principle we need to understand the same concept that is used in security in general. We need to make it hard enough for the attacker so that the reward isn’t worth the effort involved. Sometimes this is easier said than done.
The bottom line is that we can go back and forth battling CGI hijackers, but it is ultimately Google that needs to fix this problem. They need to change the method they use to determine the original source of some content. I proposed a solution to them in another post. I'd appreciate your feedback.
September 3, 2007 at 10:32 am
No question about it. I remember your post about can a black-hat take down your site? Ironically just after that article, I was hit by a barrage of content theives. The duplicate content problem is one of only two ways in which I know you can adversly affect someones website. I have been hearing a lot about this 'wiki-hijacking' in particularly recently. There are serrious problems with the duplicate content system Google utilises. I think they should just get rid of it for checking against domains, but keep it for checking content that is very similar on the same site, such as geographic spam pages :P
October 2, 2007 at 7:11 am
Dear Hamlet, As usual your posts are profound and suggest a myriad of solutions that Google should consider incorporating within their "organic" guidelines. I have a question that hopefully you have the answer to: Is there a way to program or block one's site from displaying your PPC advertising programs? When I was analyzing my traffic, I noticed that someone has been spying on my PPC keywords-- and as a result I have had to pause my campaigns.
October 2, 2007 at 8:12 am
Hi Holly, Thanks for your kind words. The problem is that they don't need access to your account to spy on your PPC keywords, there are several services out there that can tell the organic and PPC keywords of any website. Not much you can do about that. Don't pause your campaigns because of this, though. Send me an email or give me a call if you want to discuss further. Best, Hamlet