Making the world (and your site) flat—via a Reverse Proxy

by Hamlet Batista | September 24, 2007 | 8 Comments

flat_world.jpgIn order to protect some of the inventions in our software, I’ve been working with a law firm that specializes in IP protection. I’ve learned a lot from them, but I’ve learned far more from reviewing the patent applications they sent me back as possible ‘prior art.’ Let me share one of the most interesting ones I’ve seen so far, Patent Application 20070143283. Here is the abstract:

A system and method for optimizing the rankings of web pages of a commercial website within search engine keyword search results. A proxy website is created based on the content on the commercial website. When a search engine spider reaches the commercial website, the commercial website directs the search engine spider to the proxy website. The proxy website includes a series of proxy web pages that correspond to web pages on the commercial website along with modifications that enhance the rankings of the pages by the search engines. However, hyperlinks containing complex, dynamic URLs are replaced with spider-friendly versions. When a human visitor selects a proxy web page listing on the search engine results page, that visitor is directed to the proxy web page. The proxy server delivers the same content to the human visitor as to the search engine spider, only with simplified URLs for the latter.

Basically they use a reverse proxy (I wrote about this before) to replace dynamic URLs with search engine–friendly ones automatically. In addition to this, they make ‘enhancements’ to the proxy version of the pages so that they get high search engine rankings. They claim this is not cloaking:

[0014] The content contained on the proxy web pages is the same when the proxy web page is accessed either by the search engine spider or by the human visitor. The presentation of the same web page content to both the search engine spider and the human visitor allows the proxy website to stay within the ‘no cloaking’ guidelines set by most commonly used search engines.

If they rewrote only the dynamic URLs I would agree that they present the same content to both users and search engines. I don’t think as users we care much about the URLs, unless we have to type them manually into the address bar. However, I do think it is cloaking because they say changes are made to the pages in order to optimize them for higher search engine rankings—and they only present these optimized pages to the search engine crawlers. From the patent application:

[0015] Since the proxy web pages are contained on a proxy website separate from the commercial website, additional content and HTML optimization can be added to the proxy web pages that are not included on the corresponding web pages on the commercial site, via a web-based interface. The addition of this content and HTML optimization on the proxy web pages can be utilized to enhance the ranking of the proxy web pages on the search engine results pages. The effect of the addition of these optimizations on ranking can be analyzed and the content can then be revised to further enhance the ranking of the proxy web page. By utilizing the proxy web pages rather than the web pages contained on the commercial website, the rankings and functionality of the proxy web pages can be enhanced without altering the commercial web pages.

That being said, I think this is a very useful and clever technique. Rewriting dynamic URLs to make them search engine friendly via a reverse proxy is extremely useful, particularly for large e-commerce sites where the CMS or shopping cart software is not flexible enough.

Here is another interesting use that came to my mind and is not mentioned in the PA. (Maybe I should file a patent for this.) 😉

If only the world were flat

Picture a tiered site architecture where you have a home page and tiered internal pages. Tier 1 includes pages that the search engine robots access in one click; tier 2 are pages that are accessible via two clicks, tier 3 via three clicks, and so on. Search engine spiders visit a limited number of pages per site and follow a limited number of clicks from the entrance page (usually the home page). The more clicks necessary to arrive at a page the less likely the page will be crawled or indexed. Ideally you would like to have a flat site architecture where all the pages are in tier 1. Unfortunately, while this is good for search engines, it is not very appealing for your site’s visitors. Imagine how crowded your home page would look with so many links!

An automatic solution

In the initial step, a simple crawler script visits the whole site and tags each page with its corresponding tier: tier 1, tier 2, tier 3, etc. The script would record such information in a database. When a search engine requests a tier 1 page via the reverse proxy, the proxy can inject the URLs of the pages in the next non-direct tier (tier 3 — tier 2 pages are directly accessible when the robot parses the tier 1 page) and so on. This will provide a flatter structure for the search engine robot, allowing for more pages to be indexed, saving bandwidth and CPU cycles for the SEs crawlers. Alternatively, the proxy can inject links to all internal pages beyond the next tier, i.e.: tier 3, tier 4, etc. when the search engine robot requests pages on tier 1. This would make the site completely flat.

This is definitely very useful, but as I clearly explained above, this is cloaking. In my last post about cloaking Jill Whalen and others expressed concern that Google’s view of this is still negative. It is my personal opinion that Google needs to draw a line between the legitimate uses of cloaking and cloaking to take advantage of search engines. In order to stay on the safe side it is not a bad idea to ask Google if they are OK with this.

Update

After reading an insightful comment from Sam Daams and re-reading the PA, I have to admit I was wrong about my initial assessment about them cloaking. They are presenting the same 'optimized' content to the search engine spider and to the user coming from the search results. I guess this is technically not cloaking. However, if a user goes directly to the web, page he or she will see the original, non-optimized version. When Google says don't cloak to users, are they talking about search engine users or any regular visitor? Let me know your thoughts on this

 

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months

8

REPLIES

Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.

OUR BLOG

Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Getting Started with NLP and Python for SEO [Webinar]

Custom Python scripts are much more customizable than Excel spreadsheets.  This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit.  One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...

READ POST
Making it easier to implement SEO changes on your website

Changes to the RankSense SEO rules interface

As we continue to improve the RankSense app for Cloudflare, we are always working to make the app more intuitive and easy to use. I'm pleased to share that we have made significant changes to our SEO rules interface in the settings tab of our app. It is now easier to publish multiple rules sheets and to see which changes have not yet been published to production.

READ POST

How to Find Content Gaps at Scale: Atrapalo vs Skyscanner

For the following Ranksense Webinar, we were joined by Antoine Eripret, who works at Liligo as an SEO lead. Liligo.com is a travel search engine which instantly searches all available flight, bus and train prices on an exhaustive number of travel sites such as online travel agencies, major and low-cost airlines and tour-operators. In this...

READ POST

Exciting News!
seoClarity acquires RankSense

X