Making the world (and your site) flat—via a Reverse Proxy

by Hamlet Batista | September 24, 2007 | 8 Comments

flat_world.jpgIn order to protect some of the inventions in our software, I’ve been working with a law firm that specializes in IP protection. I’ve learned a lot from them, but I’ve learned far more from reviewing the patent applications they sent me back as possible ‘prior art.’ Let me share one of the most interesting ones I’ve seen so far, Patent Application 20070143283. Here is the abstract:

A system and method for optimizing the rankings of web pages of a commercial website within search engine keyword search results. A proxy website is created based on the content on the commercial website. When a search engine spider reaches the commercial website, the commercial website directs the search engine spider to the proxy website. The proxy website includes a series of proxy web pages that correspond to web pages on the commercial website along with modifications that enhance the rankings of the pages by the search engines. However, hyperlinks containing complex, dynamic URLs are replaced with spider-friendly versions. When a human visitor selects a proxy web page listing on the search engine results page, that visitor is directed to the proxy web page. The proxy server delivers the same content to the human visitor as to the search engine spider, only with simplified URLs for the latter.

Basically they use a reverse proxy (I wrote about this before) to replace dynamic URLs with search engine–friendly ones automatically. In addition to this, they make ‘enhancements’ to the proxy version of the pages so that they get high search engine rankings. They claim this is not cloaking:

[0014] The content contained on the proxy web pages is the same when the proxy web page is accessed either by the search engine spider or by the human visitor. The presentation of the same web page content to both the search engine spider and the human visitor allows the proxy website to stay within the ‘no cloaking’ guidelines set by most commonly used search engines.

If they rewrote only the dynamic URLs I would agree that they present the same content to both users and search engines. I don’t think as users we care much about the URLs, unless we have to type them manually into the address bar. However, I do think it is cloaking because they say changes are made to the pages in order to optimize them for higher search engine rankings—and they only present these optimized pages to the search engine crawlers. From the patent application:

[0015] Since the proxy web pages are contained on a proxy website separate from the commercial website, additional content and HTML optimization can be added to the proxy web pages that are not included on the corresponding web pages on the commercial site, via a web-based interface. The addition of this content and HTML optimization on the proxy web pages can be utilized to enhance the ranking of the proxy web pages on the search engine results pages. The effect of the addition of these optimizations on ranking can be analyzed and the content can then be revised to further enhance the ranking of the proxy web page. By utilizing the proxy web pages rather than the web pages contained on the commercial website, the rankings and functionality of the proxy web pages can be enhanced without altering the commercial web pages.

That being said, I think this is a very useful and clever technique. Rewriting dynamic URLs to make them search engine friendly via a reverse proxy is extremely useful, particularly for large e-commerce sites where the CMS or shopping cart software is not flexible enough.

Here is another interesting use that came to my mind and is not mentioned in the PA. (Maybe I should file a patent for this.) 😉

If only the world were flat

Picture a tiered site architecture where you have a home page and tiered internal pages. Tier 1 includes pages that the search engine robots access in one click; tier 2 are pages that are accessible via two clicks, tier 3 via three clicks, and so on. Search engine spiders visit a limited number of pages per site and follow a limited number of clicks from the entrance page (usually the home page). The more clicks necessary to arrive at a page the less likely the page will be crawled or indexed. Ideally you would like to have a flat site architecture where all the pages are in tier 1. Unfortunately, while this is good for search engines, it is not very appealing for your site’s visitors. Imagine how crowded your home page would look with so many links!

An automatic solution

In the initial step, a simple crawler script visits the whole site and tags each page with its corresponding tier: tier 1, tier 2, tier 3, etc. The script would record such information in a database. When a search engine requests a tier 1 page via the reverse proxy, the proxy can inject the URLs of the pages in the next non-direct tier (tier 3 — tier 2 pages are directly accessible when the robot parses the tier 1 page) and so on. This will provide a flatter structure for the search engine robot, allowing for more pages to be indexed, saving bandwidth and CPU cycles for the SEs crawlers. Alternatively, the proxy can inject links to all internal pages beyond the next tier, i.e.: tier 3, tier 4, etc. when the search engine robot requests pages on tier 1. This would make the site completely flat.

This is definitely very useful, but as I clearly explained above, this is cloaking. In my last post about cloaking Jill Whalen and others expressed concern that Google’s view of this is still negative. It is my personal opinion that Google needs to draw a line between the legitimate uses of cloaking and cloaking to take advantage of search engines. In order to stay on the safe side it is not a bad idea to ask Google if they are OK with this.

Update

After reading an insightful comment from Sam Daams and re-reading the PA, I have to admit I was wrong about my initial assessment about them cloaking. They are presenting the same 'optimized' content to the search engine spider and to the user coming from the search results. I guess this is technically not cloaking. However, if a user goes directly to the web, page he or she will see the original, non-optimized version. When Google says don't cloak to users, are they talking about search engine users or any regular visitor? Let me know your thoughts on this

 

Hamlet Batista

Chief Executive Officer

“We kept putting more energy into getting SEO audits done to understand why we started to lose positions to smaller, newer companies, but we kept losing ground. We kept trying to SEO audit recommendations, but our programmers couldn’t go fast enough.

8

REPLIES

Leave a Reply

Want to join the discussion? Feel free to contribute!

Install Our Free SEO Monitoring App Today!

It is not humanly possible to properly optimize every page of a big site. This leaves serious money on the table.

OUR BLOG

Latest News and Tactics

What do you do when you’re losing organic traffic and you don’t know why?

SEO Tactic #7: AB Testing Your Organic Search Snippets

SEO Tactic #7: AB testing your Organic Search Snippets. AB Testing Your Organic Search Snippets Meta descriptions, the little snippets that show up when you Google something…what business really needs ‘em, right? Before you answer, let’s walk a moment in a hypothetical online shopper’s shoes… Summer’s around the corner, and let’s say that 2018 is...

READ POST

SEO Tactic #6: Optimize Media for Search Engines

In this tactic, we’ll be optimizing your images, videos, and PDFs for search engines. This will help your multimedia content perform better in image and video searches, which have less competition than general web searches. I’ll go over the best practices for multimedia SEO.

READ POST

Request your SEO Monitoring Invitation

* indicates required





Please select all the ways you would like to hear from RankSense:


You can unsubscribe at any time by clicking the link in the footer of our emails. For information about our privacy practices, please visit our website.

We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices here.