First let me thank my beloved reader SEO blog.
Thanks to him I got a really nice bump in traffic and several new RSS subscribers.
It is really funny how people that don’t know you, start questioning your knowledge, calling you names, etc. I am glad that I don’t take things personal. For me it was a great opportunity to get my new blog some exposure.
I did not try intentionally, to be controversial. I did ran a back link check on John’s site and found those interesting results I reported. I am still more inclined to believe that my theory has more grounds than SEO Blog’s. Please keep reading to learn why.
His theory is that John fixed the problem, by making some substantial changes to his robots.txt file. I am really glad that he finally decided to dig for evidence. This is far more professional than calling people, you don’t know, names.
I thoughtfully checked both robots.txt files and here is what John removed in the new version:
# Disallow all monthly archive pages Disallow: /2005/12 Disallow: /2006/01 Disallow: /2006/02 Disallow: /2006/03 Disallow: /2006/04 Disallow: /2006/05 Disallow: /2006/06 Disallow: /2006/07 Disallow: /2006/08 Disallow: /2006/09 Disallow: /2006/10 Disallow: /2006/11 Disallow: /2006/12 Disallow: /2007/01 Disallow: /2007/02 Disallow: /2007/03 Disallow: /2007/04 Disallow: /2007/05 # The Googlebot is the main search bot for google User-agent: Googlebot # Disallow all files ending with these extensions Disallow: /*.php$ Disallow: /*.js$ Disallow: /*.inc$ Disallow: /*.css$ Disallow: /*.gz$ Disallow: /*.wmv$ Disallow: /*.tar$ Disallow: /*.tgz$ Disallow: /*.cgi$ Disallow: /*.xhtml$ # Disallow Google from parsing indididual post feeds and trackbacks.. Disallow: */feed/ Disallow: */trackback/ # Disallow all files with ? in url Disallow: /*?* Disallow: /*? # Disallow all archived monthlies Disallow: /2006/0* Disallow: /2007/0* Disallow: /2005/1* Disallow: /2006/1* Disallow: /2007/1*
In English, this means, he is now letting Google crawl and index his archived articles, dynamic pages,
and files ending with “.php”, “.js”,”.inc”, “.css”, etc. Note that in none of the robots.txt files, John is preventing the crawler from accessing his home page or the regular posts. WordPress uses PHP, but regular posts and the home page can be accessed without “.php”.
If this was the change that fixed the problem, it might have been because removing those internal pages from the spider view might have weaken his internal link structure. His claim is not without merit.
Now, here is one tiny little detail that my friend is missing. To prove his point, he used Google’s cache to show the different version of the robots.txt file. If Google still has that version on their cache, what makes him think that Google is already using the new one? Google should be caching the new version not the old one. That is why I am still not convinced that this is the reason for the fix.
John says he is not telling, because a reader said Google might change their algorithm and drop him again. How does the changes John did to his robot.txt file , have anything to do with algorithm changes? I am just curious.
In reality, we can theorize all we can, but the only ones who can tell for sure is the guys at the Googleplex. John probably tried many different things and one or several of them worked. He is probably not even sure which one did.
How did I learn SEO?
SEO Blog suggests I visit his forum to learn SEO. Here is the problem with that. I am a technical guy, I can not take gut feelings or opinions as truth. I do visit some forums and blogs every now and then, but my experience is that the noise to signal ratio is too high. I prefer to learn and get my insights from the source: search engine research papers, search engine representatives blogs or my own experiments.
I learned SEO back in 2002 when I read this paper. Back then, nobody was even talking about Google bombs, anchor text, etc. Read the paper, it is all there.