Additional Ways to Use Chrome Developer Tools for SEO

I recently read Aleyda’s excellent post about using Chrome Developer Tools for SEO, and as I’m also a big fan of DevTools, I am going to share my own use cases.

Misplaced SEO tags

If you are reviewing correct SEO tag implementations by only using View Source, or running an SEO spider, you might be overlooking an important and interesting issue.

I call this issue misplaced DOM tags, and here is one example http://www.homedepot.com/p/Kidde-Intelligent-Battery-Operated-Combination-Smoke-and-CO-Alarm-Voice-Warning-3-Pack-per-Case-KN-COSM-XTR-BA/202480909

If you check this page using View Source in Chrome or any other browser, you would see the canonical tag correctly placed inside the <HEAD> HTML element.

Similarly, if you check this page using your favorite SEO spider, you’d arrive at the same conclusion. The canonical tag is inside the <HEAD> HTML element, where it should be.

Now, let’s check again using the Chrome Developer Tools Elements tab.

Wait! What?? Surprisingly, the canonical tag appears inside the <BODY> HTML element. This is incorrect, and if this is what Googlebot sees, the canonical tag on this page is effectively useless. Then we go blaming the poor tag saying that it doesn’t work.

Is this a bug in Google Chrome Developer Tools? Let’s review the same page with Firefox and Safari Developer Tools.

You can see the same issue is visible in Firefox and Safari too, so we can safely conclude that it is not a problem with Developer Tools. It is very unlikely all of them would have the same bug. So why is this happening? Does The Home Depot need to fix this?

Let’s first look at how to fix this to understand why it happens.

We are going to save a local copy of this page using the popular command line tool curl. I will explain why it is better to use this tool than to save directly from Chrome.

Once we download the web page, open it in any of the browsers to confirm the problem is still visible in the DevTools. In my case, I didn’t see the issue in Chrome, but saw it in Safari. I’ll revisit why the discrepancy when we discuss why this happens.

Next, in order to correct the issue we will move the SEO meta tags so they are the first tags right after the opening <HEAD> HTML tag.

Now, let’s reload the page in Safari to see if the canonical still shows up inside the <BODY> HTML tag.

Bingo! We have the canonical correctly placed, and visible inside the HTML <HEAD>.

In order to understand why this addresses the issue, we need to understand a key difference between checking pages with View Source, and inside the Elements tab in the web browsers’ DevTools.

The Elements feature has a handy feature that allows you to expand and collapse parent and child elements in the DOM tree of the page. In order for this feature to work, the web browser needs to parse the page and build the tree that will represent the DOM. A common issue with HTML is that it often contains markup errors or invalid tags placed in the wrong places.

For example, if we check the page using https://validator.w3.org/nu/?doc=http%3A%2F%2Fwww.homedepot.com%2Fp%2FKidde-Intelligent-Battery-Operated-Combination-Smoke-and-CO-Alarm-Voice-Warning-3-Pack-per-Case-KN-COSM-XTR-BA%2F202480909

You can see this page “only” has 61 HTML coding errors and warnings.

Fortunately, web browsers expect errors and automatically compensate for them using a process called HTML linting or tidying. A popular tool that does this is https://infohound.net/tidy/ by Dave Raggett at W3C.

The tidying process works by adding missing closing tags, reordering tags, etc. This works flawlessly most of the time, but it can often fail and tags end up in the wrong places. This is precisely what is happening here.

Understanding this allowed me to come up with the lazy trick to move the SEO tags to the beginning of the head, because this essentially bypasses any problems introduced by other tags. 🙂

A more “professional” solution is to at least fix all the errors reported between the HTML <HEAD> tags.

Can we tell if this is affecting Googlebot or not?

It is fair to assume that as Google is now able to execute JavaScript, that Google’s indexing systems need to build DOM trees just like the main browsers do. So, I’d not ignore or overlook this issue.

A simple litmus test to see if the misplaced canonicals are being ignored is to check whether the target page is reporting duplicate titles and/or duplicate meta descriptions in Google Search Console, or not. If it is reporting duplicates, correct the issue as I explained here, use Fetch as Googlebot, and re-submit the page to the index. Then wait and see if the duplicates clear.

 

Following redirect chains

Another useful use case is reviewing automatic redirects from desktop to mobile optimized websites, or from http to https or viceversa directly in your browser.

In order to complete the next steps, you need to customize DevTools a little bit.

  1. Tick the checkbox that says “Preserve Log” in the Network tab so the log entries don’t get cleared up by the redirects
  2. Right-click on the headers of the Network tab, and select these additional headers: Scheme, Vary, and optionally Protocol to see if the resources are using the newer HTTP/2 protocol

In this example, we opened https://www.macys.com, and you can see we are 301 redirected to http://www1.macys.com, from secure to non-secure, and we can also see that the page provides a Vary header with the value User-Agent. Google recommends the use of this header with this value to tell Googlebot to try refetching the page but with a mobile user agent. We are going to do just that, but within Chrome using the mobile emulation feature.

Before we do that, it is a good idea to clear the site cookies because some sites set “desktop sticky” cookies that prevent the mobile emulation from working after you have opened the site as a desktop user.

Let’s clear the network activity log and get ready to refresh as a mobile user. Remember that we will open the desktop URL to see the redirection.

In this case you can see that Macys correctly 302 redirects to the mobile site at http://m.macys.com, which is consistent with Google’s recommendation.

 

Sneaky affiliate backlinks

As Aleyda mentioned in her post, we can use DevTools to find hidden text, and some really sneaky spam. Let me share with you a super clever link building trick I discovered a while ago while auditing the links of a client’s competitor. I used our free Chrome DevTools extension as it eliminates most of the manual checks. You can get it from here.  

To most of you, and to most Googlers, this looks like a regular backlink and it doesn’t raise any red flags. The anchor text is “here”, and it is directly in the editorial content like most editorial links. However, coming from an affiliate marketing background, I see the extra tracking parameters can be effectively used to track any sales that come from that link.

I’m not saying they are doing this, but it is relatively easy to convince many unsophisticated bloggers to write about your product, and place affiliate links like this back to your site to get compensated for sales they generated. Sales you would track directly in Google Analytics, and maybe even provide reporting by pulling stats via the GA API.

Now, the clever part is this one: they are likely setting up these tracking parameters in Google Search Console so Googlebot ignores them completely, and it is normal to expect utm_ parameters to be ignored. This trick effectively turns these affiliate links into SEO endorsement links. This is one of the stealthiest affiliate + SEO backlink tricks I’ve seen in many years reviewing backlink profiles!

 

Troubleshooting page speed issues

Let’s switch gears a bit, and discuss pagespeed from an implementation review perspective.

Let’s review another example to learn how well the website server software or CDN handles caching page resources. Caching page resources in the client browser or CDN layer offers an obvious way to improve page load time. However, web server software needs to be properly configured to handle this correctly.

If a page has been visited before, and the page resources are cached, Chrome sends conditional web requests to avoid refetching them each time.

You can see page resources already cached by looking for ones with the status code 304, which means that they haven’t changed on the server. The web server only sends headers in this case, saving valuable bandwidth and page load time.

The conditional requests are controlled by the IF-Modified-Since request header. When you tick the option in DevTools to disable the cache, Chrome doesn’t send this extra header, and you won’t see any 304 status code in the responses.

This is particularly handy to help troubleshoot page resource changes that users report are not visible.

Finally, it is generally hard to reproduce individual users’ performance problems because there are way too many factors that impact page load time outside of just the coding of the web page.

One way to easily reproduce performance problems is to have users preserve the network log and export the entries in the log as an HAR file. You can learn more about this in this video from Google Developers https://www.youtube.com/watch?v=FmsLJHikRf8.

Google provides a web tool you can use to review HAR files you receive from users here https://toolbox.googleapps.com/apps/har_analyzer/. Make sure to warn users about saving potentially sensitive information in this file.

Bonus: Find mixed http and https content quickly

Aleyda mentioned using DevTools to check for mixed http and https content raising warnings in your browser. Here is a shortcut to identify the problematic resources quickly.

You can type “mixed-content:displayed” in the filter to get the resources without https.

If you are not actively using DevTools in your SEO audits, these extra tips encourage you to get started. And, if you are, please feel free to share any cool tips you might have discovered yourself.

How to Get Googlebot to “Teach You” Advanced SEO

I recently worked on an enterprise-level client’s non-SEO related project where the goal was to confirm or deny that their new product:

1)  Was not doing anything that could be considered black hat.

2)  Was providing any SEO benefit for their clients.

The problems you face with projects like this is that Google doesn’t provide enough information, and you cannot post corner-case questions like this in public Webmaster forums. To do so would violate your NDA, and potentially reveal your client’s intellectual property. So, what option do you have left? Well, you set up a honeypot!

A honeypot is a term that comes from the information security industry. Honeypots are a set of files that, to an automated program, appear like regular files, but they allow for the monitoring and “capturing” of specific viruses, e-mail harvesters, etc. In our case, we set up a honeypot with the purpose of detecting and tracking search engine bot behavior in specific circumstances. We also wanted to track the outcome (positive, neutral or negative) in the search engine results pages (SERPs).

Let me walk you trough a few ways you can learn advanced SEO by using a honeypot. Read more

How to Think Like an SEO Expert

If you want to become an expert you need to start thinking like one. People perceive you as an authority in your field not because you claim you are, but by listening to what you say or reading what you write. From my personal experience, the key seems to be the originality, usefulness and depth of what you have to share. Recently I was very honored to contribute to a link-building project. I wanted to share with you my idea, but more than that, in this blog I like to take extra time to explain the original thought process that helped me come up with the idea in the first place.

The Challenge

Toolbar PageRank was a very important factor in measuring the quality of a link for a long while. But Google has played so much with it that it can hardly be considered reliable these days. I like to see problems like these as challenges and opportunities, so I decided to look hard for alternatives. I know there are several other methods (like using the Yahoo backlink count, number of indexed pages, etc.) but I did not feel these directly reflected how the link was important to Google, or to any other specific search engine. Each search engine has its own evaluation criteria when it comes to links, so using metrics from one to measure another is not a reliable gauge in my opinion.

I knew the answer was out there, and I knew just where to look. Read more

PageRank: Caught in the paid-link crossfire

Last week the blogosphere was abuzz when Google decided to ‘update’ the PageRank numbers they display on the toolbar. It seems Google has made real on its threat to demote sites engaged in buying and selling links for search rankings. The problem is that they caught some innocent ones in the crossfire. A couple of days later, they corrected their mistake, and those sites are now back to where they were supposed to be.

The incident reveals that there is a lot of misunderstanding about PageRank, both inside and outside the SEO community. For example, Forbes reporter Andy Greenberg writes:

On Thursday, Web site administrators for major sites including the Washingtonpost.com, Techcrunch, and Engadget (as well as Forbes.com) found that their “pagerank”–a number that typically reflects the ranking of a site in Google

He also quotes Barry Schwartz saying:

But Schwartz says he knows better. “Typically what Google shows in the toolbar is not what they use behind the scenes,” he says. “For about two and a half years now this number has had very little to do with search results.”

There are two mistakes in these assertions:

  • The toolbar PageRank does not reflect the ranking of a site in Google. It reflects Google’s perceived ‘importance’ of the site.

  • The toolbar PageRank is an approximation of the real PageRank Google uses behind the scenes. Google doesn’t update the toolbar PageRank as often as they update the real thing, but saying that it has little to do with search results is a little farfetched.

Several sites lost PageRank, but they did not experience a drop in search referrals. Link buyers and sellers use toolbar PageRank as a measure of the value of a site’s links. By reducing this perceived value, Google is clearly sending a message about paid links. The drop is clearly intended to discourage such deals.

Some ask why Google doesn’t simply remove the toolbar PageRank altogether so that buyers and sellers won’t have a currency to trade with. At first glance it seems like a good idea, but here is the catch—the toolbar PageRank is just a means of enticing users to activate the surveillance component that Google uses to study online behavior. Google probably has several reasons for doing so, but at minimum it helps measure the quality of search results and improve its algorithms. If Google were to remove the toolbar PageRank users would have no incentive to let Google ‘spy’ on their online activities. Read more

Like Flies to Project Honeypot: Revisiting the CGI proxy hijack problem

CGI proxy hijacking appears to be getting worse. I am pretty sure that Google is well aware of it by now, but it seems they have other things higher on their priority list. If you are not familiar with the problem, take a look at these for some background information:

  1. Dan Thies take and proposed solutions

  2. My take and proposed solutions

Basically negative SEOs are causing good pages to drop from the search engine results by pointing CGI proxy servers’ URLs to a victim’s domain, and then linking to those URLs so that search engine bots find them and the duplicate content filters drop one of the pages—inevitably the one with the lowest PageRank, the victim’s page.

As I mentioned in a previous post, it is very likely that this would be an ongoing battle, but that doesn’t mean we have to lay down and do nothing. Existing solutions require the injection of a meta robots noindex tag on all web pages if the visitor is not a search engine. In this way search engines won’t index the proxy-hijacked page. Unfortunately, the proxies are already altering the content before passing it to the search engine. I am going to present a solution I think can drastically reduce the effectiveness of such attacks. Read more

Google's Architectural Overview (Ilustrated) — Google's inner workings part 2

For this installment of my Google's inner workings series, I decided to revisit my previous explanation. However, this time I am including some nice illustrations so that both technical and non-technical readers can benefit from the information. At the end of the post, I will provide some practical tips to help you improve your site rankings based on the information given here.

To start the high level overview, let's see how the crawling (downloading of pages) was described originally.

Google uses a distributed crawling system. That is, several computers/servers with clever software that download pages from the entire web Read more

What is the practical benefit of learning Google's internals?

I forgot to start my Google inner workings series with WIIFM. My plan is to write one post each week.

Not matter how well I try to explain it, it is a complex subject. I should have started the first post explaining why you would want to learn that. There are a lot of easier things to read.With some people questioning the usefulness of SEO, this is a good time to make my views clear. Please note that I believe in a solid marketing mix that includes SEO, PPC, SMO, affiliate marketing, viral marketing, etc. Do not put all your eggs in one basket.

If you have been blogging for a while, you have probably noticed that you are getting hits from the search engines for words that you did not try to optimize. For example, the next day I started this blog, I received a comment from a reader that found my blog through a blog search! How was this possible?

Heather Paquinas May 26th, 2007 at 1:24 am

I found your blog in google blogsearch. Needless to say I subscribed right away after reading this. I always suspected what you said, especially after Mike Levin from hittail blogged about using hittail for ppc, but you really hit the nail on the head with this post.

This is possible because that is the job of the search engines! If every page you search had to be optimized, there wouldn't be billions of pages in Google index. It would take a lot of people to do the SEO work :-).

Why we need SEO then? Read more

Google's architectural overview — an introduction to Google's inner workings

Google keeps tweaking its search engine, and now it is more important than ever to better understand its inner workings.

Google lured Mr. Manber from Amazon last year. When he arrived and began to look inside the company’s black boxes, he says, that he was surprised that Google’s methods were so far ahead of those of academic researchers and corporate rivals.

While Google closely guards its secret sauce, for many obvious reasons, it is possible to build a pretty solid picture of Google's engine. In order to do this we are going to start by carefully dissecting Google's original engine: How Google was conceived back in 1998. Although a newborn baby, it had all the basic elements it needed to survive in the web world.
Read more

Determining searcher intent automatically

Here is an example of how useful it is to learn SEO from research papers.

If you’ve read some of my previous posts, you will know that I am a big fan of finding out what exactly search visitors want. I posted about classifying both visitors and landing pages, so that search visitors looking for information find information articles, searchers looking to take action land on transaction pages, etc.

I really like the research tools MSN Labs has. One of my favorites is this http://adlab.msn.com/OCI/OCI.aspx

You can use it to detect commercial intent. Try it. It is really nice.

I’ve been wanting to do something like that, but I didn’t have enough clues as to how to do it. Until now.

Search engines patent expert, Bill Slawsky, uncovered a gem. A research paper that details how a team of researchers achieved exactly this.

I still need to dig deep into the document and the reference material, but it is definitely an excellent find.

I will try to make a new tool for this. I will also try to make this and other scripts I write, more accessible to non-technical readers. I guess most readers don’t care much about the programming details. They just want to be able to use my tools easily 🙂

Log based link analysis for improved PageRank

While top website analytics packages offer pretty much anything you might needto find actionable data to improve your site, there are situations where we need to dig deeper to identify vital information.

One of such situations came to light in a post by randfish of Seomoz.org.He writes about the problem with most enterprise-size websites, they have many pages with no or very few incoming links and fewer pages that get a lot of incoming links.He later discusses some approaches to alleviate the problem, suggesting primary linking to link-poor pages from link-rich ones manually, or restructuring the website.I commented that this is a practical situation where one would want to use automation.

Log files are a goldmine of information about your website: links, clicks, search terms, errors, etcIn this case, they can be of great use to identify the pages that are getting a lot of links and the ones that are getting very few.We can later use this information to link from the rich to the poor by manual or automated means.

Here is a brief explanation on how this can be done.

Here is an actual log entry to my site tripscan.com in the extended log format: 64.246.161.30 – – [29/May/2007:13:12:26 -0400] “GET /favicon.ico HTTP/1.1″ 206 1406 “http://www.whois.sc/tripscan.com” “SurveyBot/2.3 (Whois Source)” “-”

First we need to parse the entries with a regex to extract the internal pages — between GET and HTTP — and the page that is linking after the server status code and the page size.In this case, after 206 and 1406.

We then create two maps: one for the internal pages — page and page id, and another for the external incoming links page and page id as well.After that we can create a matrix where we identify the linking relationships between the pages. For example: matrix[23][15] = 1, means there is a link from external page id 15 to internal page id 23.This matrix is commonly known in information retrieval as the adjacency matrix or hyper link matrix.We want an implementation that can be preferably operated from disk in order to be able to scale to millions of link relationships.

Later we can walk the matrix and create reports identifying the link-rich pages, the pages with many link relationships, and the link-poor pages with few link relationships. We can define the threshold at some point (i.e. pages with more or less than 10 incoming links.)