Custom Python scripts are much more customizable than Excel spreadsheets. This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit. One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...READ POST
In my previous post, I explored how to assess the competitive level of your keywords and I shared my strategy for optimizing non-competitive keywords. As promised, here is my strategy for optimizing highly competitive ones.
As this is a rather dense topic I will split it in two. This post will explain how to use link analysis to understand your competitor’s rank, and the following post will explain how to leverage that information in your own link-building efforts.
Not all links are created equal
At the moment, we need lots of links to our sites. My strategy is to study the link structure of my chosen web authority carefully, as well as their incoming link text in order to build a similar relevance profile for my site. If I can get similar links and anchor texts, chances are that I will be ranking right next to my competitor.
Unfortunately just getting links to your site is not enough; you need to look for the right links. No link is measured exactly the same. As I explained before, the more pages that match for a targeted query, the more the search engine needs to know about those pages to rank them properly. It is very important to understand this concept. It is the single most important reason why on-page optimization is not enough to compete for very popular keywords.
Just like on-page metrics, there are several metrics search engines use to evaluate links. Before you set out to perform link analysis and build links, there are some basic principles you need to learn.
Anchor text vs domain authority
Although it is well known that having a lot of links with your targeted keywords in the anchor text will boost your site rankings for those keywords (of course the text surrounding the anchor is also important), that is not the only way to rank for competitive keywords. Sometimes it is not even the most stable way as you are more susceptible to negative SEOs attacks.
If your site gets enough links you can also rank for competitive terms with on-page factors only.
SebastianX has an excellent post in which he describes an experiment he carried out to rank the blog of Google's spam czar, Matt Cutts, to the term “buy cheap Viagra.”
In order to take advantage of domain authority ranking, black hat SEOs have traditionally rented pages to be hosted at high-authority sites. Simply having the pages on the same domain name gives them enough of a rankings boost to outdo anyone for most competitive terms. One public example of this was when WordPress.org was caught in the act a few years ago.
Authority vs trust
Sometimes I read about authority and trust being referred to as synonymous concepts, but they are not. To better understand this, it’s important to remember that search engines see links as votes and that each vote carries weight. Some votes are worth a lot more than others and some votes do not carry any weight at all. If search engines don't trust your site (because they flagged it as SPAM), your votes will not carry weight or will not positively affect the rankings of the sites you link to. If search engines trust your site, your links help the sites it links to.
The authority of a site depends on the total weight from all its incoming links. Authoritative sites pass some authority onto the sites they link to. Domain Age is also an important concept that is closely related to trust. Search engines have learned to trust older domains more that newer ones.
Most user-generated content sites are authorities because they receive a good amount of incoming links, but their trust will depend primarily on how effective they are in policing their content and avoiding search engine SPAM. Squidoo is a good example as they have a lot of incoming links, but they were recently banned by Google for spamming.
There are several link analysis algorithms that measure the authority or importance of a website; PageRank and HITS are the most well known. To measure the trust of a site, TrustRank is the one we look to at the moment.
Absolute vs relative/topical authority
Most top search engines are only able to measure authority in absolute terms. They do not think topically. One clear exception, however, is Ask.com. In fact, this is a fundamental difference between PageRank, the link analysis algorithm used by Google, and HITS, the algorithm that is believed to be used by Teoma and now Ask.com.
For SEO purposes, the effect of this is that most search engines do not measure authority topically, so it does not matter if a link is on a totally unrelated site. Of course, the ability to measure topical authority is seen as a strength and will probably be incorporated into other search engines eventually. The feature unfortunately comes with a very high computational cost. I think that is the reason why the original PageRank did not provide it. Hilltop is a younger link analysis algorithm that provides topical features but require far less computation.
Authorities & hubs
This is an important concept introduced by Kleinberg in his paper about the HITS algorithm. The concept is simple. Hubs are like topical directories or resources pages that have a good number of relevant links to authorities. In practice, a page can be both an authority and a hub to varying degrees.
It has been documented, at least in Microsoft research papers, that a link analysis algorithm can also assign value to the placement of links in the linking page as a measure of quality. I could also intuitively promote the on-page metric keyword prominence to link prominence as an off-page metric. The higher the link is in the DOM tree, the more important it must be. Whether search engines use this is anybody’s guess. I'd still do it for click-through purposes, as links near the top usually get more clicks than the ones at the bottom.
Outbound links count
The original PageRank algorithm, and probably the current one, did
gn as much authority to pages being linked when the linking page had more links than when it had fewer. The reasoning is that a user will be more distracted and less likely to click on a specific link if the page has too many links as opposed to a few. The bottom line is that is that it’s better to get links from pages with few links, rather than pages with whole directories.
Natural link profiles
One of the easiest ways for search engines to detect manipulation is by looking at unnatural link profiles. Artificial link profile patterns are detectable by search engine filters. Most linking back to a site with the same anchor text is one example of this. All the links to a site are from sites in the same class C IP block or by the same registrant or from private registrations. That is another reason why I like to study my competing authority's link structure. Mixing the anchor text is not enough; you need to mix it in a natural way (e.g. click here, your site URL, your brand name, long tail keywords, etc.).
Deep linking is the art of getting links to your internal pages and not just to your home page. This is a good practice as your link profile will definitely look more natural.
Link acquisition rate
It is well know than search engines (at least Google) measure the rate at which a site gets links as a potential signal of manipulation. There are obvious exceptions to this rule, such as current news events, viral ideas spreading, etc. Search engines can detect these as they have access to such information. Should you fear getting too many links in too little time? My recommendation is not to be afraid of getting a lot of links to your site, but plan for a consistent rate of links. If you can get around 1,000 links every month, shoot for that, instead of 2,000 this month, 5 next, 100 the following, etc. If they are natural links coming from successful link baits, there is no need to worry.
As links are seen as votes, search engines don't want to be counting votes from the same person multiple times. To address this they discount links that belong to the same entity. Sitewide links are an example of many links coming from the same site, but that are not counted as many votes, but as only one. If your blog is placed in the blogroll of another blog, that is technically a sitewide link.
Traditionally they are flagged as affiliated links hosted in the same IP address or the same IP address block (Class C). The research paper that introduces the Hilltop algorithm provides some clues as to other interesting ways search engines probably detect affiliation. The paper talks about using second level domain names in addition of Class C IP blocks; for example sun.com and sun.co.uk are affiliated even thought they are not hosted in the same ISP.
It is also well known that Google became a registrar a few years ago, but it has yet to sell its first domain name. The move gave them access to domain ownership information. This means that you can host all of your sites with different ISPs around the world but the registration information will flag them as affiliated and their votes will only count as one.
Manual 'spam' filtering and quality rating
In some cases you build all your links perfectly, but after a while your rankings magically disappear. It is important to understand that most top search engineers see SEOs as spammers. They don't like any sort of manipulation. Whether you are a well known SEO expert and search engineers don't think you deserve your rankings, or if some search engine quality rater does not like the search engine result page where your page was listed, the result is the same: your rankings will vanish. To avoid this it is essential that your site and your title and meta descriptions do not look spammy. They should look as natural and legitimate as possible.
When you acquire links it is important to understand that some site owners will place links to your site, but will flag them as “not trusted” for the search engines. There are many ways they can do this, but the most common ones are adding the “nofollow” attribute to the link or to the meta robots tag (or X-Robots-Tag) of the linking page. They can also deny robots access to the linking pages.
I probably missed a few more principles, but these ones should give you a solid start.
Clearly doing this type of extensive link analysis requires some serious tools. It is very difficult to do it by hand. In a follow-up post I will share both free and paid tools where you can use this information when seeking links for your link-building efforts.
On a related note, I am expanding the private beta of my SEO suite, which, as you probably guessed, can help you do this type of analysis. If you would like an invitation please leave a comment or send me an e-mail. The public beta launch will be next month at TechCrunch20 in San Francisco. We didn't make it to the final 20 (we were competing with about 700 companies around the globe), but we were in the top 100 semi-finalists and we are happy to present there in the DemoPit.