Request your SEO Monitoring Invitation

By submitting your email address, you agree to receive follow up emails about RankSense’s products and services. You can opt out at any time by clicking the link in the footer of our emails. We share your information with our customer relationship management partners. For information about our privacy practices, please see our privacy policy

Log based link analysis for improved PageRank

by Hamlet Batista | May 29, 2007 | 17 Comments

While top website analytics packages offer pretty much anything you might needto find actionable data to improve your site, there are situations where we need to dig deeper to identify vital information.
One of such situations came to light in a post by randfish of Seomoz.org.He writes about the problem with most enterprise-size websites, they have many pages with no or very few incoming links and fewer pages that get a lot of incoming links.He later discusses some approaches to alleviate the problem, suggesting primary linking to link-poor pages from link-rich ones manually, or restructuring the website.I commented that this is a practical situation where one would want to use automation.
Log files are a goldmine of information about your website: links, clicks, search terms, errors, etcIn this case, they can be of great use to identify the pages that are getting a lot of links and the ones that are getting very few.We can later use this information to link from the rich to the poor by manual or automated means.
Here is a brief explanation on how this can be done.
Here is an actual log entry to my site tripscan.com in the extended log format: 64.246.161.30 – – [29/May/2007:13:12:26 -0400] “GET /favicon.ico HTTP/1.1″ 206 1406 “http://www.whois.sc/tripscan.com” “SurveyBot/2.3 (Whois Source)” “-”
First we need to parse the entries with a regex to extract the internal pages — between GET and HTTP — and the page that is linking after the server status code and the page size.In this case, after 206 and 1406.
We then create two maps: one for the internal pages — page and page id, and another for the external incoming links page and page id as well.After that we can create a matrix where we identify the linking relationships between the pages. For example: matrix[23][15] = 1, means there is a link from external page id 15 to internal page id 23.This matrix is commonly known in information retrieval as the adjacency matrix or hyper link matrix.We want an implementation that can be preferably operated from disk in order to be able to scale to millions of link relationships.
Later we can walk the matrix and create reports identifying the link-rich pages, the pages with many link relationships, and the link-poor pages with few link relationships. We can define the threshold at some point (i.e. pages with more or less than 10 incoming links.)

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months

17

REPLIES

Leave a Reply

Want to join the discussion? Feel free to contribute!

Install our free SEO monitoring app today!

RankSense can detect traffic-killing SEO issues in real time, and send instant notifications to your e-mail, phone or Slack channel. You have full control of the type of alerts you receive by severity and the frequency of alerts.

OUR BLOG

Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Site Mergers and Rebranding Without Losing SEO Traffic

Similar to site migrations, site mergers and rebrandings are usually problematic for businesses. Whether it’s a large corporation or a burgeoning startup, changing urls without comprehensive redirects and missing migration steps can result in a dramatic drop-off in site traffic. That being said, the risks associated with traffic loss and the time it takes to...

READ POST

When SEO is Not Enough to Grow Your Business

At RankSense, businesses come to us with SEO needs because they realize that search engine optimization is one of the best ways to increase visibility for their company and sales of their products. For many of our clients, that is the case and we are happy to help them grow. However, we have other clients...

READ POST

SEO Tactic #7: AB Testing Your Organic Search Snippets

SEO Tactic #7: AB testing your Organic Search Snippets. AB Testing Your Organic Search Snippets Meta descriptions, the little snippets that show up when you Google something…what business really needs ‘em, right? Before you answer, let’s walk a moment in a hypothetical online shopper’s shoes… Summer’s around the corner, and let’s say that 2018 is...

READ POST