A World of Bugs and Logs: Advanced Web Analytics

by Hamlet Batista | November 19, 2007 | 7 Comments

As search marketers we need to know if our efforts are paying off. How many visitors are we getting? What channels are they coming from? And more importantly, how many of those visitors are taking the action (conversions) that we want them to take?

For all this information, we rely on Web Analytics. There are two basic breeds of web analytics packages: web bugs that use page tagging through JavaScript, and web logs that analyze server data. Each type of package has its pros and cons, so search marketers need to utilize both to get the complete picture. Let me tell you how you can best combine the disparate world of bugs and logs.

Web bugs (page tagging)

Page tagging systems, such as Google Analytics, are excellent because you can see the data in near real time and you get a very accurate count of your website visitors. Primarily, this is because they use JavaScript, and the code can bypass caching proxies and has access to information available to the web browser that is not available to the web server, such as screen dimensions, browser plug-ins installed, etc. You could say that setting cookies is another advantage, but it is also possible to record cookie information in log files.

There are, however, some serious limitations to this approach:

  1. Page tagging systems do not record visits from search engine robots, since robots do not execute JavaScript. This information is way too important to go without.

  2. Page tagging systems do not record page errors. There is a way to work around this, and I will explain it in a follow-up post.

  3. Some users disable JavaScript and/or cookies. If this occurs, the visitor’s information is lost; even measuring return or unique visits becomes inaccurate when visitors delete cookies on their computer. If the page takes too long to load and the user clicks away, the session may not get recorded either.

  4. You need to manually install the code on each page. This opens up the opportunity for errors: pages with no tags, pages with other JavaScript code errors, etc.

Web logs (log analysis)

I am a big fan of log files and I personally use them extensively. Log files are advantageous for search marketers because they track search engine robot activity and help identify broken links and server errors. Of course, log analysis has its drawbacks too:

  1. The big one is that it fails to account for cached requests and visits from multiple users behind a proxy server (same IP address).

  2. Another problem is that you don’t get to track actions that happen in the web browser, such as JavaScript or Flash events.

  3. The reports are not in real time.

  4. It requires a level of comfort with technical things or access to competent technical staff.

A better solution: The hybrid approach

Fortunately, page tagging and log analysis can work in tandem, as the two systems complement each other well. Here are two strategies currently in use to create hybrid solutions: cookie-fortified logs and web server plug-ins.

  1. Cookie-fortified logs: The idea here is to collect the information from the JavaScript code (e.g. user session, screen resolution, etc.) and send this information back to the server via cookies. In order for this work, the web server must be configured to record the cookie information in the log files. In Apache it is as simple as including the Cookie variable in your LogFormat. For example: \”%{Cookie}i\”. The full line in your httpd.conf file should read like this:

    LogFormat “%h %v %u %t \”%r\” %>s %b \”%{Referer}i\”\”%{User-Agent}i\” \”%{Cookie}i\”” special
    Once the logs include that information, the reporting engine can produce very accurate reports. Obviously, this method requires some technical know-how. Google’s Urchin hosted edition takes this approach, but unfortunately they decided to discontinue its development.

  1. Web server plug-ins: The idea in this one is to gather information from both the page tags and from code running on the web server (via a plug-in). The advantage is that data collection goes to a third-party server that does the number crunching and generates reports. The data can be presented in ‘real time’ and you don’t need to deal with web server configuration, accessing logs via FTP, and so on. The only drawback is privacy. If you are wary of releasing competitive business information, analyzing logs may be your only option.

This was the approach taken in the first hybrid solution by Rufus Evison back in 1998. He spun the product off to create a company based upon the increased accuracy of hybrid methods.

Both approaches use the same concept: combine the information from the web browser (through page tagging) with information from the server (through logs or web server plug-in/API).

Personally, I like using Google Analytics, and it is unfortunate that it does not provide a hybrid solution. At the moment I need to use the old Urchin for my profitable projects; information about robot activity on my sites is what I miss the most. I am currently researching a way to inject that information into GA via a server-side script. I will write a blog post when/if I am successful.

Hamlet Batista

Chief Executive Officer

Hamlet Batista is CEO and founder of RankSense, an agile SEO platform for online retailers and manufacturers. He holds US patents on innovative SEO technologies, started doing SEO as a successful affiliate marketer back in 2002, and believes great SEO results should not take 6 months



Try our SEO automation tool for free!

RankSense automatically creates search snippets using advanced natural language generation. Get your free trial today.


Latest news and tactics

What do you do when you’re losing organic traffic and you don’t know why?

Getting Started with NLP and Python for SEO [Webinar]

Custom Python scripts are much more customizable than Excel spreadsheets.  This is good news for SEOs — this can lead to optimization opportunities and low-hanging fruit.  One way you can use Python to uncover these opportunities is by pairing it with natural language processing. This way, you can match how your audience searches with your...

Making it easier to implement SEO changes on your website

Changes to the RankSense SEO rules interface

As we continue to improve the RankSense app for Cloudflare, we are always working to make the app more intuitive and easy to use. I'm pleased to share that we have made significant changes to our SEO rules interface in the settings tab of our app. It is now easier to publish multiple rules sheets and to see which changes have not yet been published to production.


How to Find Content Gaps at Scale: Atrapalo vs Skyscanner

For the following Ranksense Webinar, we were joined by Antoine Eripret, who works at Liligo as an SEO lead. Liligo.com is a travel search engine which instantly searches all available flight, bus and train prices on an exhaustive number of travel sites such as online travel agencies, major and low-cost airlines and tour-operators. In this...


Exciting News!
seoClarity acquires RankSense