A World of Bugs and Logs: Advanced Web Analytics

As search marketers we need to know if our efforts are paying off. How many visitors are we getting? What channels are they coming from? And more importantly, how many of those visitors are taking the action (conversions) that we want them to take?

For all this information, we rely on Web Analytics. There are two basic breeds of web analytics packages: web bugs that use page tagging through JavaScript, and web logs that analyze server data. Each type of package has its pros and cons, so search marketers need to utilize both to get the complete picture. Let me tell you how you can best combine the disparate world of bugs and logs.

Web bugs (page tagging)

Page tagging systems, such as Google Analytics, are excellent because you can see the data in near real time and you get a very accurate count of your website visitors. Primarily, this is because they use JavaScript, and the code can bypass caching proxies and has access to information available to the web browser that is not available to the web server, such as screen dimensions, browser plug-ins installed, etc. You could say that setting cookies is another advantage, but it is also possible to record cookie information in log files.

There are, however, some serious limitations to this approach:

  1. Page tagging systems do not record visits from search engine robots, since robots do not execute JavaScript. This information is way too important to go without.

  2. Page tagging systems do not record page errors. There is a way to work around this, and I will explain it in a follow-up post.

  3. Some users disable JavaScript and/or cookies. If this occurs, the visitor’s information is lost; even measuring return or unique visits becomes inaccurate when visitors delete cookies on their computer. If the page takes too long to load and the user clicks away, the session may not get recorded either.

  4. You need to manually install the code on each page. This opens up the opportunity for errors: pages with no tags, pages with other JavaScript code errors, etc.

Web logs (log analysis)

I am a big fan of log files and I personally use them extensively. Log files are advantageous for search marketers because they track search engine robot activity and help identify broken links and server errors. Of course, log analysis has its drawbacks too:

  1. The big one is that it fails to account for cached requests and visits from multiple users behind a proxy server (same IP address).

  2. Another problem is that you don’t get to track actions that happen in the web browser, such as JavaScript or Flash events.

  3. The reports are not in real time.

  4. It requires a level of comfort with technical things or access to competent technical staff.

A better solution: The hybrid approach

Fortunately, page tagging and log analysis can work in tandem, as the two systems complement each other well. Here are two strategies currently in use to create hybrid solutions: cookie-fortified logs and web server plug-ins.

  1. Cookie-fortified logs: The idea here is to collect the information from the JavaScript code (e.g. user session, screen resolution, etc.) and send this information back to the server via cookies. In order for this work, the web server must be configured to record the cookie information in the log files. In Apache it is as simple as including the Cookie variable in your LogFormat. For example: \”%{Cookie}i\”. The full line in your httpd.conf file should read like this:

    LogFormat “%h %v %u %t \”%r\” %>s %b \”%{Referer}i\”\”%{User-Agent}i\” \”%{Cookie}i\”” special

    Once the logs include that information, the reporting engine can produce very accurate reports. Obviously, this method requires some technical know-how. Google’s Urchin hosted edition takes this approach, but unfortunately they decided to discontinue its development.

  1. Web server plug-ins: The idea in this one is to gather information from both the page tags and from code running on the web server (via a plug-in). The advantage is that data collection goes to a third-party server that does the number crunching and generates reports. The data can be presented in ‘real time’ and you don’t need to deal with web server configuration, accessing logs via FTP, and so on. The only drawback is privacy. If you are wary of releasing competitive business information, analyzing logs may be your only option.

This was the approach taken in the first hybrid solution by Rufus Evison back in 1998. He spun the product off to create a company based upon the increased accuracy of hybrid methods.


Both approaches use the same concept: combine the information from the web browser (through page tagging) with information from the server (through logs or web server plug-in/API).

Personally, I like using Google Analytics, and it is unfortunate that it does not provide a hybrid solution. At the moment I need to use the old Urchin for my profitable projects; information about robot activity on my sites is what I miss the most. I am currently researching a way to inject that information into GA via a server-side script. I will write a blog post when/if I am successful.

7 replies
  1. Richard Chmura
    Richard Chmura says:

    Hi Hamlet, GoStats is a good option. (similar to Google analytics). GoStats is fully real-time on some levels and nearly realtime on some others.

    Question: why do you want to combine the server-side (bot tracking) with the web-bug (cookies) tracking? Would it be much easier to just have two systems – one for the bots and one for the 'real' people?


Trackbacks & Pingbacks

  1. […] A World of Bugs and Logs: Advanced Web Analytics […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *