Website stats – Caveat emptor?

“Knowing a great deal is not the same as being smart; intelligence is not information alone but also judgment, the manner in which information is collected and used.”
– Dr. Carl Sagan

Web stats chart

Do you rely on website statistics as a basis for your online marketing decisions? If you do, having accurate information would seem very important to you.

Yet, over the past few years I’ve seen repeated evidence that the statistics many of us rely on may be suspect at best. There has been a lot of chatter on the Internet regarding underreporting of website statistics by Google Analytics.

Over the past year, I have been tracking stats on 15 sites that I maintain for myself and for several clients. All 15 sites have Urchin 5 installed to read the server logs as well as Google Analytics.

What I’ve noticed is a very consistent underreporting of the raw numbers on all 15 sites. The underreporting ranged from 15% to 20% once bots and other non-human visitors were removed from the mix.

In one particular case, a new web page showed 17 unique visitors in it’s first week according to our server logs, which were independently verified as having been 17 individual human visitors. But Google Analytics only showed 1 visit to the page. That was an extreme case, but was 100% verifiable.

That incident just noted occurred in January of this year. It is not an old case where the problem has been corrected since. This sheds doubt on all website counts based solely on Google Analytics.

I’ve read many reports from website owners who claim underreporting rates of Google Analytics of 50% or more. In the past I have always found that number to unacceptably high and also unrealistically deviant from the norm. Now, in the light of the facts noted above, I am not so sure.

What could cause this sort of underreporting? While I can’t say with certainty, I suspect it has something to do with Google Analytics reliance on remote JavaScript as it’s method of gathering data. If a visitor has JavaScript turned off, or a network error interrupts the transmission of data from the browser to Google, no visit is registered for that page when a visit has actually taken place.

That said, I still use Google Analytics for the statistical samplings and ratios, such as pageviews per visit and bounce rate. Why? If the sampling is broad enough, even taking the underreporting into account, that such statistics can be considered accurate within acceptable statistical margins of error.

Alexa stats are another issue. We recently had a website with a bounce rate of under 20% according to both our web server logs and Google Analytics, but reported by Alexa with a 79% bounce rate.

That’s a HUGE difference – what could cause that? Well, the reason is the biggest weakness in Alexa stats and a good reason to doubt their veracity at any level. Alexa relies on users with the Alexa toolbar installed to gather data.

Problem – who is the Alexa user base?

Webmasters, designers, marketers and other “web admin” types are heavy users. But the overwhelming majority of consumers and Internet users don’t even know what Alexa is, let alone have it’s toolbar installed. So Alexa stats are almost exclusively created by the people who use the data, not the people who should be included in the data.

Knowing this, it is clear that smaller, niche websites whose user base actually consists of Alexa toolbar users have a decided advantage in Alexa rankings.

If you are using Alexa data to make marketing decisions, be aware that you are basing those decisions on data mostly collected from sellers like yourself, not from the buyers you are trying to reach.

If there is a bottom line to this, it may be that the webserver’s logs are the most accurate form of website statistics. So internally, we use our Urchin 5 statistics for most purposes, because the method of collection is the most accurate.

So when you are looking at website statistics to make marketing or other decisions, please take the following into consideratione. Whose statistics are being used? How were those statistics gathered? Are those statistics from the website’s own server logs, or from a third-party service that does samplings but can’t possibly have completely accurate information apart from the websites own server logs? “Caveat Emptor” – let the buyer beware.

If you’d like another take on this subject from another source, please check out this link:
Another source for Google Analytics underreporting information.