Hacking Analytics, tracking iOS, Spiders & Bots

A few weeks ago I was having a conversation with a partner of ours, and the topic of tracking spider activity on fresh content came up.

While the company in question has good in-house dev resource, asking them to build something bespoke to track Googlebot was off the table.  Lets face it, most companies have dev queue’s, not a surplus of developers – and things like this are often de-prioritised out of existence!

This reminded though me of a post from a few years back that Ani Lopez (Analytics guru and all round nice guy) tweeted me, detailing a clever hack to track search engine spiders.

Analytics 101 – How it works

bot-traffic-affects-google-analytics

As you are probably aware, Google Analytics much like any Javascript stats package relies on JS to function.

Without it, traffic is simply ignored as the details are simply not forwarded to google.

Additionally, it needs the ability to cookie browsers using it, so it can track repeat visitors etc.

 

The problem

Unfortunately, there are plenty of devices that simply don’t execute JS or store cookies that you would probably want to track, the best historical example being feature phones / WAP phones (everyone remembers them right, before we all had iPhones or Galaxy’s?).

Guess what, search engine spiders fall exactly into that category as well!

Google very kindly have a workaround for WAP phone tracking, by responding to requests from devices that are unable to be tracked in the traditional way, and generating a pixel with the salient points of information we want, and passing them through to GA that way.

Its designed to sit on mobile only websites, like m.whatever.com or mobile.whatever.com – BUT – it does open up some pretty interesting opportunities…

All you need to do is preempt the GA code with a selection criteria, in this case its the request user agent:

spiderlytics

 

Use Case 1 – Tracking Spiders & Bots

track-botsThe original concept was to track search bots through analytics, something that it did rather nicely, however since the original post was published by Cardinal Path a little over two years ago, things have moved on a bit and the code no longer worked.

Yesterday I tracked down the original author Adrian Vender on twitter yesterday and had a chat with him about it. (The power of social media, YAY!)

He has very kindly updated his original source code to work again in 2013 which you can now grab directly from his personal blog here:

Tracking bots using analytics.

There are still a few things to iron out but the code linked to from his post is QA ready.

Specifically ignore the instructions to update your Analytics profile number from UA to MO (thats deprecated, and from the older non-functional version).

Also, the list of bots is a bit outdated, but I’ve built a more complete list that you can download here bots.xlsx

 

Use Case 2 – Tracking Rogue iOS Mobile Devices

iphone_drevil

Another use (untested) for this approach might well be to recover the analytics for those rogue mobile devices that are currently not playing nicely with GA.  After all, the only thing that you need to record the traffic is an accurate list of user-agents which those devices use.

Its not something that I’ve had to deal with professionally yet, so I’ve not bothered trying to get it working, but if its something that has caused you an issue and you want to fork the code please let me know and I’ll update this post and publish it here if you so desire.

 

 

 

 

Other Use Cases:

trackallthethings

When you think about it, there are potentially thousands and thousands of devices these days that don’t necessarily accept or execute javascript, but that do have unique user agents.  This could be stuff like internet enabled watches, through to the new 2013/2014 in dash internet access in many modern cars.

You can also use it to know when you’re being spidered by people using screaming frog, Xenu Linksleuth (for us old fogeys), and it could even alert you to a DDOS attack just by setting up some custom alerts.

watches

 

Basically, most things that you can do with server logs you can now do within the nice familiar frontend of Google Analytics!

 

Semi-obvious caveat: server error pages (ie. 500’s) are still unlikely to track, but thats a small loss compared to the huge potential gain!

 

What use cases can you think of?  Leave a comment below – all ideas appreciated!

 

UPDATE:

Following the comments below from Yousaf Sekander he has emailed me screenshots of his wordpress plugin designed to monitor bot activity:

SEO-crawlytics

You can download said plugin direct from WordPress.org here

Its a fantastic looking plugin, and if you are a WP publisher its a nice super easy way to get bot statistics.  The above GA method will reveal far more bots by default though – so its worth looking at as well, but for ease of use especially if you aren’t comfortable with code, go ahead and install his plugin!

Martin MacDonald
Previously: Head of SEO, Omnicom. Inbound Marketing Director, Expedia. Head of Content & SEO, Orbitz. Currently: Marketing Consultant to Fortune 500's and High Growth Startups locally in Silicon Valley. Retired BlackHat & Current Tech SEO Geek.
Martin MacDonald

@searchmartin

Founder of Digital Marketing Consultancy, MOGmedia. Former head of SEO & Content for Expedia, Orbitz, Omnicom & others.
Required reading for SEOs: https://t.co/WCrYZ8mzK2 - 13 hours ago