Referral spam in Google Analytics has become a considerable problem these days. Just like with cancer, there is no surefire way to get rid of it for good, but there are several ways of minimising its impacts on your data.

The quick fix

Make sure you have the “Exclude all hits from known bots and spiders” box checked at the view level. The way to get there: Admin -> select a view -> View Settings -> Check the box

ga_filter_bots

Make no mistake, this won’t solve your problems. It’s the bare minimum you have to do in this fight against spammy referrers. Time to bring out a slightly larger caliber, and that is…

.htaccess exclusion

For those who are skilled enough in working with the .htaccess file or really like to live dangerously, there is a way to exclude the spammy referrals before they even hit your GA. Himanshu Sharma pointed me in the right direction – to the master .htaccess spam list published by Perishable Press. This method doesn’t work 100% well either, but combined with the upcoming two, it’s a pretty good way of dealing with the situation.

ialtld

Referrer exclusion filters

I could write a long post about how to set up referrer exclusions filters, but others (like Dennis Moons, Carlos Escalera or Ben Travis) have already done it. Let’s go for a simpler and more elegant solution instead.

Simo Ahava has recently built a tool that imports several filters to your GA. These filters are already set up to rid you of unwanted traffic. Install Simo’s Spam Filter; thanks to its interface, it’s intuitive and easy to install.

Feeling pretty good about the future, aren’t you? But what about all the spam-ridden historical data? They also deserve some love.

Exclusion segment for historical data

Fear not, we’ve got your back. Install this referral exclusion segment into your Google Analytics, modify it according to your needs, and analyse away!

This segment doesn’t automatically update, but it provides you with an essential database of spammy sources, which should make your detective work easier. The smarter ones have noticed that the sources are, much like Simo’s filters, taken from the Lone Goat’s list.

Any long-term solutions?

After you will have installed this set of filters, secured your .htaccess and cleaned your data with the custom segment, you will be safe for a moment but not for a long one.

These exclusion methods are purely reactive and don’t address the real problem. According to Georgi Georgiev, that would require significant alteration of Google Analytics’ core functionality.

However, the implementation of Google Analytics – no matter if it’s the usual JS tracking or via the Measurement Protocol ultimately relies on an unidentified client machine to send the HTTP request. Thus, any such request can be spoofed/forged and there is no workaround for this that doesn’t require altering the very core of the Google Analytics tracking functionality in a very, very significant way.

The spammers are evolving very fast, and right now, they seem to be two steps ahead of the Google Analytics team. They even started pushing events into some GA accounts. Fortunately, the bright sparks at Google seem to be working on a countermeasure. I hope they get it right and deploy it quickly.

In the meantime, feel free to use everything at your disposal to make the GA data cleaner and easier to analyse. Good luck!