- On September 3, 2013
- By MartinMacdonald
We can’t stop Spam, we CAN save Polar Bears
I am bothered by comment spam as much as the next person, but in the pursuit of science (honest) I also like playing about with the latest spamming tools.
They all work roughly the same way, some better than others, but its all pretty standard…
The recipe is simple, you spin some comments, submit them to thousands (even millions) of sites in the hope that a few them stick.
The problem is though, 99.9% of them end up sat in spam queues, on a thousand million websites like this:
Spam is BAD for the environment!
Think about it!
- Spamming in the first place uses electricity,
- spidering and submitting spam relies on copper wiring or fibre optics that have manufacturing costs,
- it sits in MySql databases on servers that use precious metals,
- filling hard discs up uses yet more electricity that more often than not is produced by burning fossil fuels!
If we look at the global greenhouse gas emissions since 1990 you can see that:
Greenhouse Gases have a strong correlation to the amount of spam sent across the internet:
Now, I’m not saying its absolutely certain that its just spam thats caused this explosion in greenhouse gases, but its a risk I’m not willing to take for this guy:
In SEO we’ve often paid attention to just as tenuous correlative studies, so this shouldn’t be too huge a logic jump 😉
Lets Save the Polar Bears!
The best way that I can think to do this is by increasing the efficiency of comment spam.
Lets call it “my little way of giving back to the planet“.
The problem as I’ve already covered is the hundreds of millions of spam comments left on blogs that will never publish them. Lets start by looking at the near 1,300 comments currently in my moderation queue.
First) login to your wordpress blogs phpmyadmin and grab the table wp_comments:
and choose to download that as a CSV.
Second) Import that into excel and then filter the list to remove any comments you’ve manually cleared. If you just export it normally that should be column K, although that may be different so check first.
If all goes well you should now be looking at a list like this of lovely spam comments:
What we’re really interested in here is column E, which contains the landing page URL of the comment spam.
This is where those indiscriminate spammers are trying to send links to, so lets grab that list and drop it into a another sheet and using the rather excellent SEO Tools for Excel by Niels Bosma, alongside API access from Majestic SEO.
Third) Lets filter out the domains that are likely to be spammers, and set aside the others for a moment (we’ll come back to those in a later post).
Take the Column E from above, and just use the rather nifty function =UrlProperty(A1,”domain) with A1 being the cell reference to the top result in your list of spammy URL destination pages:
Now you’ll have a nice list of both the specific URL that’s being spammed, and the main domain name itself – the next thing that we need to do is grab the Majestic Citation Flow for the Fully Qualified Domain, and the total number of links pointing at the page in question.
The Citation Flow function is =MajesticSEOIndexItemInfo(A1,”CitationFlow”,”fresh”,TRUE) again, where A1 appears you want to put the first cell reference of your list of FQDs, and grab the entire list.
The External Backlinks per URL function is =MajesticSEOIndexItemInfo(A1,”ExtBackLinks”,”fresh”,TRUE) again, where A1 appears you want to put the first cell reference of your list of URLs, and grab the entire list.
Now you should end up with something like this:
Fourth) Now lets filter the list in place, to only domains with a Citation Flow of less than 80 (cF FQD), and sort by total number of backlinks to the URL.
All being well you’re going to see a nice ordered list of the best (well, spammiest) domains that have left comment spam on your blog!
Select the top 10 or 20 of those URLs, copy them into a new sheet, and filter in place to unique records only.
Now Copy the unique records list, and transpose them into another sheet. You should now be looking at a list of unique, really spammy URLs with one per column.
Fifth) In each column we now want to grab the top 1000 spammy links pointing at each, so we need to use a MajesticSEO dump formula: =Dump(MajesticSEOBackLinkData(A1,”SourceURL”,1000,”fresh”,FALSE))
Place the above function BELOW the first URL in your transposed list, and drag it right. You should now be looking at something like this:
Arrow 1) This is your row of spam heavy URLs from the previous step, transposed.
Arrow 2) This is your column of the top 1000 backlinks for each of the above URLs.
Lets get save the planet!
Now that you’ve got a list of thousands and thousands of pages that have already accepted spam comments, its relatively easy increasing your comment dropping success rate from around 0.01% well up into the 80-90% area.
You can build spam links with far more efficiency, you’re saving bandwidth, saving hard disc space, and saving the planet!!
This guy thanks you for saving the world, and I thank you for not submitting comment spam on my blog!
I’m not a blackhat, why do I care?
Well, now that you conceptually know how to mine your comments for URL’s you could just look at comments you’ve already accepted, and check the Citation Flow on those.
Thats a super quick way of finding domains where either the owner or a member of staff have already approached you (by leaving a legitimate comment) and you could reach out to them to see if you can get a link back on their site 😉