I am bothered by comment spam as much as the next person, but in the pursuit of science (honest) I also like playing about with the latest spamming tools.
They all work roughly the same way, some better than others, but its all pretty standard…
The recipe is simple, you spin some comments, submit them to thousands (even millions) of sites in the hope that a few them stick.
The problem is though, 99.9% of them end up sat in spam queues, on a thousand million websites like this:
Spam is BAD for the environment!
Think about it!
- Spamming in the first place uses electricity,
- spidering and submitting spam relies on copper wiring or fibre optics that have manufacturing costs,
- it sits in MySql databases on servers that use precious metals,
- filling hard discs up uses yet more electricity that more often than not is produced by burning fossil fuels!
If we look at the global greenhouse gas emissions since 1990 you can see that:
Greenhouse Gases have a strong correlation to the amount of spam sent across the internet:
Now, I’m not saying its absolutely certain that its just spam thats caused this explosion in greenhouse gases, but its a risk I’m not willing to take for this guy:
In SEO we’ve often paid attention to just as tenuous correlative studies, so this shouldn’t be too huge a logic jump 😉
Lets Save the Polar Bears!
The best way that I can think to do this is by increasing the efficiency of comment spam.
Lets call it “my little way of giving back to the planet“.
The problem as I’ve already covered is the hundreds of millions of spam comments left on blogs that will never publish them. Lets start by looking at the near 1,300 comments currently in my moderation queue.
First) login to your wordpress blogs phpmyadmin and grab the table wp_comments:
and choose to download that as a CSV.
Second) Import that into excel and then filter the list to remove any comments you’ve manually cleared. If you just export it normally that should be column K, although that may be different so check first.
If all goes well you should now be looking at a list like this of lovely spam comments:
What we’re really interested in here is column E, which contains the landing page URL of the comment spam.
This is where those indiscriminate spammers are trying to send links to, so lets grab that list and drop it into a another sheet and using the rather excellent SEO Tools for Excel by Niels Bosma, alongside API access from Majestic SEO.
Third) Lets filter out the domains that are likely to be spammers, and set aside the others for a moment (we’ll come back to those in a later post).
Take the Column E from above, and just use the rather nifty function =UrlProperty(A1,”domain) with A1 being the cell reference to the top result in your list of spammy URL destination pages:
Now you’ll have a nice list of both the specific URL that’s being spammed, and the main domain name itself – the next thing that we need to do is grab the Majestic Citation Flow for the Fully Qualified Domain, and the total number of links pointing at the page in question.
The Citation Flow function is =MajesticSEOIndexItemInfo(A1,”CitationFlow”,”fresh”,TRUE) again, where A1 appears you want to put the first cell reference of your list of FQDs, and grab the entire list.
The External Backlinks per URL function is =MajesticSEOIndexItemInfo(A1,”ExtBackLinks”,”fresh”,TRUE) again, where A1 appears you want to put the first cell reference of your list of URLs, and grab the entire list.
Now you should end up with something like this:
Fourth) Now lets filter the list in place, to only domains with a Citation Flow of less than 80 (cF FQD), and sort by total number of backlinks to the URL.
All being well you’re going to see a nice ordered list of the best (well, spammiest) domains that have left comment spam on your blog!
Select the top 10 or 20 of those URLs, copy them into a new sheet, and filter in place to unique records only.
Now Copy the unique records list, and transpose them into another sheet. You should now be looking at a list of unique, really spammy URLs with one per column.
Fifth) In each column we now want to grab the top 1000 spammy links pointing at each, so we need to use a MajesticSEO dump formula: =Dump(MajesticSEOBackLinkData(A1,”SourceURL”,1000,”fresh”,FALSE))
Place the above function BELOW the first URL in your transposed list, and drag it right. You should now be looking at something like this:
Arrow 1) This is your row of spam heavy URLs from the previous step, transposed.
Arrow 2) This is your column of the top 1000 backlinks for each of the above URLs.
Lets get save the planet!
Now that you’ve got a list of thousands and thousands of pages that have already accepted spam comments, its relatively easy increasing your comment dropping success rate from around 0.01% well up into the 80-90% area.
You can build spam links with far more efficiency, you’re saving bandwidth, saving hard disc space, and saving the planet!!
This guy thanks you for saving the world, and I thank you for not submitting comment spam on my blog!
I’m not a blackhat, why do I care?
Well, now that you conceptually know how to mine your comments for URL’s you could just look at comments you’ve already accepted, and check the Citation Flow on those.
Thats a super quick way of finding domains where either the owner or a member of staff have already approached you (by leaving a legitimate comment) and you could reach out to them to see if you can get a link back on their site 😉
13 thoughts on “We can’t stop Spam, we CAN save Polar Bears”
Genius! I’m going to use this. You have also given me an idea to extract previous emails into an excel file, remove common email services such as hotmail, gmail and yahoo and use the technique to do the same with previous contacts. Worth a try I think!
I would love to see a peer reviewed article investigating whether comment spam or celebrity gossip traffic is responsible for more penguin deaths worldwide
To an uneducated person, you seem to be very angry about tinned meat.
True Story: I often get weird looks for taking photos of spam in the local supermarket 😉
As you’re in a sharing mood Martin, I thought I’d join in with a couple additional helpful tactics/tricks that I actively use myself to Spam (Yes, even I do the occasional comment blast!)
1) When spinning comments, make them niche relevant, take a look at my tutorial on Jacob Kings site:
2) Use XRumer, GSA & ScrapeBox to pull footprints, see SEOsUnite Tutorial –
3) Use different private proxies, if I’m doing specifically niche relevant and not spamming style stuff I’ll go with really expensive proxies from the likes of Elite Private Proxies and using Squid/Buy proxies for a couple 100 cheap proxies to spam with..
Hope these tips help peoples become better black hats! #BlackHatFoLyfe
I haven’t looked, but don’t both of those strategies kill Polar Bears?
I’m trying to save the world here, not pollute it with yet more comment spam :p
Nope! Niche comments mean higher approval rate due to them being relevant to the content, proxies mean more speed for less resources as they are strong and efficient proxies. The GSA Method means you can remove the need for searching in GSA which is pretty resource intensive.
I thought it was a given that everybody tried in some way to spam relevant sites. Like, since 2008.
Stop spamming comments and spam Twitter instead! 😉
Doesn’t this blog evoke even more comment spam? 🙂
@Jan – I don’t think so, as if you’re going to actually comment spam, you need to use a few other apps and techniques that I haven’t gone into here. What the post is about allows you to be far more targeted in your site list before you start spamming, and hopefully prevents me from having to delete SO MUCH DAMN COMMENT SPAM :p
Great article! I can certainly see your point, making an economy of both effort and time will certainly help. I don’t think that polar bears would notice 😀 , but steps taken towards more efficiency are steps taken forward.
Nowadays I’m getting too many spams bored with it, that for sharing this trick. Will surely try this, lets start targeted spamming, LOL. I just hate spamming …