Anti-spam techniques: Weeding out spam profiles

|

This post is a bit of a personal anecdote, and leans a little to the technical side of community management, so bear with me.

My wife recently started a little discussion forum catering to a very specific niche of individuals sharing an interest of hers. The subject material here isn't important, but her experience is.

Being the so-called "techie husband" that I am, I told her it'd be a snap to set up something simple and run it. Of course, most of the community software I work with these days is either custom-built or large-scale, neither of which really suited her needs. So I decided to go with my old open source forum standby, phpBB.

Weeding out spam profiles

I'm a glutton for punishment, it seems.

Unwittingly, I went ahead and installed it, with a little tweaking along the way. To deter spambots, I enabled reCAPTCHA for sign-up, helped her out with some theme customization, showed her how forum permissions worked. I setup the forum to require registrations before members could read anything (her request.) I then added and configured a handful of plugins here and there for things like board stats, image attachment management, and email notifications, and went on my merry way. Quick and dirty. No further maintenance required. Or so I thought.

This sh*t is old.

Old software (via ScottSimpson on Flickr)

Much to my surprise, aside from moving their not-so-active development to Github, not a whole lot has changed with phpBB since I last used it regularly in... oh, 2003 or so. Sure, there's a new major release, version 3... from 2007. And to my chagrin, many basic features we've come to expect from today's social platforms just aren't there. Others are available through sketchy, hard-to-install third-party mods, which in most cases are poorly maintained, if at all. Being used to the efficiency, elegance and ease-of-use of other well-maintained open source packages like WordPress, this was a rather disappointing discovery.

All-in-all, it's a dinosaur. But if it's good enough to do the job, so be it, right?

As most community managers discover, there are times when you can sit back and let the seed you've planted grow, basking in each new sign-up and the brisk activity of a shiny new toy. Enjoy them while you can. They don't last.

So a few days and then a week went by, several new members joined, and all was well. My wife was happy, her little forum chugging along and growing nicely, without any involvement on my part.

And then it happened. Spam. And lots of it.

Horses and donkeys and... oh my.

Or more precisely, a multitude of manually-created spammer accounts posting spam. And not even the usual viagra or cialis ads, designer knock-offs, or even weight-loss scams. Full on porn. Horses, donkeys, men, women, you name it. Apparently Russian beastiality porn is all the rage with spammers these days. Either way, it's not particularly pleasant to moderate, particularly when you're my wife, who's never operated a forum before in her life.

She took it in stride, deleting spam posts and banning offending users with gusto. We went as far as moderating all new registrations, effectively creating a whitelist-based site. Unfortunately, it wasn't enough.

You see, unlike blog post comment spam, which is (in most cases) relatively painless to deal with using easily configurable plugins and APIs like Akismet or TypePad AntiSpam, forum spam — particularly on older forum software and social networks — is a different creature altogether. Spammers infiltrating community sites and discussion groups are insidious, devious creatures. Getting past a plain old reCAPTCHA, let alone more complex anti-bot measures, is nothing when you have a team of human spammers working for pennies on your payroll. And unlike primarily automated, robot-driven blog spam, most of the spammers we see when managing online communities are real people, whose sole (paid!) purpose in life appears to be to inflict their products upon your unsuspecting audience.

No matter how hard it is to accomplish, they'll get in one way or another.

The anatomy of a spam profile

Twitter's sign-up form

Most social networks and community sites have member profiles that require a few mandatory fields upon initial registration. Twitter is a prime example of this, requiring only three fields: full name, email, and password.

You'll see some similarities among them all, including the basics of an email address, a handle or pseudonym of some sort, a password, and possibly a full name field. Other common elements include a profile image or photo, potentially a location, and a web site URL of some kind. A bio field (and more) is often found on community sites that offer detailed profile pages.

Behind the scenes, most forum and community applications track some additional, hidden information about each user, including your browser's IP address, hostname (if available), user-agent, date of account creation and last login, and potentially the referring URL you arrived from, among other things. For the non-technical among you, these little chunks of data are made available to web sites you open by your browser, and it's perfectly normal. Many of these pieces of data are tracked directly in the site's web server logs as well, but it's useful information to expose to your moderators, if you have control over it. Most of them are also easily spoofed, but thankfully that's much more common with automated robot spam than with humans.

There are ways to automate spam and abuse detection and removal, and with larger sites it's almost mandatory. However, for the smaller community builders around, unless you're somewhat technically-inclined, working with an in-house (or outsourced) development team, or using a very full-featured community site package like Ning, this may not be an easy or cost-effective option for you. It also backfires quite often, as has happened repeatedly with both Ning and huge social networks like Yelp and FourSquare. Your automated spam-management software can even lead to class-action lawsuits, which are no fun to defend against... even if you happen to win.

The tell-tale signs of a spammer

When managing or moderating small-to-medium sized community sites, there are a few tell-tale signs to look for when attempting to manually track down spammers.

In most cases — in my experience — human spammers will make an attempt to fill in a few fields beyond the basic system requirements. This may involve adding a URL to advertise, which falls into the category of comment spam and backlink/trackback attempts that you'll more commonly see on blogs. Often, they'll include a full name, gender, or profile image, in an attempt to blend in with the rest of your community... before they dump a deluge of advertisements for whatever product they're pushing on you.

Keeping an eye out for the following will help you and your moderators to nail down spammers quickly and quietly. Some of these will be obvious, but they're still worth mentioning.

The blatantly obvious

  • Empty profiles with links to external sites
    If a user creates a profile containing nothing but a link to an external site, you're pretty much set to ban.
  • Immediate spam comment posting
    The user creates the account and immediately starts posting comments containing links on other users' content or profiles. Insta-ban.
  • Stripped links in profile text
    Spammers aren't necessarily the sharpest tools in the shed. I've operated many community sites that strip (remove) links from member profiles, sometimes forever and sometimes until a certain threshold of usage has been met. If you find a whole ton of product names listed in a big block of profile text that probably should be external links, ban away.

Profile inconsistencies

  • Full name and email address mismatch.
    This is a pretty easy one to eyeball when editing a potential spammer's profile on your site, although it's also easy to make a mistake here. If a new account shows up as John Smith, yet has an email address like tina.schlemko.98@gmail.com, it's a pretty obvious sign that something fishy is going on.
  • Profile image and name/gender mismatch.
    This again leaves the potential for misinterpretation, but checking out that their photo even vaguely matches their gender and/or name can help in deciding whether a user is a spammer or not. Neither this nor the previous technique should be relied on entirely, but they're helpful if you're on the fence based on other behaviour.
  • Profile data that looks randomly generated.
    This is slightly more obvious the more time-pressed or less motivated the spammer is, but surprisingly they still do it quite regularly. Usernames, email addresses or other required fields full of randomly generated strings or text that looks as if it wasn't human-written is a relatively obvious indicator of a lazy spammer. Again, take this with a grain of salt.

Upon closer inspection

  • Repetitive questionable behaviour
    This is a little tougher to spot unless you're either (a) watching your site like a hawk, or (b) have automated tools to help monitor odd activity and behaviour, but it's generally easy to confirm one way or another.A typical Twitter spam accountFake Twitter accounts are notorious for this; you'll notice the distinct lack of tweets and followers, yet they may be able to follow hundreds (or sometime thousands) of accounts before Twitter bans them. They're harmless enough on Twitter aside from taking up space, but on a smaller community site they can be a real nuisance. Watch for patterns of repetitive behaviour, dozens of duplicate comments, or whatever type of input your site allows. Another approach to dealing with this is to implement throttling of content submissions, which can deter this sort of behaviour (although it may annoy your users in some cases.)
  • Checking IPs, hostnames, and email addresses against known blacklists
    There are certain free third-party tools and sites that come in particularly handy here, but the best I've found when it comes to blacklists is StopForumSpam. While they do offer a (throttled, limited) API for submitting spam data (which is great), it may not scale to handle automated checks with larger, high-volume sites particularly well. However, it's an excellent tool for manually checking IP addresses, email addresses, and known spammer hostnames. Here's an example of the output based on a spammer recently nailed on one of my sites. It's very easy to use, and something you can get your moderators (or wife, in my case) using right away. Unfortunately, it won't necessarily protect you from IP spoofing by very persistent spammers, but it's a good start.
  • Previously flagged IP or email address
    This can sometimes be a red herring (particularly when using blanket IP address bans) due to how proxy servers and dial-up internet accounts work in certain areas of the world. In community software I've built, I keep my own blacklist of previously banned or flagged accounts. On attempted account creation, spammers who try and register and match the ban list are prevented from registering, or are booted to a separate area of the site directly. If you're building your own tools, this is something you'll want to address immediately. If you're using an off-the-shelf package, look for one that has this feature built-in (and configurable); failing that, look for plugins to help, if at all possible. The best way to prevent spam is to detect them before they get in the door. Just be careful not to inadvertently ban an entire school district or large organization.
  • API-based spam content detection
    There are a number of third-party services that expose external APIs (services that you can program against and utilize remotely in your code) for spam and abuse detection. Akismet, StopForumSpam, and TypePad AntiSpam, all mentioned above, offer external APIs that you can tie your application into to detect spam content automatically. There are other similar services available as well, like Defensio, LinkSleeve, Fassim and Project Honeypot; some paid and some free. It's not the simplest approach, but it's likely the best, and it'll require less maintenance down the road. This is the route that I'm taking with my larger-scale community sites.

Combining forces

At the end of the day, a combination of the above approaches is likely to be your best option for combating spam in your community. Since she was dealing with human spammers and not spambots, I ended up advising my wife to use StopForumSpam to help out with her spam issues. Thankfully, StopForumSpam offers a pretty comprehensive list of free third-party anti-spam plugins to help out with detection on common forum and community software packages. Unfortunately, these tools won't help you with a custom-built community, so you'll have to devise a method to protect yourself. Hopefully the tips above will give you a head-start on joining the fight.

On that note, we'd love to hear your suggestions and tips for weeding out spammers in your communities. What methods do you use and recommend, and why? Feel free to post your thoughts in the comments below.

Photos courtesy of matsuyuki and ScottSimpson on Flickr.

Comments on this post are closed.