Dealing with spam in real-time search

For a while now spam has become an increasingly trying problem in Twitter.

Until last Tuesday, my experience with spammers had been broadly restricted to blocking those unsolicited followers, with their rudey pics and uncomfortably-informal tweets.

But that evening, while listening to the West Ham vs Millwall game on the radio, I decided to have a quick scout to see who out there was sharing their experience of the bedlam at Upton Park. What I got was a dose of the bedlam that spam is bringing to real-time search results on a hot news topic.

Most of the spam I saw at different points on the night were quite easy to spot, where trending words like upton or millwall were being dumped unceremoniously into text and links about Viagra, making free money and the like.

But spamming can be more sophisticated than this – spammers lift what looks like perfectly valid text and topics while depositing links to dubious sites (and possibly malaware) on the end, and can even retweet-spam your own content to their own ends.

Danny Sullivan who has experienced these matters first hand, suggests the following ways real-time search engines could help us avoid spam:

  1. Accounts less than a day old don’t get to show up in Twitter Search and/or show up for trending topics
  2. Figure a reputation score for accounts and only let those appear in for trending topics
  3. Partner with a service for malware detection, so that any links Twitter puts out are analyzed to be safe

But while there are some services (not all of them publicly available yet) gearing up to deal with this problem, real-time engines already provide a range of ways to help users spot and deal with misleading content.

In terms of spotting spam, link previews and follower counts are included in some engine’s search results, to let you see what you’re clicking on, and see how trustworthy (or at least how active) your source is.

In terms of ridding your results of spam, there’s always boolean. While it would be really useful to have an updated index of popular spam terms which could be filtered, you can always get your hands dirty and use the – (AND NOT) operator to do your bidding – providing, of course, that boolean is supported in your engine of choice.

So here’s what I found in the search results of various engines:

 

Engine
Link previews
Follower count
Boolean support
Collecta Yes No Yes
Geochirp No No No
Icerocket No Yes Yes
PicFog N/A No No
Scoopler Yes No Yes
Twitter search (adv) Yes No Yes
Twitterfall Yes No Yes

 

I’ve discounted Tweetmeme in this comparison because traditional MSM stories tend to dominate amongst the Re-tweets, and I wanted a comparison in terms of breaking news, not ranked results (as is similarly found with OneRiot). Also, I haven’t included those aggregators and metasearch engines like Addictomatic and Surchur because they don’t deal exclusively in real-time search.

Although advanced Twitter search doesn’t offer a follower count option, it does offer the option to ‘expand’ shortened urls. It allows full (and intuitive) boolean search, real-time geo-search, and the option to only return tweets with links in them – making it a far more robust alternative to Geochirp.

As such it is easily the most utilitarian of the engines out there, but hats off to Icerocket for being the only who provide a follower count.

Twitterfall doesn’t tell you how many followers a Twitterer has, but it lets you preview the url (Rather than the page itself), which is pretty time-saving (by comparison with the more time-consuming offering in Scoopler). Twitterfall also supports boolean. However, I personally still find the interface confusing (removing searches, pausing searches, scrolling through results etc).

Those engines which focus on a particular niche in tweeting (Geochirp in terms of geo-location, and PicFog in terms of multimedia) are the least reliable in terms of dealing with spam – neither support boolean.

So in conclusion, I’d say if Advanced Twitter search were updated to incorporate Icerocket’s follower count, it would be hands-down winner.

Advertisements

Tags: , , , , , , , , ,

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: