Don’t stand so close to me: proximity searching the web

I ran a training session in the BBC’s Sport division this morning, and was reminded of just how useful proximity searching can be.   Proximity searching lets you find two words (and sometimes phrases) within a certain distance of each other, say 3-10 words.  This can be very useful when trying to unearth relationships of any kind between people in the news.


In the example this morning, we were interested in finding out anything about the relationship between former Newcastle manager Sam Allardyce, and Alex Ferguson.  A search in Factiva (subscription only) for:


“sam allardyce” near “alex ferguson”


…brought back this result.  The article in question is actually a court report about a lawyer who stole from a paralysed man.  But squirreled away right at the bottom of the article you will find that the accused “was a member of the prestgious (sic) Mere Golf and Country Club where regulars include football managers Sir Alex Ferguson, Sam Allardyce”.  Now this information might be useless in and of itself, but it could be useful for anyone trying to contextualise comments made by either manager about the other in the press (i.e. supportive remarks).  It highlights what interesting little nuggets of information you can tease out of online search, which would be really time-consuming using basic free-text. 


Proximity searching is little heard of outside the world of subscription search engines like Factiva and Lexis Nexis, but it is possible in a few places – so here follows two examples.


You can start with Staggernation’s Google API proximity search, which allows you to search Google’s results for two words (single words only, I’m afraid) within up to three words of each other (in either order).  You can then sort the results by proximity or ranking using the dropdown, and add further search terms to your search in the Additional terms box.  Some decent results came back (though interestingly, not the one I found in Factiva), with a broad sweep of content from MSM outlets.  Unfortunately though, you can’t refine by domain, so there’s no way to weed out some of the wild speculation in forums.


Exalead has a proximity operator available in its Advanced search (check the option for Adjacent words).  Unfortunately you can’t define how many words there might be between your terms, but the results are still good (and you can refine the search to bring back only UK sites – which is really useful).  These results were OK, and include an article from back when England were looking for a successor after Sven (Ferguson suggested Allardyce was the right choice).


Tags: , , , ,

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: