Wikistats: for tracking trends on Wikipedia

21 June 2012

Here is an excellent new tool to help you find out what topics and issues are trending on Wikipedia – something most national newsrooms pay close attention to.

Content Analysis 2.0: A Framework for Using Wordle

12 January 2011

I’m presenting at a conference on Friday (14th January 2011): ‘Exploring the language of the popular in Anglo-American Newspapers 1833-1988’.

This is an AHRC funded research seminar, held at Sheffield University.



This paper explores the application of interactive web tool Wordle in the framing of content analysis, finding that it offers new possibilities for scholars. But it is only useful in the field of news archives where publishers make output available in data-portable formats.

Those traditional content analysis methods used to establish the frequency of terms in text can be constrained by human limitations – most notably the difficulties inherent to selecting terms to measure from a large collection of documents. Classification process requires agreed standards between researchers, in order to establish consistency (Weber, 1990), or intercoder reliability (Neuendorf, 2002). But the initial steps in framing research often rely upon assumptions which do not (and cannot) take into account the frequencies of all significant words across large-scale document collections.

This research proposes a means by which scholars might challenge their initial assumptions about texts, and use computational power to audit the full range selected. It is proposed that this may invigorate approaches to content and discourse analyses.

Wordle has been employed by newspapers in coverage of major news events, including analyses of major public speeches (Stodard, 2010; Rogers I and II, 2010), and political manifestos (Rogers III, 2010). In the literature, Wordle’s merits have been explored in terms of framing partisanship in political speech (Monroe et al, 2008). While this paper acknowledges the limitations of such software as a means to an end in content analysis (McNaught and Lam, 2010), the application of such technologies can nevertheless help inform the preliminary stages in content analysis.

Data scraping techniques both simple (and laborious) and complex (macros in Microsoft Word) are discussed. Large volumes of text (downloaded from Nexis) require parsing for metadata and stop-words, with the remaining text then usable in Wordle. This data is presented as a word cloud, with keywords ranging in scale as a function of frequency. This offers a more systematic means of auditing large data sets across a range of variables.

A plea for the application of data portability in the construction of online newspaper archives is put forth. Those archives which do not provide text-only download options (including Times Digital Archive, and Gale’s 19th Century British Library Newspapers in the UK, and New York Times Archive and Google News archive in the US) are explored in terms of their output formats. Optical Character Recognition software is acknowledged as a possible solution, but a hugely time-consuming one. This research demonstrates that without text-readable formats, content analysis of online news archives will remain limited in scope and potential.


McNaught, Carmel and Lam, Paul (2010) ‘Using Wordle as a Supplementary Research Tool’, The Qualitative Report Volume 15 Number 3 May 2010 630-643

Monroe, Burt, Colaresi, Michael, Quinn, Kevin, (2008) ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’, Political Analysis 16 (4): 372-403.

Neuendorf, Kimberly A (2002) The Content Analysis Guidebook, London, Sage.

Rogers, Simon I (2010) ‘The text of the Queen’s speech as a wordle – and how it compares to 1997’, The Guardian, May 25th

Rogers, Simon II (2010) ‘David Cameron and Nick Clegg’s statements as a wordle’, The Guardian, May 12th

Rogers, Simon III (2010) ‘Conservative manifesto: how does it compare to Labour’s?’, The Guardian, April 13th.

Stodard, Katy (2010) ‘Obama’s state of the union speech: how did the words he used compare to other presidents? As wordles’, The Guardian, January 28th.

Weber, Robert (1990: Basic Content Analysis. 2nd ed., Newbury Park, CA: Sage.

#Wikileaks detractors: let’s have some consistency please

18 December 2010

#Radio4 #Today broadcast an ill-tempered ‘debate’ between John Pilger and Janet Daley earlier this morning.

Daley made one particular point which deserves further scrutiny.  Apparently she’d like to see Julian Assange arrested for his role in handling the ‘illegally stolen’ diplomatic cables.

I do not remember her being quite so forthright with regard to the Telegraph editors who got their hands on the MPs expenses material early last year.

The question of ethics  (and legality) in this story didn’t really take off at the time – so agog were the public (and no doubt prosecutors) at the waste that was going on in Westminster.

Back then those on the left and right united to defend this act of chequebook journalism, but today it seems  the right are feeling much less solidarity.

It seems that when the invisible hand of American diplomacy is at stake, some find the truth a little too hard to handle.

Delicious to be scrapped? Some alternatives

17 December 2010

Rumours broke last night that Yahoo! are to mothball Delicious.

While this would be an inconvenience for those of us who use the tool to save our bookmarks, by binning the Delicious network much real and potential eureka moments in online search will be lost forever.

This is a massive loss for anyone who wants to make sense of the web, including journalists tracking stories, contributors and other reliable sources online.

If push comes to shove, I’m personally inclined towards either Pinboard (subscription for pro version required) or Xmarks, but there are plenty of other alternatives.

Here follows a short (and very much draft) extract of notes for a book I’m aiming to finish later next year, on online research for journalists.

It covers some alternatives to Delicious for newsgathering and research…

For some, the browsing and searching options in Delicious may seem a little over-restrictive – what of all those bookmarks whose owners haven’t tagged, described or even included a title for their bookmarks – moreover, how valuable can search in this field be, when you can’t search the full text of all links saved in this social bookmarking service?

Certainly Google doesn’t index Delicious bookmarks by default, so are there any alternatives?

As ever, of course there are – several services offer more sophisticated ways of searching your bookmarks, using a range of means.

Since March 2010, Google Bookmarks have been experimenting with public lists – although no where near as thorough or populated as Delicious, Google Bookmarks (which requires a Google account to use) does provide search for the entire page of your bookmarks, giving a researcher more control over the bookmarks which have been shared publicly, and lessening the impact of bad or inconsistent tagging.

Blinklist offer an alternative search option, albeit one with relatively little UK content, and which lacks much of the functionality of Delicious (similar could be said for Faves which contains a good deal of content, but isn’t as robust as Delicious).

Likewise, social annotation tools like Diigo (which incorporates FURL – account required) may be useful here too. CiteULike and Connotea offer an academic take on social bookmarking, and can be a useful accompaniment to Google Scholar for digging out expertise, or esoteric research.

inSuggest offers a bookmark discovery service – just type in your (or any) username to receive suggested new reading.

However, this searcher didn’t have much luck – I struggled to make the Deligoo plugin for Internet Explorer work (and the FireFox extension is not compatible with version 3.6.6.), while Delizzy wasn’t available at the time of writing.

In addition, I wasn’t able to sign up to Simpy, but that doesn’t stop it being a useful place to search other people’s bookmarks. It is possible to construct a Google Custom Search to house your bookmarks, or use sources like Lijit.

West meets East: A Journalistic Journey to Azerbaijan

12 December 2010

Yesterday morning I got back from Azerbaijan, where I’ve spent the past three days with former colleagues at the Centre for Investigative Journalism. This trip was organised by the Open Society Institute, who seek to promote and develop freedom of expression in Azerbaijan.

It was an enjoyable and insightful visit, where we learned about the state of journalism in four post-Soviet countries: our host nation, Georgia, Ukraine and Kazakhstan.

A folly in Baku old town, within the city walls, Azerbaijan

A folly in Baku old town, within the city walls, Azerbaijan

First impressions

As we passed through a very eastern passport control, a throng of local taxi drivers beckoned us with some familiar western names: BP and HALLIBURTON. The west has certainly taken its pound of flesh from Azerbaijan in recent years.

Baku, with its graffiti-free old town, and immaculately maintained public buildings, has been converting oil into prosperity since the 19th century.

This land was once dominated by wealthy, philanthropic oligarchs, whose portraits adorn the stunning restaurant we visited on the second night of our visit.   Azerbaijan is caught between two leviathans; Russia and Iran, whose influence on free expression today passes invisibly across the borders.

The motorway which took us from the airport to the centre of Baku was an uncannily smooth ride. Our host, Director of Azerbaijan’s OSI program, Rovshan Bagirov, assured us that this doesn’t come cheap, but that it represents little strain on the country’s coffers. An Azerbaijani government economist has apparently calculated that they could afford to lay a couple of millimetres of gold for the entire length of this motorway, if they saw fit.  Price is no object in Azerbaijan.


In and around Baku, every inch of visible public space is backlit, side-lit, or lit from above. At night, green lights are shone on the grass, to bring out its fresh nocturnal lustre. Trees are wreathed in pretty, glowing red berries. Beautiful white buildings, unmistakably Islamic, radiate with eastern opulence. But shining a light on the affairs of the country’s powerful elites is a different matter.

Azerbaijan is not an easy place in which to practice journalism. While there are no shortage of public data to interrogate, getting the message to the masses in a country where radio and TV are heavily censored and blocked, and where distribution of newspapers and pamphlets is tightly controlled, is not easy.

Fountain in Baku city centre, Azerbaijan

Fountain in Baku city centre, Azerbaijan


The Georgian OSI contingent (OSGF), comprising Hatia Jinjikhadze, Marina Ghoghoberidze, Irakli Tsertsvadze and Irina Lashkhi, told us of the media in their country.

Investigative journalism here is driven by broadcasting – their leading TV series is similar in form and content to PBS Frontline. Georgia has a strong post-Soviet tradition of public interest journalism, but this has been curtailed in recent years (certainly since the Rose Revolution of late 2003).

Much work is yet to be done online, where public engagement though currently small, is growing.

The OSGF Media Support Program exists to advocate transparency in the country. Their current initiatives are to support the development of independent media in the capital Tibilisi, as well as across the wider regions, and they are campaigning for a law to stop government interference in public interest journalism.

The Ukrainian OSI delegation gave us a tantalising glipmse of a holy grail in investigative journalism.  The Ukranian Pravda is an online-driven, non-paywalled media outlet which finds space for a rich mix of investigative journalism, while still managing to derive a healthy profit from online advertising.

But here there is no equivalent to the BBC with which to compete, and the Ukranian media market is not exposed to the same levels of competition for its online advertising revenues from the likes of Craigslist or Gumtree.

In this rapidly developing and well-resourced journalistic environment, a subversive TV format has emerged, where public figures are pranked into accounting for their public finance decisions before a live TV audience.

We then heard from Dariya Tsyrenzhapova from OSI Kazakhstan, who described a country whose oil wealth mirrors that of Azerbaijan. She told us of Gennadiy Benditsky, whose investigative work into the embezzlement of public fianances for newspaper The Vremya, has inspired much debate about the leaking of information and public transparency in news media. Here restrictive laws incite self-censorship, and Kazakhstan’s libel laws have a chilling effect on free speech which UK journalists will be able to relate to.

On Thursday afternoon we heard (all too briefly) from Emin Milli (@eminmilli), an Azerbaijani youth activist who described himself as ‘a blogger without a blog’. He talked of the 17 months he spent in an Azerbaijani prison, as a consequence of the free speech activism he helped organise on Facebook. His dignity, and the good humour with which he shared these experiences were humbling, and genuinely inspiring.

I’d like to extend special thanks to Rovshan, whose hospitality and insights into life in Azerbaijan have left a lasting impression. I’d also like to thank Fidan Bagirova, who organised and moderated this event with aplomb.

The Caspian Sea, from Baku, Azerbaijan

The Caspian Sea, from Baku, Azerbaijan (slightly wonky)