Posts Tagged ‘wikileaks’

Content Analysis 2.0: A Framework for Using Wordle

12 January 2011

I’m presenting at a conference on Friday (14th January 2011): ‘Exploring the language of the popular in Anglo-American Newspapers 1833-1988’.

This is an AHRC funded research seminar, held at Sheffield University.



This paper explores the application of interactive web tool Wordle in the framing of content analysis, finding that it offers new possibilities for scholars. But it is only useful in the field of news archives where publishers make output available in data-portable formats.

Those traditional content analysis methods used to establish the frequency of terms in text can be constrained by human limitations – most notably the difficulties inherent to selecting terms to measure from a large collection of documents. Classification process requires agreed standards between researchers, in order to establish consistency (Weber, 1990), or intercoder reliability (Neuendorf, 2002). But the initial steps in framing research often rely upon assumptions which do not (and cannot) take into account the frequencies of all significant words across large-scale document collections.

This research proposes a means by which scholars might challenge their initial assumptions about texts, and use computational power to audit the full range selected. It is proposed that this may invigorate approaches to content and discourse analyses.

Wordle has been employed by newspapers in coverage of major news events, including analyses of major public speeches (Stodard, 2010; Rogers I and II, 2010), and political manifestos (Rogers III, 2010). In the literature, Wordle’s merits have been explored in terms of framing partisanship in political speech (Monroe et al, 2008). While this paper acknowledges the limitations of such software as a means to an end in content analysis (McNaught and Lam, 2010), the application of such technologies can nevertheless help inform the preliminary stages in content analysis.

Data scraping techniques both simple (and laborious) and complex (macros in Microsoft Word) are discussed. Large volumes of text (downloaded from Nexis) require parsing for metadata and stop-words, with the remaining text then usable in Wordle. This data is presented as a word cloud, with keywords ranging in scale as a function of frequency. This offers a more systematic means of auditing large data sets across a range of variables.

A plea for the application of data portability in the construction of online newspaper archives is put forth. Those archives which do not provide text-only download options (including Times Digital Archive, and Gale’s 19th Century British Library Newspapers in the UK, and New York Times Archive and Google News archive in the US) are explored in terms of their output formats. Optical Character Recognition software is acknowledged as a possible solution, but a hugely time-consuming one. This research demonstrates that without text-readable formats, content analysis of online news archives will remain limited in scope and potential.


McNaught, Carmel and Lam, Paul (2010) ‘Using Wordle as a Supplementary Research Tool’, The Qualitative Report Volume 15 Number 3 May 2010 630-643

Monroe, Burt, Colaresi, Michael, Quinn, Kevin, (2008) ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’, Political Analysis 16 (4): 372-403.

Neuendorf, Kimberly A (2002) The Content Analysis Guidebook, London, Sage.

Rogers, Simon I (2010) ‘The text of the Queen’s speech as a wordle – and how it compares to 1997’, The Guardian, May 25th

Rogers, Simon II (2010) ‘David Cameron and Nick Clegg’s statements as a wordle’, The Guardian, May 12th

Rogers, Simon III (2010) ‘Conservative manifesto: how does it compare to Labour’s?’, The Guardian, April 13th.

Stodard, Katy (2010) ‘Obama’s state of the union speech: how did the words he used compare to other presidents? As wordles’, The Guardian, January 28th.

Weber, Robert (1990: Basic Content Analysis. 2nd ed., Newbury Park, CA: Sage.


#Wikileaks detractors: let’s have some consistency please

18 December 2010

#Radio4 #Today broadcast an ill-tempered ‘debate’ between John Pilger and Janet Daley earlier this morning.

Daley made one particular point which deserves further scrutiny.  Apparently she’d like to see Julian Assange arrested for his role in handling the ‘illegally stolen’ diplomatic cables.

I do not remember her being quite so forthright with regard to the Telegraph editors who got their hands on the MPs expenses material early last year.

The question of ethics  (and legality) in this story didn’t really take off at the time – so agog were the public (and no doubt prosecutors) at the waste that was going on in Westminster.

Back then those on the left and right united to defend this act of chequebook journalism, but today it seems  the right are feeling much less solidarity.

It seems that when the invisible hand of American diplomacy is at stake, some find the truth a little too hard to handle.

Introduction to Computer Assisted Reporting

1 December 2010

On Monday I introduced our MAs to Computer Assisted Reporting.

My job was made easier given Wikileaks latest release dominating Sunday’s (and Monday’s) papers.  This story (indeed all of the major Wikileaks stories this year) are a testament to the power of Computer Assisted Reporting.

For many years we have lagged far behind the US (and to a lesser extent some continental European countries), but in Wikileaks, CAR in the UK has truly come of age.

However, it would be wrong to assume that CAR is only helpful when looking for needles in haystacks in big, international stories.

CAR is just as useful in a local context.

For that reason (and partly because our course is NCTJ-accredited), I’ve drawn my examples from local news issues; crime in London, and Hillingdon Council’s incomings and outgoings.  The second example in particular, is intended to be taught in conjunction with local Public Affairs.

The files are here:

CAR script_2010_Murray_Dick_2010

CAR_examples_with working_Murray_Dick

Here’s hoping for a revolution in data manipulation in the weeks and months ahead.