Posts Tagged ‘new york times’

Content Analysis 2.0: A Framework for Using Wordle

12 January 2011

I’m presenting at a conference on Friday (14th January 2011): ‘Exploring the language of the popular in Anglo-American Newspapers 1833-1988’.

This is an AHRC funded research seminar, held at Sheffield University.



This paper explores the application of interactive web tool Wordle in the framing of content analysis, finding that it offers new possibilities for scholars. But it is only useful in the field of news archives where publishers make output available in data-portable formats.

Those traditional content analysis methods used to establish the frequency of terms in text can be constrained by human limitations – most notably the difficulties inherent to selecting terms to measure from a large collection of documents. Classification process requires agreed standards between researchers, in order to establish consistency (Weber, 1990), or intercoder reliability (Neuendorf, 2002). But the initial steps in framing research often rely upon assumptions which do not (and cannot) take into account the frequencies of all significant words across large-scale document collections.

This research proposes a means by which scholars might challenge their initial assumptions about texts, and use computational power to audit the full range selected. It is proposed that this may invigorate approaches to content and discourse analyses.

Wordle has been employed by newspapers in coverage of major news events, including analyses of major public speeches (Stodard, 2010; Rogers I and II, 2010), and political manifestos (Rogers III, 2010). In the literature, Wordle’s merits have been explored in terms of framing partisanship in political speech (Monroe et al, 2008). While this paper acknowledges the limitations of such software as a means to an end in content analysis (McNaught and Lam, 2010), the application of such technologies can nevertheless help inform the preliminary stages in content analysis.

Data scraping techniques both simple (and laborious) and complex (macros in Microsoft Word) are discussed. Large volumes of text (downloaded from Nexis) require parsing for metadata and stop-words, with the remaining text then usable in Wordle. This data is presented as a word cloud, with keywords ranging in scale as a function of frequency. This offers a more systematic means of auditing large data sets across a range of variables.

A plea for the application of data portability in the construction of online newspaper archives is put forth. Those archives which do not provide text-only download options (including Times Digital Archive, and Gale’s 19th Century British Library Newspapers in the UK, and New York Times Archive and Google News archive in the US) are explored in terms of their output formats. Optical Character Recognition software is acknowledged as a possible solution, but a hugely time-consuming one. This research demonstrates that without text-readable formats, content analysis of online news archives will remain limited in scope and potential.


McNaught, Carmel and Lam, Paul (2010) ‘Using Wordle as a Supplementary Research Tool’, The Qualitative Report Volume 15 Number 3 May 2010 630-643

Monroe, Burt, Colaresi, Michael, Quinn, Kevin, (2008) ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’, Political Analysis 16 (4): 372-403.

Neuendorf, Kimberly A (2002) The Content Analysis Guidebook, London, Sage.

Rogers, Simon I (2010) ‘The text of the Queen’s speech as a wordle – and how it compares to 1997’, The Guardian, May 25th

Rogers, Simon II (2010) ‘David Cameron and Nick Clegg’s statements as a wordle’, The Guardian, May 12th

Rogers, Simon III (2010) ‘Conservative manifesto: how does it compare to Labour’s?’, The Guardian, April 13th.

Stodard, Katy (2010) ‘Obama’s state of the union speech: how did the words he used compare to other presidents? As wordles’, The Guardian, January 28th.

Weber, Robert (1990: Basic Content Analysis. 2nd ed., Newbury Park, CA: Sage.