Posts Tagged ‘guardian’

Content Analysis 2.0: A Framework for Using Wordle

12 January 2011

I’m presenting at a conference on Friday (14th January 2011): ‘Exploring the language of the popular in Anglo-American Newspapers 1833-1988’.

This is an AHRC funded research seminar, held at Sheffield University.



This paper explores the application of interactive web tool Wordle in the framing of content analysis, finding that it offers new possibilities for scholars. But it is only useful in the field of news archives where publishers make output available in data-portable formats.

Those traditional content analysis methods used to establish the frequency of terms in text can be constrained by human limitations – most notably the difficulties inherent to selecting terms to measure from a large collection of documents. Classification process requires agreed standards between researchers, in order to establish consistency (Weber, 1990), or intercoder reliability (Neuendorf, 2002). But the initial steps in framing research often rely upon assumptions which do not (and cannot) take into account the frequencies of all significant words across large-scale document collections.

This research proposes a means by which scholars might challenge their initial assumptions about texts, and use computational power to audit the full range selected. It is proposed that this may invigorate approaches to content and discourse analyses.

Wordle has been employed by newspapers in coverage of major news events, including analyses of major public speeches (Stodard, 2010; Rogers I and II, 2010), and political manifestos (Rogers III, 2010). In the literature, Wordle’s merits have been explored in terms of framing partisanship in political speech (Monroe et al, 2008). While this paper acknowledges the limitations of such software as a means to an end in content analysis (McNaught and Lam, 2010), the application of such technologies can nevertheless help inform the preliminary stages in content analysis.

Data scraping techniques both simple (and laborious) and complex (macros in Microsoft Word) are discussed. Large volumes of text (downloaded from Nexis) require parsing for metadata and stop-words, with the remaining text then usable in Wordle. This data is presented as a word cloud, with keywords ranging in scale as a function of frequency. This offers a more systematic means of auditing large data sets across a range of variables.

A plea for the application of data portability in the construction of online newspaper archives is put forth. Those archives which do not provide text-only download options (including Times Digital Archive, and Gale’s 19th Century British Library Newspapers in the UK, and New York Times Archive and Google News archive in the US) are explored in terms of their output formats. Optical Character Recognition software is acknowledged as a possible solution, but a hugely time-consuming one. This research demonstrates that without text-readable formats, content analysis of online news archives will remain limited in scope and potential.


McNaught, Carmel and Lam, Paul (2010) ‘Using Wordle as a Supplementary Research Tool’, The Qualitative Report Volume 15 Number 3 May 2010 630-643

Monroe, Burt, Colaresi, Michael, Quinn, Kevin, (2008) ‘Fightin’ Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict’, Political Analysis 16 (4): 372-403.

Neuendorf, Kimberly A (2002) The Content Analysis Guidebook, London, Sage.

Rogers, Simon I (2010) ‘The text of the Queen’s speech as a wordle – and how it compares to 1997’, The Guardian, May 25th

Rogers, Simon II (2010) ‘David Cameron and Nick Clegg’s statements as a wordle’, The Guardian, May 12th

Rogers, Simon III (2010) ‘Conservative manifesto: how does it compare to Labour’s?’, The Guardian, April 13th.

Stodard, Katy (2010) ‘Obama’s state of the union speech: how did the words he used compare to other presidents? As wordles’, The Guardian, January 28th.

Weber, Robert (1990: Basic Content Analysis. 2nd ed., Newbury Park, CA: Sage.


Styling an online news niche?

8 September 2009

Aside from offering useful reminders on ‘i before e’ exemptions, and the proper placement of apostrophes, news style guides sometimes offer an insight into the values of news organisations, and the social mores of their audiences.

Take guidance on use of the words ‘gay’ and ‘homosexual’ in news, for example.

The BBC suggests its (radio) journalists accord with the following code:

…some people believe the word “homosexual” has negative overtones, even that it is demeaning. Most homosexual men and women prefer the words “gay” and “lesbian”. Either word is acceptable as an alternative to homosexual, but “gay” should be used only as an adjective. “Gay” as a noun – “gays gathered for a demonstration” – is not acceptable. If you wish to use homosexual, as adjective or noun, do so. It is also useful, as it applies to men and women.

The Times style guide says:

gay fully acceptable as a synonym for homosexual or lesbian.

And the Guardian style guide says:

gay: Use as an adjective rather than a noun: a gay man, gay people, gay men and lesbians not “gays and lesbians”

Meanwhile, in the US news producers including AP, New York Times & Washington Post have all been working from the same principal for a long time now:

The Associated Press, The New York Times and The Washington Post all restrict usage of the term “homosexual” — a word whose clinical history and pejorative connotations are routinely exploited by anti-gay extremists to suggest that lesbians and gay men are somehow diseased or psychologically/emotionally disordered, and which, as The Washington Post notes, “can be seen as a slur.” AP and New York Times editors also have instituted rules against the use of inaccurate terminology such as “sexual preference” and “gay lifestyle.”

By contrast with this earnest and sensitive approach, some of those ‘culturally traditional’ sources in the US continue to persevere in their battle for control of language with such zeal that absurd consequences can abound – see The Dangers of Auto-Replace.

Getting back to Blighty, the Telegraph style book offers markedly different guidance to any of it’s UK competitors:

gay: permissible in headlines if essential but use homosexual in text.

There is no sense of equivalence here. Instead, this coded compromise hints at the pervasive nature of the permissive web surfer.

For ‘essential’ we surely can’t discount ‘essential to search traffic’. Words in <title> tags take precedence over terms used in body-text when it comes to search ranking, the significance of which becomes apparent when you bear in mind that last year 50% of traffic came via search.

Off-line, The Telegraph balances alienating the ‘pink pound’ against reaching out to the paper’s older, more socially conservative readers to whom the term ‘homosexual’ is preferred to ‘gay’.

Online there is not yet such a thing as a ‘pink pageview’ – search favours the majority term over the minority one (check Google Trends for the runaway winner here). This simple market truism might have financial, as well as political consequences for any news outlet swimming against the tide.

But of course all of the above style advice is intended for the present – so what of the future?

Politics (and potential offence) aside, if’s core readership, who might comprise a future subscription-base, expect to read (and find) news containing those terms they prefer rather than those terms the rest of society uses, then this might present another facet to the development of niche online news.

Of course it could be argued that this approach would risk alienating younger readers whose preferred choice of terminology may render certain words and phrases obsolete.

But this assumes that a hardcore of younger people (and future subscribers) won’t align themselves to a political outlook which prefers usage of one term over the other, which given the nature of politics seems unlikely.

It could equally be argued that people don’t care sufficiently enough about the political and social significance of these (or any other) terms, to the extent that it would influence their decision to spend money on information provided elsewhere free-of-charge.

But on the other hand, web usability tells us that reader-experience is core to creating successful online copy and branding, and that developing trust (which might include the consistent use of preferred words and language) is key to success on the web.

Whose really winning the online news war?

22 May 2008

Shane Richmond at the Torygraph must have pinched himself this morning.

Last week saw him manning the barricades, and issuing a pre-emptive rebuttal to a (what was at the time brewing) story about far-right infiltration at Mytelegraph, from the presses of his main online rival, The Grauniad.

What better way to stick two fingers up to your nearest competitor then, than to trumpet your newly won crown as the UK’s number one newspaper website, even if the margin of victory may be transient? (more…)