Purpose of CIJ Current Awareness Policy
At the CIJ, we need to keep up-to-speed on examples of excellence in investigative journalism, for a number of reasons. These include:
The need to reach out to investigative researchers (and other interested parties, like whistle blowers and journalism students) wherever they are, to offer our help and services,
The need to develop our current contacts,
The need to keep track on journalists who are new to the field, to supplement our speakers, and;
The need to keep track on trends in investigative research, FOI, Computer Assisted Reporting (CAR), and new fields as they arise, which will help CIJ policy as it applies to our training and events.
Relying on our own reading in the field is fine, but there is a whole world of new- and old- media out there which we could do with keeping on top of, not to mention people we haven’t heard of yet. A comprehensive approach is needed to make sure we don’t fall behind in the field.
So we need to think about what subject matter we are concerned with, before putting in place a policy and a structure to capture this online content.
Finding suitable keywords for unearthing content in this field can be difficult. It’s a fuzzy area – how do you define a piece of investigative research? Some FOI requests render simple information with little effort, which is hardly synonymous with the hard and often thankless work associated with large-scale investigations. Likewise, some investigations (such as into a used car salesmen, locally) are of less relevance to us than other investigations. Trying to define it isn’t easy. We therefore need to think in terms of the different entities involved.
Here’s what I came up with after a bit of thought:
- Terms to capture (very broadly) mentions of ‘investigation’, CAR, FOI, and Whistle blowing.
- The general subject matter of concern to investigative journalism stories.
- Names of those journalists and others who already produce quality investigations.
1) Keywords for broad subjects
I sat down with a pen and paper, and tried a couple of searches here and there. I then consulted an online thesaurus, and most importantly got our team involved in coming up with suitable keywords to cover our searches, which I then tested. The terms left once all this had been done, were:
“an investigation by”
“computer assisted reporting” also less dependent on the UK (the US is far ahead of the UK here) ignore CAR – nowhere is case sensitive.
FOI or “freedom of information”
Whistle blower or “whistle blower” or “whistle-blower” or leak or informant or “a source at”
2) Subject matter of interest to Investigative Journalism
We need to think of the subject matter which concerns investigative journalism, and think about a hierarchical structure which will help us manage our current awareness, as it develops into an archive. This will also help with the organisation of content on our site, and internal content as we produce it.
We need to create a CIJ taxonomy, to structure the archive of content we manage before it becomes unmanageable. We therefore need to find a suitable taxonomy (possibly from the world of law), and tailor it accordingly.
Broadly speaking, much investigative research concerns the crimes and misdemeanours of the powerful. So the emphasis must be on those crimes most associated with this criminal field, terms such as (in no particular order):
There are also crimes specific to International Law which we need to consider, such as:
Crimes against humanity
The slave trade
These are though, quite emotive terms; so for legal reasons we may need to hide them from public view.
We may want to turn some of the index terms into either free text or tag-based search terms, and either convert to RSS or incorporate into our manual news gathering.
Unfortunately, those taxonomies which would be easiest to use as models (such as the British Crime Survey) are heavily concerned with personal crimes, which are not often of direct concern work at the CIJ. This is an area I will take a closer look at soon, but it is important – we will be starting indexing content early next week.
3) Established journalists and others in the field
Knowing who is already producing good quality investigations makes the process of finding their content easier.
The easiest (if perhaps, most nepotistic) approach to this for us, would be to create feeds running searches across many search engines and media, for the names of CIJ advisors. We can then add others in the field – we can expand to search for mentions of past speakers, alumni, and others contacts in the field, as well as those we may not already be in direct contact with (such as finalists, or nominees for awards, like The Paul Foot Award).
We don’t need to be too comprehensive here though – some of our advisors are not directly in the public eye, through choice. So feeds from Google Blogs should do it bear in mind a good deal of investigations don’t make it to Google News). Those common(ish) names like Michael Ryan will need to be augmented with other terms, but could remain a problem (in which case will need to be removed, and dealt with later).
A manual search in Amazon’s advanced search for journalistic titles, sorted by release date, should keep us up to date on new books in our field in advance of their publication.
We can also search for associations – where journalists we know have worked with other journalists on large investigations we don’t know, by setting up an RSS feed for a Google search like this:
“david leigh and” site:guardian.co.uk
RSS and automated news gathering: searches
I’m going to focus heavily on RSS and forgo conventional alerts. RSS is real time, not dependent on 24-hour turn around like the alert. Because we live in a 24-hour news world, it doesn’t make sense to rely on them for current awareness, in the first instance.
I’m going to start by setting up some RSS feeds for the searches outlined above, within a generic CIJ Google Reader account. This has the advantages of being platform independent (so I can access from home or work), while also allowing access to my other colleagues in the CIJ, should I be unavailable to pick up the news.
Yahoo News and Google News I anticipate to be the greatest potential problem in terms of overload, so I’ll start with UK-based content, and build up to world issues where possible.
I’m then going to subscribe to search feeds for blog posts in Technorati, and WordPress search (to begin with). The benefits of the ‘related tags’ option in WordPress are estimable – feeding by aboutness rather than free text is far more likely to render good results, albeit this will form part of our manual current awareness, more of which later.
I have since learned that the following basic url string of:
http://wordpress.com/tag/<insert tag name>
…allows you to effectively set up feeds (at least single keyword feeds) in WordPress. See the option on the bottom-left of the page.
This could make such RSS suitable to our overall subject matter terms (i.e. fraud), rather than our investigative research terms.
The option to search for a tag (using the command tag:investigation) is an advantage Delicious has over other aggregators. Google Blog search would be vastly improved if it incorporated this.
Having looked at IceRocket, I will only set up searches for single words, as I’ve noticed it doesn’t seem to phrase search either in free text searching or tag searching.
I have had to use advanced search in Twitter. Microblogging dictates that I modify some of my searches – after all, what are the odds that a user will relinquish 27 of their precious 140 characters in typing out “computer assisted reporting”? Something I’ve been thinking about re microblogging – are typos a more serious issue here? They are more likely to be used by people using mobile technologies, with their tiny keys. We might need to factor this in (i.e. search for mis-typed keywords, and abbreviations of some of our terms).
This approach (especially the blog searching) is also a great way to find new contacts – I also intend to go down the path outlined in another blog post I did last year, on finding contributors via advanced blog search.
The keywords “data mining” ,”computer assisted reporting” , FOI or “freedom of information” And whistle blower or “whistle blower” or “whistle-blower” or leak or informant or “a source at” are all less dependent on the UK (the US is far ahead of us in many of these areas), so I will not delimit by country. I also left out the keyword CAR for obvious reasons.
But what about those sources who don’t offer RSS? This search in Newsnow would be a fuzzy but interesting way to keep on top of investigations in local papers. I can try turning into an RSS feed by passing it through page-2-rss which may or may not work on aggregator pages like this. If it doesn’t work, we’ll just have to keep tabs on it manually.
Delicious – is a source I’m not sure is suited to RSS. I can run all the above searches in it, but can’t create a feed for them, but in any case Delicious’s doesn’t offer a date order ranking, so in a sense its worth pursuing manually.
Nonetheless, there’s no harm in setting up a few Delicious page-2-rss feeds for tag searches in Delicious.
We also need to consider conversations taking place across the social media, from social networks, to forums, listserves, Google groups. We need to search for them, and subscribe to their RSS, or sign up to their distribution lists.
Once they’ve been iteratively improved, and assuming we can render this content consistently, we will combine all of the above into a Yahoo pipe or Feedrinse module, which will allow us to delete duplicates, and turn our automated current awareness into a public feed, which we can then use as a widget on our site. I’ll be using this read write web article to help, if I can make it work well.
I’m not going to bother subscribing to straight Google, Yahoo or MSN search result RSS feeds – we want breaking news, not content that’s popular by pagerank weighting.
Once we’re up and running, I’ll organise all of the above into designated folders by keyword in our Google Reader account. I’ll then name these folders after the keywords used, so we can keep up to date in the early days of what’s yielding good, bad and indifferent results. We need to build scope for refining all of these searches as we develop our strategy and fields of interest.
RSS for bespoke sources
Aside from the generic search feeds, we would also benefit from keeping tabs on the output of organisations and individuals we know already produce quality investigative research.
Certainly a major trend in investigative research seems to be away from newspapers, and toward other organisations and groups in society.
We need to build these up, on an adhoc basis – but there are some obvious areas in which to start.
A tailored list (following brainstorming with my CIJ colleagues) will provide a list of organisations we should be monitoring, such as:
Specialist magazines, and
Video and audio
We also need to think beyond type-based media – about video (whether TV or film), and audio.
Searching for programmes/films featuring investigation (text-based) can be done in:
As far as direct video feeds are concerned, I’ll be looking at:
Seesmic will have to be done manually.
As far as audio feeds (or manual gathering) is concerned, I’ll be starting with the following (picked up from an article in Pandia):
One thing that concerns me from the outset about searching podcast-specific engines, is the volume of paranormal investigations, which I can’t seem to remove from some engines, and which are of course of no interest to us.
Aside from automated searches, and in addition to all that I’ve mentioned above, where automated search won’t cut the mustard, I’m going to also implement a policy of manual searches – say once a day, across a range of engines and sources.
This will allow us spread the net wider still. It will involve searches using the terms already outlined, but also advanced operator searches in Google, such as:
“seen by the guardian” site:guardian.co.uk (replace Guardian and site:guardian.co.uk with other appropriate sources)
“the guardian understands” site:guardian.co.uk (replace Guardian and site:guardian.co.uk with other appropriate sources)
Some of the other aggregators I’ll be looking to cover our manual newsgathering policy at every day are:
Results from these searches, in addition to the hugely important spots we will work with from our resident experts, will (hopefully) be pulled together and made publicly available and searchable via our Delicious account, with recent saves I’m hoping to create into a widget for our site.
Current awareness in past events
One other thing to factor in to our current awareness is all that has gone before.
We need to seek out future mentions of big investigative strands across the media from the past, to keep on top of people revisiting certain investigations. To that end, I picked the brains of our in-house expert, and started putting together a list of those TV, print, radio and documentary film sources who no longer exist, to cover these bases too.
Feeding on from this, a related area which is of great interest to us at the CIJ is keeping on top of significant landmarks in the field of investigative journalism.
This is a hard area to develop from scratch, but a good starting point is to look at the BBC’s On This Day.
It’s not possible to search the content in this site (no search engine is offered). However this is easily got round by doing a domain search in Google, where you can exploit their advanced search operators.
Any other business
Although I intend to make our feeds public, and share with our contacts in Google in time, I have nevertheless uploaded my draft collection of feeds as an xml template here. Any feedback or comments would be more than welcome.
Investigative journalism is an international business, and at some point we may want to start newsgathering in foreign languages. We will therefore harness the language skills of our staff and interns to pull this content into the fold.