Last week I posted something about what library catalogues can learn from commercial search engines, as the academic community struggles with getting academic information out to a generation of students used to using Google to answer all their problems.
Yet today I read a fascinating article about the future of search, suggesting in fact that its the commercial search engines who fall down in areas where the public library sector does well.
Concerns are being voiced in the online search community that we are nigh approaching keyword-meltdown. As more and more content is added to the web, it is argued that keyword-based search engines like Google will struggle more and more to cope with the sheer volume of information out there, returning larger and larger haystacks of results, when what we really ask for are needles. Page-ranking won’t save them, as cited results (which invariably float to the top of most search engine results) are not necessarily the needle we are looking for.
Several approaches to solving this problem have been mooted:
social search, tagging, guided search, natural-language search, statistical methods, open search, semantic search, and (way out there) artificial intelligence.
That’s a lot of alternatives, and they are being used to an extent already. But most have their own associated problems:
Social searching relies on contributors and hence a small percentage of the web population, it would never be as efficient or as far reaching as an algorithm. Anyone who has used del.icio.us or stumbleupon will tell you, this approach is still in its infancy as far as serious online research is concerned.
Tagging is unstructured and messy. It would be hard to get everyone to agree on tagging conventions, and make sure everyone means the same thing when they tag a certain thing. Not to mention all the factual innacuracies which plague Flickr and other sites who rely on tagging. Of course some (Foucault) would say that the problems inherent to language would make this impossible anyway…
Guided search is again quite limited, and requires some education.
Natural-language search is good for answering questions, but makes it hard to search conceptually.
Open search is a little like semantic search, in that it converts web-copy into structured data.
Semantic search seems to be the most obvious way forward. The only problem with this approach (which essentially means turning the web into a massive database of structured information, with every page’s ‘aboutness’ captured in uniform fields or tags) is that it will require re-writing the web from scratch.
Of course if everything on the web had been organised and ordered like a public library catalogue, this problem wouldn’t be so serious. But that is, of course, pure pie in the sky.
One thing I didn’t see mentioned in the article was the potential for advanced operators and filters (though I suppose these could fall into the guided search category). Filtering your results by domain, by document type, or by language, and being able to search the titles or url webpages, are all good ways of cutting through the chafe. Of course this requires some training for the user, and metadata conventions on the part of the publisher, but it’s the best way of finding what you’re after I’m aware of…