The other day I stumbled across a new green search engine – truevert.
Its a semantic search mashup made possible by Yahoo BOSS.
The thinking behind the technology is outlined by one of it’s creators here:
Our approach is to mimic the way that people learn language. When people learn a new word they learn its meaning by it relation to the other words in the sentence or paragraph. . . . Even if you learn it from a dictionary, its meaning is still from the context of other words. Similarly, when people understand a sentence, each word in the sentence helps to disambiguate the other words in the sentence. For example, consider the sentence, “the tree surgeon examined the young man’s palm.” By the time you get to the word “palm,” you have a pretty good idea what that word means.
But how we ‘learn’ language is not so cut and dried as is covered here – if only it were so simple.
The world of language acquisition is a fiercely contested one – with almost as many theories as there are letters in the alphabet. Just because we have tried and trusted learning methods in language, and have long-established tools like dictionaries and thesauri to help us come to terms with language, that is only part of the process in understanding.
And though it may seem petty (semantic even) to contend that some innate propensity toward language might play a significant part in the process of learning and understanding, to fail to factor in this possibility seems misleading, and hints at just how far down the road to artificial intelligence semantic search has yet to go (if the final destination is ever ultimately possible).
But to the engine – and while, of course, hard and fast conclusions about the quality of an engine can’t be drawn from one or two searches, nonetheless I have a real-world scenario which hints at a critical issue for this, and other semantic search engines.
Back in the real world, I’ve been searching for a savings account which is ethical (and so being green would also help in it’s relevance, in an engine like this). Here’s a search need which would be well suited to test truevert.
So I ran a search – ISA investment.
The headline to the first result which comes back is a blog article (not necessarily a reliable source – but that’s another argument) entitled: Lloyds TSB launches new fixed rate cash ISA.
But this article is not concerned with any ‘green’ savings products I concluded, having read down the article. So how come it came back top of the list?
Could it be that this result crept in under the semantic technology used to filter the results?
Is it possible that the ISA, being a savings product specific to the UK (check out the contrast in results between searches for ISA in google.com and google.co.uk), triggered a best of the rest filter, having found no genuinely relevant results?
Then I did a bit of scouring. What sort of words might appear in this blog post which would boost it up the results? Well, if you search for the word environment on this page, you’ll find a sentence right at the bottom saying…
At the moment just one in ten consumers has an Investment ISA but almost half admit, in the current economic environment, they would be more likely to open one if they were guaranteed a higher rate on their cash savings.
So is it the economic environment cited in this blog post which determined it’s ranking in the results? Well on account of the fact that the article isn’t about anything environmental in a green sense, I can but conjecture.
And so this leads me to think: how can a system ever hope to cope with the ever-growing (and multi-faceted) languages that exist throughout the English-speaking world (let alone variations in other languages throughout the rest of the world), given the endless figurative meanings people express in words at every turn.
Whenever I think about this subject, I’m always drawn back to an ongoing argument I have with my dad about precision in language.
Having been a civil engineer all his working life, he believes that accuracy and objectivity in language is central to the running of society. That is to say, if an architect wants to be sure the measurements and directions he/she outlines for the construction of their design, it’s essential that those directions are written in such language whereby nothing can be mistaken. And the fact that we’ve got to where we’ve got to in terms of technical sophistication in building, is indicative of the fact that people can communicate objectively using language.
But I’ve always believed that because language is culturally relative both in form and content, that it is impossible to say for sure what someone means by any particular sentence, or fragment of a sentence.
If you want to communicate with someone, and eradicate all possible means of misunderstanding, you have to do it in numbers, rather than words.
If it were possible to construct language using words and grammar which wouldn’t give rise to any potential (even the most marginal) misunderstanding, there would be no such thing as The Law.
Language is you and me, and as Tom Leonard famously(!) put it:
all living language is sacred.