Monday, 12 May 2008

Search Engine Technology

While there has been a lot of speculation in the press recently about the Microsoft offer for Yahoo, Google have remained the unchallenged market leader for quite some time, with around 60% of the market (for english speaking counties anyway).

Over the past couple of weeks, I have been taking another look at search technology to see what's new and improved. I found several companies with products in private or public beta testing. There are 3 main areas of development; semantics, clustering and visualisation.
(to see the full presentation on search technology click here.)

Changing the way that search engines display their results, can have a dramatic effect on the speed a user can sort through vast quantities of data to find the answer they require. Microsoft tafiti uses silverlight combined with live search to allow stacking of results to different searches and dragging and dropping of data, media and sites, to a sidebar. Microsoft have also improved their integration of mapping data. Sites like Lygo return thumbnails of each website making easier to recognise sites at a glance.

Clustering was been around for a while, but recent developments in technology have made it far more effective. Sites such as Clusty provide a number of clustering options ranging from the source of the data to the content. Clusty not only uses clustering to group results, but it also uses the same complex linguistic technologies when performing a search, knowing what words or phases have the same meaning, and where the same word can have different meanings.


Quintura use clustering in a very different way, producing navigable tag clouds, that can be surfed from term to term, until you find the data you are looking for. For example i ran a search on myself, found some race results for a fell race i ran, linked to a local running club, and from there to details on the club members and finally to their blogs and websites.


The final area of development, is that of semantic searching. The current holy grail of searching is the ability to get real answers to real questions. For example you could ask "when was Elvis born" and you would probably get a fairly accurate answer. Where this seems to fall down is when the question is more subjective or where there are lots of matches, asking "when was i born" would be much more complex, as would "when was John smith born" as there are many many possible correct answers. Askwiki and wikiasearch both have beta's that work to some degree, true knowledge also have a very interesting public beta. Another key problem with this type of search is that it takes a huge amount of time and man power to enter the required data.

In many ways search engines are far superior to humans, they can reference and cross reference billions and billions of bits of data instantly. But in many ways humans still have the advantage, we start collecting data from infancy, and we are far better at understanding more subtle references. For example a human would recognise a photographic reference or a really abstract reference to a movie or popular song lyric (even if that was slightly different to the original - for example if you whistled a movie theme), a computer would find this far more difficult.

3 comments:

Kyrene said...

Great work.

Anonymous said...

Hi there,

I have a question for the webmaster/admin here at davidcoxon.blogspot.com.

May I use some of the information from this post right above if I provide a backlink back to this site?

Thanks,
Peter

David Coxon said...

Peter that would be fine, you are more than welcome to use any information from this site if you wish. I have to say that it quite old now though and that search technology is constantly changing so it may not be as useful as it once was.