Wednesday, February 07, 2007

Web Search - Everything old is new again

Given the flurry of new energy and funding going into companies attempting to improve web search, I thought I should spend some time trying to make sense of it all. From what I’ve seen most companies follow one of two approaches to improving search: 1) human edited solutions and 2) next generation algorithms. (Caveat, I’m focusing on companies improving the quality of search results for web search rather than the spate of companies improving UI or going after vertical search.)

THE FIRST GROUP includes companies such as Jimmy Wales’ Wikia, TextDigger, and Eureekster. They are often referred to as Social Search companies. The general idea is to use allows users to determine which results are most relevant for a search query - sometimes to explicit voting, sometimes through more implicit behaviors such as tracking user clicks. If this sounds familiar, you're right. Yahoo started out with an army of editors. So did Ask. Ever heard of DMOZ? It's still alive. The difference today is that these editors have been open-sourced - de-centralized volunteers. Seeing what Wikipedia has accomplished it is obvious that this strategy can lead to very powerful results. I have high hopes for this approach. Humans will always be better than judgment when it comes to natural language processing. Language after all, is a human construct best interpreted by the humans who created the code in the first place. The brick wall that the first round of human edited search companies hit was cost. On today's web, if a company can effectively address issues of SPAM, there is a real shot at building a great search engine which adapts as quickly as our discourse adapts without bearing the burden of armies of editors.

THE SECOND GROUP includes new companies such as Powerset and Hakia, and even Microsoft implies they have natural language processing with their Ms Dewey search engine, even if somewhat tongue in cheek. These companies all use some form of language processing to improve search, either through advanced parsing of the search query, clustering of results, or relevance ranking results. Sound familiar? AltaVista had, and still has a great categorization tool that didn't help them avoid the spiral into irrelevance. Ask(Jeeves) touted the ability to input English language queries. Heard of MeaningMaster/Cognition. They're still around, but have yet to hit their stride despite years of effort. Danny Sullivan has a great history of natural language search processing here. Of course the main challenge with natural language processing is the algorithms. Processing is getting cheaper and AI is getting better every year. Now may be the time for a quantum leap in algorithmic processing, but I haven't seen any evidence yet. Remember a great search engine should use a large number of matching technologies. Imagine what Google search would be like if they exclusively used page rank and ignored word counts, taxonomies, and meta data. The point is natural language processing (NLP) is just one technique and any search company that relies on NLP alone will really struggle to deliver the most relevant results. Two other issues I have with NLP are that 1) the improvements to be had pertain to only a subset of searches. There are without doubt examples where NLP is advantaged (see Bambi Francisco's starry-eyed review of Powerset here, but I did a few test of my own and even a search like "What is Powerset doing?" which contains plenty of language ambiguity yielded virtually the same results on Google as they did on Hakia. (The social search approach btw, is valuable to a much wider set of search queries in my opinion). My second issue is probably more takes me longer to enter a natural language search query than a keyword query, e.g. "Who are the presidents of the united states" versus "us presidents." Who has time for that?

So which approach is better? It's likely both Natural Language Processing and social search will be integrated into every search engine in 5 years. Until then users will always gravitate towards solutions that provide obviously better search results within the first few queries and given social search's applicability to a larger set of queries and the proven existence of large and highly active volunteer contributors I would put my money on social search to drive the next successful up-start search engine.

Footnote: According to Jim Armstrong, who really should have started a blog by now, there are bigger search problems to fry looking at search UI and information discovery and integrating data from multiple sources (desktop, enterprise server, and web) into a single search interface. I’ll leave it to Jim to create a blog entry to address this issue.

Sphere: Related Content


Jackline said...

Hi Nice Blog .This web time clock is used to track the time and attendance of employees, and at the same time track labor activity against specific parts, jobs, and operations.

Calandra said...

Good words.