CTO Blog

You are in: CTO Blog

VOTE FOR ME
CIO Blogs
IT Blog Awards

Subscribe

Recent Posts

Navigate


Search the blog

« iPhone 3G, and Mozilla show user adoption to new levels of Power | Main | The Fallacy of separating Web development from IPv6 »

Cuil bursts onto the semantic search scene

By guest blogger, John Furneaux

John Furneaux is a colleague at Capgemini, who I’ve recently started to converse with on Socio-technical thinking. John kindly offered to write a guest post on an interesting – and more to the point potentially useful – evolution in search. Hope you enjoy.

ps – these posts from 2007 might also provide interesting additional context… ERP where E is the World and Sandy.

Cuil bursts onto the semantic search scene - John Furneaux

Ten years ago an unnamed portal owner told two guys from an internet start-up called BackRub (who were trying to sell a superior search technology) that “as long as we're 80 percent as good as our competitors, that's good enough. Our users don't really care about search”. 10 years later, I think the Google duo have made their point.

So can anyone challenge Google with superior search results? There has certainly been no shortage of attempts. But with its prodigious computing resources, unmatched brand equity and dominance across multiple search verticals (maps, images, news etc.), Google will surely retain the same dominion over web search for the foreseeable future that Microsoft has for so long held over the desktop.

Enter semantic search. A search engine guided by crawlers which identify the meaning of the pages they retrieve, not solely their data content and keywords. ‘Meaning’ in this context is not about vast leaps in artificial intelligence, but the ability to glean structured facts from unstructured data, like the location of an earthquake strike from a news article.

The latest venture into this space is Cuil (pronounced “cool”), the launch challenged, slightly bizarrely branded and even ridiculed brainchild of ex-Googlers which sought and received unprecedented launch publicity in its first live 48 hours. So what does it offer more than its rather macho claim of a larger index than any competitor?


Using an example close to home, compare the results page for the query ‘Capgemini’ from Cuil with that from Google. Three key differentiators are apparent: a matrix result layout, suggested associated search categories (e.g. ‘Companies of France’) and an image for each result (with varying degrees of relevance).

Does this represent a stronger proposition for users?

While the layout and aesthetics may prove popular according to taste, the other features are perhaps misguided. If I’d wanted to know about ‘Companies of France’ (one of the associated categories) I could have typed in ‘french companies’. If I’d wanted a picture related to Capgemini, I’d have clicked ‘Images’ for my search in Google, a split second effort. So why is Cuil so exciting? Because its attempt to understand my query demonstrates again that the semantic web is far from stuck in the dreams of PhD researchers, but is impacting real-life working services. Following in the footsteps of Hakia, it reminds us of the approaching new paradigm, a shift as starkly different from the status quo as was the move to Google’s search from Yahoo’s directory.

The first impact of this shift is for individual users requiring information. The key here is in the question which precedes each internet search. Nobody wants to know “apple pie recipe”. That’s the query they construct to answer the question “What’s the recipe for apple pie?”. In this way, searching is currently a four step process. Take a question (“What is the tallest building in London?”), convert it to a query (“tallest building london”), choose a result (Wikipedia article on ‘Tall Buildings in London’) and scan for your answer (One Canada Square). A fully-functional semantic engine in comparison will take its criteria from the question itself and use its structures of meaning to provide more relevant information.

This is the approach of Powerset, the search start-up recently acquired by Microsoft. Despite some promising progress, Powerset is perhaps not there yet, as it currently operates only over the Wikipedia domain, which is metadata rich in the first place. In this new world where ‘search’ engines actually answer questions and are therefore actually more like ‘retrieval’ engines, the Google approach of offering vast numbers of links to places which might hold your answer could start to look decidedly retro.

The second - perhaps more dramatic - shift will be for the enterprise. Once these metadata-rich services are constructed, they will be accessible through any number of widgets. Take an example task such as data cleansing – to spot erroneous surnames in data fields, a widget could identify those entries which return abnormally low matching results when compared with the surname list built semantically from sources across the web. Or perhaps it might use the aggregation of data items identified as song titles from social networks as well as blogs and websites to identify not just keyword trends (like Google Trends) but real insight into global trends in music by album, compilation, artist or genre. Or might it support product development by trawling blogs, forums and comparison websites identifying the specific functionalities which users appreciate most by gadget type. In essence, disparate information on the Internet will actually be accessible as if internally held.

The bottom line for Cuil? It’s no Google-killer (although who really knows…): but the increasing raft of semantically oriented search engine competitors from the ground up might start to make the search marketplace just that little more varied, and perhaps even a little more Cuil.

TrackBacks

TrackBack URL for this entry: http://www.capgemini.com/cgi-bin/blog/mt-tb.cgi/560

Comments

John,

Interesting post – so are you arguing that semantic search is dependent on successful natural language processing? Surely that’s a long way off?

Leon

An article on Slate.com discusses the launch--and the subsequent debacle--of Cuil. I featured this article on Light Reading in Talent as well. Here are some excerpts:

"Last Monday morning, the search engine Cuil launched with great fanfare. By Monday afternoon, it had completely tanked. Users who test-drove the would-be Google rival were quick to complain about mismatched articles and thumbnail photos; the poor breadth of results; obvious queries that turned up blank; and even, in a moment of true existential crisis, the site's inability to locate itself."

http://www.slate.com/id/2196492/

Great challenge Leon – thanks.

For me, full natural language processing (in a ‘Turing test’ sense) is not required for the next generation of semantic search engines. I think that simply the ability to associate data points in web content (a trivial example being a geographical location, number with units of centigrade and a date together on a page suggesting a weather report) can lead to sufficient meaning being drawn out of unstructured data to fundamentally alter what search engines offer us.

My modular view of search separates the front end ‘convert a user question to a query' (which clearly does require natural language processing) from the back end ‘parse web content semantically’ (which I argue does not). In this way, we can be getting good semantic results long before the search engine would deal reliably with a natural language question.

@Jerry: Yes absolutely, Cuil had an extremely troubled start. To be fair, they did manage to garner an extraordinary 50 million queries on their first day, which according to Cuil is 'in the same ballpark as Microsoft's Live Search'.

The article I linked to above, which discussed the launch, noted the interesting point that Cuil's non-parallel approach to search meant that high volumes actually also degraded the quality of results.

Among the current spate of semantic search announcements (bubble?), the potentially interesting set of features might come from the recently announced Freebase Parallax.

http://mqlx.com/~david/parallax/

If their teaser is to be believed....

My experience with Powerset, Hakia and Cuil have been disappointing vis-a-vis Google. Cuil stands out only in presentation, but fails miserably in terms of finding the right search results. Some searches have drawn a complete blank.

In my opinion, unless an end-user is clearly known/identified by the semantic search engine,learning about a particular individual's interests and search behaviour will be impossible. The next best thing is to learn from the "wisdom of the crowd" and this is perhaps a much longer learning journey (depends on the size of total customer base and the number of similar search behaviours)

Hence, I see a mismatch between higher expectations and current capabilities as the immediate challenge/disadvantage for the first generation semantic search engines that might keep the end-users returning to Google for the time being.

Renjish

Post a comment

Commenting Policy

Name:
Email Address:
URL:
Remember personal info?
Comments: (you can't use HTML tags for style)