CTO Blog

CTO Blog

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

Cuil bursts onto the semantic search scene

Category :

By guest blogger, John Furneaux John Furneaux is a colleague at Capgemini, who I’ve recently started to converse with on Socio-technical thinking. John kindly offered to write a guest post on an interesting – and more to the point potentially useful – evolution in search. Hope you enjoy. ps – these posts from 2007 might also provide interesting additional context… ERP where E is the World and Sandy. Cuil bursts onto the semantic search scene - John Furneaux Ten years ago an unnamed portal owner told two guys from an internet start-up called BackRub (who were trying to sell a superior search technology) that “as long as we're 80 percent as good as our competitors, that's good enough. Our users don't really care about search”. 10 years later, I think the Google duo have made their point. So can anyone challenge Google with superior search results? There has certainly been no shortage of attempts. But with its prodigious computing resources, unmatched brand equity and dominance across multiple search verticals (maps, images, news etc.), Google will surely retain the same dominion over web search for the foreseeable future that Microsoft has for so long held over the desktop. Enter semantic search. A search engine guided by crawlers which identify the meaning of the pages they retrieve, not solely their data content and keywords. ‘Meaning’ in this context is not about vast leaps in artificial intelligence, but the ability to glean structured facts from unstructured data, like the location of an earthquake strike from a news article. The latest venture into this space is Cuil (pronounced “cool”), the launch challenged, slightly bizarrely branded and even ridiculed brainchild of ex-Googlers which sought and received unprecedented launch publicity in its first live 48 hours. So what does it offer more than its rather macho claim of a larger index than any competitor? Using an example close to home, compare the results page for the query ‘Capgemini’ from Cuil with that from Google. Three key differentiators are apparent: a matrix result layout, suggested associated search categories (e.g. ‘Companies of France’) and an image for each result (with varying degrees of relevance). Does this represent a stronger proposition for users? While the layout and aesthetics may prove popular according to taste, the other features are perhaps misguided. If I’d wanted to know about ‘Companies of France’ (one of the associated categories) I could have typed in ‘french companies’. If I’d wanted a picture related to Capgemini, I’d have clicked ‘Images’ for my search in Google, a split second effort. So why is Cuil so exciting? Because its attempt to understand my query demonstrates again that the semantic web is far from stuck in the dreams of PhD researchers, but is impacting real-life working services. Following in the footsteps of Hakia, it reminds us of the approaching new paradigm, a shift as starkly different from the status quo as was the move to Google’s search from Yahoo’s directory. The first impact of this shift is for individual users requiring information. The key here is in the question which precedes each internet search. Nobody wants to know “apple pie recipe”. That’s the query they construct to answer the question “What’s the recipe for apple pie?”. In this way, searching is currently a four step process. Take a question (“What is the tallest building in London?”), convert it to a query (“tallest building london”), choose a result (Wikipedia article on ‘Tall Buildings in London’) and scan for your answer (One Canada Square). A fully-functional semantic engine in comparison will take its criteria from the question itself and use its structures of meaning to provide more relevant information. This is the approach of Powerset, the search start-up recently acquired by Microsoft. Despite some promising progress, Powerset is perhaps not there yet, as it currently operates only over the Wikipedia domain, which is metadata rich in the first place. In this new world where ‘search’ engines actually answer questions and are therefore actually more like ‘retrieval’ engines, the Google approach of offering vast numbers of links to places which might hold your answer could start to look decidedly retro. The second - perhaps more dramatic - shift will be for the enterprise. Once these metadata-rich services are constructed, they will be accessible through any number of widgets. Take an example task such as data cleansing – to spot erroneous surnames in data fields, a widget could identify those entries which return abnormally low matching results when compared with the surname list built semantically from sources across the web. Or perhaps it might use the aggregation of data items identified as song titles from social networks as well as blogs and websites to identify not just keyword trends (like Google Trends) but real insight into global trends in music by album, compilation, artist or genre. Or might it support product development by trawling blogs, forums and comparison websites identifying the specific functionalities which users appreciate most by gadget type. In essence, disparate information on the Internet will actually be accessible as if internally held. The bottom line for Cuil? It’s no Google-killer (although who really knows…): but the increasing raft of semantically oriented search engine competitors from the ground up might start to make the search marketplace just that little more varied, and perhaps even a little more Cuil.

About the author

C. Bate

Leave a comment

Your email address will not be published. Required fields are marked *.