In recent posts I have touched on the challenges of too much content, too little context, together with the further challenge of the sheer number and types of devices that will create and use data http://www.capgemini.com/ctoblog/2008/03/communications_convergence_or.php. I was even pretty provocative about whether master data has become an IT department distraction that is slowing down business requirements for ‘better’ and ‘faster’ decision making by using content in a more personal manner http://www.capgemini.com/ctoblog/2008/03/master_data_has_it_become_a_ba.php. So I think it’s about time I got a little more positive about at least one potential solution.

I can confess to being a cynic about some aspects of the Semantic Web together with some others, and for a pretty good summary of where we were a year ago take a look here. I am however a fan of Resource Description Framework, RDF, as a method to make something meaningful of the Semantic Web, it may be a personal bias based more on what I think I understand of the approaches being taken than fact. Anyway back to why I think we are making some real progress with RDF, and if you want a real in depth explanation on RDF from the W3C then be my guest, but it is complex and not necessary to my comments here. A better update on the topic together with some good case study examples was published in the Scientific American in December 2007.
Okay enough on the technology primer, and back to the real world with the tipping point that we are at, or at least approaching, with the various aspects of Web 1.0 and Web 2.0. I can make a half way believable claim that formal Web 1.0 content publishing by commercial bodies from Government to Enterprises is at least semi structured data with some degree of acceptable provenance. Where as I will suggest that Web 2.0 represents a huge, and growing, amount of unstructured data with little to no provenance, but a tendency to provide ‘infill’ or background data around any structured data topic. Finally in our own Enterprise we have structured data with an absolute provenance, just not enough of it, or up to date enough for what we need. The question is can we combine the three forms in such a way that our internal structured data represents the framework against which the data from the Web can be added and tested for accuracy.
As an example; Capgemini knows my home address, and period of employment, as structured facts, but on the Web may find that I seem to have other home addresses, but if this can be checked against other Web data that shows I was working somewhere else at the time these addresses where given, then this would be an ‘acceptable’ fact. If I am shown as living at another address that I have given as my home address whilst I am working for Capgemini that would be deemed an ‘unacceptable’ fact for checking.
You see how it works? Start with the structure and add to it the unstructured in a manner that statements are continually tested for acceptability or not. Welcome to what RDF can do; Ah ha goes the cry, but will it scale? The surprising answer seems increasing to be yes, and you can even find a listing of what has been achieved by different pioneers/vendors. Even more impressive is that some of this has translated into commercially available products, or services. Too good to be true? Well perhaps for a year or two, in the mainstream market, but it’s getting pretty close now, and for UK citizens it’s even possible to test out the claims.
Garlik, who has chosen to use RDF to power a people information service, and is a UK start-up that attracted £9 million in funding, has a web site that allows you to try out their RDF product in a simple way. They offer a free test on the accuracy of information on your digital identity which uses the approach I laid out in the example above, simple and compelling, but also given there are sixty million people in the UK a pretty good example of scale.
So there is the ‘hands on’ reason that I think RDF is going to start to deliver a way forward for us to merge in a controlled manner unstructured data with our structured data with verification of the accuracy, and as a result extend the amount of usable data. Go try it, after all in this industry, using some thing is definitely the path to believing in the technology!