It was a US President who first said ‘it’s the economy stupid’ when asked what the top topic from a long list of issues would be most important to voters. If you substitute the role of a voter with that of a user, then the answer to the same question – what is most important to them – might well be, ‘it’s the data stupid!’ Yet bizarrely data is the one topic that we still have the least clarity about, particularly with respect to what and how we will be dealing with it in the more complex environments beyond IT applications and structured databases.
We have clarity, or at least emerging clarity, about the issues of computational power, networking connectivity, and even integration, by the cloud, internet and web respectively, but exactly what do we have on data? Lots of promises around how semantics will change the game, and as part of this there is no doubt that semantic events are getting more interesting in establishing how to manage data created for, and consumed by, human eyeballs, not applications and computers. But is this enough to address the changes we are experiencing around the explosion in use of the web, and the road into ‘everything and everyone connected in the cloud’ based on everything as a ‘service’?
Increasingly we are seeing a new phenomenon in which huge amounts of valuable data held by government or quasi government, and other bodies, is being made available for download and use. A prime example of this is the recent move by the World Bank to not only make a large amount of their data available, but to even place it in a catalogue to help people to make use of it. There are examples of new collaboration between people and organisations that share data in ways that create wholly new data combinations. (MashUps if you like, though increasingly we seem to hear the term ‘portal’ being applied to this). A personal favourite of mine is the housing affordability map, well worth taking a look at if you haven’t seen it, but do also take a look at Citizen-Dan as well. All of these are new information-rich seams that users will quickly find and from that enterprises will start to tap into them. This increases the ‘data problem’.
So what do I define the ‘data problem’ to be? The first part of the problem is that I don’t think we are paying enough attention to this topic. This is probably because our tacit assumption is that data remains firmly associated with applications, as is the case with our current models. These models were developed to address the big issue in the early nineties tied to the spread of personal computing, or PCs, the loss of control of data, or more importantly loss of consistency in data, the one version of the truth requirement. But the whole point of ‘services’ as opposed to ‘applications’ is to break monolithic models down into flexible, reusable and re-orchestrated elements, and thereby destroy the tied nature of data. But we also have to add to this the question as to where we will be deriving data from, think of the examples above, what format is it in, when should we take ownership etc, etc.
So my view is that we are facing four recognisably different data environments that will all co-exist in our enterprise.
- Data from the web with the characteristic that it is designed for human consumption, is unstructured in today’s terminology, and most importantly is of an unknown quality. I.e. we don’t know if it is a true or false fact.
- Data formed into a service object to be used in an orchestration with other service elements. This is really tricky and in my mind the big issue that has to be addressed. One of the few places I have seen this explored recently is at Searchdatamanagement.com.
- Traditional data coupled to the applications, and relational databases, which is not only structured, but most importantly is known to be ‘true’ and is ‘owned’ by the enterprise in the eyes of the auditors.
- Lastly the challenge of archiving data when a metadata model is required to allow data to be rapidly recovered if needed for legal purposes etc.
It seems likely that a piece of data will sequentially pass through all four stages as it progresses from external capture, to internal orchestration, into a full process application transaction before finally being archived. And on this path its name, values, true/unknown, and relationships are likely to change at each stage.
Now have I done enough to highlight the challenge? I repeat again ‘it’s the data stupid’ and recommend that you start asking some awkward questions and take a careful look at the topic with your internal data architects. Magic bullets? Don’t think there are any at present, if you know differently then do please post. Instead I think it’s a hand-cut governance issue at this stage of the game, but do at least have the awareness not to let the topic slip unnoticed into the enterprise, and then explode. Strangely enough that’s what happened with PCs last time we had a technology paradigm change!