It seems as though big data is the latest candidate for the hype cycle, and even though I reacted to a number of articles and white papers a month or so back, it appears to be back on the agenda again. The triggers this time are a press interview in which I was asked to give my views, and the publishing of a very good – meaning very practical account of what to consider and focus on getting right – white paper by a colleague, Steve Jones.
But then you are not going to adopt big data for a while yet are you? So no need to read this just yet! Wrong!! The point about big data is that it’s an environmental shift that is taking place all around us by virtue of the shift in what people are doing online, with what devices etc. My point in the previous post ‘Big data or is the accumulation of small data the real issue?’ was that big data is made up of the huge amount of small data that is being handled and accumulated by the way that we are increasingly using a rich variety of sources other than enterprise applications.
There are two lines of thought around at the moment and it is necessary to separate them in order to clarify my title. The first is merely to see that technology change in the form of more powerful systems and tools allows a level of analysis of business information not previously possible. That’s certainly true, but this approach remains focused on the use of structured data, and relational data bases, in other words the trusted data that we have today internally captured and processed, and most important of all categorized in order to be able to make use of it.
Okay, so most of the comments on this topic do include ‘unstructured’ data in their headings, but the reality is that the overall approach is still defined by the need to deal with some form of categorization in order to make sense of any form of analysis. My colleague Steve Jones has just written an excellent paper on this topic called 'Master Data Management Mastering the Information Ocean' which offers some really practical approaches to making a workable solution. But this is still big data = big analytics empowered by cloud or grid technology making computational power to do this available at a sensible price.
It’s the other side to big data that is the game changer; the sheer volume of interactions and accumulations by users with all kinds of data in an online world. This is the real unstructured material that is ‘happening’ in every enterprise today, creating not just a headache on the provision of storage, but a nightmare of untrusted data in circulation, or could it just be the new face of Business Intelligence? It’s the ability to react to real-time external market information in real-time to optimize a response, instead of the current after-the-event big data analysis to see how well it all turned out.
As with many of these new approaches an example makes it easier to understand; a sales person sees from a news item and a link to a competitor’s Web site that they will be holding a promotion in the local area for the next month, and accordingly after collaboration with their manager they change their pricing in a key account where the competitor is active to ensure that they do not lose business. That’s what real-time decision support tied to social collaboration tools is all about, the de-centralization of business models in order to maximize and optimize local conditions.
However, if this activity was captured and used badly in a ‘big data analytics model’ it would be an example of untrusted data contaminating the outcome. Of course it would be possible to ‘recognize’ this data and categorize it into a model to see what would happen if this competitor behaved the same way at a country, or even global level. However, by the time this sort of modeling was carried out in all probability if the competitor wanted to do this it would have happened!!! Again, BI after the event!
What this defines is the need for a new category of data which I call ‘trusted in context’, i.e. in the context of the customer account in question, at that time, with that competitor the data could be trusted enough to act upon. It’s happening right now in most enterprises, and sometimes coupled with tools like Salesforce.com chatter, all of which may, or may not, be an enterprise provisioned capability. That’s the game change which big data brings; the available data to support local operational decision making within a limited context is as big as all the available data reachable by a search engine on the Web.
The question is, do you realize that this is happening and what should you be doing with providing guidance and policies about how such data is used, stored, and most of all controlled, in its acceptance as enterprise trusted data? Here is a link to a good site where a discussion under the title of ‘Unstructured Data: The Elephant in the Big Data Room’ is running.