Big data or is the accumulation of small data the real issue?

Publish date:

No less an organization than McKinsey has decided to draw attention to ‘big data’ being the next big thing in a new report.  As you might expect it’s well researched and well written, but it seems to start from the principle that big data is a big opportunity to use the ‘big resources’ of cloud-based […]

No less an organization than McKinsey has decided to draw attention to ‘big data’ being the next big thing in a new report.  As you might expect it’s well researched and well written, but it seems to start from the principle that big data is a big opportunity to use the ‘big resources’ of cloud-based computational devices to comb more data to find better answers. I am not going to argue that this isn’t true and that there aren’t some sectors and organizations that really do find this a breakthrough, but it doesn’t fit very well with the usual issues I find myself discussing with CIOs.

The two key topics are the cost and manner of storage, and the challenge of governance around the increasing number of sources their colleagues are using from the Web (content), and Web 2.0 (people). If you break these down it’s more about the issues that arise from small data, i.e. the sheer number of people, devices, sources all creating and consuming data. This creates some serious challenges around where the data is and how it is both de-duplicated and backed up, but in practice these are technology issues and all of the storage vendors are keen to come in with their latest products.

Try Googling, or Binging, ‘cost volume data storage’ and you will see what I mean. Now try ‘data governance’ and there is only really one consistent name that comes back and that’s the Data Governance Institute which has been around for some years publishing some good work around their Data Governance Framework, a working checklist and approach to the topic. However, as I said at the beginning of this post the challenge is small data, or more particularly the constant acquisition of small pieces of data from external sources saved on to hard discs, and then passed onward to others in the enterprise.

This small data accumulation is not generated by our own systems, and generally not regulated, but if you ask users, a real breakthrough is providing them with the information they need. As such we should regard it as ‘untrusted’ and ensure it is isolated from our own corporate data which is regarded as trusted. This is where it gets tough as quite a bit of this data will ‘leak’ into the enterprise by various routes and the rules then say the enterprise has taken ownership and is responsible for the accuracy of this data.

It’s all to do with the increasing focus on the external use of technology in the front office and market place, as opposed to the traditional internal role of IT to transact data which is created by the enterprise’s formal processes. We want and need to use the sources of information of the market to react in a tactical and successful manner to events and opportunities through decision support at a local level. Right now this is happening in many enterprises and falls under the category of ‘consumer-based IT’, and frankly it is, or should be, a worry to any CIO, but it’s not going to go away. In fact, it will continue to increase as one of the key changes in capability that consumer technology is bringing to enterprises’ abilities to drive increased revenues.

My term for this phenomenon is ‘trusted in context’, and the context is the judgment or experience of the person using the information and/or use they make of it. A salesperson using the public information from a competitor’s Web site about their special offer to the market to adjust the position they take in selling against this competitor in their account is using this information in context. The context is specific and limited, and so the risk is also limited in its possible consequences to the salesperson’s enterprise. But use this information as applying to the whole market place in a big data model without checking its providence and accuracy and it’s potentially a serious distortion. However, it’s not enough to use this very simple definition at a time when the whole use of technology is changing month by month towards the extension of the Internet Web model and external interactions.

I have mentioned MIKE2.0 before. It’s really moved on, but what is MIKE2.0? Their Web site defines it as; MIKE2.0, which stands for Method for an Integrated Knowledge Environment, is an open source methodology for Enterprise Information Management that provides a framework for information development. The MIKE2.0 Methodology is part of the overall Open Methodology Framework. Most important of all it’s a dynamic environment that is constantly building and changing its approaches as the market, technology and uses change.

So my recommendation is by all means consider big data and read the various reports on the topic, but I suspect that your colleagues in strategy and marketing will drive that side. Right now the major issue that matters in most enterprises is actually small data, and the rise in the amount of small data being used across the enterprise by an increasing number of people, and stored on various devices. For that I recommend taking a more detailed look at MIKE2.0 starting with the five phases of their approach to the topic.

Related Posts

cloud transformation

Cloud economics : How to get full value from the cloud

Date icon October 20, 2021

The changing operating model leads to a long-term structural transformation of IT spending.


Empowering our employees to become cyber savvy in the new normal

Date icon October 14, 2021

Celebrating Cybersecurity Awareness Month at Capgemini

Climate Change

Digital technologies are key enablers of action for the decade

Daniel Sahl-Corts
Date icon October 13, 2021

Digitalization is already profoundly changing the state, the economy and society. It...