In an excellent article on Mashable by Brian Gentile of Jaspersoft and the subsequent and equally insightful follow-up by James Kobielus at IBM, they conclude that:
Big Data is NOT….
- Only about massive volume but more importantly massive variety (multi-structured data) and velocity (value-driven change frequency), or the 3 V’s as they are commonly known, to deliver ‘whole population analytics’.
- Simply hadoop rather, its a hybrid capability incorporating additional features such as NoSQL and Massively Parallel Processing (MPP) platforms
- Purely unstructured data rather, our ability to harness multi-structured data forms across numerous and changing formats and structures of corporate data
- Only for social media feeds and sentiment analysis but, has a variety of use-cases for both on-premise and off-premise data assets enabling corporations to ‘continuously harvest’ information
- No SQL Whatsoever; NoSQL does NOT mean No SQL, it actually means Not Only SQL through the innovative use of a variety of low-latency data API’s proliferated across the emerging global cloud infrastructure.
In a nutshell, an excellent combined perspective that poses some interesting questions.
So, lets conclude speculatively that Big Data is a federated and massively distributed technology cocktail with a modern ‘cloud-like’ wrapper to harness multi-form data effectively, at the speed of the business across internal process touchpoints (operational and financial excellence) and external interaction touchpoints (customer and partner excellence).
Next lets assume that Big Data capabilities can be serviced by a hybrid architecture utilising cloud, commodity information management H/W and S/W technologies (From integration and MDM through to MPPs and in-memory) harnessing structured corporate data repositories (the Inside-Out paradigm) and, increasingly prevalent unstructured transactional event information from social media, mobile devices and web interactions (the Outside-In paradigm).
The architects and technologists out there will mention that not all these capabilities are fully mature, that Hadoop is actually pretty slow and that the complex event and business rule processing required to harness data at this scale still requires some work and product/solution innovation but lets leave it here for now….
Sounds reasonable to me.
So what is the real challenge of Big Data?
In my opinion it is alignment and harnessing appropriate information context on a corporate scale.
We should consider the following fundamental constraints in 2012 and build on them:
- Data Science and Data Visualisation disciplines are in their infancy. Until they fully mature, our ability to automate whole-population analytics with appropriate context and intelligence is hampered
- Big Data Provisioning is fundamentally different – It needs a Marshalling process. This has been highlighted by Steve Jones and Manuel Sevilla’s in recent blogs and a corresponding white papers represents a step-change in how multi-format data can be processed for corporate advantage going forward
- Without a single unified view, Big Data intelligence is unattributable to business need. Many organisations are finding it difficult to get the most basic of unified views of customers and other key master data management assets. Without the most basic of customer profile how can transactional analysis of surrounding events and interactions be truly useful?
- The information needs of the few always get prioritised over the needs of the many – Until appropriate data management of key organisational information is managed corporately, and, the organisational processes that ‘touch’ the data are governed corporately within the auspices of a sustainable and outcome based common information architecture and finally and perhaps most importantly, the information is a C-level priority (cue the role of the Chief Data officer or CDO) where the immediate needs of BAU activities and ongoing transformation projects rarely supersede the corporate alignment of data, will Big Data become truly useful.
I believe that whilst the technology cocktail for Big Data is an Information Management evolution, the Information Management revolution will come when we corporately align information assets with focused context to the transactional event at hand rather than the current organic and a la carte approach to information management seen in the corporate world of 2012.