So, we all remember our ‘happily ever afters’ don’t we?
I have the third instalment in the Information Management story; it’s tantamount to a scandal in my eyes! Enter into the field of ‘Big Data’ where, vendors slug it out over who can process the largest quantity of data in the shortest amount of time. As David realised at Goliaths expense, agility is often more important that size…..
Dealing with a large variety of inputs quickly to make effective decisions is fundamental, even in myths and legends…..
- Hercules not rampantly decapitating each of the Hydras heads as he was armed with a next best course of action algorithm suggesting that fire was quickest path to victory. That might have saved some time!
- The trojans running appropriate security and fraud analytics against Odysseus’s Trojan Horse prior to permitting its entry through the city gates. Odysseus, you’re not looking quite so smart now eh?
- Icarus posting out his ideas for his wax-and-feathers escape plan to a social media site for sentiment analytics prior to his escape attempt from prison. What might have happened I wonder!
Maybe our heroes (and the Trojans, sorry Odysseus) in these particular stories would have had a slightly easier ride had they rapid access to combined internal and external insight during their quests!
As Big Data Practitioners we all aspire to provide unique solutions, against the ever increasing sea of data in which we are drowning daily. However, do not be fooled by the current disposition towards pure volume-based infrastructure and software focus to achieve your organisational goals for Big Data…
Enter the arena of the ‘Big Data’ misnomer….
- It can be ‘plugged in’ to your current infrastructure seamlessly
- It is focused on driving value from social media
- It provides superfast response times for every query
- It relies on all your data being in ‘the cloud’
- It requires extensive capital investment in new hardware, software and infrastructure
Does any of this sound familiar? It simply does not tell the whole story, and, that the real key to successfully delivering Big Data initiatives is as old as the Queen of Narnia herself.
Let me be the first to state that having a flexible infrastructure and, innovative software capabilities (especially in terms of harnessing Hadoop, Mapreduce, Social media feeds, unstructured data and so on) is an important component of your Big Data Strategy.
It is not however, the ‘be all and end all’ and, furthermore not where the majority of your time, effort and cost are likely to reside. Big data requires a different approach and ethic to standard data architecture and delivery, is not necessarily faster than in-house technology (nor should it be), can often be achieved using low capital outlay and through innovative re-use of your in-house IT landscape, will present unique challenges to collecting external data events outside your organisations direct control and, finally, does not necessarily imply a cloud-based data retention strategy to be effective.
As we originally discussed ‘The Big Data Challenge’ is not simply about hadoop, massive volumes, unstructured data, social media and NO SQL rather, its about capturing the right events with the appropriate context at the right time regardless of the data source on which the analysis is based….
Variety is the spice of life for Big Data
The largest challenge for a Big Data project is in fact, the common foe for any Information Management endeavour, namely Variety. Increasingly varied data interfaces with differing rules underpinning the context of the data content, become more apparent, as we attempt to process greater volumes of data with the express intent of delivering insight at greater speed than ever before.
Big Data initiatives fall apart when organisations fail to consider the diversity of data inputs required and, the event-based business rules necessary to repeatedly deliver critical business insight on large amounts of data…
An Ode to Big Data for the Biggest and Fastest
I personally cannot continue to propagate the myth that Big Data is purely about processing large data volumes at speed and, as such I present:
An ode to Big Data for Biggest and Fastest
There once was a CFO called Kim,
Who bought loads of fast servers at whim,
Whilst looking for action, to client dissatisfaction,
She realised handling variety was king!
Big Data is about the 3 V’s – Volume, Velocity and Variety with the ability to handle a wide variety of formats seamlessly, the real roadblock to success.
The lesson to be learnt here is, that the biggest and fastest Big Data infrastructure, is only as efficient as the lowest common denominator ‘Variety’. If the solution can handle structured and unstructured data from both an internal (inside-out) and external (outside-in) perspective, even if this reduces the speed at which insight can be achieved, this is likely to make for a better platform for your organisation.
So, how do I add a little ‘David’ to my Big Data ‘Goliath’?
So, if you have reached this stage, you are virtually converted!
My suggestion is to add a little finesse and focus to your Big Data giant platform and initiative.
- FOCUS on a single business outcome (i.e. Brand Sentiment, Customer Retention etc)
- DESIGN a simple BIG DATA MODEL + EVENT CAPTURE mechanism to fulfil the outcome
- DISCOVER and ALIGN your structured and unstructured data feeds against the requirement
- DETERMINE the immediacy of each data need against the most appropriate Big Data architectural component (Hadoop, Unstructured, Database, ETL, Analytics, Social Media etc)
In my earlier blog entitled ‘Right Data’ rather than Big Data’, we alluded to the emergence of the enterprise data discovery tools (Infosphere Discovery and Global Ids as examples) which, are one means to accelerate points 2 and 3 in my recommendations.
Finally, point 4 will be driven by the volume, velocity and variety of the task at hand. Where immediate responses to predominantly structured data are required, Database or In-memory Appliances will be the order of the day; If the volumes and the variety are enormous but the question can be answered periodically (weekly or monthly), batch ETL feeds to Hadoop to process in parallel then returning to your analytical engines to visualise may meet the requirement.
Size isn’t everything, but, its a start!
Don’t forget, taking a volume and velocity approach to Big Data through technological wizardry alone would be equivalent to opening a pandora’s box inside your organisation. It should be more about a ‘cradle to grave’ approach to event capture against a common business outcome where, source system interface preparation is everything.
The key to a successful Big Data initiative lies in handling that elusive spice of life we call ‘variety’ .