Information is Knowledge, Knowledge is power, power is energy over time, energy is mass times the speed of light squared… if something has mass then it has gravity therefore information has gravity…
The odd piece about this clearly false statement is that actually data does appear to have gravity in how it works in an organisation and in some ways it can be compared to the way planets form. Data Marts, ODS, ERP databases and indeed pretty much any data store in an organisation including weblogs tends to exhibit a basic characteristic.
People will add information to the data store they use most often even if it clearly isn’t meant to go there because they find that, in the short term, easier than putting it somewhere else. Once the information is there however it becomes part of the reason that you can’t migrate away from that store as more and more people come to rely on the store due to the increase in information within it.
So in other words a data store, any data store, attracts new information in direct relation to how much information is already in there. The more that is already there the more likely it is that people will tactically add new information to it. This includes duplicating information from other sources rather than accessing information from the original source or another existing data mart/store.
The problem with this is that tactical ODS solutions follow an exponential ballooning of information when architecturally they are not designed to cope with such volumes. The more information is added the more cost is added but its added only small piece by small piece so a death by a thousand cuts problem occurs. Thus your nice clean ODS on day 1 is your nightmare hybrid data warehouse, ODS, web site database in year 3. What this all means is that organisations need to do three things
- Recognise that integration technologies today significantly reduce the need for ODS approaches
- Have a clear and strong governance approach which minimises or eliminates data duplication
- Deliberately create a data star
The first two we’ve talked about before at Capgemini so its the third that I’ll talk about here. Big Data technologies such as Hadoop are designed to create the data star, a place where new information can reasonably cheaply be added and where the assumption is that big crunching will go on to turn it into useful information and where the intention is to create a large gravitational pull for new information. With Big Data solutions the goal is to contain as much as possible so increases in volume are not a problem they are in fact a benefit. By establishing a Big Data environment with the goal of it becoming your organisations data star you create a Data Gravity Well (I might be pushing the analogy rather too far here) which builds on how people already work with data sources, namely if they find a source useful they keep adding information into it. With traditional approaches this leads to a source that becomes cumbersome, expensive and quite probably moves from useful to issue. With Big Data approaches it just means that more relationships and more challenges can be addressed and more pre-processing can occur before the data is turning into structured information within those more traditional BI solutions.
The way to fight the data gravity problem is to build a data star. Just make sure you’ve planned your infrastructure scaling correctly though or it could collapse and create a data black hole…