Today I did a Google search on “Big Data”. I got an overwhelming results of 1.640.000.000 item in less than one fifth of a second. This reminded me of The “Droste effect”. The effect of a picture appearing within itself, in a place where a similar picture would realistically be expected to appear.[Source Wikipedia in 0.27 seconds]. Why? Because Google, BD pioneer with its Google Map Reduce, is like the mother-of-all-big-data.
For some Big Data is nothing more than just a Hype ( Big Data Hype, 24.000.000 results in 0.25 seconds). They argue that Big Data is just big business and perhaps nothing more than old wine in a new bottle. However there is no denying that there is an increased need for analysis of large volumes of both structured data and unstructured data. So let’s take a closer look.
Big Data is often defined as the combination of Volume, Variety and Velocity.
Volume is often associated with Hadoop. Hadoop makes it possible to store petabytes of (un)structured data in the cloud on commodity hardware. As mentioned before Hadoop is based on Google Map reduce and Google file system. In combination with traditional relational databases, as we find in data warehousing and Business Intelligence that are perfectly optimized to deal with the analysis of structured data, new Big Data architectures evolve.
In a recent study on Big Data by Capgemini and The Economist Intelligence Unit http://www.capgemini.com/services-and-solutions/technology/business-information-management/the-deciding-factor/ we found that 85% of the 600 surveyed business leaders said the issue is not about volume but the ability to analyze and act on the data in real time. This brings us to one of the other V’s: Velocity or real time insights. Here we find technology improvements like in memory technology (often associated with Qlikview, Tableau, Spotfire, PowerPivot). But also Data warehouse appliances, which consists of an integrated set of servers, storage, operating system(s), DBMS and software specifically pre-installed and pre-optimized for data warehousing with Massive Parallel Processing capabilities and columnar storage.
In the same study 42% of the respondents said that unstructured content is too difficult to interpret. This brings us to the third V: Variety, where combining structured (often internal) data with unstructured (often external) data is of the essence. This is often done with text, audio and video mining. Just imagine the potential of combining facts from your Business Intelligence environment (like point of sale data) with opinions about your brand or products from social interactions on Facebook or Twitter. Some of the technology leaders in this area are Autonomy and Attensity.
Perhaps Big Data is old wine in a new bottle. But with the above mentioned examples it is clear to me that with the aging of the wine the quality clearly has improved.