According to a study by IDC, in 2013, the size of the digital universe was about 4.4 zettabytes of which about 22% was tagged and useful for analysis, and less than 5% really valuable. Even so less than 5% of the useful data was actually analyzed. By 2020, the size of the digital universe is expected to expand to 44 zettabytes with about 37% of the digital universe tagged and useful for analysis, and less than 10% really valuable.[i]
These numbers simply translated mean there is an ocean of untapped business opportunity.
Organization do realize the potential opportunity that is out there however they are intimidated by the enormity of the problem. Organizations would need to find a way to identify the data they want from this Digital Universe, acquire the data, organize the data, analyze the data and finally convert the analysis into actionable insights. Traditionally Organizations have been doing this and the above approach seems logical and applicable. However it is easier said than done.
The challenges that the digital universe throws at us have not been encountered before. To understand the gravity of the problem let’s explore the problem first. If I were to state “Somebody has said something about something somewhere and it is going to impact some.” Can you make any sense of it? We don’t know – who is communicating, what are they communicating about, and who or what is going to be impacted by this communication. In the changing human linguistics it is even more difficult to derive sense and purpose. The Variety in which this information is present, the Volume in which it is avaulable and the Velocity with which this information is generated in the digital universe only compounds the problem.
Traditionally when analyzing large sets of structured data we use dimensions to add context – Customer, Product, Time, Location etc. These Data Dimensions are central to the Organizations business and are well defined, structured and used across processes. These dimensions are used to make sense of the data and are used to slide and dice the data based on various criteria. Who are the most profitable customers? Which is the bestselling product during the holiday season? What is the risk exposure of the Organization? And many more complex questions. Therefore these dimensions are key to providing the context. As these dimensions are used across the organization for operational purposes as well they are considered critical to the functioning of the organization and are referred to as the Master Data and Reference Data.
Master Data is the consistent and uniform set of identifiers, extended attributes and relationship that describe the core entities of the enterprise and are used across multiple business processes and supporting systems. Examples include customer, product, employee, supplier, etc.
In the Digital Universe however it is not just about structured data, but also semi-structured and unstructured data. Given the ever expanding volume of data, the dimensions are not easily identifiable and therefore the data lacks context. This makes it difficult to derive true value from the Information in the Digital Universe.
In the next part we will look at a potential solution to the problem.