The concept of big data is a topic that continues to gain significant press, both in the blogging community as well as through discussions by software vendors. As I discuss this topic with clients, I often address it from the angle of bigger data volumes will result in bigger data problems. Although this seems like a logical premise, the reality of what it really means to an organization and how to plan accordingly is what is often overlooked.
From a data quality and information governance standpoint, the same types of problems around the integrity of the data and the need to put policies, procedures and an organizational structure in place to address data governance is still important. The difference is that the size and scope of data issues (and how to resolve them) will become magnified as the volumes of data increase. In addition, the type of data often varies in a big data scenario, with a significant amount of unstructured data being available for consumption into an organization. Information such as social data (Facebook, Twitter) can provide significant insights into what your customers and employees are saying about your company; however, if you can’t do anything with this data, does it really provide any value?
As you begin to consider how to incorporate these additional data sources into your organization, there are key considerations that should be discussed to ensure that your organization is able to consume and leverage big data in a timely fashion in order to take advantage of any potential trends in the marketplace.
- Information Governance Strategy – big data without a strategy to make sense of it is essentially useless data. Information is gained when you understand how these new data sources (such as Facebook posts) relate to your existing client, partner, or supplier information. If a customer is praising your company or is making disparaging remarks on Facebook or Twitter, having the ability to react in a timely and appropriate manner (and to have a policy/procedure regarding who is responsible for the action) is critical. Having an information governance strategy in place that identifies the key stakeholders across the organization and aligns how the difference parts of the organization (Sales, Marketing, Finance, etc.) are using information and responding to customer needs is critical. Without an organizational structure in place for the purposes of data governance across the company, decisions will likely be made in silos and could result in key opportunities to gain (or retain) clients and market share to be missed.
- Data Quality Strategy – as new, and potentially unstructured, data feeds come into your organization, having a data quality strategy defined to cleanse, standardize and match the new data sources to your existing data should be considered a necessity. The underlying data quality policies (such as how a customer ID is defined) will not change with a big data scenario, however, the unstructured nature of the data and the various formats being brought into the organization will require appropriate technology to deal with these unstructured data sources. In addition, your existing data quality processes should be revisited to make sure the rules currently in place do not need to be altered. The new data sources should enrich your existing data and not results in new versions of the truth (duplicates). Ensuring your existing data quality strategy (or putting one in place) accounts for the nuances in the increased data volumes is critical if you are going to use the data effectively.
For those interested in the big data topic, I would also recommend checking out this article from the October 2011 edition of the McKinsey Quarterly on the topic of “Seizing the potential of ‘big data’.”