Use of six sigma and lean manufacturing techniques to solve data quality issues

Manufacturing firms have been using lean techniques like 5S, JIT, Kanban, Kaizen, Poka-Yoke, TPM, TQM for more than half a century to churn out high quality products. In fact, firms like TOYOTA have used quality as a strategic lever and marketing differentiator to become one of the largest automobile manufacturers in the world.
If we think of OLTP systems like ERP, CRM etc as a machine, transactions as manufacturing process, data as raw material, then the output or finished product is the information stored in the production database of OLTP systems. The exchange of data or information over networks can be viewed as inventory or goods transfers and sales. The end consumers in this case are the OLTP business users and the organization’s end consumers.
So if data is a raw material, a high quality raw material coupled with a robust production process shall ensure a high quality finished good (information) and ensure customer delight.
If one were to apply the lean manufacturing techniques, that revolutionized traditional assembly line manufacturing by eliminating waste and building quality into the product, to the data or information process chain, the results should be equally effective and far reaching.
High quality data is defined as data that is “fit” for being used by data consumers, i.e.. can be transformed into information products that fulfill users’ expectations.   There are various dimensions of data quality such as “completeness”, “consistency”, “accuracy” etc. A data quality problem can then be defined as a difficulty in one or more of the quality dimensions that makes data unfit for use.    The causes of poor data quality are often traced to the IT system itself (technology, architecture, application design) or in the organization’s operational practices (system implementation, operations).[1]

How to use FMEA to identify data quality hotspots:

FMEA (Failure Mode & Effects Analysis) is one of the tools used by Lean Six Sigma practitioners to identify potential problems and their impact on a process in a systematic and proactive manner. Let’s explore the usage of FMEA as a tool to identify the critical data elements and define data quality standards applicable to these elements.
An FMEA analysis identifies the opportunities for failure, or “failure modes,” in each step of the process. Each failure mode gets a numeric score for three factors that quantify (a) likelihood that the failure will occur, (b) likelihood that the failure will not be detected, and (c) the amount of harm or damage the failure mode may cause to a user or equipment. Each of these three scores can range from 1 to 10. The product of these three scores is the Risk Priority Number (RPN) for that failure mode. The sum of RPNs for all sub-tasks becomes the overall RPN of the process.
Let’s take the example of a master data maintenance screen in a SAP ECC system which has hundreds of data elements that are entered manually as well as via a mass upload program.

For one such field;  a data quality error with a minor impact rating has two causes with a fairly low probability of occurrence of each. But because the current checks are ineffective for one of the causes, it becomes a critical data element
All SAP fields for a master data domain can then be utilized in Pareto analysis using RPN values calculated above to identify the critical 20% that can lead to 80% data quality errors.
These values can also be utilized to create data quality standards and goals applicable to the critical data elements. Reducing the overall RPN by a certain percentage for a master data entry process could be one such goal.


[1]:  Management of Data Quality in Enterprise Resource Planning Systems by Michael Röthlin