A recent Capgemini survey reveals that “organizations are increasingly prioritizing big data as an asset, and once realized the bottom line improvements can be very noticeable.” In addition, the survey notes that decisions are increasingly based on hard analytics, but with the massive amounts of data available, accessibility – or the time it takes to produce meaningful analytics – can substantially slow down the decision-making process as well as the business, ultimately producing a negative impact on the bottom line.
Six Sigma continuous improvement professionals have been reengineering processes and dealing with masses of data for years in their quest to identify the vital few factors and the variables that have the most impact on the performance of a value stream, or as it used to be called, “the end-to-end process”. This “Big Process” emerged with the advent of Six Sigma in the nineties. To be sure, the sheer volume of data has increased exponentially, and the means and solutions to facilitate analysis are improving; still, the relationship between data and process has been constant. If you want to substantially improve your processes, you need data to change your organization’s operational reality. And if data is to make its story known, it must be linked to operations – your business’ processes.
To make sense of the relationships between the data and an organization’s business processes, it takes creativity, statistical prowess, continuous improvement expertise, and process knowledge, the same factors necessary to validate a root cause. Not using any one of these elements means the enterprise-wide potential of Big Data and Big Process improvement is delayed, or anemic in the production of real results.
Clients are asking for insight that leads to better service, lower costs, and shorter lead times; continuous improvement professionals have been providing insights using Six Sigma improvement techniques such as massive data set analysis and dashboards since the mid to late nineties. In a textbook transformation example, a team of Lean Six Sigma experts used Lean Management System principals, and advanced Six Sigma tools such as Design of Experiments (DOE) to drive efficiency in a group of check processing centers. The results were impressive: work flow improved, costs were reduced, and attrition of staff slowed, but a more sophisticated level of improvement was needed if each center was to become profitable. With staffing levels set to address peak volumes of check receipts, the safety margins built in to meet service level agreements were too high to achieve profitability. To right-size the staff with operational requirements, the team needed to know how many checks each center would receive on any particular day, in advance.
There was no shortage of data: operational databases had millions of data points, literally 5-10 years of transactions were available regarding the number and time checks were received in each and every center for every day of the year. We had used the data to understand how many checks could be processed by each process step, and had built a staffing model to position staff in the correct operational roles in the service center throughout the shift. But it was obvious that we were overstaffed for upwards of 50% of the operational days. No one had ever used the data to systematically understand or predict the volume of transactions each center would need to process daily. Operationally, the only function for the data was to invoice the client for transactions, and to determine if service level agreements had been met. Building a model to predict check receipts daily was a great idea, and a huge step in operational performance management. The statistical question was: could the team tie the data from the external market place to the organization’s service centers processes to predict volumes, and to optimize staffing levels. Some of the analytical challenges included:
- Millions of transactions that seriously challenged our statistical software and the computing capacity of our laptop computers.
- Data collecting inconsistencies: we needed to know how many checks were received daily, not how many were processed, this meant hundreds of hours spent in cleaning data to make it usable.
- The sheer number of variables that could impact the number of checks that were issued by bank customers, which then would be received in the centers: we tested 150 variables!
The statistics were an interesting challenge…it suffices to say that it took 2000 hours of work from six highly capable individuals to run the calculations, identify the variables and build the prediction models that were user-friendly enough so that they could be operationally deployed. Most of our labor was spent cleaning the data, putting it into a paired format to test variables that had no statistical significant impact on delivery time and volumes. We eventually identified the variables that allowed us to understand and reach R squared correlations for each center of .90 to .95 percent. The statistics were easy to derive with the right software, although some of the tests took 20 minutes to run on our laptops. We used Minitab 15 to do the calculations. It was difficult, time consuming, but in our final analysis we were able to determine how many checks for each day of the week, of each month would be received, and that we could do it a year in advance. It turned out that factors such as social security payments, special government refunds, long weekends, holidays, the day of the week or the month, all impacted the number of checks received. In all, six different models for six different processes were built for 15 locations in the United States and the United Kingdom. The transformation was completed for all of the locations within just six months.
After this initiative we conducted a “lessons learned” session with both the improvement and operations teams. There was an overwhelmingly realization that the prediction models were the differentiators. No single improvement had a greater impact on the profitability of each center then our models. It was also noted that it was only after we truly understood the processes that it became possible to build the models that allowed us to predict check receipts. We needed the process knowledge derived from several months of analytics and improvements to pair the concepts with big data. Had we begun with the objective of predicting check receipts to better managing staffing level, the team believed it would certainly have taken much longer.
When people ask me about Big Data and Big Process I tell them not to separate the two concepts. It takes time to improve operations, the more process knowledge your analytical team has, the faster you will see results. Don’t assume that they will figure out what is significant, pair the statistical experts with your process experts to get big process improvement results.