On the 7th of May 2015, the UK Conservative party won the British general election outright, conquering a majority of 331 of 650 parliamentary seats to Labours 232. None of the major pollsters predicted this outcome. Their research generally pointed to a hung parliament with a slight advantage to the Conservatives over Labour.
Unsurprisingly, this divergence stirred up an inflamed debate about how the pollsters could get it so wrong (here from BBC) and how such mistakes can be avoided in the future (here from the British Polling Council).
Why is this relevant to big data projects?
Big Data is a key technology for democratising and accelerating access to multiform data and making better decisions (here for the Capgemini point of view).
The similarities between a poll and a big data project may not be apparent; however a poll is actually a “small” big data project. It ingests multiform data (telephone and web surveys, political, socio-demographic and geographic information), distils it (produces a forecast within a confidence band) and enables decision making (e.g. how to tweak a political campaign).
In fact, there are some important lessons to be learned from the polls debacle...
Don’t fall in love with black boxes
Make sure you clearly understand which data are being used to inform decisions and how they are “distilled”. Excessive trust in advanced analytic algorithms and machine learning can bring sour surprises.
For example, not knowing exactly what is the bias induced by using "web-polls" in which the responding population is self selected, might have had significant consequences on the accuracy of the forecasts even when using sophisticated methodologies to correct the bias
Be realistic in setting the scope and expectations of the insights that your projects can create, and make sure all stakeholders understand them. The possibilities of Big Data are enormous, but the constraints of actual systems and data are real.
The more realistic the expectations, the less likely someone will end up eating up their own hat.
Fail fast - succeed faster
Sometimes, even the most meticulously prepared projects go wrong because of sudden changes or other unexpected events.
For example, there is wide consensus that a large portion of electors made up their minds at the last minute and the turnout was lower than expected. This uncertainty could not be integrated in the models and contributed to the inaccuracy.
What to do to avoid these mistakes? Well, nothing…
Mistakes will happen. Be prepared to accept them, quickly understand the root cause of the issue and adapt your infrastructure to improve the output. Let the “Fail fast – succeed faster” mantra become part of your company’s culture.
And remember, your users are not expecting perfection – they are expecting you to get it right ASAP.
And if you don’t believe me – I can organize a poll to demonstrate it.