Skip to Content

Predicting consumer behavior is harder than ever, but not impossible


Predicting ice cream sales is never easy

Weather – clearly a big factor in sales – is hard to predict more than a few days out, so planning demand over the year is a fine balancing act. Predicting what people want is trickier still. Inventing a new product because the nation has suddenly fallen in love with, say, watermelon, is a major challenge. It’s a race against time.

By the time you’ve spotted a new trend, launched a product, and distributed it, your competitor’s watermelon ice cream is already the hit of the summer.

Taste is far from the only concern. New product development needs to closely monitor people’s preferences, from presentation to packaging and price; from ingredients to ethics.

This is made more complicated because trends are not distributed evenly. Watermelon may be loved by young urban buyers but hated as a symbol of everything wrong with the world by older rural buyers.

Growing diversity is driven by more stratified social groupings in traditional markets, and by the growing middle class in new markets. Segmentation of market demand is more important than ever for understanding what products will succeed, and where.

The oft-quoted statistic that 95% of new products fail is probably an exaggeration, but it’s true that few products deliver hoped-for returns – even with all the data now available on consumer habits. If we can get better at predicting what people want, we can improve success.

When it comes to predictive analysis in consumer behavior, data has the right answers. But also lots of wrong ones.

The answers are often assumed to lie in online data. There is some truth in this, but that doesn’t mean all answers in data are correct.

Many companies scrape personal information off the web and apply simple “sentiment analysis.” This seems intuitive – if lots of people are tweeting positively about watermelon, or calling out products with certain ingredients, then it makes sense that these are growing trends.

But caution is always needed with data. The internet is a very chaotic system. Social media is incredibly noisy. A hastily shared meme may not reflect someone’s deeper core beliefs. Vocal minority opinions may silence mainstream ones, placing outsized value on unpopular ideas. Products too close to people’s pet concerns may feel opportunistic or creepy. Small errors in data can lead to big errors in prediction.

The art of data science is not just spotting correlations but finding the meaningful signal in the noise.

How to do predictive analysis right

Big data sets rarely give all the answers. But, when combined with deeper data, such as focus groups, panel tests, and multiple other sources, key signals can be teased out.

To do this, different data sources need to be effectively combined and adjusted for bias or confounding factors (e.g., fitness apps may just tell you about people who love fitness, Twitter hashtags may just reflect the loudest voices). They can then be used to build and train sophisticated models.

Model assumptions themselves also need to be rigorously questioned. The model designer’s own echo chamber may lead to expectations that aren’t held in the wider world.

Combining complex datasets and building reliable predictive models for consumer behavior is hard. It goes beyond spotting trends on social media, or pumping data into off-the-shelf platforms that no one really understands. It needs experienced people versed in data, statistical methods, and scientific thinking who can design unbiased experiments that truly tease apart what is a meaningful trend and what is just noise.

What election models can teach us about predictions across diverse groups

A good example of using data to make accurate predictions in a diverse population comes from the exit poll in the 2017 UK election. Despite dramatic changes that redrew many political lines (Brexit and Corbynism), predicted seats for all parties were within four of the actual result. These were stunningly accurate predictions of an entire population’s behavior based on a comparatively small data sample.

The polling team interviewed 200 voters at just 144 (of 40,000) polling stations. They asked how they just voted, and how they voted in the previous election. This personalized approach to data collection gave confidence that answers could be trusted.

The samples were selected to be representative of the local population. Results were then fed into a model which estimated the change in vote in that area, by matching characteristic of vote-changers to the wider population. This was extrapolated across the whole country – assuming that similar demographics undergo similar changes in voting to produce a prediction for every constituency – most of them correct. This approach is known as multi-level regression and post-stratification, or MRP.

Find a signal that tells us what consumers really want

Traditionally, consumer products used simpler big data approaches to spot trends, rather than complex techniques, which analyze multiple data sources and complex models to deliver nuanced predictions. These techniques are common in many other industries, such as our pharma clients. They regularly combine diverse data sources such as targeted clinical trial data and public NHS records to understand how new products might fare in the real world – what they call “real world evidence.”

If consumer product companies seek to understand behaviors at a more granular level, they can learn from these approaches.

For example, we worked on a chemical formulation model that used transfer machine learning to combine a large historic data set of millions of chemicals, with a small database of leads to find likely candidate chemicals. This is a situation where the company needed to make very specific predictions rather than spot general trends. So, while the application may seem quite different to predicting consumer trends, similar data techniques and approaches can be applied to both problems.

We have begun to see appetite for more complex statistical and data science approaches to help companies make detailed product design predictions. For example, we combined loyalty card and point of sale data in the US to identify how spending in different demographics correlate to taste. This led us to identify several targeted segments such as an interest in fresh fruit ice cream flavors on the West Coast, which fed into product development and marketing.

Complex consumer behavior needs more sophisticated data models

Such complex statistical approaches – which combine detailed data from controlled groups, with bigger data, e.g. on demographics and past behavior – can be deployed to make more detailed predictions across multiple sub-populations, supporting ever greater product personalization.

This allows consumer goods organizations to understand not just the direction of trends, but how those trends vary between region, culture, income, and so on. And, it’s not just about spotting the new.

These models can also help monitor gradual evolution, such as how tastes change over time, so that product formulations can be gradually modified. It allows consumer product companies to get under the skin of consumers.

The benefit of this insight is better and more tailored decisions: on new product decisions, reformulations, marketing, supply chains, and sales channels.

So, where should you start?

If this approach is new to you, running a proof-of-value exercise is a good starting point. This helps you identify what projects would be most useful, what data you will need to do them (and how to get it if you don’t have it) and make sensible decisions about whether the benefit justifies the cost. If the numbers add up, this should be developed in an agile way, with checks to ensure it delivers as hoped.

Capgemini can help with all this. We have 40 years’ experience dealing with complex data across a wide range of industries. We can help identify which sources of data are likely to be useful, combine disparate data sets to extract meaningful insights, and use proven governance frameworks to efficiently develop customized models to deliver nuanced predictions that inform big decisions.

About the author

Dr. Danica Greetham is part of Capgemini’s consulting team and is an expert in social networks and mathematical modelling of Human Behavior. Danica consults for major CPG firms and has previously led the Reading University Centre for the Mathematics of Human Behaviour.