Machine learning and Formula 1

Publish date:

The complexity of Formula 1 races is often overlooked, so just how reliable are the predictions created by machine learning techniques?

Recently, machine learning (ML) has grown in popularity and usage across a range of industries, like sport. The Rugby 7s Video Detection program was developed to identify key events during matches to generate automated insights. The fastest growing sport for ML is Formula 1, the most data rich sport in the world, with 120 sensors producing 3GB of data every second. Precision is key for F1, and alongside data from simulations and year-round testing, there is no shortage of data for the teams to analyse, especially in comparison to a sport like rugby.

There are three areas that ML can be applied: race strategy, logistics and car design. They can make race predictions and extract vital information that contributes to team strategies by using various ML techniques to predict aspects like the probability of a driver being overtaken. Alongside live data, teams also have 65 years of historical data available to further improve the accuracy of these models, introducing the possibility of AI to prioritise data for rapid access. Given the complexity of an F1 race however, the reliability of predictions made must be questioned when looking at different race elements.


Arguably, the most unpredictable element of a race is the weather. Even though teams have access to live weather data, it is still hard to predict exactly what will happen; the 2020 Hungarian Grand Prix was forecast to rain, yet it never did, impacting strategies in terms of tyres and pit stops for each driver.

The accuracy of the predictions is also questioned when considering issues, which include:

  • Crashes (especially if the safety car is employed)
  • Penalties incurred during the race
  • Grid penalties
  • Mechanical/technical failures
  • Pit lane incidents

To solve this, deep learning can be used to predict when mechanical failures may occur, but how reliable is it? On average a pit-stop costs 20-25 seconds, meaning a poorly timed/misjudged stop could cost the driver a podium and valuable championship points; the accuracy needs to be as precise as possible.

Fan predictions

Some fans are making their own predictions using machine learning, and with others creating visual dashboards, they can see which factors are most likely to affect the results themselves.

Top 5 factors to affect race results
Source: F1 Fan Voice

Now the importance of qualifying positions has been established, the probability of winning based off starting position must be examined, assuming all things are equal – i.e. the driver qualifying first does not have any grid penalties.

Win probability based on starting position
Source: F1 Fan Voice

Importantly, even if a driver is at the front of the grid it does not necessarily mean that they are the favourite to win the race – how does this affect the predictability of each circuit?

Predictability of race circuits
Source: F1 Fan Voice

The Baku circuit in Azerbaijan is the least predictable but given the limited history of races and their results, this is somewhat unsurprising. The pole position driver has only won once. In 2017 the winner actually started from 10th, illustrating how unpredictable F1 races are given the probability of winning from outside the front row. Clearly, this demonstrates the impact the circuit has when it comes to reliably predicting the outcome of a race, in that not all circuits are easy to predict as others.

Track changes

The irregularity of the F1 calendar is possibly why the circuits have varying predictability: races are constantly being reintroduced/removed from the calendar, meaning data for a specific Grand Prix may not always be available/outdated. After 35 years the Dutch Grand Prix at Zandvoort is returning – this data will be extremely outdated, especially since its track is being rebuilt.

Additionally, the varying circuits for each national Grand Prix makes collecting data more difficult. Since Formula 1’s creation in 1950, over 70 tracks have hosted a Grand Prix ; in fact, the Italian Grand Prix at Monza is currently the only circuit to be part of every season. The inconsistency in the track data also raises questions in its reliability, which would make it harder to predict outcomes for each race.

Changes to tracks between seasons don’t help either, as the track would be “new” each time it changes, especially if it affects the track distance. To allow for this, teams would have to create a model of each track that incorporates the addition/removal of various features and create an algorithm that predicts an average lap time; again, this would be especially demanding for the newly rebuilt Zandvoort.

Zandvoort 1980-89
Source: Wikipedia

Zandvoort 2020
Source: Wikipedia

Rule changes

The constant rule changes make it hard when using past data as it will be from different races in many aspects. This season saw some changes, with the upcoming season bringing an entirely different car design and spending cap. Clearly, data from old spec cars cannot be used to predict results and/or race strategies for newer cars. Since the new design means cars will be heavier and slower, it would be wrong to use the old design to predict the new cars’ performance. It can be broken down into individual aspects for machine learning to be applied (like helping determine the aerodynamics when racing behind another driver), but it is impractical to entirely predict the performance of a newly designed car using data from a different model.

One aspect to consider is tyre compounds, which determine grip and tyre life, influencing pit stop strategy; theoretically, harder compounds mean fewer stops. Yearly changes regarding tyres include:

  • Compounds available for each race
  • Tyre composition
  • Tyre pressure

Given this is just one area, it is extremely difficult to apply ML techniques when past data is ultimately a collection of somewhat dissimilar races with a multitude of everchanging aspects. Other factors which need to be considered are:

  • Track temperature (night races mean lower temperatures)
  • Power units
  • Brakes
  • Suspension
  • Fuelling

Individual predictions as to how each component will perform with different track conditions, along with the probability of it failing, are possible, though combining all these predictions into one overall race strategy would be unreliable.

Technology is less restricted, allowing for a rapid development in the technology cars use. Most recently, Mercedes’ new DAS system was protested by other teams. The governing body, the FIA, deemed the system legal for 2020, but the 2021 rules have been altered to make this new system illegal and previously Mercedes’ use of a “party mode” also sparked controversy. Updating the regulations to adapt with the technology teams develop demonstrates the pace of development. It’s only a matter of time before the ML techniques used behind the scenes catch up in terms of value and can be reliably used to generate race predictions and strategies.

Overall, the vast amounts of data available to teams allows aspects of a race to be studied independently, but the complexity of factors that make up a race ultimately mean that for now, the use of ML techniques to predict race strategy is flawed. However, despite these problems surrounding the data used, there is no doubt that ML and AI will eventually dominate the sport – the only question is how long it will take, and how reliable it will be.



Grace Wilding

Apprentice Software Engineer

Related Posts

Corporate Responsibility and Sustainabililty

Reaching 50 CodeYourFuture graduate hires at Capgemini UK

Date icon April 1, 2022

After more than three years, our partnership with CodeYourFuture (CYF) recently achieved an...

Consulting Services

7 examples of great product roadmaps

Date icon November 12, 2021

Product roadmaps come in all shapes and sizes – how do you choose the right one to convey...

Corporate Responsibility and Sustainabililty

Bias in NLP models – identifying and removing bias within word embeddings

Date icon March 29, 2021

According to Wikipedia “Bias is a disproportionate weight in favour of or against an idea or...