The World Cup: Lessons from Mathematical Models

Published on 6th December 2022

Key Highlights

1.Towards the Quarter Final, the model mean marginal advantages are more concrete, where 1.7% is likely to result in a difference of one goal.

2.Quality coaching with leadership in attitude development and investments in attracting, skilling, and retaining key talents in the form of top scorers and goalkeepers, are the bare minimum required to disrupt the pre-quarter-final exit tradition for African teams.

3.The priority intervention areas for African teams are mentality, score drive, tactical inventiveness, and team coherence.

4.Setting and mapping out the critical spatio-temporal and geometric aspects of the training ground and practice sessions should benefit from applied geospatial technologies to enhance precision and accuracy, hence the need to engage geospatial expertise.

5.Going forward, policy development to effectively address nine variables of winning in the tournament is the critical challenge Sports Ministers in Africa must champion and steer with zeal and integrity.

A World Cup of Inordinate Surprises

The World Cup captures the imagination and dreams of nations. It brings us together in diversity. Isn’t the World Cup, therefore, a great opportunity to popularise and socialise mathematics, which is otherwise a dreaded subject boring to the majority within classroom walls? Are mathematics scholars modelling the World Cup to prove to the world that mathematics is the most powerful language for rational decision-making, based on data?

The author of Soccermatics, a professor of mathematics called David Sumpter, has already shown the way in applying mathematics to model football outcomes. His approach is advanced, targeting postgraduate students. To demystify football modelling and present a simple but useful model, the author tested a model during the 2018 World Cup, improved it to expand from six to nine variables, and is now applying the improved model to predict the outcomes of the more intriguing 2022 World Cup.

The 2022 World Cup has rightly earned the title of a tournament of inordinate surprises. The model presented here has confirmed the same. The group stages proved the model to be grossly off in four out of twenty-four group matches that the model predicted. The four surprises, outliers par excellence, were:

Group D: Denmark against Australia – The model gave Denmark a 6.5% mean marginal advantage over Australia, but Australia won 1-0

Group E: Spain against Japan – The model gave Spain a 5.5% mean marginal advantage over Japan, but Japan won 2-1

Group G: Brazil against Cameroon – The model gave Brazil a 7.9% mean marginal advantage over Cameroon, but Cameroon won 1-0 and became the first African country to defeat Brazil at the tournament, ending Brazil’s streak of 17 unchallenged instances at the group stage. 

Group H: Portugal against South Korea – The model gave Portugal a 7.9% mean marginal advantage over South Korea, but South Korea won 2-1

Soccer and Mathematics

The nexus between soccer and mathematics may look far-fetched. Nothing could be farther from the truth. Mathematical lessons from soccer are critical to decision-making, to help increase the winning chances of teams from Africa, which have been yearning for a place in the advanced stages of the World Cup for decades. Kenya’s Sports Minister, for example, has promised the country a place in the World Cup in 2030. There is a real and urgent need to involve mathematicians and scientists in recalibrating Africa’s football to reach greater heights. 

Soccer has been referred to as a game of chance. This statement must excite any mathematician, because it makes soccer the practical go-to arena for applying axiomatic probability. Prof. David Sumpter has even gone further to show the way, by developing Soccermatics as a postgraduate course in applied mathematics for the football industry. Creative application of knowledge knows no boundaries – interesting!

Unlike the advanced approach by Prof. David Sumpter, the model shared here is simplified to be readily digestible to high school and college students. The goal is to inspire learning by application while exploiting the infectious, pervasive and global fascination of the World Cup moments. The model draws on real-world examples, which have been used to predict the outcome of the games played at both the 2018 World Cup and the continuing 2022 World Cup. The model predicted France’s victory in the 2018 World Cup, giving it a 1.9% mean marginal advantage over Croatia. The 2018 World Cup experience has led to the establishment of nine variables of key importance in football modelling. The insights and lessons drawn from this experience are key to socialising and popularising STEM education while nurturing scientific curiosity, innovation, and creativity among young learners.

The mathematical prediction for the foreseen tough duel between France and England in the 2022 Quarter Final scheduled for 10th December is an interesting case in point worth sharing here, scripted from the model as follows:

“Mathematician's model suggesting France can bank on a knife-edge 1.1% mean marginal advantage over England in the World Cup Quarter Finals, likely a one-goal difference. If serendipity favours England at 70%, then England can emerge victorious, even with a two-goal difference. If France snatches lady luck from England, then England will be soaked in bitter tears as France widens the gap, even by at least three goals.”

Soccer as a game involves balancing risks with rewards, the quality with the frequency of scoring chances, and the defensive with the offensive forays. With modern technology, we can measure and generate distribution and heat maps of the team-specific events on the pitch, geometric synergy, the launching angle and how it influences the range within which a shot can land inside the net, which is a sine function of double the angle, and statistics on ball possession, pass accuracy, shot conversion, and historical precedence – an aspect of path dependence.

Goal expectation (xG) is calculated as a mathematical expectation, the product of the probability of scoring a goal and the frequency of shots. It is the weighted average of scoring goals as successful events in the rectangular football pitch accommodating random variables. On this stage, a teams’ destiny is decided usually within two hours – either tears of joy or tears of regret. Machine learning based on the football statistics generated makes it even more interesting to simulate possible outcomes with amazing accuracy.

Fundamentally, a model should be as simple as possible, yet as complex as necessary to contain all the key variables. Models are like working hypotheses, made better with time as learning from data progresses. After all, all models are wrong, but some models are useful, just as the British statistician called George Box rightly put it. The usefulness of the World Cup model shared here can be judged based on its purpose, which is predicting the most likely winner in a match.

Nine Crucial Soccer Variables

The modelling experience has identified nine variables which are key to delivering a win in this exciting game. From experience, key informant interviews, observation, expectation, and historical precedence, the modeller decides the percentage score of a team on each variable. The variable of climate goes beyond temperature and humidity to include the general fit into the new environment to the players, home advantage included.

Resistive nucleus includes the quality of a team’s defence and quality of the goalkeeper, both key to preventing an opponent’s goal.

Serendipity is about the stroke of luck inherent in random chances that can favour a team, especially the presumed underdog. These lucky factors tilt the scale and cannot be underestimated in this game of chance.

Mentality is key to performance, not only in soccer. There are teams that enter the match with a mindset of possibility and winners. If such a mentality does not degenerate to overconfidence, the rewards are always evident. The other variables to watch and score for each team are tactical inventiveness in exploiting subtle opportunities with high probabilities of scoring – think of Messi’s mellifluous dribble, honed skillset, tenacity gradient in the sense that it tends to wane with time as the game advances, score drive as seen in gifted scorers through the share of attempts that are likely to be converted into goals – including penalty shots, and team coherence as evident in the accuracy of passes, ball possession, and geometric synergy in the field, among others.

Supported by statistics on goal expectation, weighted averages of all the scores on the variables tend to reveal the team that is likely to win – with or without serendipity. This is how it can be predicted that France can bank on a knife-edge mean marginal advantage of 1.1% over England in the upcoming Quarter Final. With luck on the side of England, the outcome can flip and reward England with up to a two-goal difference. If luck shifts to France, then France can open wide at least a three-goal difference.

Progressive Model Prediction Accuracy

The earlier stages of the game are more challenging to predict, but they help in calibrating the model for validation in the subsequent stages. As witnessed here, the model performed much better as the game advanced. Out of the 24 group-stage matches used for calibration, the model predicted 20 matches with an estimated accuracy of 80% based on a scale level of “1 = grossly off” to “5 = spot on”.

Progressing to the knock-out stage and subsequent ones, the model becomes more laser-focused in predicting outcomes. This is because of the sieving and learning process, making it easier to arrive at a more nuanced and accurate parameterisation of the key variables. For example, at the time of writing this article, six out of the eight knock-out matches had been played, and the model got all the predictions spot on (level 5). All the predictions can be accessed from the link: Nashon Adero | Facebook.

Model Insights and Lessons

The simplified nine-variable model is a practical soccer prediction alternative. Towards the Quarter Final, the mean marginal advantages are more concrete, where 1.7% is likely to result in a difference of one goal. In the case of Brazil against Korea, the model’s 8% difference in favour of Brazil easily translated to four goals in the first half. In the case of Croatia against Japan, the model’s 2.8% difference in favour of Croatia resulted in a goal difference of just more than one, two in this case. In the case of England against Senegal, the former’s 4.1% mean marginal advantage resulted in a goal difference of just more than 2, three in this case.

We can now reflect on the performance of the African teams against the model shared here. It is now evident that the teams must work harder on these nine variables to match the rest of the world. The priority intervention areas are mentality, score drive, tactical inventiveness, and team coherence. Setting and mapping out the critical spatio-temporal and geometric aspects of the training ground and practice sessions should benefit from applied geospatial technologies to enhance precision and accuracy, hence the need to engage geospatial expertise.

Quality coaching with leadership in attitude development and investments in attracting, skilling, and retaining key talents in the form of top scorers and goalkeepers, are the bare minimum required to disrupt the pre-quarter-final exit tradition of African teams. Going forward, policy development to address the nine variables effectively is the critical challenge Sports Ministers in Africa must champion and steer with zeal and integrity.

By Nashon Adero
[email protected]

The author, a lecturer at Taita Taveta University, Kenya, is a geospatial and systems modelling expert.  He developed COVID-19 models for predicting the rising numbers in 2020 and 2021, which are now part of the book he co-edited in 2021, entitled The Future of Africa in the Post-COVID-19 World. He is also a co-author of a modern book entitled Project Design for Geomatics Engineers and Surveyors (2nd edition). Wilson Kibe, a young Kenyan graduate from Kenyatta University collected and collated the key datasets that have informed the 2022 World Cup prediction model.


This article has been read 652 times
COMMENTS