The following is an analysis from Goldman Sachs Research economists Jan Hatzius, Jari Stehn and Donnie Millar,
originally published June 3, 2016.
At a high level, our approach is as follows. First, we estimate a regression model to predict the number of goals scored by a particular team (“team i”) against a particular opponent (“team j”) using the entire history of mandatory international matches since 1958, when the first European championship was played (a total of 4,719 matches). Following the literature on predicting football matches, we assume that the number of goals scored by team i is described by a so-called Poisson distribution and explained by the following statistical factors:
1. The difference in team performance as reflected in Elo ratings prior to the match. The Elo system was originally devised to rank chess players. It is a composite measure of national football team success that evolves depending on a team's results and the strength of its opponents.
2. The number of goals scored by team i in the last 10 competitive matches.
3. The number of goals conceded by team j in the last 2 competitive matches.
4. A home dummy.
5. A European Championship dummy to capture whether a team does systematically better at European Championships than in other competitive matches
Second, we use these regression estimates and our assumed Poisson distribution in a Monte Carlo simulation with 100,000 draws to generate a distribution of outcomes for each of the 52 matches, from the opener between France and Romania on June 10 to the final on July 10. We use the rounded prediction of the goals scored to determine the outcome of each match during the group stage and the unrounded prediction to pick the winner in the knockout stage.
Third, we use the estimation results to generate both a set of probabilities that a particular team reaches a particular stage of the tournament, up to and including the championship, and a modal—that is, single most likely—forecast for the outcome of each match, which we then run forward through the tournament until the final.
Our probabilities are shown in Exhibit 1. The model says that France has a 23% probability of winning the trophy, followed by Germany at 20%, Spain at 14%, and England at 11%. Although Germany has the highest Elo rating, France is favored because of its home advantage.
Exhibits 2 and 3 provide a different perspective by showing the modal prediction for the entire tournament. There are some interesting contrasts with the probabilities in Exhibit 1. For example, Exhibit 1 says that Germany is more likely than Spain to win the tournament because it is more likely to succeed across the entire range of possible tournament configurations. But Exhibit 3 says that in the single most likely case, Spain beats England in Semifinal 1, France beats Germany in Semifinal 2, and France then wins the final—i.e., Germany finishes behind Spain.
Which approach is better, the probabilistic one in Exhibit 1 or the modal one in Exhibits 2 and 3? A modal forecast does have the advantage of being more “crisp.” The sentence “Goldman Sachs says France will win” has a better ring to it than “Goldman Sachs says France has a 23% probability of winning, with Germany close behind.” Nevertheless, we think that a probabilistic approach is more useful—for predicting the outcome of football tournaments and, increasingly, for our day-to-day work on economic forecasting.
Exhibit 4 provides more insight into the results by breaking down the probabilities of winning for the top four teams in a “waterfall chart” format. It shows that the most important factor is the Elo score, followed by home advantage and the European Championship dummy. The chart illustrates that the front-runner position for France derives largely from its home advantage, as its Elo rating is well below Germany’s and also a bit below Spain’s. Meanwhile, Germany benefits from the European Championship dummy, which picks up its historically strong tournament performance.
It is difficult to assess how much faith one should have in these predictions. On the plus side, our approach carefully considers the stochastic nature of the tournament using statistical methods, and we do think that the Elo rating—the most important input into our analysis—is a compelling summary of a team’s track record. On the minus side, we ignore a number of potentially important factors that are difficult to summarize statistically, including the quality of the individual players unless they are reflected in the team’s recent track record. And there is no room for human judgment (which may not be such a bad thing given that none of us are really football experts but some are enthusiastic Germany supporters).
One useful cross-check is to compare our results with bookmakers’ odds. Exhibit 5 plots our estimated championship probability against the average probability implied by the odds offered by five different bookmakers. The basic result is clear. Even though our model does not include bookmakers’ odds in any way, the probabilities are quite similar. A possible reason is that professional betting firms use many of the same inputs—such as Elo ratings—in their analysis and that they process the information in ways that are ultimately similar to ours.
Another useful check is to evaluate the performance of our model for the 2014 World Cup, which followed an essentially identical approach to the one presented here. It is safe to say that we had our hits and misses.
First, performance in the group stage was not great. The model only identified 9 of the 16 advancing teams and failed to predict the elimination of heavyweights Spain and Italy, although it correctly anticipated that England would fly home early.
Second, the model gave Brazil a 48% probability of winning the trophy, by far the highest of all the contestants. That failure illustrates a certain lack of imagination that is inherent in our approach. If the greatest football nation on earth—in terms of both past victories and its 2014 Elo rating—plays a World Cup at home, we are bound to project success. At least the probability was below 50%!
Third, the model did correctly identify three of the four semifinalists before the start of the tournament, namely Argentina, Brazil, and Germany, although it incorrectly picked Spain over the Netherlands.
Fourth, the fully updated version of the model—that is, the projection we sent out before each day of play on the basis of updated Elo ratings and other performance measures—was remarkably accurate during the knockout stage. It correctly predicted the winner of every match except the 7-1 semifinal between Germany and Brazil. But that was, by one estimate, the single most surprising result in World Cup history.
Ultimately, this last predictive failure might best capture the spirit of the exercise. As we said in our comment at the time: “Speaking as forecasters, we regret the miss. But, speaking as Germans, we would note that there are more important things than being right.”
May the best team win and let’s hope that watching Euro 2016 is as much fun as it was to write this article!
Sven Jari Stehn
1. The database is taken from http://www.eloratings.net/.
2. See, for example, A. Heuer, C. Müller, and O. Rubner, "Soccer: Is scoring goals a predictable Poissonian process?" Europhysics Letters, 2010.
3. As it happens, we recently shifted our approach to forecasting near-term Fed policy in a more probabilistic direction. See “Superforecasting Fed Policy,” US Daily, May 23, 2016.
4. This means, for example, that we cannot include factors such as the retirement or injury of a key player.
5. One result that looks surprising to our not-so-expert eyes is the low probability of success for Italy in Exhibit 1. The mechanical reason is that Italy's current Elo rating is not very high and Italy has historically not performed very well at European Championships.
6. See The World Cup and Economics, 2014.
7. See Nate Silver, “The Most Shocking Result in World Cup History,” http://fivethirtyeight.com/datalab/the-most-shocking-result-in-world-cup-history/, July 8, 2014.