Next Article in Journal
Convolution Based Graph Representation Learning from the Perspective of High Order Node Similarities
Next Article in Special Issue
A New Stochastic Order of Multivariate Distributions: Application in the Study of Reliability of Bridges Affected by Earthquakes
Previous Article in Journal
A Fault Diagnosis Scheme for Gearbox Based on Improved Entropy and Optimized Regularized Extreme Learning Machine
Previous Article in Special Issue
The Leader Property in Quasi Unidimensional Cases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches

by
José Daniel López-Barrientos
1,*,
Damián Alejandro Zayat-Niño
1,
Eric Xavier Hernández-Prado
2 and
Yolanda Estudillo-Bravo
2
1
Facultad de Ciencias Actuariales, Universidad Anáhuac México, Naucalpan de Juárez 52786, Mexico
2
Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Puebla 72000, Mexico
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(23), 4587; https://doi.org/10.3390/math10234587
Submission received: 14 November 2022 / Revised: 27 November 2022 / Accepted: 29 November 2022 / Published: 3 December 2022
(This article belongs to the Special Issue Probability, Statistics and Their Applications 2021)

Abstract

:
This is a work about football. In it, we depart from two well-known approaches to forecast the outcome of a football match (or even a full tournament) and take advantage of their strengths to develop a new method of prediction. We illustrate the Élö–Runyan rating system and the Poisson technique in the English Premier League and we analyze their accuracies with respect to the actual results. We obtained an accuracy of 84.37% for the former, and 79.99% for the latter in this first exercise. Then, we present a criticism of these methods and use it to complement the aforementioned procedures, and hence, introduce the so-called Élö–Runyan–Poisson–Pearson method, which consists of adopting the distribution that best fits the historical distribution of goals to simulate the score of each match. Finally, we obtain a Monte Carlo-based forecast of the result. We test our mechanism to backcast the World Cup of Russia 2018, obtaining an accuracy of 87.09%; and forecast the results of the World Cup of Qatar 2022.

1. Introduction

This work presents a new methodology by which to forecast the final score of a football match and thus predict the result of a full tournament. It builds on the theory presented in, for instance, Ref. [1], Chapter 1 in [2], and Ref. [3].
We propose a variation of the methods presented in the aforementioned references in the sense that, instead of simulating the number of goals scored and received by each team as a Poisson random variable, we adopt the distribution that best fits the historic distribution of goals according to a basic test of hypotheses (as in [4]). Moreover, we use a Bayesian approach by plugging Élö’s probabilities instead of the a priori assumption that the number of goals for a given crew against another is a uniform average between their goals for and against, respectively.
We illustrate the classic Élö–Runyan method to backcast the final standings of the clubs of several seasons in a national football league, and then we exemplify Poisson’s approach in the final season of that same football league. Finally, we apply the enhanced method we found to a World Cup by making the assumption that the analyses remain valid despite the fact that in the latter case, there are nations instead of clubs. Based on the results previously published in [1], we strongly believe that the knowledge and approaches we found can be used for forecasting other leagues.
Now we summarize the hypotheses behind the model we designed.
Assumption 1.
(i)The goals scored and received by a given team are modelled by stationary continuous-time counting processes whose distributions can be obtained with historical data.
(ii) The number of goals scored by team A against team B depends on the offensive strength of team A, and on the defensive weakness of team B.
(iii) The Élö–Runyan index (see (1) below) can be used to measure the performance of a given team against another one.

1.1. State of the Art and Contribution

The recent literature on forecasting different sports results is vast, and includes, among others, [5,6,7,8,9,10]. Our analyses are based on two cornerstones. The first is Élö’s rating system (see [11] for a quick introduction to the topic) to obtain the probability that a team bests another in a given match. The other is the Poisson simulation of the number of goals scored by two opposing parties, which was thoroughly presented in [12] and can be traced back to [13].
Our work can be located in the gap between references such as [1,14,15]. Indeed, we estimate the probability that a team defeats another one with a Bayesian nonlinear model—Élö’s method—and we apply it by using the Markov chain Monte Carlo iterative simulation technique to attempt to forecast the future result of a full tournament.
We illustrate the Élö–Runyan method in England’s national league, obtaining an accuracy of 84.37%. Then, we backcast another season of the same league with an accuracy of 79.99%. Later, we test the fortified method we devised in the World Cup of Russia in 2018, obtaining an accuracy of 87.09%. Finally, we applied our algorithms to try to predict the outcome of the World Cup of Qatar 2022.

1.2. Computational Resources

To develop our work, we required a sufficiently large database to be able to analyze and compute the necessary estimates for the corresponding matches, seasons, and teams. Our datasets were obtained from football-data and refined with Flash score. In the former, one can find information from over 25 years from various leagues and international competitions from around the world, whereas we used the latter, to sort the matches in the proper chronological order. We used Minitab 21.1.0 and R-4.2.2 for Windows to analyze the datasets and implement our algorithms. All the codes and databases can be found in https://github.com/DonDisparates/qatar2022.git (accessed on 13 November 2022).

1.3. Organization of the Research

The rest of the paper is divided as follows. In the next section, we present a preliminary description of the English Premier League, which we will use as a working example. In Section 3 we present the essentials of Élö’s method, and we illustrate its use by means of a backcast of the English Premier League with information from seasons 2005/2006–2020/2021. In Section 4, we introduce Poisson’s model and replicate the results of the same league for season 2021/2022. Next, in Section 5, we combine the former techniques, show the method’s usage with information from the four years that preceded Russia’s World Cup, and we and conclude with a forecast of the result of the World Cup Qatar 2022. We present our conclusions in Section 6.

2. The English Premier League: Preliminary Description

The Premier League is considered the best football tournament and the most economically attractive in the world. It accounts for nine of the 20 most valuable teams for the year 2021 according to Forbes magazine, something that attracts sponsors and television networks (cf. [16]).
Indeed, during the 2018–2019 season, the league paid between $119 and $186 million to each club in the First Division. On the other hand, when relegated, the league gives these teams 55% of the revenue share in the first year, 45% in their second, and 20% for the third. Relegation is also punished by the fans, so much so that, troupes like Middlesbrough FC, Hull City, and Sunderland lost 17%, 25%, and 33% of the tickets in the season following relegation, respectively. In addition, the sponsors also have a relegation clause, which implies a budget reduction if the faction falls to the Second Division. Finally, a Premier League squad is worth around $445 million and in the Second Division, the average is $49 million (see [17]).
The English league was established in 1888 and was made up of 12 founding teams. It was known as The Football League. Later, in 1892, it absorbed its rival, the Football Alliance, and it was divided into two divisions; the two best aggregations of the Football Alliance, along with all the members of the Football League were part of the First Division. The rest of the Alliance clubs formed the Second Division. A Third Division was created in 1920, and in 1958 the Fourth was born. In 1992, the league changed its name to what we know today as the Premier League. Its main objective was precisely to maximize monetary income, especially through television rights (see [18]).
For this part of the project, we built a database with information of all the matches played in the English domestic league from seasons 2005/2006 to 2020/2021. That is 16 complete seasons, 40 different teams—including the five squads that managed to proclaim themselves champions of the competition (Chelsea, Leicester, Liverpool, Manchester City, Manchester United)—and a grand total of 6,080 matches played in the Premier League. For each game, 105 different variables were registered, where the majority are the odds of different bookmakers. Afterward, we cleaned the database and kept only the following information.
  • Season: This identifies the official name of the tournament when the teams met.
  • Matchday: This corresponds to one of the 38 matchdays that are played throughout the calendar of the season.
  • Date: The day and month in which the match was played.
  • Host: This bit of information corresponds to the team that plays in its own stadium.
  • Guest: This stands for the name of the team that plays a game away from home.
  • Goals for host/guest: This corresponds to the goals that are scored by the local/visiting team in each match.
  • Points home/guest: This is the number of points obtained by the local/visiting team according to the outcome of the match.
  • Goals for host/guest: This corresponds to the goals that are scored by the home/away team in each match.
  • Goal difference (GD) host/guest: A difference will be established for the home/away team.
Figure 1 illustrates the information we used for the first matchday in season 2005/2006.
Remark 1.
At first sight, it might seem unnecessary to keep track of the goal difference between the teams. However, Table 1 shows that the correlation of this variable and the position in the final leader board is very high: 92.6%.
Table 1 displays the results for season 2005/2006, and Figure 2 shows the corresponding regression curve.
In order to measure the reliability of the model we present in Section 3 below, we compare our classification with the actual results of each tournament. To this end, we divide the leader board into five groups and say that our model is successful if a given team finishes the competition in the correct group. The groups we consider are as follows.
  • Champion: This singleton is the first group.
  • UCL: The second group corresponds to the squads that manage to play the European clubs tournament widely known as the Union of European Football Associations (UEFA) Champions League. This group includes the four best-positioned teams after playing the 38 rounds of the aforementioned championship. That is, the champion and the following three clubs on the leader board.
  • UEL: The teams that finish the season in fifth and sixth places in the leader board get to play in a second European tournament known as the UEFA European League.
  • Middle table: This is the largest group. It includes those teams from the seventh to the 17th positions in the final leader board.
  • Relegation: This last group refers to the teams that lose the category at the end of each season.
Figure 3 illustrates the information we used for seasons 2005/2006-2011/2012, whereas Figure 4 depicts the time series of the points needed to belong to the first four groups. The comparison between the first column of Figure 3 with Figure 2 emphasizes the relevance of the goal difference with respect to the final standings at the end of the season.
Remark 2.
(i)The required number of points to obtain the championship registers a slight downward trend during the first 11 seasons. Actually, in the eleventh, the team that turned on top—Leicester City—bested an initial odd of 5000 to one (cf. [19]). However, between the 2016/2017 and 2019/2020 seasons, the champion attained a historical record of points.
(ii) The score needed to be part of the UEFA Champions League group has shown a fluctuating behavior, with a downward trend in recent years.
(iii) The score needed to avoid relegation has remained almost constant during almost all seasons, except for some tournaments in which the points obtained were very low.
(iv) The group belonging to the UEL has kept an ascending trend with the passing of the first years, to later vary in a notorious way; as happened with the UCL. This highlights how competitive the league is, and how much performance can change from one tournament to the next.

3. The Rating of Football Squads

There exist three main scoring systems in football. The Fédération Internationale de Football Association (FIFA) Ranking evaluates national teams with data from the last four years, updated monthly. The three main points that it used to take into account by 2015 were the number of matches won, the average number of points won in matches in the last twelve months, and the average number of points won in matches prior to the last twelve months. The method was questioned due to the frequent inconsistencies between the position and the results between two opposing factions, as well as the disparities arbitrarily granted by the federation to each corresponding confederation. Unlike the FIFA ranking, the Soccer Power Index measures the performance of each player individually, which will depend on the party he faces. This results in simultaneous equations that will be solved by means of initial conditions and an iterative calculation supported by computational systems. It returns a classified squads from general, offensive, and defensive points of view. Its main problems are the high cost of obtaining the information, along with the fact that the data is not public.
Analogous to the FIFA ranking, the Élö rating system assigns a score to each team, so that it is possible to compare two squads when they match up against each other. The Élö formula is named after the scientist and chessplayer Árpád Élö, who devised a rating algorithm for chessplayers in 1970. This method assumes that along their career, a player’s performance will be normally distributed around an average level. This means that deviations around this level do occur. Applied to a single game, performance is an abstraction that cannot be measured objectively; it depends on the judgment, decisions, and actions of the player in the course of the game. However, it is possible to derive a measurable object concept, the performance ranking, over a sufficiently large number of matches, such as a tournament. This is because the performance of a player in various games does consist of the combination of the average ranking of the competition and the score obtained. The method proved to be so effective, that in 1978, it was implemented by the Fédération Internationale des Échecs (see [3]).
There is a wide variety of methods based on Élö’s effort to forecast the result of a football match (see [11,20] and Élö-Runyan’s classification of all the National squads with a membership to the FIFA). The Élö–Runyan index of a team ( t ) at time n can be computed with the regression curve of its current performance E n ( t ) with respect to its past performance, and that of the squad it faces, say ( t ) ,
E n ( t ) = E n 1 ( t ) + K · G W 1 + 10 E n 1 ( t ) E n 1 ( t ) 400 1 ,
where K is an arbitrary weighting constant whose value depends on the character of the match, and is given by
K : = 60   for   World   Cup   finals ; 50   for   continental   championship   finals   and   major   intercontinental   tournaments ; 40   for   major   tournaments   and   continental   or   World   Cup   qualifiers ; 30   for   all   other   tournaments ; 20   for   friendly   matches .
In addition, G is an adjustment parameter given by
G : = 1 if   the   teams   tie   or   there   is   a   difference   of   one   goal , 1.5 if   there   is   a   two - goal   difference 11 + d 8 for   a   d - goal   difference ,   with   d = 3 , 4 ,
Finally, the symbol W equals 1 for a win, 0.5 for a draw, and 0 for a loss. Squads with higher Élö–Runyan rating have a higher probability of winning a game than a troupe with a lower Élö–Runyan rating. The probability that team ( t ) defeats team ( t ) is
p : = 1 + 10 E n 1 ( t ) E n 1 ( t ) 400 1 .
As a consequence, the probability that team ( t ) defeats team ( t ) is given by
1 p = 1 + 10 E n 1 ( t ) E n 1 ( t ) 400 1 .
After each game, the Élö–Runyan ratings of the squads (1) are updated according to Algorithm 1.
Algorithm 1:Élö–Runyan rating method.
Mathematics 10 04587 i001
If a team with a higher Élö–Runyan rating wins, only a few points are transferred from the lower-rated cadre. However, if a lower-rated squad wins, then transferred points from a higher-rated player are far greater.

The English Premier League: Élö–Runyan Analysis

We consider the information we retrieved from seasons 2005/2006–2020/2021. We assign all teams a base score of 1500 Élö points (which is an average score and is widely used and accepted for football squads) and use (1) with K = 30 to the classifications displayed in Figure 5 and Figure 6.
Figure 7 is the Élö–Runyan analog of Figure 4. A quick comparison yields the following.
  • The score needed to be champion has a negative trend in the last two seasons.
  • The Élö–Runyan score required to maintain the category has is much more fluctuating than the actual classification (which represented a much more constant score).
Table 2 is a confusion matrix to compare our results with the ones presented in Section 2. The rows represent the instances in an actual class, while the column stand for the instances in a predicted class.
Divide the trace of the matrix by the sum of all its entries to obtain a quotient of 84.37%, an acceptable value.
Table 3 and Table 4 show the different percentages of hits we attained for each season with respect to the real database. The last row stands for the weighted average of hits in each season.
Remark 3.
(i)There are several seasons that had a 100% success rate, but the 2017/2018 season stands out because the difference between first and second place was 19 points, whereas the difference between the UEL group and the average table was of nine points.
(ii) The group with the most successes is the UCL, with a success rate of approximately 91%, and this is due to the advantage that these teams usually have over the rest; mainly in monetary terms.
(iii) The relegation group has a fairly high success rate. The main reason for this is that most of the relegated teams have only been in the first division for one year; that is, they have only played the season they are relegated, because the demand to play in the Premier League and in the second division is quite big.
(iv) The UEL group has the lowest level of success; however, this percentage is reasonable because it only has two places, which are usually much more disputed.
(v) It should be noted that there are two groups in which there is an overlap, where the champion and the first classified in the UCL group coincide, so the Champions League group is made up of four teams and not three.
(vi) Lastly, the champion and midtable groups have confidence levels of 81.25% and 90.34%, respectively. Both percentages are reasonable because there are few seasons in which the champion is defined in the last days. On the other hand, in the middle table group (without counting the teams that are fighting for relegation or for some European competition), they are in the “quiet” zone of the table, which helps to present a high level of reliability.

4. The Rise of Actual Champions

We devote this section to explaining the essentials of Poisson’s method to forecast the result of a football match. We base our presentation in [1,21] and Chapter 1 in [2] and use the information we retrieved on the Premier League for the purpose of illustration. The core of the technique is simple: let N ( t ) be the random variable that represents the number of goals scored by squad ( t ) during a football match against team ( t ) . Naturally, this random variable depends on the strength of the scoring party and the weakness of its rival.
The main assumption of this model is that N ( t ) follows a Poisson distribution with parameter
λ ˚ ( t ) + λ ˜ ( t ) 2 ,
where λ ˚ ( t ) is the mean of the goals scored per match by team ( t ) , and λ ˜ ( t ) is the mean of the goals received by squad ( t ) . As is thoroughly covered in [1], this assumption is justified by the convergence in distribution of the Binomial distribution to Poisson’s law when the number of experiments increases ad infinitum as the probability of goal per experiment becomes smaller.
To reflect the effect of playing as hosts and guests, we define the random variables N h ( t ) and N g ( t ) , respectively, and thus, for N h ( t ) , the intensities λ ˚ ( t ) and λ ˜ ( t ) referred to by (3) are replaced by λ ˚ h ( t ) and λ ˜ g ( t ) and represent the mean of the goals per match scored by team ( t ) when they play as hosts and the mean of the goals per match received by squad ( t ) when they play as guests. Analogously, for N g ( t ) , the intensities λ ˚ ( t ) and λ ˜ ( t ) referred to by (3) are replaced by λ ˚ g ( t ) and λ ˜ h ( t ) and represent the mean of the goals per match scored by the troupe ( t ) when they play as guests and the mean of the goals per match received by the crew ( t ) when they play as hosts.
Next, it suffices to generate a realization of the random variables N h ( t ) and N g ( t ) to simulate the result of the pairing ( t ) versus ( t ) . To this end, it is possible to use the cumulative product algorithm presented in, for instance, Section 4.2 in [22] and Algorithm 1 in [1].
The procedure we just described enables us to simulate one result of the direct match between ( t ) and ( t ) . However, one simulation is by no means sufficient to forecast the result. We can use the weak law of large numbers (see Chapters 7.4 and 7.5 in [23], Chapter 8 in [24], Chapter 8.4 in [25], Chapter 8 in [26,27] and Teorema VI.1 in [1]) to simulate the result of the match between these teams a large number of times. Next, we compute the proportion of times that each outcome is observed. The weak law of large numbers allows us to interpret these numbers as the probabilities that ( t ) wins, ( t ) wins, and the match ends up tied. This technique is widely known as the Monte Carlo simulation technique (see, for instance, Exercise 5.6 in [28]). If we multiply the former two of these probabilities by three, the latter by one, and add the result to the points earned by each of the squads, we will obtain an expected distribution of the points disputed in that match.
The final step is to simulate each match in the competition by means of this procedure to produce the final standings of the tournament.

The English Premier League: Poisson Analysis

Table 5 displays the resulting leader board of the 2021/2022 season of the English Premier League when we use the procedure we just described with the information from that season. In this simulation, we used 10,000 iterations of the tournament.
The weighted average percentage rate of accuracy in this exercise rises to 79.99%. This is not a bad result, however, as we have previously stated, the model we use heavily relies on the assumption that the distribution of scored and received goals follows Poisson’s law. This means that the intensity described by (3) remains constant throughout the match. This condition is not uniformly met, and is reflected in the discrepancies between the simulated number of points, and the actual number of points obtained by each team.
A plausible alternative is to keep the assumption that the distribution of goals is Poisson’s with mean Λ , and suppose further that such Λ is a random variable itself with a suitable distribution, for instance, a Type III-Pearson’s distribution (i.e., a Gamma random variable). This approach is very standard for actuarial practitioners modeling the number of claims in insurance (see, for instance, Example 12.3.1 in [29] and [30] pp. 30–33). We borrow the next result from Chapter 3 in [30] and include it here for the sake of self-completeness of our presentation.
Theorem 1.
Let [ N Λ ] follow Poisson’s law with mean Λ, and Λ be a random variable with a Γ ( α , β ) -distribution, where α > 0 is a shape parameter; and β > 0 , a rate parameter. That is, Λ has a density function given by
f Λ ( λ ) = β α Γ ( α ) λ α 1 e β λ ,
for λ > 0 , where Γ ( α ) : = 0 t α 1 e t d t . Then,
P ( N = n ) = Γ ( n + α ) n ! Γ ( α ) β 1 + β α 1 1 + β n ,
for n = 0 , 1 , 2 ,
The random variable N referred to by (4) in the above result follows the so-called negative binomial distribution with mean and variance given by
m = α β ,
s 2 = α β 1 + 1 β ,
respectively. Note, in particular, that the variance is larger than the mean. This property holds in general for all mixtures of Poisson random variables. We will profit from it in the next section.
Proof of Theorem 1.
The theorem of total probability yields
P ( N = n ) = 0 P ( N = n Λ = λ ) f Λ ( λ ) d λ = 0 λ n n ! e λ β α Γ ( α ) λ α 1 e β λ d λ = β α Γ ( α ) n ! 0 λ n + α 1 e λ ( 1 + β ) d λ .
Writing z : = λ ( 1 + β ) (y d z = ( 1 + β ) d λ ), we have
P ( N = n ) = β α n ! Γ ( α ) 0 z 1 + β n + α 1 e z 1 1 + β d z = β α n ! Γ ( α ) 1 1 + β n + α 0 z n + α 1 e z d z = β α n ! Γ ( α ) 1 1 + β n + α Γ ( n + α ) .
Rearranging this expression yields (4). This completes the proof. □
In the next section, we will combine the schemas we have presented so far. However, we will test how well the Poisson random variable fits the number of goals scored and received by the opponents, and we will replace it when a test of hypothesis rejects it with the negative binomial (resp. binomial) distribution when the variance is larger (resp. smaller) than the mean. This approach will allow us to consider random changes in the intensities of the performance of the squads.

5. Test of the Élö–Runyan–Poisson–Pearson Model

The methods we presented in Section 3 and Section 4 can confidently be used for classifying football factions. Actually, the Élö–Runyan method has been extensively used for forecasting the outcome of any particular match between crews in the same league, whereas the Poisson method has been used to forecast the score of matches between teams that do not necessarily face each other with regularity. In this section, we propose a modification of the Poisson method so as to try to forecast the result of the upcoming World Cup in Qatar 2022. To this end, we develop the methodology in the following subsection, test it in the results of the last World Cup, and finalize by giving a forecast of Qatar’s outcome in the last subsection.

5.1. The Élö–Runyan–Poisson–Pearson Model

To begin with, it is mandatory to rely on a sufficiently large database with the results of each team. The idea is to keep track of the number of goals scored and received by the squads as hosts and guests for a reasonable period. Then, we compare the mean to the variance and run a goodness-of-fit test to decide what random variable should be used to simulate each random variable.
Next, it is necessary to estimate the parameters of the chosen distribution. Then, for each match, we proceed to generate the number of goals M h ( t ) scored by team ( t ) when playing as hosts against team ( t ) by means of the following tuned version of (3),
M h ( t ) = p N ˚ h ( t ) + N ˜ g ( t ) ,
where N ˚ h ( t ) stands for the number of goals scored by team ( t ) as hosts (regardless of the performance of the team ( t ) ), N ˜ g ( t ) represents the number of goals received by team ( t ) when playing as guests (regardless of the performance of the team ( t ) ), and p is the Élö–Runyan probability that the squad ( t ) bests squad ( t ) according to (2).
Analogously, we generate the number of goals M g ( t ) scored by team ( t ) when playing as guests against team ( t ) ,
M g ( t ) = ( 1 p ) N ˚ g ( t ) + N ˜ h ( t ) ,
where N ˚ g ( t ) stands for the number of goals scored by team ( t ) as guests (regardless of the performance of the team ( t ) ) and N ˜ h ( t ) represents the number of goals received by team ( t ) when playing as hosts (regardless of the performance of the team ( t ) ). Iterating (7) and (8) and using the weak law of large numbers yields a forecast of the match between teams ( t ) and ( t ) .
It should be noted as well that a fundamental difference between (7) and (8) and (3) is the fact that there are no a priori equal weighting factors for each of the parameters of the random variables to be generated. Indeed, in (3), it is just as likely to score a goal due to the party’s own strength than to the other’s squad weakness. In spite of this, in (7) and (8), we substitute the original 1 / 2 factors by the Élö–Runyan probabilities given by (2). This measure enables us to account for the strength of the team which has a better Élö–Runyan rating.
We set the null hypothesis that the distribution of the goals is Poisson, and distinguish three cases.
(i) 
In the case that we fail to reject this hypothesis, we take advantage of the fact that the Poisson mass function can be recursively written as
P ( N = n + 1 ) = λ n + 1 P ( N = n ) ,
for λ > 0 , and n = 0 , 1 , , with the initial condition P ( N = 0 ) = e λ , and mirror this fact by means of Algorithm 2 (which we borrowed from Chapter 4.2 in [22]) to simulate the number of goals.
(ii) 
If we reject the null hypothesis that the distribution of goals is Poisson, and the variance of the goals is lower than the mean, we use a binomial distribution with mean x ¯ . In this case, we refer to Section II in [1] to interpret x ¯ 90 as the probability that a goal is scored at any of the 90 min in a match and take 90 as the total number of trials a team will have to do it. We profit from the fact that the binomial mass with parameters 90 and x ¯ 90 can be recursively written as
P ( N = n + 1 ) = 90 n n + 1 x ¯ 90 x ¯ P ( N = n ) ,
for n = 0 , 1 , , 90 , with initial condition P ( N = 0 ) = 1 x ¯ 90 90 , and use Algorithm 3 (which we take from Chapter 4.3 in [22]) to simulate binomial random variables.
Algorithm 2: Generation of a Poisson realization.
Mathematics 10 04587 i002
Algorithm 3: Generation of a binomial realization.
Mathematics 10 04587 i003
(iii) 
If we reject the null hypothesis that the distribution of goals is Poisson, and the variance is larger than the mean, we take advantage of the fact that the negative binomial random variable can be recursively written as
P ( N = n + 1 ) = n + α ( n + 1 ) ( 1 + β ) P ( N = n ) ,
for n = 0 , 1 , with the initial condition P ( N = 0 ) = β 1 + β α , use the method of moments (see, e.g., Chapter VII.2.1 in [31]) and (5) and (6) to obtain the estimates of the parameters
α ^ = x ¯ 2 s 2 x ¯ ,
β ^ = x ¯ s 2 x ^ ,
and mirror these facts by means of Algorithm 4 to simulate negative binomial random variables.
Algorithm 4: Generation of a negative binomial realization.
Mathematics 10 04587 i004
The general Élö–Runyan–Poisson–Pearson model to forecast the score in a direct match between two football teams is as in Algorithm 5.
Algorithm 5: The Élö–Runyan–Poisson–Pearson model.
Mathematics 10 04587 i005
We complete this subsection with a few comments on Algorithm 5.
Remark 4.
(a)A plausible interpretation of (3) is that the number of goals scored by a team is due to the offensive power of the the team itself, and to the defensive weakness of its rival. Algorithm 5 emphasizes this assumption by acknowledging the performance of a team by updating the Élö–Runyan ratings after it has simulated a match. We develop the potential of this feature later in Algorithms 6 and 7 below.
(b) The random variables (7) and (8) are floor functions of convex linear combinations of discrete-type random variables. This ensures that we maintain the simulated values in the realm of integral numbers for both squads.
(c) Because it is very difficult for the actual data to give a mean equal to its variance, we consider the use of the goodness-of-fit test in all three cases. This allows these two statistics to be statistically equivalent.
(d) For the case where the variance is lower than the mean, we follow the approach devised in [1] and Chapter 1 in [2] and chose 90 and x ¯ / 90 as the parameters of the corresponding binomial random variable. Another possibility would be to estimate the parameters by means of another method.
(e) The estimates (12) and (13) correspond to the standard when it comes to using negative binomial random variables in the insurance context. However, this is not the only alternative. It is also possible to use the method of maximum likelihood to this end. In this case, we obtain the system of equations
β 1 + β T α T α + i = 1 T y i = 0 ,
T ln T α T α + i = 1 T y i T 0 t α 1 e α ln t d t Γ ( α ) 1 + i = 1 T 0 t α + y i 1 e α y i ln t d t Γ ( α + y i ) 1 = 0 ,
where y 1 , y 2 , , y T is the sample of goals in each match. These expressions are solvable for α and β via (for example) Newton–Raphson’s method.
(f) Algorithms 2–4 profit from the widely known recursion formulas (9)–(11) and their corresponding initial conditions. However, it turns out that this is possible only if the mass function of N is such that there exist constants a and b such that
P ( N = n ) = a + b n P ( N = n 1 ) f o r   n = 1 , 2 ,
Surprisingly enough, there are no other random variables that meet this condition (see Theorem 17.3 in [32]). This fact explains our decision to stick to these particular random variables.

5.2. Russia 2018

Every four years, the representatives of all six confederations that compose FIFA gather in the most important tournament for any sport on the face of the Earth: the FIFA World Cup. The participating confederations are
  • the Asian Football Confederation (AFC);
  • the UEFA;
  • the Confédération Africaine de Football (CAF);
  • the Confederation of North, Central American and Caribbean Association Football (CONACAF);
  • the Confederação Sul-Americana de Futebol (CONMEBOL); and
  • the Oceania Football Confederation.
The current format of the FIFA World Cup comprises two phases. In the first round, the 32 invited teams are divided into eight groups and in each group, they play three matches among themselves in a round robin mode. Afterward, the best two factions of each sector move on to a knockout round of four matches.
We display the names and the Élö–Runyan rating (computed with Algorithm 1) of the teams that played in the Russia 2018 World Cup according to the confederation to which each belongs. We obtained the Élö–Runyan ratings by analyzing the information of all the matches of the teams listed above in the period from 14 July 2014, until 13 June 2018 (that is, right after Brazil’s World Cup, and the beginning of Russia’s World Cup). We applied Algorithm 1 with a uniform K = 20 for all the matches to amount for the fact that, during COVID-19 pandemic, access to some of the strongest teams around the world became restricted and was limited. The initial Élö–Runyan rating we used for all the teams was 1500.
  • UEFA: Belgium (1633.222), Croatia (1724.443), Denmark (1564.427), England (1604.612), France (1742.171), Germany (1426.583), Iceland (1550.110), Poland (1456.364), Portugal (1516.734), Russia (1577.244), Serbia (1485.873), Spain (1543.799), Sweden (1557.115), and Switzerland (1581.902).
  • CONMEBOL: Argentina (1558.131), Brazil (1635.239), Colombia (1626.491), Peru (1521.952), and Uruguay (1646.054).
  • CONCACAF: Costa Rica (1422.986), Mexico (1485.310), and Panama (1412.645).
  • CAF: Egypt (1377.147), Morocco (1374.678), Nigeria (1384.316), Senegal (1377.673), and Tunisia (1371.521).
  • AFC: Australia (1447.447), Iran (1458.788), Japan (1513.472), Saudi Arabia (1453.555), and South Korea (1443.991).
We devote this subsection to replicating the Russian edition of the World Cup so as to measure the accuracy of the method we proposed in Section 5.1. It is worth mentioning that, despite the fact that, in practice, there is only one host team (the organizer), we have decided to keep the host–guest format because it amounts to the performance of the squads during the tournament.
The model we use for the first phase of the tournament is essentially the same one we have used to produce Table 5. However, we applied Algorithm 5 (with K = 60 , for this reflects the fact that we intend to simulate a World Cup) with the corresponding rates listed above to forecast one outcome of the first match, and then we used the Monte Carlo simulation technique to compute averages of the Élö–Runyan ratings of teams ( t ) and ( t ) and
  • the proportion of times that ( t ) wins,
  • the proportion of times that ( t ) wins, and
  • the proportion of times when the match ends up tied.
Then, we multiply the former two of these probabilities by three, the latter by one, and add the result to the points earned by each of the squads. This procedure yields an expected distribution of the points disputed in each match. Algorithm 6 summarizes this. The start date of the World Cup is assumed to be n.
As for the knockout round, we ran all T = 10,000 simulations of each match and chose the team with the largest probability of winning as the one who stayed in the competition.
Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 display the initial arrangement of the teams in the tournament, along with the mean and variance of the goals they scored and received in the intercup period in all the matches they played. We made the decision on the random variable used for simulating the goals scored and received running a goodness-of-fit test with the null hypothesis that the distribution was Poisson. Should we fail to reject this null hypothesis, we would choose a Poisson random variable (see Algorithm 2). Otherwise, we selected a binomial random variable (see Algorithm 3) if the mean was greater than the variance; and a negative binomial distribution with parameters (12) and (13) (see Algorithm 4) in the remaining case.
Algorithm 6: Results of the first phase for a Word Cup group.
Mathematics 10 04587 i006
We initialized Algorithm 6 with the Élö–Runyan ratings we listed above and set T = 10,000 to obtain the results of this phase. The results are as in Table 14. To measure the accuracy of our simulations, we have opted for a format which is analogous to that of Table 5.
The knockout stage of the FIFA World Cup is the second and final stage of the competition, following the 48 matches of the group stage. It is subdivided into four substages: round of 16, quarterfinals, semifinals, and a final match. The teams that survive the first phase line up according to Figure 8 and start the second phase.
To simulate each match in this phase, we produced Algorithm 7, which is a variant of Algorithm 6. Note that the main difference between the former respective to the latter is that it does not assume the presence of four teams, and therefore, does not make the allocation of points from lines 1–2 and 15. Most of all, it does not allow more than one troupe in the next phase. Also, observe that Algorithm 7 breaks ties up via a coin flip (see lines 14–18). Hence mirroring the popular voice that says: “Penalty shoot-outs are basically crap shoots”.
Algorithm 7: Result of a second-phase match in a World Cup group.
Mathematics 10 04587 i007
Remark 5.
One might argue that, in spite of the fact that Algorithms 6 and 7 rely on the update of the Élö–Runyan ratings (by calling Algorithm 5), they do not update the data of the scored and received goals so as to verify the distribution of these random variables in the next iterations of Algorithm 5. However, we assume that, because Algorithm 5 is producing simulations according to the original distributions of such random variables, they do not change as the tournament evolves. That is, they are realizations of a stationary stochastic process.
Applying Algorithm 7 with T = 10,000, the Élö–Runyan ratings for both squads and the historical data of the goals scored and received by each pair of teams, we predict the outcomes of the second phase in Russia 2018 World Cup. The Table 15, Table 16, Table 17 and Table 18 establish the comparison.
According to the results displayed in Table 14, Table 15, Table 16, Table 17 and Table 18, Algorithms 6 and 7 have a combined accuracy of 87.09%.

5.3. Qatar 2022

We display the names and Élö–Runyan ratings of the 32 countries participating in the World Cup of Qatar 2022 according to their confederation of origin.
  • UEFA: Belgium (1662), Croatia (1571), Denmark (1659), England (1649), France (1659), Germany (1615), Netherlands (1711), Poland (1559), Serbia (1619), Spain (1645), Switzerland (1592), Portugal (1633), and Wales (1504).
  • AFC: Australia (1574), Iran (1621), Japan (1693), Qatar (1552), Saudi Arabia (1558), and South Korea (1652).
  • CAF: Cameroon (1552), Ghana (1512), Morocco (1675), Senegal (1636), and Tunisia (1584).
  • CONCACAF: Canada (1613), Costa Rica (1545), Mexico (1573), and the United States of America (1638).
  • CONMEBOL: Argentina (1732), Brazil (1760), Ecuador (1552), and Uruguay (1574).
Table 19, Table 20, Table 21, Table 22, Table 23, Table 24, Table 25 and Table 26 display the initial arrangement of the teams in the tournament, along with the mean and variance of the goals they scored and received in the intercup period. This distribution became public on 1 April 2022 at the FIFA Congress 2022, celebrated in the city of Doha, Qatar. These tables also display the random variable used for simulating the goals scored and received by comparing the mean to the variance and running a goodness-of-fit test with the null hypothesis that the distribution was Poisson. It is remarkable that most of the random variables were modeled with Poisson distributions; some of them, with a negative binomial random variable, while only two were modeled after a binomial random variable (see Table 23).

5.3.1. First Phase

We applied our simulation Algorithm 6 with T = 10,000 times and obtained the leader boards displayed in Figure 9.

5.3.2. Second Phase

We iteratively applied Algorithm 7 with T = 10,000, the Élö–Runyan ratings for both squads and the stationary trajectories of the goals scored and received by each pair of teams, Figure 10 forecasts the outcomes of the second phase in Qatar 2022 World Cup starting from the Round of 16. The percentages below the name of the teams are the predicted proportion of times that each team beats its rival.
We forecast that Brazil will end up rising as world champions, because its Élö–Runyan rating will be of 1840.616153, against Belgium’s 1741.27328. However, we should keep in mind that mathematics will not play football, men will.

6. Aftermatch

This work represents a follow-up of a lot of references dealing with the Élö–Runyan rating system for football, and of Poisson’s method to forecast a football match (see [1]). We illustrated each of these methods in the English Premier League in Section 3 and Section 4, respectively. We have also presented Algorithm 5, a technique that extends Poisson methodology to forecast the result of a football match under the assumption that the historical data of the goals scored and received by a cadre are stationary (see Assumption 1 and Remark 5).
Our main contribution is the incorporation of the Élö–Runyan rating system to Poisson’s method to account for the strength of a team in an ongoing tournament while emphasizing the assumption that the goals scored by a team is due to both the offensive power of the the team itself and the defensive weakness of its rival. Another novelty included in Algorithm 5 is the inclusion of a goodness-of-fit test to decide if the Poisson random variable is a suitable option to simulate the goals scored and received by each opponent, or not. In the latter case we compare the mean to the variance of the historic data, and use a binomial or a negative binomial for the simulations. Because the mass functions of these random are likely to be written in a recursive way, they represent the standard used by actuaries modelling claims frequency in, for instance, car insurance. Moreover, we have taken a chance to apply Algorithm 5 to the upcoming World Cup Qatar 2022 before it officially starts (see Algorithms 6 and 7).
A plausible extension of this work is to loosen the assumption that, should the Poisson distribution fail to fit the historic data, then it is necessary to choose between the binomial or negative binomial random variables for the simulations. In this case, it would be possible to fit a mixture of Poisson random variables (when the variance is larger than the mean), or even an empirical distribution. Another area of opportunity lies in the use of stationary random processes (see Assumption 1(i)), for one could withdraw this hypothesis and reflect it in our algorithms.

Author Contributions

Conceptualization, methodology, original draft preparation, editing, funding acquisition, supervision, and project administration are due to J.D.L.-B. Software, data validation, formal analysis, data curation, and visualization are due to D.A.Z.-N., E.X.H.-P. and Y.E.-B. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Universidad Anáhuac México.

Data Availability Statement

All the codes and data can be found in: https://github.com/DonDisparates/qatar2022.git, accessed on 13 November 2022.

Acknowledgments

We sincerely thank the contribution of the following people: Ingrid Bárcenas-Hernández, José Antonio Contró-Sánchez, Aldo Martínez-Arias, José Manuel Mendoza-Madrid, Dafne Mendoza-Ramos, Michelle Mondragón-Sainz, and Sara Ortega-Ricarte. All of them worked very hard retrieving data from several websites so that the authors could use the information in this research paper. We also thank the beautiful Figure 9 and Figure 10 designed by Priscilla Camargo–Bacha, Graphic Designer of the Faculty of Actuarial Sciences at Universidad Anáhuac México. Last, but not least, we sincerely thank all three anonymous expert reviewers appointed by Preda for their thorough revision of the early version of our work. We sincerely believe their comments enhanced the presentation and accuracy or this research paper.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. López-Barrientos, J.D.; Silva, E.; Lemus-Rodríguez, E. Captain Tsubasa II: El Surgimiento de Campeones Virtuales. In Investigación Estocástica y Estadística en la Educación; Velasco-Luna, F., Ed.; Benemérita Universidad Autónoma de Puebla: Puebla, Mexico, 2022; Chapter 8; pp. 120–139. [Google Scholar]
  2. Sumpter, D. Soccermatics: Mathematical Adventures in the Beautiful Game; Bloomsbury Sigma: London, UK, 2017. [Google Scholar]
  3. Élö, A.E. The Rating of Chessplayers. Past and Present; Arco Publishing Incorporated: New York, NY, USA, 1986. [Google Scholar]
  4. Dixon, M.; Coles, S. Modelling Association football scores and inefficiencies in the football betting market. J. R. Stat. Soc. Ser. C 1997, 46, 265–280. [Google Scholar] [CrossRef]
  5. Chalikias, M.; Kossieri, E.; Lalou, P. Football matches: Decision-making in betting. Teach. Stat. 2020, 42, 4–9. [Google Scholar] [CrossRef]
  6. Fedrizzi, G.; Canal, L.; Micciolo, R. UEFA EURO 2020: An exciting match between football and probability. Teach. Stat. 2022, 44, 119–122. [Google Scholar] [CrossRef]
  7. Koopman, J.; Lit, R. A dynamic bivariate Poisson model for analysing and forecasting match results in the English Premier League. J. R. Stat. Soc. Ser. A 2015, 178, 167–186. [Google Scholar] [CrossRef] [Green Version]
  8. Saraivaa, E.F.; Suzuki, A.K.; Filhob, C.A.; Louzadab, F. Predicting football scores via Poisson regression model: Applications to the National Football League. Commun. Stat. Appl. Methods 2016, 23, 297–319. [Google Scholar] [CrossRef] [Green Version]
  9. Wheatcroft, E. Forecasting football matches by predicting match statistics. J. Sport. Anal. 2021, 7, 77–97. [Google Scholar] [CrossRef]
  10. Ötting, M.; Groll, A. A regularized hidden Markov model for analyzing the ‘hot shoe’ in football. Stat. Model. 2021, 22, 6. [Google Scholar] [CrossRef]
  11. Lyons, K. What Are the World Football Elo Ratings? The Conversation. 11 June 2014. Available online: https://theconversation.com/what-are-the-world-football-elo-ratings-27851 (accessed on 13 November 2022).
  12. Dixon, M.; Robinson, M. A birth process model for association football matches. Statistician 1998, 47, 523–538. [Google Scholar] [CrossRef]
  13. Fisher, R. The significance of deviations from expectations in a Poisson series. Biometrics 1950, 6, 17–24. [Google Scholar] [CrossRef] [Green Version]
  14. Rue, H.; Salvesen, Ø. Prediction and retrospective analysis of soccer matches in a league. J. R. Stat. Soc. Ser. D 2000, 49, 339–418. [Google Scholar] [CrossRef]
  15. Stefani, R.; Pollard, R. Football Rating Systems for Top-Level Competition: A Critical Survey. J. Quant. Anal. Sport. 2007, 3, 3. [Google Scholar] [CrossRef]
  16. Forbes Staff. Barcelona Supera al Real Madrid y se Convierte en el Club de Futbol Más Valioso. Forbes México. 12 April 2021. Available online: https://www.forbes.com.mx/barcelona-real-madrid-club-mas-valioso-del-mundo/ (accessed on 13 November 2022).
  17. Vázquez, F. Descenso en Premier League, Una Lucha por US250 mMillones. El Economista. 21 June 2020. Available online: https://www.eleconomista.com.mx/deportes/Descenso-en-Premier-League-una-lucha-por-US250-millones-20200621-0049.html (accessed on 13 November 2022).
  18. Luca. Historia de la Premier League: Orígenes, Leyendas y Curiosidades. RetroFootball. 1 September 2020. Available online: https://www.retrofootball.es/retroblog/historia-de-la-premier-league-origenes-leyendas-y-curiosidades/ (accessed on 13 November 2022).
  19. Dias, S. Leicester City Defy 5000-1 Odds to Become Premier League Champions. Sports Lumo. 2 May 2016. Available online: https://sportslumo.com/football/may-2-2016-leicester-city-defy-5000-1-odds-to-become-premier-league-champions/#:~:text=Leicester%20City%20defy%205000%2D1 (accessed on 13 November 2022).
  20. Goddard, J. Regression models for forecasting goals and match results in association football. Int. J. Forecast. 2005, 21, 331–340. [Google Scholar] [CrossRef]
  21. Chu-Chun-Lin, S. Rendez-vous of the Poisson and exponential distributions at the World Cup of Soccer. Teach. Stat. 1999, 21, 60–62. [Google Scholar] [CrossRef]
  22. Ross, S.M. Simulation; Academic Press Inc.: Cambridge, MA, USA, 1997. [Google Scholar]
  23. Grimmett, G.; Stirzaker, D. Probability and Random Processes; Oxford Science Publications: Oxford, UK, 1994. [Google Scholar]
  24. Grinstead, C.M.; Snell, J.L. Introduction to Probability; American Mathematical Society: Providence, RI, USA, 1997. [Google Scholar]
  25. Hoel, P.G.; Port, S.C.; Stone, C.J. Introduction to Stochastic Processes; Houghton Miffley Company: Boston, MA, USA, 1972. [Google Scholar]
  26. Isaac, R. The Pleasures of Probability; Springer: New York, NY, USA, 1995. [Google Scholar]
  27. Lanier, D.; Trotoux, D. La loi des grands nombres, le théorème de De Moivre-Laplace. In Contribution à une Approche Historique de l’Enseignement des Mathématiques; Evelyne, B., Claude, M., François, C., Eds.; Presses Universitaires de Franche-Comté: Besançon, France, 1996; pp. 259–294. [Google Scholar]
  28. Durrett, R. Probability. Theory and Examples; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  29. Bowers, N.L., Jr.; Gerber, H.U.; Hickman, J.C.; Jones, D.A.; Nesbitt, C.J. Actuarial Mathematics; The Society of Actuaries: Schaumburg, IL, USA, 1997. [Google Scholar]
  30. Lemaire, J. Bonus-Malus Systems in Automobile Insurance; Springer Science+Business Media LLC.: New York, NY, USA, 1995. [Google Scholar]
  31. Mood, A.M.; Graybill, F.A.; Boes, D.C. Introduction to the Theory of Statistics; McGraw-Hill: New York, NY, USA, 1974. [Google Scholar]
  32. Promislow, D. Fundamentals of Actuarial Mathematics; John Wiley and Sons Ltd.: London, UK, 2011. [Google Scholar]
Figure 1. Example of data presentation.
Figure 1. Example of data presentation.
Mathematics 10 04587 g001
Figure 2. Regression curve between the obtained points and the goal difference at the ending of season 2005/2006.
Figure 2. Regression curve between the obtained points and the goal difference at the ending of season 2005/2006.
Mathematics 10 04587 g002
Figure 3. Leader boards of the first seven seasons in our study.
Figure 3. Leader boards of the first seven seasons in our study.
Mathematics 10 04587 g003
Figure 4. Evolution of the number of points needed to belong to each of the first four groups.
Figure 4. Evolution of the number of points needed to belong to each of the first four groups.
Mathematics 10 04587 g004
Figure 5. Final standings obtained by means of (1) for seasons 2005/2006–2012/2013.
Figure 5. Final standings obtained by means of (1) for seasons 2005/2006–2012/2013.
Mathematics 10 04587 g005
Figure 6. Final standings obtained by means of (1) for seasons 2013/2014–2020/2021.
Figure 6. Final standings obtained by means of (1) for seasons 2013/2014–2020/2021.
Mathematics 10 04587 g006
Figure 7. Evolution of the number of points needed to belong to each of the first four groups according to Algorithm 1.
Figure 7. Evolution of the number of points needed to belong to each of the first four groups according to Algorithm 1.
Mathematics 10 04587 g007
Figure 8. Schematic of the final phase.
Figure 8. Schematic of the final phase.
Mathematics 10 04587 g008
Figure 9. Final standings for the first phase.
Figure 9. Final standings for the first phase.
Mathematics 10 04587 g009
Figure 10. The arrangement mirrors that of Figure 8.
Figure 10. The arrangement mirrors that of Figure 8.
Mathematics 10 04587 g010
Table 1. Analysis of variance.
Table 1. Analysis of variance.
Source DFAdj. Sum Sq.Adj. Mean Sq.F Valuep-Value
RegressionGD16118.316118.31238.250.000
Error 18462.2425.68
Lack of adj.16385.7424.110.630.765
Pure error276.5038.25
Total 196580.55
Table 2. This confusion matrix shows the accuracy of our back-cast.
Table 2. This confusion matrix shows the accuracy of our back-cast.
ChampionUCLUELMiddle TableRelegation
Champion133000
UCL339510
UEL051890
Middle table0191597
Relegation000741
Table 3. Comparison of hits of Algorithm 1 with respect to the actual results of of seasons 2005/2006–2012/2013.
Table 3. Comparison of hits of Algorithm 1 with respect to the actual results of of seasons 2005/2006–2012/2013.
GROUP2005/20062006/20072007/20082008/20092009/20102010/20112011/20122012/2013
Champion0%100%0%100%0%100%100%100%
UCL100%100%100%100%75%75%75%75%
UEL50%100%50%50%0%0%50%0%
Midtable81.82%90.91%90.91%72.73%90.91%90.91%90.91%90.91%
Relegation66.67%66.67%100%33.33%100%100%100%100%
TOTAL75.00%90.00%85.00%70.00%76.25%81.25%86.25%81.25%
Table 4. Comparison of hits of Algorithm 1 with respect to the actual results of seasons 2013/2014–2020/2021.
Table 4. Comparison of hits of Algorithm 1 with respect to the actual results of seasons 2013/2014–2020/2021.
GROUP2013/20142014/20152015/20162016/20172017/20182018/20192019/20202020/2021
Champion100%100%100%100%100%100%100%100%
UCL100%100%75%100%100%75%100%100%
UEL100%100%50%100%100%50%50%50%
Midtable100%90.91%81.82%100%100%90.91%90.91%90.91%
Relegation100%66.67%66.67%100%100%66.67%100%100%
TOTAL100.00%90.00%76.25%100.00%100.00%81.25%90.00%90.00%
Table 5. Leader board comparison.
Table 5. Leader board comparison.
PositionRealPointsPoissonPointsAccuracy
Champion1Manchester City93Manchester City76.59100%
UCL2Liverpool92Liverpool75.46100%
3Chelsea74Chelsea67.47
4Tottenham71Tottenham62.50
UEL5Arsenal69Arsenal57.0650%
6Manchester United58West Ham55.46
middle table7West Ham56Leicester City53.8781.81%
8Leicester City52Crystal Palace53.85
9Brighton51Manchester United52.58
10Wolverhampton51Aston Villa51.64
11Newcastle49Brighton51.49
12Crystal Palace48Wolverhampton51.20
13Brentford46Brentford49.91
14Aston Villa45Newcastle46.28
15Southhampton40Burnley45.28
16Everton39Everton44.96
17Leeds38Southhampton44.80
Relegation18Burnley35Leeds40.3166.67%
19Watford23Watford38.01
20Norwich City22Norwich City32.11
Table 6. Group A in Russia 2018.
Table 6. Group A in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Goals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Saudi Arabia1.882.10Poi0.850.98Poi1.704.81Poi1.701.91Poi
Egypt1.300.99Poi0.430.33Poi0.760.53Poi1.240.77Poi
Russia1.522.01Poi1.321.34Poi2.003.79Poi0.800.38Poi
Uruguay2.331.89Poi0.781.28Poi1.090.71Poi1.000.96Poi
Table 7. Group B in Russia 2018.
Table 7. Group B in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
var.
Goals
against
as Host
LawMean
Goals
for as
Guest
var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
var.
Goals
against
as
Guest
Law
Spain3.003.47Poi0.270.20Poi1.673.82Poi0.870.78Poi
Portugal1.912.81Poi0.680.70Poi1.792.06Poi0.630.86Poi
Iran2.121.75Poi0.530.72Poi1.412.01Poi0.410.60NB
Morocco2.192.44Poi0.480.25Poi0.620.85Poi0.460.25Poi
Table 8. Group C in Russia 2018.
Table 8. Group C in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
France2.261.74Poi0.971.13Poi1.441.36Poi0.830.58Poi
Denmark1.702.04Poi0.740.54Poi1.252.11Poi1.200.75Poi
Australia2.653.03Poi0.901.19Poi1.231.18Poi1.150.98Poi
Peru1.741.06Poi0.780.95Poi1.211.38Poi1.181.08Poi
Table 9. Group D in Russia 2018.
Table 9. Group D in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Croatia2.725.31Poi0.390.24Poi1.171.81Poi1.171.72Poi
Argentina2.073.44Poi0.660.85Poi1.894.09Poi1.112.30NB
Nigeria1.581.66Poi0.830.72Poi1.111.36Poi0.840.98Poi
Iceland1.780.95Poi1.001.11Poi1.611.56Poi1.391.62Poi
Table 10. Group E in Russia 2018.
Table 10. Group E in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Brazil1.771.54Poi0.450.34Poi2.362.55Poi0.480.41Poi
Costa Rica1.481.43Poi0.560.47Poi1.101.42Poi1.571.65Poi
Serbia1.831.69Poi1.220.84Poi1.220.84Poi1.111.32Poi
Switzerland1.153.71Poi1.050.71Poi0.951.20Poi2.210.79Poi
Table 11. Group F in Russia 2018.
Table 11. Group F in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Germany2.282.51Poi0.941.06Poi2.065.27Poi0.940.72Poi
South Korea1.241.25Poi0.910.97Poi1.041.12Poi1.402.23NB
Sweden1.263.19Poi0.850.79Poi1.051.27Poi1.680.77Poi
Mexico1.601.47Poi0.691.52Poi1.571.24Poi1.481.52Poi
Table 12. Group G in Russia 2018.
Table 12. Group G in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Belgium3.335.67Poi0.670.67Poi2.472.65Poi1.071.53Poi
Panama1.381.01Poi0.730.50Poi1.001.63Poi1.442.10Poi
Tunisia1.750.75Poi1.000.70Poi1.562.36Poi0.830.58Poi
England2.191.15Poi0.500.50Poi1.802.83Poi0.530.65Poi
Table 13. Group H in Russia 2018.
Table 13. Group H in Russia 2018.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Senegal1.951.68Poi0.500.52Poi1.360.69Bin0.951.23Poi
Japan2.333.17Poi1.061.33Poi1.602.24Poi0.670.49Poi
Poland2.403.28Poi0.960.68Poi2.314.37Poi1.151.51Poi
Colombia1.351.23Poi0.850.73Poi1.422.24Poi0.850.98Poi
Table 14. Group winners comparison.
Table 14. Group winners comparison.
PositionRealPointsAlgorithm 6Estimated PointsAccuracy
Group A1Uruguay9Uruguay6.4842100%
2Russia6Russia5.1548
Group B1Spain5Spain5.6867100%
2Portugal5Portugal4.8296
Group C1France5France7.3876100%
2Denmark5Denmark4.0612
Group D1Croatia9Croatia6.3811100%
2Argentina4Argentina4.6111
Group E1Brazil7Brazil7.0674100%
2Switzerland5Switzerland4.1768
Group F1Sweden6Sweden4.7170100%
2Mexico6Mexico4.3551
Group G1Belgium9Belgium7.3368100%
2England6England6.1921
Group H1Colombia6Colombia6.0252100%
2Japan4Japan4.7146
Table 15. Quarterfinalists.
Table 15. Quarterfinalists.
RealAlgorithm 7Accuracy
UruguayUruguay75%
FranceFrance
BrazilBrazil
BelgiumBelgium
RussiaSpain
CroatiaCroatia
SwedenSweden
EnglandColombia
Table 16. Semifinalists.
Table 16. Semifinalists.
RealAlgorithm 7Accuracy
FranceFrance50%
BelgiumBrazil
CroatiaCroatia
EnglandColombia
Table 17. Finalists.
Table 17. Finalists.
RealAlgorithm 7Accuracy
FranceFrance100%
CroatiaCroatia
Table 18. Champion.
Table 18. Champion.
RealAlgorithm 7Accuracy
FranceFrance100%
Table 19. Group A in Qatar 2022.
Table 19. Group A in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Ecuador1.811.87Poi1.811.85Poi0.901.75Poi1.241.49Poi
Senegal1.680.94Poi0.480.41Poi1.380.48Poi0.750.91Poi
Qatar1.822.35Poi1.131.29Poi1.472.52Poi1.101.89Poi
Netherlands2.002.46Poi1.000.77Poi1.152.75Poi2.101.00Poi
Table 20. Group B in Qatar 2022.
Table 20. Group B in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
England1.653.00Poi1.414.12Poi1.024.14NB0.310.46Poi
Iran1.737.24NB1.203.96Poi0.973.83Poi0.270.34Poi
USA1.843.29Poi0.801.65Poi0.491.49Poi0.350.43Poi
Wales1.221.34Poi1.021.28Poi0.571.20Poi0.520.90Poi
Table 21. Group C in Qatar 2022.
Table 21. Group C in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Poland1.712.14Poi1.100.86Poi1.572.33Poi1.432.07Poi
Mexico1.611.79NB0.710.92Poi0.920.99Poi1.351.69Poi
Saudi Arabia1.611.79Poi0.710.92Poi0.920.99Poi1.351.69Poi
Argentina1.881.92Poi0.590.74Poi1.921.99Poi0.962.04Poi
Table 22. Group D in Qatar 2022.
Table 22. Group D in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Australia2.233.10Poi0.380.39Poi0.742.99Poi2.110.49Poi
France2.503.61Poi0.820.72Poi1.481.30Poi0.810.63Poi
Denmark2.353.92Poi0.380.39Poi2.082.16Poi1.081.49Poi
Tunisia1.702.14Poi0.570.45Poi1.001.52Poi0.901.23Poi
Table 23. Group E in Qatar 2022.
Table 23. Group E in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Spain2.684.01Poi0.520.63Poi1.801.76Poi0.960.68Bin
Costa Rica1.140.90Poi0.670.32Bin0.830.67Poi1.450.94Poi
Germany2.905.47Poi1.141.08Poi1.841.71Poi1.262.19Poi
Japan2.134.38Poi0.841.34NB2.4010.24Poi0.330.36Poi
Table 24. Group F in Qatar 2022.
Table 24. Group F in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Morocco1.831.63Poi0.510.65Poi1.842.24Poi0.680.95NB
Croatia1.810.85Poi1.231.25Poi1.452.75Poi1.402.44Poi
Belgium3.134.69Poi0.881.03Poi2.041.78Poi1.001.48Poi
Canada3.422.98Poi0.420.35Poi2.287.46NB0.920.43Poi
Table 25. Group G in Qatar 2022.
Table 25. Group G in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
Cameroon1.301.69Poi0.430.33Poi0.961.12Poi1.001.48Poi
Brazil2.172.64Poi0.390.35Poi2.172.23Poi0.350.31Poi
Switzerland2.203.36Poi0.680.62Poi1.502.25Poi1.551.16Poi
Serbia2.432.24Poi1.190.82Poi1.281.08Poi1.002.20Poi
Table 26. Group H in Qatar 2022.
Table 26. Group H in Qatar 2022.
TeamMean
Goals
for as
Host
Var.
Goals
for as
Host
LawMean
Goals
against
as Host
Var.
Gals
against
as Host
LawMean
Goals
for as
Guest
Var.
Goals
for as
Guest
LawMean
Goals
against
as
Guest
Var.
Goals
against
as
Guest
Law
South Korea1.882.41Poi0.641.08Poi1.872.92Poi0.671.02NB
Uruguay1.682.64Poi0.530.57Poi1.411.01Poi1.191.82Poi
Portugal2.323.56Poi0.440.32Poi2.052.13Poi1.001.09Poi
Ghana1.291.06Poi0.640.80Poi0.630.24Poi0.841.21Poi
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

López-Barrientos, J.D.; Zayat-Niño, D.A.; Hernández-Prado, E.X.; Estudillo-Bravo, Y. On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches. Mathematics 2022, 10, 4587. https://doi.org/10.3390/math10234587

AMA Style

López-Barrientos JD, Zayat-Niño DA, Hernández-Prado EX, Estudillo-Bravo Y. On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches. Mathematics. 2022; 10(23):4587. https://doi.org/10.3390/math10234587

Chicago/Turabian Style

López-Barrientos, José Daniel, Damián Alejandro Zayat-Niño, Eric Xavier Hernández-Prado, and Yolanda Estudillo-Bravo. 2022. "On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches" Mathematics 10, no. 23: 4587. https://doi.org/10.3390/math10234587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop