On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches

López-Barrientos, José Daniel; Zayat-Niño, Damián Alejandro; Hernández-Prado, Eric Xavier; Estudillo-Bravo, Yolanda

doi:10.3390/math10234587

Open AccessArticle

On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches

¹

Facultad de Ciencias Actuariales, Universidad Anáhuac México, Naucalpan de Juárez 52786, Mexico

²

Facultad de Ciencias Físico-Matemáticas, Benemérita Universidad Autónoma de Puebla, Puebla 72000, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4587; https://doi.org/10.3390/math10234587

Submission received: 14 November 2022 / Revised: 27 November 2022 / Accepted: 29 November 2022 / Published: 3 December 2022

(This article belongs to the Special Issue Probability, Statistics and Their Applications 2021)

Download

Browse Figures

Versions Notes

Abstract

:

This is a work about football. In it, we depart from two well-known approaches to forecast the outcome of a football match (or even a full tournament) and take advantage of their strengths to develop a new method of prediction. We illustrate the Élö–Runyan rating system and the Poisson technique in the English Premier League and we analyze their accuracies with respect to the actual results. We obtained an accuracy of 84.37% for the former, and 79.99% for the latter in this first exercise. Then, we present a criticism of these methods and use it to complement the aforementioned procedures, and hence, introduce the so-called Élö–Runyan–Poisson–Pearson method, which consists of adopting the distribution that best fits the historical distribution of goals to simulate the score of each match. Finally, we obtain a Monte Carlo-based forecast of the result. We test our mechanism to backcast the World Cup of Russia 2018, obtaining an accuracy of 87.09%; and forecast the results of the World Cup of Qatar 2022.

Keywords:

Élö–Runyan rating system; Poisson forecasting method; inverse transform method; recursive distributions; English Premier League; Russia 2018; Qatar 2022

MSC:

60H30; 62P25; 62F07

1. Introduction

This work presents a new methodology by which to forecast the final score of a football match and thus predict the result of a full tournament. It builds on the theory presented in, for instance, Ref. [1], Chapter 1 in [2], and Ref. [3].

We propose a variation of the methods presented in the aforementioned references in the sense that, instead of simulating the number of goals scored and received by each team as a Poisson random variable, we adopt the distribution that best fits the historic distribution of goals according to a basic test of hypotheses (as in [4]). Moreover, we use a Bayesian approach by plugging Élö’s probabilities instead of the a priori assumption that the number of goals for a given crew against another is a uniform average between their goals for and against, respectively.

We illustrate the classic Élö–Runyan method to backcast the final standings of the clubs of several seasons in a national football league, and then we exemplify Poisson’s approach in the final season of that same football league. Finally, we apply the enhanced method we found to a World Cup by making the assumption that the analyses remain valid despite the fact that in the latter case, there are nations instead of clubs. Based on the results previously published in [1], we strongly believe that the knowledge and approaches we found can be used for forecasting other leagues.

Now we summarize the hypotheses behind the model we designed.

Assumption 1.

(i)The goals scored and received by a given team are modelled by stationary continuous-time counting processes whose distributions can be obtained with historical data.

(ii) The number of goals scored by team A against team B depends on the offensive strength of team A, and on the defensive weakness of team B.

(iii) The Élö–Runyan index (see (1) below) can be used to measure the performance of a given team against another one.

1.1. State of the Art and Contribution

The recent literature on forecasting different sports results is vast, and includes, among others, [5,6,7,8,9,10]. Our analyses are based on two cornerstones. The first is Élö’s rating system (see [11] for a quick introduction to the topic) to obtain the probability that a team bests another in a given match. The other is the Poisson simulation of the number of goals scored by two opposing parties, which was thoroughly presented in [12] and can be traced back to [13].

Our work can be located in the gap between references such as [1,14,15]. Indeed, we estimate the probability that a team defeats another one with a Bayesian nonlinear model—Élö’s method—and we apply it by using the Markov chain Monte Carlo iterative simulation technique to attempt to forecast the future result of a full tournament.

We illustrate the Élö–Runyan method in England’s national league, obtaining an accuracy of 84.37%. Then, we backcast another season of the same league with an accuracy of 79.99%. Later, we test the fortified method we devised in the World Cup of Russia in 2018, obtaining an accuracy of 87.09%. Finally, we applied our algorithms to try to predict the outcome of the World Cup of Qatar 2022.

1.2. Computational Resources

To develop our work, we required a sufficiently large database to be able to analyze and compute the necessary estimates for the corresponding matches, seasons, and teams. Our datasets were obtained from football-data and refined with Flash score. In the former, one can find information from over 25 years from various leagues and international competitions from around the world, whereas we used the latter, to sort the matches in the proper chronological order. We used Minitab 21.1.0 and R-4.2.2 for Windows to analyze the datasets and implement our algorithms. All the codes and databases can be found in https://github.com/DonDisparates/qatar2022.git (accessed on 13 November 2022).

1.3. Organization of the Research

The rest of the paper is divided as follows. In the next section, we present a preliminary description of the English Premier League, which we will use as a working example. In Section 3 we present the essentials of Élö’s method, and we illustrate its use by means of a backcast of the English Premier League with information from seasons 2005/2006–2020/2021. In Section 4, we introduce Poisson’s model and replicate the results of the same league for season 2021/2022. Next, in Section 5, we combine the former techniques, show the method’s usage with information from the four years that preceded Russia’s World Cup, and we and conclude with a forecast of the result of the World Cup Qatar 2022. We present our conclusions in Section 6.

2. The English Premier League: Preliminary Description

The Premier League is considered the best football tournament and the most economically attractive in the world. It accounts for nine of the 20 most valuable teams for the year 2021 according to Forbes magazine, something that attracts sponsors and television networks (cf. [16]).

Indeed, during the 2018–2019 season, the league paid between $119 and $186 million to each club in the First Division. On the other hand, when relegated, the league gives these teams 55% of the revenue share in the first year, 45% in their second, and 20% for the third. Relegation is also punished by the fans, so much so that, troupes like Middlesbrough FC, Hull City, and Sunderland lost 17%, 25%, and 33% of the tickets in the season following relegation, respectively. In addition, the sponsors also have a relegation clause, which implies a budget reduction if the faction falls to the Second Division. Finally, a Premier League squad is worth around $445 million and in the Second Division, the average is $49 million (see [17]).

The English league was established in 1888 and was made up of 12 founding teams. It was known as The Football League. Later, in 1892, it absorbed its rival, the Football Alliance, and it was divided into two divisions; the two best aggregations of the Football Alliance, along with all the members of the Football League were part of the First Division. The rest of the Alliance clubs formed the Second Division. A Third Division was created in 1920, and in 1958 the Fourth was born. In 1992, the league changed its name to what we know today as the Premier League. Its main objective was precisely to maximize monetary income, especially through television rights (see [18]).

For this part of the project, we built a database with information of all the matches played in the English domestic league from seasons 2005/2006 to 2020/2021. That is 16 complete seasons, 40 different teams—including the five squads that managed to proclaim themselves champions of the competition (Chelsea, Leicester, Liverpool, Manchester City, Manchester United)—and a grand total of 6,080 matches played in the Premier League. For each game, 105 different variables were registered, where the majority are the odds of different bookmakers. Afterward, we cleaned the database and kept only the following information.

Season: This identifies the official name of the tournament when the teams met.
Matchday: This corresponds to one of the 38 matchdays that are played throughout the calendar of the season.
Date: The day and month in which the match was played.
Host: This bit of information corresponds to the team that plays in its own stadium.
Guest: This stands for the name of the team that plays a game away from home.
Goals for host/guest: This corresponds to the goals that are scored by the local/visiting team in each match.
Points home/guest: This is the number of points obtained by the local/visiting team according to the outcome of the match.
Goals for host/guest: This corresponds to the goals that are scored by the home/away team in each match.
Goal difference (GD) host/guest: A difference will be established for the home/away team.

Figure 1 illustrates the information we used for the first matchday in season 2005/2006.

Remark 1.

At first sight, it might seem unnecessary to keep track of the goal difference between the teams. However, Table 1 shows that the correlation of this variable and the position in the final leader board is very high: 92.6%.

Table 1 displays the results for season 2005/2006, and Figure 2 shows the corresponding regression curve.

In order to measure the reliability of the model we present in Section 3 below, we compare our classification with the actual results of each tournament. To this end, we divide the leader board into five groups and say that our model is successful if a given team finishes the competition in the correct group. The groups we consider are as follows.

Champion: This singleton is the first group.
UCL: The second group corresponds to the squads that manage to play the European clubs tournament widely known as the Union of European Football Associations (UEFA) Champions League. This group includes the four best-positioned teams after playing the 38 rounds of the aforementioned championship. That is, the champion and the following three clubs on the leader board.
UEL: The teams that finish the season in fifth and sixth places in the leader board get to play in a second European tournament known as the UEFA European League.
Middle table: This is the largest group. It includes those teams from the seventh to the 17th positions in the final leader board.
Relegation: This last group refers to the teams that lose the category at the end of each season.

Figure 3 illustrates the information we used for seasons 2005/2006-2011/2012, whereas Figure 4 depicts the time series of the points needed to belong to the first four groups. The comparison between the first column of Figure 3 with Figure 2 emphasizes the relevance of the goal difference with respect to the final standings at the end of the season.

Remark 2.

(i)The required number of points to obtain the championship registers a slight downward trend during the first 11 seasons. Actually, in the eleventh, the team that turned on top—Leicester City—bested an initial odd of 5000 to one (cf. [19]). However, between the 2016/2017 and 2019/2020 seasons, the champion attained a historical record of points.

(ii) The score needed to be part of the UEFA Champions League group has shown a fluctuating behavior, with a downward trend in recent years.

(iii) The score needed to avoid relegation has remained almost constant during almost all seasons, except for some tournaments in which the points obtained were very low.

(iv) The group belonging to the UEL has kept an ascending trend with the passing of the first years, to later vary in a notorious way; as happened with the UCL. This highlights how competitive the league is, and how much performance can change from one tournament to the next.

3. The Rating of Football Squads

There exist three main scoring systems in football. The Fédération Internationale de Football Association (FIFA) Ranking evaluates national teams with data from the last four years, updated monthly. The three main points that it used to take into account by 2015 were the number of matches won, the average number of points won in matches in the last twelve months, and the average number of points won in matches prior to the last twelve months. The method was questioned due to the frequent inconsistencies between the position and the results between two opposing factions, as well as the disparities arbitrarily granted by the federation to each corresponding confederation. Unlike the FIFA ranking, the Soccer Power Index measures the performance of each player individually, which will depend on the party he faces. This results in simultaneous equations that will be solved by means of initial conditions and an iterative calculation supported by computational systems. It returns a classified squads from general, offensive, and defensive points of view. Its main problems are the high cost of obtaining the information, along with the fact that the data is not public.

Analogous to the FIFA ranking, the Élö rating system assigns a score to each team, so that it is possible to compare two squads when they match up against each other. The Élö formula is named after the scientist and chessplayer Árpád Élö, who devised a rating algorithm for chessplayers in 1970. This method assumes that along their career, a player’s performance will be normally distributed around an average level. This means that deviations around this level do occur. Applied to a single game, performance is an abstraction that cannot be measured objectively; it depends on the judgment, decisions, and actions of the player in the course of the game. However, it is possible to derive a measurable object concept, the performance ranking, over a sufficiently large number of matches, such as a tournament. This is because the performance of a player in various games does consist of the combination of the average ranking of the competition and the score obtained. The method proved to be so effective, that in 1978, it was implemented by the Fédération Internationale des Échecs (see [3]).

There is a wide variety of methods based on Élö’s effort to forecast the result of a football match (see [11,20] and Élö-Runyan’s classification of all the National squads with a membership to the FIFA). The Élö–Runyan index of a team

(t)

at time n can be computed with the regression curve of its current performance

E_{n}^{(t)}

with respect to its past performance, and that of the squad it faces, say

(- t)

,

E_{n}^{(t)} = E_{n - 1}^{(t)} + K \cdot G (W - {(1 + 10^{\frac{E_{n - 1}^{(- t)} - E_{n - 1}^{(t)}}{400}})}^{- 1}),

(1)

where K is an arbitrary weighting constant whose value depends on the character of the match, and is given by

K : = \{\begin{matrix} 60 for World Cup finals; \\ 50 for continental championship finals and major intercontinental tournaments; \\ 40 for major tournaments and continental or World Cup qualifiers; \\ 30 for all other tournaments; \\ 20 for friendly matches . \end{matrix}

In addition, G is an adjustment parameter given by

G : = \{\begin{matrix} 1 & if the teams tie or there is a difference of one goal, \\ 1.5 & if there is a two - goal difference \\ \frac{11 + d}{8} & for a d - goal difference, with d = 3, 4, \dots \end{matrix}

Finally, the symbol W equals 1 for a win, 0.5 for a draw, and 0 for a loss. Squads with higher Élö–Runyan rating have a higher probability of winning a game than a troupe with a lower Élö–Runyan rating. The probability that team

(t)

defeats team

(- t)

is

p : = {(1 + 10^{\frac{E_{n - 1}^{(- t)} - E_{n - 1}^{(t)}}{400}})}^{- 1} .

(2)

As a consequence, the probability that team

(- t)

defeats team

(t)

is given by

1 - p = {(1 + 10^{\frac{E_{n - 1}^{(t)} - E_{n - 1}^{(- t)}}{400}})}^{- 1} .

After each game, the Élö–Runyan ratings of the squads (1) are updated according to Algorithm 1.

Algorithm 1:Élö–Runyan rating method.

If a team with a higher Élö–Runyan rating wins, only a few points are transferred from the lower-rated cadre. However, if a lower-rated squad wins, then transferred points from a higher-rated player are far greater.

The English Premier League: Élö–Runyan Analysis

We consider the information we retrieved from seasons 2005/2006–2020/2021. We assign all teams a base score of 1500 Élö points (which is an average score and is widely used and accepted for football squads) and use (1) with

K = 30

to the classifications displayed in Figure 5 and Figure 6.

Figure 7 is the Élö–Runyan analog of Figure 4. A quick comparison yields the following.

The score needed to be champion has a negative trend in the last two seasons.
The Élö–Runyan score required to maintain the category has is much more fluctuating than the actual classification (which represented a much more constant score).

Table 2 is a confusion matrix to compare our results with the ones presented in Section 2. The rows represent the instances in an actual class, while the column stand for the instances in a predicted class.

Divide the trace of the matrix by the sum of all its entries to obtain a quotient of 84.37%, an acceptable value.

Table 3 and Table 4 show the different percentages of hits we attained for each season with respect to the real database. The last row stands for the weighted average of hits in each season.

Remark 3.

(i)There are several seasons that had a 100% success rate, but the 2017/2018 season stands out because the difference between first and second place was 19 points, whereas the difference between the UEL group and the average table was of nine points.

(ii) The group with the most successes is the UCL, with a success rate of approximately 91%, and this is due to the advantage that these teams usually have over the rest; mainly in monetary terms.

(iii) The relegation group has a fairly high success rate. The main reason for this is that most of the relegated teams have only been in the first division for one year; that is, they have only played the season they are relegated, because the demand to play in the Premier League and in the second division is quite big.

(iv) The UEL group has the lowest level of success; however, this percentage is reasonable because it only has two places, which are usually much more disputed.

(v) It should be noted that there are two groups in which there is an overlap, where the champion and the first classified in the UCL group coincide, so the Champions League group is made up of four teams and not three.

(vi) Lastly, the champion and midtable groups have confidence levels of 81.25% and 90.34%, respectively. Both percentages are reasonable because there are few seasons in which the champion is defined in the last days. On the other hand, in the middle table group (without counting the teams that are fighting for relegation or for some European competition), they are in the “quiet” zone of the table, which helps to present a high level of reliability.

4. The Rise of Actual Champions

We devote this section to explaining the essentials of Poisson’s method to forecast the result of a football match. We base our presentation in [1,21] and Chapter 1 in [2] and use the information we retrieved on the Premier League for the purpose of illustration. The core of the technique is simple: let

N^{(t)}

be the random variable that represents the number of goals scored by squad

(t)

during a football match against team

(- t)

. Naturally, this random variable depends on the strength of the scoring party and the weakness of its rival.

The main assumption of this model is that

N^{(t)}

follows a Poisson distribution with parameter

\frac{{\overset{˚}{λ}}^{(t)} + {\tilde{λ}}^{(- t)}}{2},

(3)

where

{\overset{˚}{λ}}^{(t)}

is the mean of the goals scored per match by team

(t)

, and

{\tilde{λ}}^{(- t)}

is the mean of the goals received by squad

(- t)

. As is thoroughly covered in [1], this assumption is justified by the convergence in distribution of the Binomial distribution to Poisson’s law when the number of experiments increases ad infinitum as the probability of goal per experiment becomes smaller.

To reflect the effect of playing as hosts and guests, we define the random variables

N_{h}^{(t)}

and

N_{g}^{(t)}

, respectively, and thus, for

N_{h}^{(t)}

, the intensities

{\overset{˚}{λ}}^{(t)}

and

{\tilde{λ}}^{(- t)}

referred to by (3) are replaced by

{\overset{˚}{λ}}_{h}^{(t)}

and

{\tilde{λ}}_{g}^{(- t)}

and represent the mean of the goals per match scored by team

(t)

when they play as hosts and the mean of the goals per match received by squad

(- t)

when they play as guests. Analogously, for

N_{g}^{(t)}

, the intensities

{\overset{˚}{λ}}^{(t)}

and

{\tilde{λ}}^{(- t)}

referred to by (3) are replaced by

{\overset{˚}{λ}}_{g}^{(t)}

and

{\tilde{λ}}_{h}^{(- t)}

and represent the mean of the goals per match scored by the troupe

(t)

when they play as guests and the mean of the goals per match received by the crew

(- t)

when they play as hosts.

Next, it suffices to generate a realization of the random variables

N_{h}^{(t)}

and

N_{g}^{(- t)}

to simulate the result of the pairing

(t)

versus

(- t)

. To this end, it is possible to use the cumulative product algorithm presented in, for instance, Section 4.2 in [22] and Algorithm 1 in [1].

The procedure we just described enables us to simulate one result of the direct match between

(t)

and

(- t)

. However, one simulation is by no means sufficient to forecast the result. We can use the weak law of large numbers (see Chapters 7.4 and 7.5 in [23], Chapter 8 in [24], Chapter 8.4 in [25], Chapter 8 in [26,27] and Teorema VI.1 in [1]) to simulate the result of the match between these teams a large number of times. Next, we compute the proportion of times that each outcome is observed. The weak law of large numbers allows us to interpret these numbers as the probabilities that

(t)

wins,

(- t)

wins, and the match ends up tied. This technique is widely known as the Monte Carlo simulation technique (see, for instance, Exercise 5.6 in [28]). If we multiply the former two of these probabilities by three, the latter by one, and add the result to the points earned by each of the squads, we will obtain an expected distribution of the points disputed in that match.

The final step is to simulate each match in the competition by means of this procedure to produce the final standings of the tournament.

The English Premier League: Poisson Analysis

Table 5 displays the resulting leader board of the 2021/2022 season of the English Premier League when we use the procedure we just described with the information from that season. In this simulation, we used 10,000 iterations of the tournament.

The weighted average percentage rate of accuracy in this exercise rises to 79.99%. This is not a bad result, however, as we have previously stated, the model we use heavily relies on the assumption that the distribution of scored and received goals follows Poisson’s law. This means that the intensity described by (3) remains constant throughout the match. This condition is not uniformly met, and is reflected in the discrepancies between the simulated number of points, and the actual number of points obtained by each team.

A plausible alternative is to keep the assumption that the distribution of goals is Poisson’s with mean

Λ

, and suppose further that such

Λ

is a random variable itself with a suitable distribution, for instance, a Type III-Pearson’s distribution (i.e., a Gamma random variable). This approach is very standard for actuarial practitioners modeling the number of claims in insurance (see, for instance, Example 12.3.1 in [29] and [30] pp. 30–33). We borrow the next result from Chapter 3 in [30] and include it here for the sake of self-completeness of our presentation.

Theorem 1.

Let

[N Λ]

follow Poisson’s law with mean Λ, and Λ be a random variable with a

Γ (α, β)

-distribution, where

α > 0

is a shape parameter; and

β > 0

, a rate parameter. That is, Λ has a density function given by

f_{Λ} (λ) = \frac{β^{α}}{Γ (α)} λ^{α - 1} e^{- β λ},

for

λ > 0

, where

Γ (α) : = \int_{0}^{\infty} t^{α - 1} e^{- t} d t

. Then,

P (N = n) = \frac{Γ (n + α)}{n! Γ (α)} {(\frac{β}{1 + β})}^{α} {(\frac{1}{1 + β})}^{n},

(4)

for

n = 0, 1, 2, \dots

The random variable N referred to by (4) in the above result follows the so-called negative binomial distribution with mean and variance given by

\begin{matrix} m & = & \frac{α}{β}, \end{matrix}

(5)

\begin{matrix} s^{2} & = & \frac{α}{β} (1 + \frac{1}{β}), \end{matrix}

(6)

respectively. Note, in particular, that the variance is larger than the mean. This property holds in general for all mixtures of Poisson random variables. We will profit from it in the next section.

Proof of Theorem 1.

The theorem of total probability yields

\begin{matrix} P (N = n) & = & \int_{0}^{\infty} P (N = n Λ = λ) f_{Λ} (λ) d λ \\ = & \int_{0}^{\infty} \frac{λ^{n}}{n!} e^{- λ} \frac{β^{α}}{Γ (α)} λ^{α - 1} e^{- β λ} d λ \\ = & \frac{β^{α}}{Γ (α) n!} \int_{0}^{\infty} λ^{n + α - 1} e^{- λ (1 + β)} d λ . \end{matrix}

Writing

z : = λ (1 + β)

(y

d z = (1 + β) d λ

), we have

\begin{matrix} P (N = n) & = & \frac{β^{α}}{n! Γ (α)} \int_{0}^{\infty} {(\frac{z}{1 + β})}^{n + α - 1} e^{- z} \frac{1}{1 + β} d z \\ = & \frac{β^{α}}{n! Γ (α)} {(\frac{1}{1 + β})}^{n + α} \int_{0}^{\infty} z^{n + α - 1} e^{- z} d z \\ = & \frac{β^{α}}{n! Γ (α)} {(\frac{1}{1 + β})}^{n + α} Γ (n + α) . \end{matrix}

Rearranging this expression yields (4). This completes the proof. □

In the next section, we will combine the schemas we have presented so far. However, we will test how well the Poisson random variable fits the number of goals scored and received by the opponents, and we will replace it when a test of hypothesis rejects it with the negative binomial (resp. binomial) distribution when the variance is larger (resp. smaller) than the mean. This approach will allow us to consider random changes in the intensities of the performance of the squads.

5. Test of the Élö–Runyan–Poisson–Pearson Model

The methods we presented in Section 3 and Section 4 can confidently be used for classifying football factions. Actually, the Élö–Runyan method has been extensively used for forecasting the outcome of any particular match between crews in the same league, whereas the Poisson method has been used to forecast the score of matches between teams that do not necessarily face each other with regularity. In this section, we propose a modification of the Poisson method so as to try to forecast the result of the upcoming World Cup in Qatar 2022. To this end, we develop the methodology in the following subsection, test it in the results of the last World Cup, and finalize by giving a forecast of Qatar’s outcome in the last subsection.

5.1. The Élö–Runyan–Poisson–Pearson Model

To begin with, it is mandatory to rely on a sufficiently large database with the results of each team. The idea is to keep track of the number of goals scored and received by the squads as hosts and guests for a reasonable period. Then, we compare the mean to the variance and run a goodness-of-fit test to decide what random variable should be used to simulate each random variable.

Next, it is necessary to estimate the parameters of the chosen distribution. Then, for each match, we proceed to generate the number of goals

M_{h}^{(t)}

scored by team

(t)

when playing as hosts against team

(- t)

by means of the following tuned version of (3),

M_{h}^{(t)} = ⌊p ({\overset{˚}{N}}_{h}^{(t)} + {\tilde{N}}_{g}^{(- t)})⌋,

(7)

where

{\overset{˚}{N}}_{h}^{(t)}

stands for the number of goals scored by team

(t)

as hosts (regardless of the performance of the team

(- t)

),

{\tilde{N}}_{g}^{(- t)}

represents the number of goals received by team

(- t)

when playing as guests (regardless of the performance of the team

(t)

), and p is the Élö–Runyan probability that the squad

(t)

bests squad

(- t)

according to (2).

Analogously, we generate the number of goals

M_{g}^{(- t)}

scored by team

(- t)

when playing as guests against team

(t)

,

M_{g}^{(- t)} = ⌊(1 - p) ({\overset{˚}{N}}_{g}^{(- t)} + {\tilde{N}}_{h}^{(t)})⌋,

(8)

where

{\overset{˚}{N}}_{g}^{(- t)}

stands for the number of goals scored by team

(- t)

as guests (regardless of the performance of the team

(t)

) and

{\tilde{N}}_{h}^{(t)}

represents the number of goals received by team

(t)

when playing as hosts (regardless of the performance of the team

(t)

). Iterating (7) and (8) and using the weak law of large numbers yields a forecast of the match between teams

(t)

and

(- t)

.

It should be noted as well that a fundamental difference between (7) and (8) and (3) is the fact that there are no a priori equal weighting factors for each of the parameters of the random variables to be generated. Indeed, in (3), it is just as likely to score a goal due to the party’s own strength than to the other’s squad weakness. In spite of this, in (7) and (8), we substitute the original

1 / 2

factors by the Élö–Runyan probabilities given by (2). This measure enables us to account for the strength of the team which has a better Élö–Runyan rating.

We set the null hypothesis that the distribution of the goals is Poisson, and distinguish three cases.

(i): In the case that we fail to reject this hypothesis, we take advantage of the fact that the Poisson mass function can be recursively written as

$P (N = n + 1) = \frac{λ}{n + 1} P (N = n),$

(9)

for $λ > 0$ , and $n = 0, 1, \dots$ , with the initial condition $P (N = 0) = e^{- λ}$ , and mirror this fact by means of Algorithm 2 (which we borrowed from Chapter 4.2 in [22]) to simulate the number of goals.
(ii): If we reject the null hypothesis that the distribution of goals is Poisson, and the variance of the goals is lower than the mean, we use a binomial distribution with mean $\bar{x}$ . In this case, we refer to Section II in [1] to interpret $\frac{\bar{x}}{90}$ as the probability that a goal is scored at any of the 90 min in a match and take 90 as the total number of trials a team will have to do it. We profit from the fact that the binomial mass with parameters 90 and $\frac{\bar{x}}{90}$ can be recursively written as

$P (N = n + 1) = \frac{90 - n}{n + 1} \frac{\bar{x}}{90 - \bar{x}} P (N = n),$

(10)

for $n = 0, 1, \dots, 90$ , with initial condition $P (N = 0) = {(1 - \frac{\bar{x}}{90})}^{90}$ , and use Algorithm 3 (which we take from Chapter 4.3 in [22]) to simulate binomial random variables.

Algorithm 2: Generation of a Poisson realization.

Algorithm 3: Generation of a binomial realization.

(iii): If we reject the null hypothesis that the distribution of goals is Poisson, and the variance is larger than the mean, we take advantage of the fact that the negative binomial random variable can be recursively written as

$P (N = n + 1) = \frac{n + α}{(n + 1) (1 + β)} P (N = n),$

(11)

for $n = 0, 1, \dots$ with the initial condition $P (N = 0) = {(\frac{β}{1 + β})}^{α}$ , use the method of moments (see, e.g., Chapter VII.2.1 in [31]) and (5) and (6) to obtain the estimates of the parameters

$\begin{matrix} \hat{α} & = & \frac{{\bar{x}}^{2}}{s^{2} - \bar{x}}, \end{matrix}$

(12)

$\begin{matrix} \hat{β} & = & \frac{\bar{x}}{s^{2} - \hat{x}}, \end{matrix}$

(13)

and mirror these facts by means of Algorithm 4 to simulate negative binomial random variables.

Algorithm 4: Generation of a negative binomial realization.

The general Élö–Runyan–Poisson–Pearson model to forecast the score in a direct match between two football teams is as in Algorithm 5.

Algorithm 5: The Élö–Runyan–Poisson–Pearson model.

We complete this subsection with a few comments on Algorithm 5.

Remark 4.

(a)A plausible interpretation of (3) is that the number of goals scored by a team is due to the offensive power of the the team itself, and to the defensive weakness of its rival. Algorithm 5 emphasizes this assumption by acknowledging the performance of a team by updating the Élö–Runyan ratings after it has simulated a match. We develop the potential of this feature later in Algorithms 6 and 7 below.

(b) The random variables (7) and (8) are floor functions of convex linear combinations of discrete-type random variables. This ensures that we maintain the simulated values in the realm of integral numbers for both squads.

(c) Because it is very difficult for the actual data to give a mean equal to its variance, we consider the use of the goodness-of-fit test in all three cases. This allows these two statistics to be statistically equivalent.

(d) For the case where the variance is lower than the mean, we follow the approach devised in [1] and Chapter 1 in [2] and chose 90 and

\bar{x} / 90

as the parameters of the corresponding binomial random variable. Another possibility would be to estimate the parameters by means of another method.

(e) The estimates (12) and (13) correspond to the standard when it comes to using negative binomial random variables in the insurance context. However, this is not the only alternative. It is also possible to use the method of maximum likelihood to this end. In this case, we obtain the system of equations

\frac{β}{1 + β} - \frac{T α}{T α + \sum_{i = 1}^{T} y_{i}} = 0,

\begin{matrix} T ln (\frac{T α}{T α + \sum_{i = 1}^{T} y_{i}}) - T (\frac{\int_{0}^{\infty} t^{α - 1} e^{- α} ln t d t}{Γ (α)} - 1) \\ + \sum_{i = 1}^{T} (\frac{\int_{0}^{\infty} t^{α + y_{i} - 1} e^{- α - y_{i}} ln t d t}{Γ (α + y_{i})} - 1) = 0, \end{matrix}

where

y_{1}, y_{2}, \dots, y_{T}

is the sample of goals in each match. These expressions are solvable for α and β via (for example) Newton–Raphson’s method.

(f) Algorithms 2–4 profit from the widely known recursion formulas (9)–(11) and their corresponding initial conditions. However, it turns out that this is possible only if the mass function of N is such that there exist constants a and b such that

P (N = n) = (a + \frac{b}{n}) P (N = n - 1) f o r n = 1, 2, \dots

Surprisingly enough, there are no other random variables that meet this condition (see Theorem 17.3 in [32]). This fact explains our decision to stick to these particular random variables.

5.2. Russia 2018

Every four years, the representatives of all six confederations that compose FIFA gather in the most important tournament for any sport on the face of the Earth: the FIFA World Cup. The participating confederations are

the Asian Football Confederation (AFC);
the UEFA;
the Confédération Africaine de Football (CAF);
the Confederation of North, Central American and Caribbean Association Football (CONACAF);
the Confederação Sul-Americana de Futebol (CONMEBOL); and
the Oceania Football Confederation.

The current format of the FIFA World Cup comprises two phases. In the first round, the 32 invited teams are divided into eight groups and in each group, they play three matches among themselves in a round robin mode. Afterward, the best two factions of each sector move on to a knockout round of four matches.

We display the names and the Élö–Runyan rating (computed with Algorithm 1) of the teams that played in the Russia 2018 World Cup according to the confederation to which each belongs. We obtained the Élö–Runyan ratings by analyzing the information of all the matches of the teams listed above in the period from 14 July 2014, until 13 June 2018 (that is, right after Brazil’s World Cup, and the beginning of Russia’s World Cup). We applied Algorithm 1 with a uniform

K = 20

for all the matches to amount for the fact that, during COVID-19 pandemic, access to some of the strongest teams around the world became restricted and was limited. The initial Élö–Runyan rating we used for all the teams was 1500.

UEFA: Belgium (1633.222), Croatia (1724.443), Denmark (1564.427), England (1604.612), France (1742.171), Germany (1426.583), Iceland (1550.110), Poland (1456.364), Portugal (1516.734), Russia (1577.244), Serbia (1485.873), Spain (1543.799), Sweden (1557.115), and Switzerland (1581.902).
CONMEBOL: Argentina (1558.131), Brazil (1635.239), Colombia (1626.491), Peru (1521.952), and Uruguay (1646.054).
CONCACAF: Costa Rica (1422.986), Mexico (1485.310), and Panama (1412.645).
CAF: Egypt (1377.147), Morocco (1374.678), Nigeria (1384.316), Senegal (1377.673), and Tunisia (1371.521).
AFC: Australia (1447.447), Iran (1458.788), Japan (1513.472), Saudi Arabia (1453.555), and South Korea (1443.991).

We devote this subsection to replicating the Russian edition of the World Cup so as to measure the accuracy of the method we proposed in Section 5.1. It is worth mentioning that, despite the fact that, in practice, there is only one host team (the organizer), we have decided to keep the host–guest format because it amounts to the performance of the squads during the tournament.

The model we use for the first phase of the tournament is essentially the same one we have used to produce Table 5. However, we applied Algorithm 5 (with

K = 60

, for this reflects the fact that we intend to simulate a World Cup) with the corresponding rates listed above to forecast one outcome of the first match, and then we used the Monte Carlo simulation technique to compute averages of the Élö–Runyan ratings of teams

(t)

and

(- t)

and

the proportion of times that $(t)$ wins,
the proportion of times that $(- t)$ wins, and
the proportion of times when the match ends up tied.

Then, we multiply the former two of these probabilities by three, the latter by one, and add the result to the points earned by each of the squads. This procedure yields an expected distribution of the points disputed in each match. Algorithm 6 summarizes this. The start date of the World Cup is assumed to be n.

As for the knockout round, we ran all T = 10,000 simulations of each match and chose the team with the largest probability of winning as the one who stayed in the competition.

Table 6, Table 7, Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13 display the initial arrangement of the teams in the tournament, along with the mean and variance of the goals they scored and received in the intercup period in all the matches they played. We made the decision on the random variable used for simulating the goals scored and received running a goodness-of-fit test with the null hypothesis that the distribution was Poisson. Should we fail to reject this null hypothesis, we would choose a Poisson random variable (see Algorithm 2). Otherwise, we selected a binomial random variable (see Algorithm 3) if the mean was greater than the variance; and a negative binomial distribution with parameters (12) and (13) (see Algorithm 4) in the remaining case.

Algorithm 6: Results of the first phase for a Word Cup group.

We initialized Algorithm 6 with the Élö–Runyan ratings we listed above and set T = 10,000 to obtain the results of this phase. The results are as in Table 14. To measure the accuracy of our simulations, we have opted for a format which is analogous to that of Table 5.

The knockout stage of the FIFA World Cup is the second and final stage of the competition, following the 48 matches of the group stage. It is subdivided into four substages: round of 16, quarterfinals, semifinals, and a final match. The teams that survive the first phase line up according to Figure 8 and start the second phase.

To simulate each match in this phase, we produced Algorithm 7, which is a variant of Algorithm 6. Note that the main difference between the former respective to the latter is that it does not assume the presence of four teams, and therefore, does not make the allocation of points from lines 1–2 and 15. Most of all, it does not allow more than one troupe in the next phase. Also, observe that Algorithm 7 breaks ties up via a coin flip (see lines 14–18). Hence mirroring the popular voice that says: “Penalty shoot-outs are basically crap shoots”.

Algorithm 7: Result of a second-phase match in a World Cup group.

Remark 5.

One might argue that, in spite of the fact that Algorithms 6 and 7 rely on the update of the Élö–Runyan ratings (by calling Algorithm 5), they do not update the data of the scored and received goals so as to verify the distribution of these random variables in the next iterations of Algorithm 5. However, we assume that, because Algorithm 5 is producing simulations according to the original distributions of such random variables, they do not change as the tournament evolves. That is, they are realizations of a stationary stochastic process.

Applying Algorithm 7 with T = 10,000, the Élö–Runyan ratings for both squads and the historical data of the goals scored and received by each pair of teams, we predict the outcomes of the second phase in Russia 2018 World Cup. The Table 15, Table 16, Table 17 and Table 18 establish the comparison.

According to the results displayed in Table 14, Table 15, Table 16, Table 17 and Table 18, Algorithms 6 and 7 have a combined accuracy of 87.09%.

5.3. Qatar 2022

We display the names and Élö–Runyan ratings of the 32 countries participating in the World Cup of Qatar 2022 according to their confederation of origin.

UEFA: Belgium (1662), Croatia (1571), Denmark (1659), England (1649), France (1659), Germany (1615), Netherlands (1711), Poland (1559), Serbia (1619), Spain (1645), Switzerland (1592), Portugal (1633), and Wales (1504).
AFC: Australia (1574), Iran (1621), Japan (1693), Qatar (1552), Saudi Arabia (1558), and South Korea (1652).
CAF: Cameroon (1552), Ghana (1512), Morocco (1675), Senegal (1636), and Tunisia (1584).
CONCACAF: Canada (1613), Costa Rica (1545), Mexico (1573), and the United States of America (1638).
CONMEBOL: Argentina (1732), Brazil (1760), Ecuador (1552), and Uruguay (1574).

Table 19, Table 20, Table 21, Table 22, Table 23, Table 24, Table 25 and Table 26 display the initial arrangement of the teams in the tournament, along with the mean and variance of the goals they scored and received in the intercup period. This distribution became public on 1 April 2022 at the FIFA Congress 2022, celebrated in the city of Doha, Qatar. These tables also display the random variable used for simulating the goals scored and received by comparing the mean to the variance and running a goodness-of-fit test with the null hypothesis that the distribution was Poisson. It is remarkable that most of the random variables were modeled with Poisson distributions; some of them, with a negative binomial random variable, while only two were modeled after a binomial random variable (see Table 23).

5.3.1. First Phase

We applied our simulation Algorithm 6 with T = 10,000 times and obtained the leader boards displayed in Figure 9.

5.3.2. Second Phase

We iteratively applied Algorithm 7 with T = 10,000, the Élö–Runyan ratings for both squads and the stationary trajectories of the goals scored and received by each pair of teams, Figure 10 forecasts the outcomes of the second phase in Qatar 2022 World Cup starting from the Round of 16. The percentages below the name of the teams are the predicted proportion of times that each team beats its rival.

We forecast that Brazil will end up rising as world champions, because its Élö–Runyan rating will be of 1840.616153, against Belgium’s 1741.27328. However, we should keep in mind that mathematics will not play football, men will.

6. Aftermatch

This work represents a follow-up of a lot of references dealing with the Élö–Runyan rating system for football, and of Poisson’s method to forecast a football match (see [1]). We illustrated each of these methods in the English Premier League in Section 3 and Section 4, respectively. We have also presented Algorithm 5, a technique that extends Poisson methodology to forecast the result of a football match under the assumption that the historical data of the goals scored and received by a cadre are stationary (see Assumption 1 and Remark 5).

Our main contribution is the incorporation of the Élö–Runyan rating system to Poisson’s method to account for the strength of a team in an ongoing tournament while emphasizing the assumption that the goals scored by a team is due to both the offensive power of the the team itself and the defensive weakness of its rival. Another novelty included in Algorithm 5 is the inclusion of a goodness-of-fit test to decide if the Poisson random variable is a suitable option to simulate the goals scored and received by each opponent, or not. In the latter case we compare the mean to the variance of the historic data, and use a binomial or a negative binomial for the simulations. Because the mass functions of these random are likely to be written in a recursive way, they represent the standard used by actuaries modelling claims frequency in, for instance, car insurance. Moreover, we have taken a chance to apply Algorithm 5 to the upcoming World Cup Qatar 2022 before it officially starts (see Algorithms 6 and 7).

A plausible extension of this work is to loosen the assumption that, should the Poisson distribution fail to fit the historic data, then it is necessary to choose between the binomial or negative binomial random variables for the simulations. In this case, it would be possible to fit a mixture of Poisson random variables (when the variance is larger than the mean), or even an empirical distribution. Another area of opportunity lies in the use of stationary random processes (see Assumption 1(i)), for one could withdraw this hypothesis and reflect it in our algorithms.

Author Contributions

Conceptualization, methodology, original draft preparation, editing, funding acquisition, supervision, and project administration are due to J.D.L.-B. Software, data validation, formal analysis, data curation, and visualization are due to D.A.Z.-N., E.X.H.-P. and Y.E.-B. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by Universidad Anáhuac México.

Data Availability Statement

All the codes and data can be found in: https://github.com/DonDisparates/qatar2022.git, accessed on 13 November 2022.

Acknowledgments

We sincerely thank the contribution of the following people: Ingrid Bárcenas-Hernández, José Antonio Contró-Sánchez, Aldo Martínez-Arias, José Manuel Mendoza-Madrid, Dafne Mendoza-Ramos, Michelle Mondragón-Sainz, and Sara Ortega-Ricarte. All of them worked very hard retrieving data from several websites so that the authors could use the information in this research paper. We also thank the beautiful Figure 9 and Figure 10 designed by Priscilla Camargo–Bacha, Graphic Designer of the Faculty of Actuarial Sciences at Universidad Anáhuac México. Last, but not least, we sincerely thank all three anonymous expert reviewers appointed by Preda for their thorough revision of the early version of our work. We sincerely believe their comments enhanced the presentation and accuracy or this research paper.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

López-Barrientos, J.D.; Silva, E.; Lemus-Rodríguez, E. Captain Tsubasa II: El Surgimiento de Campeones Virtuales. In Investigación Estocástica y Estadística en la Educación; Velasco-Luna, F., Ed.; Benemérita Universidad Autónoma de Puebla: Puebla, Mexico, 2022; Chapter 8; pp. 120–139. [Google Scholar]
Sumpter, D. Soccermatics: Mathematical Adventures in the Beautiful Game; Bloomsbury Sigma: London, UK, 2017. [Google Scholar]
Élö, A.E. The Rating of Chessplayers. Past and Present; Arco Publishing Incorporated: New York, NY, USA, 1986. [Google Scholar]
Dixon, M.; Coles, S. Modelling Association football scores and inefficiencies in the football betting market. J. R. Stat. Soc. Ser. C 1997, 46, 265–280. [Google Scholar] [CrossRef]
Chalikias, M.; Kossieri, E.; Lalou, P. Football matches: Decision-making in betting. Teach. Stat. 2020, 42, 4–9. [Google Scholar] [CrossRef]
Fedrizzi, G.; Canal, L.; Micciolo, R. UEFA EURO 2020: An exciting match between football and probability. Teach. Stat. 2022, 44, 119–122. [Google Scholar] [CrossRef]
Koopman, J.; Lit, R. A dynamic bivariate Poisson model for analysing and forecasting match results in the English Premier League. J. R. Stat. Soc. Ser. A 2015, 178, 167–186. [Google Scholar] [CrossRef] [Green Version]
Saraivaa, E.F.; Suzuki, A.K.; Filhob, C.A.; Louzadab, F. Predicting football scores via Poisson regression model: Applications to the National Football League. Commun. Stat. Appl. Methods 2016, 23, 297–319. [Google Scholar] [CrossRef] [Green Version]
Wheatcroft, E. Forecasting football matches by predicting match statistics. J. Sport. Anal. 2021, 7, 77–97. [Google Scholar] [CrossRef]
Ötting, M.; Groll, A. A regularized hidden Markov model for analyzing the ‘hot shoe’ in football. Stat. Model. 2021, 22, 6. [Google Scholar] [CrossRef]
Lyons, K. What Are the World Football Elo Ratings? The Conversation. 11 June 2014. Available online: https://theconversation.com/what-are-the-world-football-elo-ratings-27851 (accessed on 13 November 2022).
Dixon, M.; Robinson, M. A birth process model for association football matches. Statistician 1998, 47, 523–538. [Google Scholar] [CrossRef]
Fisher, R. The significance of deviations from expectations in a Poisson series. Biometrics 1950, 6, 17–24. [Google Scholar] [CrossRef] [Green Version]
Rue, H.; Salvesen, Ø. Prediction and retrospective analysis of soccer matches in a league. J. R. Stat. Soc. Ser. D 2000, 49, 339–418. [Google Scholar] [CrossRef]
Stefani, R.; Pollard, R. Football Rating Systems for Top-Level Competition: A Critical Survey. J. Quant. Anal. Sport. 2007, 3, 3. [Google Scholar] [CrossRef]
Forbes Staff. Barcelona Supera al Real Madrid y se Convierte en el Club de Futbol Más Valioso. Forbes México. 12 April 2021. Available online: https://www.forbes.com.mx/barcelona-real-madrid-club-mas-valioso-del-mundo/ (accessed on 13 November 2022).
Vázquez, F. Descenso en Premier League, Una Lucha por US250 mMillones. El Economista. 21 June 2020. Available online: https://www.eleconomista.com.mx/deportes/Descenso-en-Premier-League-una-lucha-por-US250-millones-20200621-0049.html (accessed on 13 November 2022).
Luca. Historia de la Premier League: Orígenes, Leyendas y Curiosidades. RetroFootball. 1 September 2020. Available online: https://www.retrofootball.es/retroblog/historia-de-la-premier-league-origenes-leyendas-y-curiosidades/ (accessed on 13 November 2022).
Dias, S. Leicester City Defy 5000-1 Odds to Become Premier League Champions. Sports Lumo. 2 May 2016. Available online: https://sportslumo.com/football/may-2-2016-leicester-city-defy-5000-1-odds-to-become-premier-league-champions/#:~:text=Leicester%20City%20defy%205000%2D1 (accessed on 13 November 2022).
Goddard, J. Regression models for forecasting goals and match results in association football. Int. J. Forecast. 2005, 21, 331–340. [Google Scholar] [CrossRef]
Chu-Chun-Lin, S. Rendez-vous of the Poisson and exponential distributions at the World Cup of Soccer. Teach. Stat. 1999, 21, 60–62. [Google Scholar] [CrossRef]
Ross, S.M. Simulation; Academic Press Inc.: Cambridge, MA, USA, 1997. [Google Scholar]
Grimmett, G.; Stirzaker, D. Probability and Random Processes; Oxford Science Publications: Oxford, UK, 1994. [Google Scholar]
Grinstead, C.M.; Snell, J.L. Introduction to Probability; American Mathematical Society: Providence, RI, USA, 1997. [Google Scholar]
Hoel, P.G.; Port, S.C.; Stone, C.J. Introduction to Stochastic Processes; Houghton Miffley Company: Boston, MA, USA, 1972. [Google Scholar]
Isaac, R. The Pleasures of Probability; Springer: New York, NY, USA, 1995. [Google Scholar]
Lanier, D.; Trotoux, D. La loi des grands nombres, le théorème de De Moivre-Laplace. In Contribution à une Approche Historique de l’Enseignement des Mathématiques; Evelyne, B., Claude, M., François, C., Eds.; Presses Universitaires de Franche-Comté: Besançon, France, 1996; pp. 259–294. [Google Scholar]
Durrett, R. Probability. Theory and Examples; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
Bowers, N.L., Jr.; Gerber, H.U.; Hickman, J.C.; Jones, D.A.; Nesbitt, C.J. Actuarial Mathematics; The Society of Actuaries: Schaumburg, IL, USA, 1997. [Google Scholar]
Lemaire, J. Bonus-Malus Systems in Automobile Insurance; Springer Science+Business Media LLC.: New York, NY, USA, 1995. [Google Scholar]
Mood, A.M.; Graybill, F.A.; Boes, D.C. Introduction to the Theory of Statistics; McGraw-Hill: New York, NY, USA, 1974. [Google Scholar]
Promislow, D. Fundamentals of Actuarial Mathematics; John Wiley and Sons Ltd.: London, UK, 2011. [Google Scholar]

Figure 1. Example of data presentation.

Figure 2. Regression curve between the obtained points and the goal difference at the ending of season 2005/2006.

Figure 3. Leader boards of the first seven seasons in our study.

Figure 4. Evolution of the number of points needed to belong to each of the first four groups.

Figure 5. Final standings obtained by means of (1) for seasons 2005/2006–2012/2013.

Figure 6. Final standings obtained by means of (1) for seasons 2013/2014–2020/2021.

Figure 7. Evolution of the number of points needed to belong to each of the first four groups according to Algorithm 1.

Figure 8. Schematic of the final phase.

Figure 9. Final standings for the first phase.

Figure 10. The arrangement mirrors that of Figure 8.

Table 1. Analysis of variance.

Source		DF	Adj. Sum Sq.	Adj. Mean Sq.	F Value	p-Value
Regression	GD	1	6118.31	6118.31	238.25	0.000
Error		18	462.24	25.68
	Lack of adj.	16	385.74	24.11	0.63	0.765
	Pure error	2	76.50	38.25
Total		19	6580.55

Table 2. This confusion matrix shows the accuracy of our back-cast.

	Champion	UCL	UEL	Middle Table	Relegation
Champion	13	3	0	0	0
UCL	3	39	5	1	0
UEL	0	5	18	9	0
Middle table	0	1	9	159	7
Relegation	0	0	0	7	41

Table 3. Comparison of hits of Algorithm 1 with respect to the actual results of of seasons 2005/2006–2012/2013.

GROUP	2005/2006	2006/2007	2007/2008	2008/2009	2009/2010	2010/2011	2011/2012	2012/2013
Champion	0%	100%	0%	100%	0%	100%	100%	100%
UCL	100%	100%	100%	100%	75%	75%	75%	75%
UEL	50%	100%	50%	50%	0%	0%	50%	0%
Midtable	81.82%	90.91%	90.91%	72.73%	90.91%	90.91%	90.91%	90.91%
Relegation	66.67%	66.67%	100%	33.33%	100%	100%	100%	100%
TOTAL	75.00%	90.00%	85.00%	70.00%	76.25%	81.25%	86.25%	81.25%

Table 4. Comparison of hits of Algorithm 1 with respect to the actual results of seasons 2013/2014–2020/2021.

GROUP	2013/2014	2014/2015	2015/2016	2016/2017	2017/2018	2018/2019	2019/2020	2020/2021
Champion	100%	100%	100%	100%	100%	100%	100%	100%
UCL	100%	100%	75%	100%	100%	75%	100%	100%
UEL	100%	100%	50%	100%	100%	50%	50%	50%
Midtable	100%	90.91%	81.82%	100%	100%	90.91%	90.91%	90.91%
Relegation	100%	66.67%	66.67%	100%	100%	66.67%	100%	100%
TOTAL	100.00%	90.00%	76.25%	100.00%	100.00%	81.25%	90.00%	90.00%

Table 5. Leader board comparison.

	Position	Real	Points	Poisson	Points	Accuracy
Champion	1	Manchester City	93	Manchester City	76.59	100%
UCL	2	Liverpool	92	Liverpool	75.46	100%
	3	Chelsea	74	Chelsea	67.47
	4	Tottenham	71	Tottenham	62.50
UEL	5	Arsenal	69	Arsenal	57.06	50%
UEL	6	Manchester United	58	West Ham	55.46	50%
middle table	7	West Ham	56	Leicester City	53.87	81.81%
	8	Leicester City	52	Crystal Palace	53.85
	9	Brighton	51	Manchester United	52.58
	10	Wolverhampton	51	Aston Villa	51.64
	11	Newcastle	49	Brighton	51.49
	12	Crystal Palace	48	Wolverhampton	51.20
	13	Brentford	46	Brentford	49.91
	14	Aston Villa	45	Newcastle	46.28
	15	Southhampton	40	Burnley	45.28
	16	Everton	39	Everton	44.96
	17	Leeds	38	Southhampton	44.80
Relegation	18	Burnley	35	Leeds	40.31	66.67%
	19	Watford	23	Watford	38.01
	20	Norwich City	22	Norwich City	32.11

Table 6. Group A in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Goals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Saudi Arabia	1.88	2.10	Poi	0.85	0.98	Poi	1.70	4.81	Poi	1.70	1.91	Poi
Egypt	1.30	0.99	Poi	0.43	0.33	Poi	0.76	0.53	Poi	1.24	0.77	Poi
Russia	1.52	2.01	Poi	1.32	1.34	Poi	2.00	3.79	Poi	0.80	0.38	Poi
Uruguay	2.33	1.89	Poi	0.78	1.28	Poi	1.09	0.71	Poi	1.00	0.96	Poi

Table 7. Group B in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	var. Goals against as Host	Law	Mean Goals for as Guest	var. Goals for as Guest	Law	Mean Goals against as Guest	var. Goals against as Guest	Law
Spain	3.00	3.47	Poi	0.27	0.20	Poi	1.67	3.82	Poi	0.87	0.78	Poi
Portugal	1.91	2.81	Poi	0.68	0.70	Poi	1.79	2.06	Poi	0.63	0.86	Poi
Iran	2.12	1.75	Poi	0.53	0.72	Poi	1.41	2.01	Poi	0.41	0.60	NB
Morocco	2.19	2.44	Poi	0.48	0.25	Poi	0.62	0.85	Poi	0.46	0.25	Poi

Table 8. Group C in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
France	2.26	1.74	Poi	0.97	1.13	Poi	1.44	1.36	Poi	0.83	0.58	Poi
Denmark	1.70	2.04	Poi	0.74	0.54	Poi	1.25	2.11	Poi	1.20	0.75	Poi
Australia	2.65	3.03	Poi	0.90	1.19	Poi	1.23	1.18	Poi	1.15	0.98	Poi
Peru	1.74	1.06	Poi	0.78	0.95	Poi	1.21	1.38	Poi	1.18	1.08	Poi

Table 9. Group D in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Croatia	2.72	5.31	Poi	0.39	0.24	Poi	1.17	1.81	Poi	1.17	1.72	Poi
Argentina	2.07	3.44	Poi	0.66	0.85	Poi	1.89	4.09	Poi	1.11	2.30	NB
Nigeria	1.58	1.66	Poi	0.83	0.72	Poi	1.11	1.36	Poi	0.84	0.98	Poi
Iceland	1.78	0.95	Poi	1.00	1.11	Poi	1.61	1.56	Poi	1.39	1.62	Poi

Table 10. Group E in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Brazil	1.77	1.54	Poi	0.45	0.34	Poi	2.36	2.55	Poi	0.48	0.41	Poi
Costa Rica	1.48	1.43	Poi	0.56	0.47	Poi	1.10	1.42	Poi	1.57	1.65	Poi
Serbia	1.83	1.69	Poi	1.22	0.84	Poi	1.22	0.84	Poi	1.11	1.32	Poi
Switzerland	1.15	3.71	Poi	1.05	0.71	Poi	0.95	1.20	Poi	2.21	0.79	Poi

Table 11. Group F in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Germany	2.28	2.51	Poi	0.94	1.06	Poi	2.06	5.27	Poi	0.94	0.72	Poi
South Korea	1.24	1.25	Poi	0.91	0.97	Poi	1.04	1.12	Poi	1.40	2.23	NB
Sweden	1.26	3.19	Poi	0.85	0.79	Poi	1.05	1.27	Poi	1.68	0.77	Poi
Mexico	1.60	1.47	Poi	0.69	1.52	Poi	1.57	1.24	Poi	1.48	1.52	Poi

Table 12. Group G in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Belgium	3.33	5.67	Poi	0.67	0.67	Poi	2.47	2.65	Poi	1.07	1.53	Poi
Panama	1.38	1.01	Poi	0.73	0.50	Poi	1.00	1.63	Poi	1.44	2.10	Poi
Tunisia	1.75	0.75	Poi	1.00	0.70	Poi	1.56	2.36	Poi	0.83	0.58	Poi
England	2.19	1.15	Poi	0.50	0.50	Poi	1.80	2.83	Poi	0.53	0.65	Poi

Table 13. Group H in Russia 2018.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Senegal	1.95	1.68	Poi	0.50	0.52	Poi	1.36	0.69	Bin	0.95	1.23	Poi
Japan	2.33	3.17	Poi	1.06	1.33	Poi	1.60	2.24	Poi	0.67	0.49	Poi
Poland	2.40	3.28	Poi	0.96	0.68	Poi	2.31	4.37	Poi	1.15	1.51	Poi
Colombia	1.35	1.23	Poi	0.85	0.73	Poi	1.42	2.24	Poi	0.85	0.98	Poi

Table 14. Group winners comparison.

	Position	Real	Points	Algorithm 6	Estimated Points	Accuracy
Group A	1	Uruguay	9	Uruguay	6.4842	100%
Group A	2	Russia	6	Russia	5.1548	100%
Group B	1	Spain	5	Spain	5.6867	100%
Group B	2	Portugal	5	Portugal	4.8296	100%
Group C	1	France	5	France	7.3876	100%
Group C	2	Denmark	5	Denmark	4.0612	100%
Group D	1	Croatia	9	Croatia	6.3811	100%
Group D	2	Argentina	4	Argentina	4.6111	100%
Group E	1	Brazil	7	Brazil	7.0674	100%
Group E	2	Switzerland	5	Switzerland	4.1768	100%
Group F	1	Sweden	6	Sweden	4.7170	100%
Group F	2	Mexico	6	Mexico	4.3551	100%
Group G	1	Belgium	9	Belgium	7.3368	100%
Group G	2	England	6	England	6.1921	100%
Group H	1	Colombia	6	Colombia	6.0252	100%
Group H	2	Japan	4	Japan	4.7146	100%

Table 15. Quarterfinalists.

Real	Algorithm 7	Accuracy
Uruguay	Uruguay	75%
France	France
Brazil	Brazil
Belgium	Belgium
Russia	Spain
Croatia	Croatia
Sweden	Sweden
England	Colombia

Table 16. Semifinalists.

Real	Algorithm 7	Accuracy
France	France	50%
Belgium	Brazil
Croatia	Croatia
England	Colombia

Table 17. Finalists.

Real	Algorithm 7	Accuracy
France	France	100%
Croatia	Croatia	100%

Table 18. Champion.

Real	Algorithm 7	Accuracy
France	France	100%

Table 19. Group A in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Ecuador	1.81	1.87	Poi	1.81	1.85	Poi	0.90	1.75	Poi	1.24	1.49	Poi
Senegal	1.68	0.94	Poi	0.48	0.41	Poi	1.38	0.48	Poi	0.75	0.91	Poi
Qatar	1.82	2.35	Poi	1.13	1.29	Poi	1.47	2.52	Poi	1.10	1.89	Poi
Netherlands	2.00	2.46	Poi	1.00	0.77	Poi	1.15	2.75	Poi	2.10	1.00	Poi

Table 20. Group B in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
England	1.65	3.00	Poi	1.41	4.12	Poi	1.02	4.14	NB	0.31	0.46	Poi
Iran	1.73	7.24	NB	1.20	3.96	Poi	0.97	3.83	Poi	0.27	0.34	Poi
USA	1.84	3.29	Poi	0.80	1.65	Poi	0.49	1.49	Poi	0.35	0.43	Poi
Wales	1.22	1.34	Poi	1.02	1.28	Poi	0.57	1.20	Poi	0.52	0.90	Poi

Table 21. Group C in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Poland	1.71	2.14	Poi	1.10	0.86	Poi	1.57	2.33	Poi	1.43	2.07	Poi
Mexico	1.61	1.79	NB	0.71	0.92	Poi	0.92	0.99	Poi	1.35	1.69	Poi
Saudi Arabia	1.61	1.79	Poi	0.71	0.92	Poi	0.92	0.99	Poi	1.35	1.69	Poi
Argentina	1.88	1.92	Poi	0.59	0.74	Poi	1.92	1.99	Poi	0.96	2.04	Poi

Table 22. Group D in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Australia	2.23	3.10	Poi	0.38	0.39	Poi	0.74	2.99	Poi	2.11	0.49	Poi
France	2.50	3.61	Poi	0.82	0.72	Poi	1.48	1.30	Poi	0.81	0.63	Poi
Denmark	2.35	3.92	Poi	0.38	0.39	Poi	2.08	2.16	Poi	1.08	1.49	Poi
Tunisia	1.70	2.14	Poi	0.57	0.45	Poi	1.00	1.52	Poi	0.90	1.23	Poi

Table 23. Group E in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Spain	2.68	4.01	Poi	0.52	0.63	Poi	1.80	1.76	Poi	0.96	0.68	Bin
Costa Rica	1.14	0.90	Poi	0.67	0.32	Bin	0.83	0.67	Poi	1.45	0.94	Poi
Germany	2.90	5.47	Poi	1.14	1.08	Poi	1.84	1.71	Poi	1.26	2.19	Poi
Japan	2.13	4.38	Poi	0.84	1.34	NB	2.40	10.24	Poi	0.33	0.36	Poi

Table 24. Group F in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Morocco	1.83	1.63	Poi	0.51	0.65	Poi	1.84	2.24	Poi	0.68	0.95	NB
Croatia	1.81	0.85	Poi	1.23	1.25	Poi	1.45	2.75	Poi	1.40	2.44	Poi
Belgium	3.13	4.69	Poi	0.88	1.03	Poi	2.04	1.78	Poi	1.00	1.48	Poi
Canada	3.42	2.98	Poi	0.42	0.35	Poi	2.28	7.46	NB	0.92	0.43	Poi

Table 25. Group G in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
Cameroon	1.30	1.69	Poi	0.43	0.33	Poi	0.96	1.12	Poi	1.00	1.48	Poi
Brazil	2.17	2.64	Poi	0.39	0.35	Poi	2.17	2.23	Poi	0.35	0.31	Poi
Switzerland	2.20	3.36	Poi	0.68	0.62	Poi	1.50	2.25	Poi	1.55	1.16	Poi
Serbia	2.43	2.24	Poi	1.19	0.82	Poi	1.28	1.08	Poi	1.00	2.20	Poi

Table 26. Group H in Qatar 2022.

Team	Mean Goals for as Host	Var. Goals for as Host	Law	Mean Goals against as Host	Var. Gals against as Host	Law	Mean Goals for as Guest	Var. Goals for as Guest	Law	Mean Goals against as Guest	Var. Goals against as Guest	Law
South Korea	1.88	2.41	Poi	0.64	1.08	Poi	1.87	2.92	Poi	0.67	1.02	NB
Uruguay	1.68	2.64	Poi	0.53	0.57	Poi	1.41	1.01	Poi	1.19	1.82	Poi
Portugal	2.32	3.56	Poi	0.44	0.32	Poi	2.05	2.13	Poi	1.00	1.09	Poi
Ghana	1.29	1.06	Poi	0.64	0.80	Poi	0.63	0.24	Poi	0.84	1.21	Poi

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

López-Barrientos, J.D.; Zayat-Niño, D.A.; Hernández-Prado, E.X.; Estudillo-Bravo, Y. On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches. Mathematics 2022, 10, 4587. https://doi.org/10.3390/math10234587

AMA Style

López-Barrientos JD, Zayat-Niño DA, Hernández-Prado EX, Estudillo-Bravo Y. On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches. Mathematics. 2022; 10(23):4587. https://doi.org/10.3390/math10234587

Chicago/Turabian Style

López-Barrientos, José Daniel, Damián Alejandro Zayat-Niño, Eric Xavier Hernández-Prado, and Yolanda Estudillo-Bravo. 2022. "On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches" Mathematics 10, no. 23: 4587. https://doi.org/10.3390/math10234587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Élö–Runyan–Poisson–Pearson Method to Forecast Football Matches

Abstract

1. Introduction

1.1. State of the Art and Contribution

1.2. Computational Resources

1.3. Organization of the Research

2. The English Premier League: Preliminary Description

3. The Rating of Football Squads

The English Premier League: Élö–Runyan Analysis

4. The Rise of Actual Champions

The English Premier League: Poisson Analysis

5. Test of the Élö–Runyan–Poisson–Pearson Model

5.1. The Élö–Runyan–Poisson–Pearson Model

5.2. Russia 2018

5.3. Qatar 2022

5.3.1. First Phase

5.3.2. Second Phase

6. Aftermatch

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI