**1. Introduction**

Soccer is the most popular sport in Europe [1,2]. The game is played by two teams of 11 players, on a rectangular field with a goal placed at each end. The objective of the game is to score by getting a spherical ball into the opposing goal. Each team includes 10 field players, that can maneuver the ball using any part of the body except hands and arms, and one goalkeeper, who is allowed to touch the ball with the whole body, as long as they stay in their penalty area. Otherwise, the rules of the field players apply. The match has two periods of 45 minutes each. The winning team is the one that scores more goals by the end of the match.

In most European countries, soccer competitions are organized hierarchically in leagues composed by groups of teams. At the end of each season, a promotion and relegation system decides which teams move up and down into the hierarchy. In a given league and season each pair of teams plays to matches, so that the visited and visitor interchange place. All teams start with zero points and, at every round, one {victory, draw, defeat} is worth {3, 1, 0} points. By the end of the last round, the team that accumulated more points is the champion [3].

This paper studies the dynamical performance of soccer teams in a given league. The modeling perspective adopts the concepts of fractional calculus [4,5] and power law (PL) [6]. These tools have been used to model dynamical systems with memory in mechanics [7], electromagnetism [8], biology [9], sports [10], economy [11], and others [12]. The proposed approach embeds implicitly details such as the behavior of players and coaches, strategical and tactical maneuvers during the matches, errors of referees and a multitude of other effects. The scale of observation addresses the teams behavior in the perspective of their classification along the league. Data characterizing the year 2018–2019 and the four leagues, namely the Spanish 'La Liga', English 'Premiership', Italian 'Serie A' and French 'Ligue 1', are processed and discussed. The computational and mathematical modeling leads to the emergence of patterns that are analyzed and interpreted in the light of fractional dynamics.

In spite of its social and economical importance, the topic of soccer dynamics has been the subject of only a restricted number of studies. In fact, we can consider the study of the dynamic effects at different levels, namely about the evolution of each player along his career, of the evolution of the players within a given match, or the progress of the classification of a group of teams along a given league. Couceiro et al. [13] characterized the predictability and stability levels of players during a soccer match by means of fractional calculus and entropy measures. Silva et al. [14] investigated how different entropy measures can be applied to assess the performance variability and to uncover the interactions underlying the players and teams' performance. Machado and Lopes [15] adopted distinct dissimilarity measures and multidimensional scaling to study the behavior of teams competing within soccer national leagues and related the generated loci with the teams' performance over time. Neuman et al. [16] measured the organization associated with the behavior of a soccer team through the Tsallis entropy of ball passes between the players. They found that the teams' positions at the end of one season were correlated with the teams entropy. Moreover, the entropy score could be used for predicting of the teams' final positions. Lopes and Machado [17] studied the dynamics of national soccer leagues using information and fractional calculus tools. In their approach, an entire soccer league season was treated as a complex system with a state observable at the time of rounds, consisting of the goals scored by the teams. The system behavior was visualized in 3-D maps generated by multidimensional scaling and interpreted based on the emerging clusters.

Predicting the outcome of soccer matches has been investigated since at least the 1960s, due to its interests for the general public, clubs, advertising companies, media, professional odds setters, and researchers [18]. Various statistical techniques have been used, including Poisson models [19], Bayesian schemes [20], rating systems [21], and machine learning methods [22,23], such as kernel-based relational learning [24], among others [25,26]. Advanced machine learning techniques [27,28] may be of relevance and represent an alternative to statistical or analytical approaches. However, dynamical phenomena involving complex memory effects need to be analyzed under the light of modeling tools to better understand phenomena embedded in the data series. Therefore, a synergistic strategy encompassing studies of distinct nature seems the best to consider at the moment of writing this paper.

This paper addresses the dynamics exhibited by several leagues having in mind the evolution of the teams along one season. The adoption of heuristic models, fitting real-world data, and the entropy analysis of statistical information, give a new perspective to a complex system that has been scarcely studied in the scientific literature.

Bearing these ideas in mind, this paper is organized as follows. Section 2 models the behavior of the teams in four top European soccer leagues by means of different functions. Section 3 analyzes the leagues in the perspective of the entropy of the spatio-temporal patterns exhibited by distinct alternative models. Section 4 uses the models for predicting future results and assesses their accuracy. Finally, Section 5 outlines the conclusions.

#### **2. Modeling the Teams' Dynamics**

Let us consider *N* teams competing in a league for one season. Therefore, the league has *R* = 2(*N* − 1) rounds, and each team plays *N* − 1 matches at home and *N* − 1 matches away.

Let us denote by *xi*(*k*), *i* = 1, ... , *N*, 0 ≤ *k* ≤ *kr*, the teams' positions from the beginning of the season up to the round *kr* = 3, ... , *R*. The lower limit *kr* = 3 is adopted to yield data-series with a minimum number of points for processing. Therefore, the signals *xi*(*k*) evolve in discrete time and one-dimensional space, and can be seen as the output of a complex system.

*Symmetry* **2020**, *12*, 356

We use the nonlinear least-squares [29,30] to test the behavior of the series *xi*(*k*) for six fitting hypotheses, namely shifted power (SP), quadratic (Qu), Hill (Hi), vapor pressure (VP), power law (PL) and Hoerl (Ho) models, given by:

$$SP: \ \pounds\_i^{SP}(k) = a\_i(k\_r) \cdot |k - b\_i(k\_r)|^{c\_i(k\_r)},\tag{1a}$$

$$\mathcal{Q}u \,:\, \hat{\mathfrak{x}}\_{l}^{\mathcal{Q}u}(k) = a\_{l}(k\_{l}) + b\_{l}(k\_{r}) \cdot k + c\_{l}(k\_{r}) \cdot k^{2},\tag{1b}$$

$$H\mathbf{i}:\ \mathbf{x}\_i^{H\dot{\mathbf{i}}}(k) = \frac{a\_i(k\_r) \cdot k^{b\_i(k\_r)}}{c\_i(k\_r)^{b\_i(k\_r)} + k^{b\_i(k\_r)}},\tag{1c}$$

$$VP: \text{ : } \pounds\_i^{VP}(k) = \mathfrak{e}^{\frac{a\_i(k\_F) + b\_i(k\_F)}{c\_i(k\_F) \cdot \ln(k)}},\tag{1d}$$

$$PL: \ \pounds\_i^{PL}(k) = a\_i(k\_r) \cdot k^{b\_i(k\_r)},\tag{1e}$$

$$H o: \ \pounds\_i^{Ho}(k) = a\_i(k\_r) \cdot b\_i(k\_r)^k \cdot k^{c\_i(k\_r)},\tag{1f}$$

where *x*ˆ*i* denotes the approximated values, *k* represents time and {*ai*(*kr*), *bi*(*kr*), *ci*(*kr*)} ∈ R are the models' parameters. Naturally, for each model, the parameters vary with time, that is, they depend on *kr*.

We can adopt other fitting models, eventually with more parameters, that adjust better to some particular series *xi*(*k*). However, only simple analytical expressions, requiring a limited set of parameters, are considered [31], otherwise the interpretation of the parameters becomes unclear. Moreover, loosely speaking, with exception of Qu, these heuristic models reflect somehow fractional characteristics, embodied in their structures by the non-integer exponents.

For assessing the accuracy of the models (1a)–(1f) we adopt the time varying errors *Ei* and *E*† given by:

$$E\_i(k\_r) = \mathbf{x}\_i(k\_r) - \mathbf{\hat{x}}\_i(k\_r), \ i = 1, \cdots, N, \ k\_r = \mathbf{3}, \cdots, R,\tag{2a}$$

$$E^{\dagger}(k\_r) = \sqrt{\frac{1}{N} \sum\_{i=1}^{N} \left[ E\_i(k\_r) \right]^2}, \ k\_r = 3, \cdots, R,\tag{2b}$$

where *N* = 20 and *R* = 38 correspond to the number of teams and rounds in the 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1'. Therefore, *Ei* gives information for team *i*, while *E*† highlights the fitting error for all teams involved in the league.

Figure 1 illustrates the error *<sup>E</sup>*1(*kr*) for the 2018–2019 champions of 'La Liga' and "Premiership", namely the FC Barcelona and Manchester City, and the models (1a)–(1f). Figure 2 depicts *E*†(*kr*) for 'La Liga' and 'Premiership'. Obviously, this error increases with the number of data points. Moreover, we verify that, with the exception of the Hi (1c), all other models approximate well the data, demonstrating the adequacy of the fitting functions. The other teams and leagues yield charts of the same type and, therefore, are omitted herein. For a more assertive comparison, we calculate the mean and the standard deviation, *μ* = 136 <sup>∑</sup><sup>38</sup>*kr*=<sup>3</sup> *E*†(*kr*) and *σ* = 136 <sup>∑</sup><sup>38</sup>*kr*=<sup>3</sup>[*E*†(*kr*) − *μ*]2, of the *E*†(*kr*) series for the models (1a)–(1f). Table 1 summarizes the corresponding values, highlighting that the Ho (1f) is the best three-parameter model approximation, and that the PL (1e) represents a good alternative, when having in mind the advantage of having just two parameters.

**Figure 1.** The error *<sup>E</sup>*1(*kr*), *kr* = 3, ... , 38, for the models (1a)–(1f), during the season 2018–2019: (**a**) FC Barcelona; (**b**) Manchester City.

**Figure 2.** The error *<sup>E</sup>*†(*kr*), *kr* = 3, ... , 38, for the models (1a)–(1f), during the season 2018–2019: (**a**) 'La liga'; (**b**) 'Premiership'.


**Table 1.** The mean and standard deviation, *μ* and *σ*, of the *E*†(*kr*) series for the four leagues and the models (1a)–(1f), during the season 2018–2019.

Figure 3 depicts the parameters {*<sup>a</sup>*1(*kr*), *b*1(*kr*)} of the PL and {*<sup>a</sup>*1(*kr*), *b*1(*kr*), *<sup>c</sup>*1(*kr*)} for the Ho model for FC Barcelona and Manchester City, respectively. The point labels represent the value of *kr*. We verify that for the FC Barcelona we have two distinct periods in the locus. The first corresponds to

3 ≤ *kr* ≤ 8, where the parameters evolve influenced by a set of consecutive bad results between rounds five and eight. The second period corresponds to 8 ≤ *kr* ≤ 38, where the path changes direction driven by a consistent and positive team behavior towards the final victory at *kr* = 38. For the Manchester City the variation of the parameters is more complex. Initially, we observe a route for the period 3 ≤ *kr* ≤ 7. Then, the locus has a slight change, due to a draw achieved by the team at round 8, but recovers fast its initial trend during for 9 ≤ *kr* ≤ 15. Again the locus changes driven by the set of team negative results in rounds 16–19 and 24. From *kr* = 25 onward, the parameters evolve positively influenced by the consecutive team victories until the end of the season at *kr* = 38. For other teams we can draw similar conclusions, meaning that there exists a clear relationship between the models' parameters and the teams' performance along the season. Moreover, we verify that, in general, abrupt changes in the loci route correspond to inconsistent results at early rounds, that is, small values of *kr*. For larger values of *kr*, eventual inconsistencies on the teams' behavior do not translate in significant modifications of the parameter patterns, since the fitting becomes less sensitive to the number of fitting points.

**Figure 3.** Locus of the the models parameters for the 2018–2019 champions of 'La Liga' and 'Premiership': (**a**) PL model for FC Barcelona; (**b**) Ho model for FC Barcelona; (**c**) PL model for Manchester City; (**d**) Ho model for Manchester City. The point labels denote *kr* = 3, . . . , 38.

#### **3. Entropy of the Spatio-Temporal Patterns of the Models' Parameters**

In this section we characterize the soccer leagues by means of entropy. The entropy is a measure of regularity that has been successfully adopted in the study of complex systems [15,32].

#### *3.1. The Entropy of the PL Model*

By approximating the output signals *xi*(*k*) through PL functions (1e) we are modeling the complex system as a fractional integrator [33,34] of order *bi* ∈ R<sup>+</sup> for a constant, step-like, input signal.

If a team obtains {victory, draw, defeat} in all matches, then *xi*(*k*) is a straight line with *ai* = {3, 1, 0} and *bi* = 1. However, real-world teams have {victories, draws, defeats} and, thus, yield a fractal like response that follows a PL behavior. Therefore, fractional/unit values of *bi* reflect variable/constant time evolution, while values of *ai* close to {3, 1, 0} correspond to {victory, draw, defeat} results [10].

For each league we now compute the PL parameters {*ai*, *bi*} that fit the teams' positions *xi*(*k*), *i* = 1, ... , 20, 0 ≤ *k* ≤ *kr*, from the beginning of the season up to the round *kr* = 3, ... , 38. Therefore, for every *kr* we have an array of 20 × (*kr* − 2) points in a two-dimensional space. We then determine the bi-dimensional histograms by binning the data of each array into *M* × *M* = 100 × 100 bins {*<sup>α</sup>j*, *βk*}, *j*, *k* = 1, . . . , *M*. Finally, we calculate the Shannon entropy [35,36]:

$$S(k\_r) = -\sum\_{j=1}^{M} \sum\_{k=1}^{M} P\left(a\_{j\prime}\beta\_k\right) \log P\left(a\_{j\prime}\beta\_k\right),\tag{3}$$

where the probabilities *P <sup>α</sup>j*, *βk* are approximated by the data relative frequencies.

For example, Figure 4 depicts the histograms of the PL parameters from the beginning, *kr* = 3, up to the end of the 2018–2019 season, *kr* = 38, for 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1'. We verify that the parameters {*ai*, *bi*} exhibit less dispersion for the pair P1 = {'La Liga', 'Ligue 1'} than for the pair P2 = {'Premiership', 'Serie A'}.

**Figure 4.** Histograms of the PL parameters {*ai*, *bi*} from the beginning, *kr* = 3, up to the end of the 2018–2019 season, *kr* = 38, and the leagues: (**a**) 'La Liga'; (**b**) 'Premiership'; (**c**) 'Serie A'; (**d**) 'Ligue 1'.

Figure 5 illustrates the evolution on the entropy, *S*(*kr*) versus *kr* = 3, ... , 38, for the PL parameters and 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1'. Again, we verify that the pairs P1 and P2 reveal similar behavior. For the pair P1, the entropy increases faster with *kr* that for the pair P2. This means that, in the 2018–2019 season, the Spanish and French teams started more irregular than the English and Italian ones. Nevertheless, since for all leagues, *S*(*kr*) converges to a similar settling value, we conclude that by the end of the season the {victory, draw, defeat} global pattern exhibited by teams in different leagues is identical.

**Figure 5.** Evolution of the entropy *S*(*kr*) versus *kr* = 3, ... , 38, for the PL parameters and the 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1', during the season 2018–2019.

The symmetry between the pairs P1 and P2, both in the histograms of Figure 4 and the entropy evolution represented in Figure 5 opens, however, new questions. Is such duality of patterns just a casual result, or does it point towards some underlying effects ruling the dynamics of these complex systems? Additionally, can other patterns occur and what is their meaning? A future study addressing a larger number of leagues seems necessary in order to give a response to these type of questions.

#### *3.2. The Entropy of the Ho Model*

The Ho model (1f) combines both an exponential and a PL function. These functions characterize well integer- and fractional-order systems, respectively [37].

Similarly to the previous subsection, we compute the Ho parameters {*ai*, *bi*, *ci*} that fit the teams' positions *xi*(*k*), *i* = 1, ... , 20, 0 ≤ *k* ≤ *kr*, from the beginning of the season up to the round *kr* = 3, ... , 38. For every *kr* we now obtain an array of 20 × (*kr* − 2) points in a three-dimensional space. Therefore, we determine three-dimensional histograms by binning the data of each array into *M* × *M* × *M* = 100 × 100 × 100 bins {*<sup>α</sup>j*, *β<sup>k</sup>*, *<sup>γ</sup>l*}, *j*, *k*, *l* = 1, . . . , *M*. Finally, we calculate the Shannon entropy [35,36]:

$$S(k\_r) = -\sum\_{j=1}^{M} \sum\_{k=1}^{M} \sum\_{l=1}^{M} P\left(\alpha\_{j\prime}, \beta\_{k\prime}, \gamma\_l\right) \log P\left(\alpha\_{j\prime}, \beta\_{k\prime}, \gamma\_l\right),\tag{4}$$

where the probabilities *P <sup>α</sup>j*, *β<sup>k</sup>*, *<sup>γ</sup>l* are approximated by the data relative frequencies.

Figure 6 depicts the two-dimensional projections of the histogram of the Ho parameters, from the beginning, *kr* = 3, up to the end of the 2018–2019 season, *kr* = 38, for 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1'. Therefore, the charts represent the combination of the pairs of parameters {*ai*, *bi*}, {*ai*, *ci*} and {*bi*, *ci*}. As for the PL model, we verify that, in general, the parameters {*ai*, *bi*, *ci*} exhibit less dispersion for the leagues P1 = {'La Liga', 'Ligue 1'} than for the leagues P2 = {'Premiership', 'Serie A'}.

Figure 7 depicts the entropy, *S*(*kr*) versus *kr* = 3, ... , 38, for the Ho model parameters, and 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1'. We verify that *S*(*kr*) has now an identical behavior for all leagues, but we can still distinguish a slight difference between the pairs P1 and P2.

**Figure 6.** The two-dimensional projections of the histograms of the Ho parameters, from the beginning, *kr* = 3, up to the end of the 2018–2019 season, and the leagues: (**<sup>a</sup>**–**<sup>c</sup>**) 'La Liga'; (**d**–**f**) 'Premiership'; (**g**–**i**) 'Serie A'; (**j**–**l**) 'Ligue 1'. The charts represent the combination of the pairs of parameters {*ai*, *bi*}, {*ai*, *ci*} and {*bi*, *ci*}.

 **Figure 7.** Evolution of the entropy *S*(*kr*) versus *kr* = 3, ... , 38, for the Ho model parameters and the leagues 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1', during the season 2018–2019.

#### **4. Predicting the Teams' Results**

The models (1a)–(1f), introduced in section 2, are now tested in the prediction of the teams' results. In a first phase, we fit the models to the series *xi*(*k*), *i* = 1, ... , *N*, 0 ≤ *k* ≤ *kr*, and we calculate the approximation *<sup>x</sup>*<sup>ˆ</sup>*i*(*k*) for each *kr* = 3, ... , 37. In a second phase, we extrapolate the values of *<sup>x</sup>*<sup>ˆ</sup>*i*(*k* + <sup>1</sup>), that is, the teams positions for every round from four to 38. Finally, in a third phase, we assess the accuracy of the values *<sup>x</sup>*<sup>ˆ</sup>*i*(*k* + 1) by means of the prediction errors *E* and *E*†, where *kr* = 4, . . . , 38.

Figure 8 illustrates the error *<sup>E</sup>*1(*kr*), *kr* = 4, ... , 38, obtained with the models (1a)–(1f) for the champions of the 'La Liga', 'Premiership', 'Serie A' and 'Ligue 1' in season 2018–2019, namely for the FC Barcelona, Manchester City, Juventus and Paris Saint-Germain. Figure 9 depicts *<sup>E</sup>*†(*kr*), *kr* = 4, ... , 38, for 'La liga', 'Premiership', 'Serie A' and 'Ligue 1', obtained with the six models (1a)–(1f). We verify again that, with the exception of the Hi model (1c), all other models predict well the teams' results.

**Figure 8.** The error *<sup>E</sup>*1(*kr*), *kr* = 4, ... , 38, for the models (1a)–(1f), during the season 2018–2019: (**a**) FC Barcelona; (**b**) Manchester City; (**c**) Juventus; (**d**) Paris Saint-Germain.

For a numerical comparison of the models accuracy, we calculate the mean and the standard deviation, *μ* and *σ*, of the *E*†(*kr*) series and list their values in Table 2. We verify again that the Ho (1f) model leads to the best predictions. We can see that the errors are larger for the leagues in the set P1 than for the ones in the set P2, meaning that the prediction is more difficult for the leagues with higher values of entropy.


**Table 2.** The mean and standard deviation, *μ* and *σ*, of the *E*†(*kr*) for the six models (1a)–(1f) and season 2018–2019.

**Figure 9.** The error *<sup>E</sup>*†(*kr*), *kr* = 4, ... , 38, for the models (1a)–(1f), during the season 2018–2019: (**a**) 'La liga'; (**b**) 'Premiership'; (**c**) 'Serie A'; (**d**) 'Ligue 1'.
