*2.2. Machine-Learning-Inference (MLI)*

The nature and objectives of problem (1) consists of (i) combining the predictions of a set of institutions, (ii) trying to keep constant (uniform) the knowledge (the information) provided by each of them and (iii) verifying (approaching as much as possible) a set of restrictions. Under this perspective, we draw inspiration from some machine learning algorithms, such as ridge, lasso or elastic net (in ridge, lasso or elastic net the goal is to minimize a distance, keeping under control the number of parameters of the model to avoid overfitting and all this is controlled by a parameter that allow to rescale or determine the relative importance of each source error function) to propose a specification that combines both objectives: the relative distance expression and the constraints part related to the true predictions. We propose a new specification, which also introduces the parameters *δt* related to an arbitrary temporal structure (for example, each of those parameters may depend on the time distance of the restriction to the forecasted period or on the certainty available about the corresponding constraint value), with a parameter *λ* that weights the restrictions imposed by the distance between the aggregation of the predictions and the true predictions. It is possible to consider a family of norms since the problem is in **R***n* and we are under a normed vector space.

Putting together both expressions and being mindful that the parameters {*<sup>ω</sup>i*} are parametrized, we formalize the minimization problem under a machine learning perspective as:

$$\min\_{x\_i} \sum\_{i \in I} \frac{1}{|I|} \log(\omega\_i |I|)^{-1} + \lambda \sum\_t \delta\_t ||\sum\_{i \in I} \omega\_i y\_{i,t} - a\_t|| \tag{2}$$

The connection of the proposed specification to the machine learning literature stems from the form of the objective function (Equation (2)) and its two summands. The first one refers to the divergence of Kullback–Leibler: ∑*<sup>i</sup>*∈*<sup>I</sup>* 1 |*I*| log(*<sup>ω</sup>i*|*I*|)−1. The second one corresponds (resembles) to a flexible regularization term: *λ* ∑*t <sup>δ</sup>t*|| ∑*<sup>i</sup>*∈*<sup>I</sup> <sup>ω</sup>iyi*,*<sup>t</sup>* − *at*||.

Lambda ( *λ*, hereafter) is a penalty parameter to choose weights that minimize the divergence of Kullback–Leibler to a uniform distribution and penalize the magnitude of the deviation of the weighted prediction from the observed value. On the one hand, when *λ* is equal to 0 there is no past prediction penalty and the result is equivalent to the classic model without temporal restrictions. On the other hand, when *λ* grows the breach of the temporal restrictions is gaining weight and dominates Equation (2). In this latter case, the problem may be thought of as a weighted regression problem but with the coefficients restricted to being positive and to adding up to one and without showing the drawbacks of traditional procedures when the number of forecasters is larger than the number of temporal restrictions.

The delta parameters (*δ*, hereafter) are an improvement measure for the magnitude of the importance that *λ* gives to the breach of the restrictions. In other words, *δ* weights the relative importance to the restriction from one year to another.

## *2.3. From Maximum-Entropy Inference to Machine Learning Inference*

Problem (1) indeed shares the same essence as the minimization of problem (2). The first problem is a constrained optimization problem and the second one incorporates this restriction to the objective function. The methodology of solving problem (1) is by the method of the Lagrange multipliers. Specifically, the constrained problem is converted into a structural form with both the objective and the constrained conditions together multiplied by parameters depending on the set of restrictions. Solving the first order conditions of the Lagrangian function, the optimum is derived. The Lagrangian for (1) is written as:

$$\mathcal{L} = \begin{array}{c} \sum\_{i \in I} \frac{1}{|\mathcal{T}|} \log(\omega\_i |I|)^{-1} + \sum\_{t} \lambda\_t \left(\sum\_{i \in I} \omega\_i y\_{i,t} - a\_t\right) \end{array}$$

It should be noted that the solution {(*ω*<sup>∗</sup>*i* , *λ*∗*t* )}(*<sup>i</sup>*∈*I*,*<sup>t</sup>*), if it exists, pushes down to 0 the second part of the Lagrangian since the restrictions must hold and moreover minimize the relative distance.

Let us now assume a family of problems denoted by P(*λ*). Fixing *λ* we have the following minimization problem:

$$\min\_{x\_i} \sum\_{i \in I} \frac{1}{|I|} \log(\omega\_i |I|)^{-1} + \lambda \sum\_t \delta\_t ||\sum\_{i \in I} \omega\_i y\_{i,t} - a\_t|| \tag{3}$$

When the norm is the absolute distance and *λt* = *λδ<sup>t</sup>*, both problems, (3) and (1) coincide. If a solution in the former problem (3) exists, then such a solution is a candidate for the later problem (1) for the specific *λ<sup>t</sup>*. Only the restrictions may not be satisfied in problem (3) if this distortion allows the reduction of (if it is possible) the relative entropy with the uniform distribution. Therefore, under the assumption of existence of solution, both problems will offer the same class of solution.

The consideration in the optimum allows us to consider addressing problem (3) from another perspective when in fact problem (1) has no solution or it is too complex to find. The algorithms and structural forms borrowed by machine learning could be a way to approach the solution from a machine learning framework.
