**1. Introduction**

This paper applies a Machine Learning approach with the aim of providing a single aggregated prediction from a set of individual predictions. Departing from the well-known maximum-entropy inference methodology, a new factor capturing the distance between the true and the estimated aggregated predictions presents a new problem. To tackle the issues posed by this additional factor, one can look at machine learning (ML) algorithms like ridge regression, lasso or elastic nets. By doing so, the main contribution of this paper is a novel algorithm that combines classic maximum-entropy inference with machine learning and regularization principles by applying a penalty when the aggregated forecast fails to match the forecast target. Via a simulation exercise, we assess the performance of the algorithm and compare it against the naive approach in which aggregated predictions are built as averages of individuals predictions. We also apply this algorithm to a dataset of predictions on Spanish gross domestic product to produce optimal weights that are then used to produce predictions, the predictive ability of which is also evaluated.

Nowadays, there is an increasing number of prospective sources and methods stating a wide variety of forecasts for a given economic variable. The traditional methods for combining forecasts are based on the relative past performance of the forecasters to be combined. However, the number of forecasters has increased considerably over recent years, with the new ones not having had enough time to sufficiently demonstrate their predictive ability, an issue relevant in Economics.

The convenience of combining individual results to obtain a single aggregated prediction is not only problematic in Economics. In Physical Theory, understood as Statistical Mechanics, the seminal works of Jaynes ([1,2]) provide the connection with Information Theory that suggests a constructive

method for setting up probability distribution with partial knowledge. Another reason why an Information Theory approach could be a more appropriate way of tackling the problem of the prediction aggregation is an informational matter. Rational expectation says that experts should converge eventually to the true prediction. After a long but successful learning process, experts should make similar predictions. Therefore, a uniform distribution over the set of predictions should be the ultimate combination of predictions. Such a distribution maximizes its entropy.

The machine learning literature on combining forecasts is vast and includes among others the approaches of bagging [3], boosting [4,5] or neural network blending [6]. In the field of economics, combining forecasts has a long tradition and is still an active area (see, e.g., Refs. [7–10]). Prediction combination in order to forecast gross production represents also an active subfield of research (e.g., Refs. [11–16]). The ASA/NBER business outlook surveys started producing composite economic forecasts on 1968 shortly after Ref. [17] commented on the advantages of averaging several forecasts of gross production (as pointed out in Ref. [18]).

From the classic theory, the combination of individual results to obtain a single aggregated prediction consists on a vector of weights that calibrates different degrees of expert ability. Several alternatives can be considered for the combination of forecasts involving different degrees of sophistication. For instance, Ref. [19] considers a minimization of variance-covariance; Ref. [20] offer a method to compute the weights in order to minimize the error variance of the combination. Another method called the regression methods by Ref. [21] interprets the coefficient vector of a linear projection as the corresponding weights. This line of research takes into account the same optimization problem by changing the restriction conditions. We present the benchmark model for the optimization problem of the aggregation of prediction under the perspective of Information Theory. This model activates the criterium of Kullback–Leibler distance to determine the weights of the aggregation of prediction. The nature and objectives of the above problem consists of combining the predictions trying to keep constant (uniform) the knowledge provided by each of them and verifying the true prediction.

Under this perspective, a second approach, the Machine Learning technique, presents a second optimization problem. We draw inspiration from some machine learning algorithms to sugges<sup>t</sup> a specification that combines both objectives: the relative distance expression and the constraints part related to the true prediction. We propose a new specification that also introduces temporal parameters related to an arbitrary temporal structure. Parameters that weight each of the divergences between the aggregation of the predictions and the true predictions. The resulting optimization problem resembles that of regression with regularization [22] and we propose solving it using nested cross-validation [23].

Empirical features of the proposed algorithm are illustrated using a dataset of predictions on Spanish gross domestic product (GDP). The dataset used in this application comes from Fundación de las Cajas de Ahorro, FUNCAS. This is a rich dataset with a sufficient number of institutions making predictions to allow the use of the proposed algorithm. Using this dataset, the proposed algorithm produces optimal weights which are then used to produce both predictions and the predictive ability. Although the dataset does not allow us to disentangle clear differences between the proposed algorithm and a naive forecast, the algorithm is robust in the sense that selecting predictions made in either July or December leads to similar results and interpretations.

The differences between the proposed algorithm and the naive forecast are further explored in a simulation study. Such a study reveals that the proposed algorithm becomes more suitable than the simpler, naive overall average as the length of the target time series increases, as the number of forecasting institutions decreases and as the institutions with predictions sharper than the rest become fewer in number and depart more from the rest.

The paper is organized as follows. In Section 2 we present the model. In Section 3 we introduce the Machine Learning algorithm applied to the maximum-entropy inference problem. In Section 4 the above algorithm is applied to a dataset of predictions on Spanish gross domestic product and in Section 5 assessed via a simulation exercise. Section 6 presents the concluding remarks.
