**3. Algorithm**

This section proposes an algorithm to deal with problem (3). The proposed algorithm finds a solution to problem (3) analogously as they do well-understood regularization, machine-learning algorithms. The main steps of the algorithm are splitting the data into training, validation and test sets and choosing the penalty coefficient, *λ*, via cross-validation on validation sets. The parameters *δt* are exogenous. Following this, the algorithm prediction error can be computed on the test sets and the *xi* values estimated using the whole set after addressing the *λ* and *δ* parameters, bearing in mind that the parameters {*<sup>ω</sup>i*} are parametrized. The estimated values are finally used as weights to combine the individual predictions.

For cross-validation, we follow the time-series machine-learning literature and propose the use of rolling-origin evaluation [24], also known as rolling-origin-recalibration evaluation [25]. These are forms of nested cross-validation, which should give an almost unbiased estimate of error [23]. Once the number of institutions (forecasters) that we could be used to properly define the training, validation and test sets are selected, we can start to solve the optimization problem. As we will have already noticed, the institutions must be the same in the training, testing and validation sets. If this condition is not fulfilled, the problem will not be well defined. To solve this issue, in our application (see Section 4), the dimensionality of the initial data bank was reduced from 21 to around 10 forecasters satisfying the condition of existence of data for the three phases. This gives us three sets of data sampling with around 10 institutions for each phase.

As a possible specification we select one of the possible options, we consider the quadratic norm, a ridge regression, in the objective function and add a parameter *λ* and the *δ*'s that characterize the slackness of the process. For simplicity we use *δt* = 1, where we give equal importance to all restrictions. Different values of *λ*, from a grid of values, are tested to find the optimum that minimizes the divergence and penalizes the combinatorial prediction with respect to the observed value.

The steps of the proposed algorithm are described in detail in Algorithm 1. The output of the algorithm is a prediction for period *T* + 1 denoted by *<sup>a</sup>*<sup>ˆ</sup>*T*+1. The requirements to apply this algorithm are: (i) the three dataset splits mentioned above (training, validation and test), (ii) a set of discrete values of *λ* between 0 and infinity, (iii) a set of discount values *δ* emphasizing the *λ* parameter, and (iv) a prediction error function. The algorithm solves the optimization problem on the training subset for each of the different values of *λ* and *δ*. Once the optimization problem is solved, we ge<sup>t</sup> a set of prediction errors on the validation set, as many as values for *λ*. Subsequently, through cross-validation, we make the selection of the *λ* that minimizes this prediction error. Thanks to this selection, it is possible to obtain the best penalty in terms of prediction error. Once the best *λ* is obtained, we apply the algorithm on the test set and evaluate its performance. We ge<sup>t</sup> the *ωi* that minimize the objective function and a measure of its prediction error.

**Algorithm 1:** Machine learning based entropy

```
1 input:
```

```
2 Forecast data made by institution i for year n, {yi,n; i in 1 : I, n in 1 : N + 1} (N ≥ 2)
3 Realized values, a1:N
4 Set of penalty coefficients, {λj, j in 1 : J}
5 Set of discount coefficients {δt,T, t in 1 :(N − 1), T in 2 : N}
6 Forecast error function f
 7 output:
8 Prediction aˆN+1
9 Pseudocode:
10 For n in 2 : N
11 For j in 1 : J
12 Solve for weights using the training subset y1,..., yn−1:
13 Set ωi,n,j = argmin{ωi} ∑i∈I 1|I| log(ωi|I|)−1 + λj ∑n−1 t=1 δt,n|| ∑i∈I ωiyi,t − at||
14 Determine the forecast error using the validation set yn:
15 Set en,j = f(an, ∑i∈I ωi,n,jyi,n)
16 End For
17 End For
18 Set j∗ = argminj(N − 1)−1 ∑Nt=2 et,j
19 Set λ∗ = λj∗
20 Solve for weights using λ∗ and the full data set:
21 Set ω∗i = argmin{ωi} ∑i∈I 1|I| log(ωi|I|)−1 + λ∗ ∑Nt=1 δt,N|| ∑i∈I ωiyi,t − at||
22 Set aˆN+1 = ∑i∈I ω∗i yi,N+1
```