**4. Data Analysis**

A dataset of predictions on Spanish gross domestic product is used to illustrate empirical features of the proposed algorithm. The proposed algorithm produces optimal weights *<sup>ω</sup>*<sup>∗</sup>*i* (Table 1) that are used to produce predictions *<sup>a</sup>*<sup>ˆ</sup>*T*+<sup>1</sup> (Table 2), the predictive ability of which can be assessed. The predictive ability of the proposed algorithm for this dataset is similar to that of alternative naive forecast algorithms, in agreemen<sup>t</sup> with the simulation exercise of Table 3.

The dataset used in this application comes from the Fundación de las Cajas de Ahorro, FUNCAS. The sample covers the economic predictions of different institutions from 2000 to 2018. The selected sample contains a total of 21 institutions: Analistas financieros, Asesor, Bankia, BBVA, Caixabank, Cámara de Comercio de España, CatalunyaCaixa, CEEM-URJC, Cemex, CEOE, CEPREDE-UAM, ESADE, Funcas, ICAE-UCM, IEE, Instituto de Macroeconomía y Finanzas (Universidad CJC), Instituto Flores de Lemus, Intermoney, Repsol, Santander, Solchaga Recio & asociados). Each agency makes two predictions a year, in July and December for both the current and the following year. Therefore, each year is predicted by each agency up to 4 times. FUNCAS prediction panels are very well known with a prominent experience in economic research and for their thorough work in collecting forecasts at the regional and national levels. In addition, FUNCAS provides such information for free (see www.funcas.es).

For this data analysis, a quadratic forecast error function *f*(*<sup>x</sup>*, *y*)=(*x* − *y*)<sup>2</sup> and the following algorithmic parameter values have been used: *λ* ∈ {1 × <sup>10</sup>−4, 2 × <sup>10</sup>−4, ... , 8 × 1015, 9 × 10<sup>15</sup>} and *δ<sup>t</sup>*,*<sup>T</sup>* = 1 for all *t*, *T*. The optimization problems have been solved using the free software R version-3.6.1 [26] and the optimization algorithms available in the nloptr library, which serves as an interface for the NLOPT library [27]. NLOPT algorithms can be global or local and based on derivatives or gradient free and include, for example, the augmented Lagrangian algorithm, which uses subsidiary local optimization algorithms. All optimizations have been initialized with a uniform starting point.

To help illustrate the application of the algorithm, Tables 1 and 2 focus on the subset of the full dataset that only includes forecasts for each given year made in July of that same year. Alternative restrictions of the full dataset are possible, for example, forecasts for each year made in December of that same year or forecasts for each year made in July of the previous year. Such alternative restrictions lead to similar key features regarding predictive ability and optimal weights as are described below.

Key features of the optimal weights *<sup>ω</sup>*<sup>∗</sup>*i* output by the proposed algorithm included in Table 1 are weight variation across years and across institutions, variations that can be substantial but also reveal some consistencies. The years for which Table 1 reports optimal weights are 2002 through 2018. For the first two prediction years, all weights are negligible except for one, with that single key institution representing about 10% of the number of institutions. For the remaining fifteen years, weights spread out producing 20% to 60% of key institutions. Institutions range from those receiving large optimal weights (e.g., CatalunyaCaixa with 100% on 2002–2003 or IEE and ICO with about 75% on 2004 and 2006 respectively) to those receiving negligible weights. Some institutions are not considered in some years. Of the initial 21 institutions in the full dataset, only 13 produced forecasts from 2000, of which only 9 were still producing forecasts by the end of the sample. Considering years and institutions jointly gives two institution groups: institutions with strikes of substantial weights (e.g., using 25% as threshold: CatalunyaCaixa, IEE and ICO) and the rest of institutions.

Some key factors to assess the predictive ability of predictions *<sup>a</sup>*<sup>ˆ</sup>*T*+<sup>1</sup> made using the proposed algorithm included in Table 2 have been varied as parameters in the simulation study. The simulation study considers multiple combinations of different parameters (Table 3). A combination of parameters that resembles the features in the data could be: (i) 40% of key agents, given that about half of the estimated optimal weights in Table 1 are non-negligible , i.e., ≥ 4% (note however that the fraction of non-negligible weights grows substantially over time in the data while it remains constant in the simulation study); (ii) 10 forecasting agents, given that the number of agents decreases from 13 on 2000 to 9 on 2018; and (iii) a sample size of *T* = 20 years, with the data covering nineteen years (2000–2018). According to the simulation study, such combination of parameters seems to have potential for favoring either the naive or the proposed algorithm depending on the degree of variability between predictions. A variability of *SD* = 0.2 might be reasonable for the data, since predictions for a year are made in July of that same year. This amount of variability produced an average increase of 3.21% in the root mean square prediction error relative to the naive algorithm in the simulation study. This is consistent with the differences reported in Table 3 for the data.


**Table 1.** Optimal weights *ω*∗ *i* output by the proposed algorithm.


**Table 2.** Gross Domestic Product (GDP), forecasts *<sup>a</sup>*<sup>ˆ</sup>*T*+<sup>1</sup> and corresponding sample forecast root mean square errors (RMSE) for the time period 2000–2018 using different methods: the arithmetic average of predictions made by all institutions (naive); the proposed algorithm (machine); and the arithmetic average of the subset ofpredictionsusedtomakethepredictionswiththeproposed algorithm (naive2).
