Next Article in Journal
Privacy-Preserving Distributed Learning via Newton Algorithm
Previous Article in Journal
Self-Supervised Skin Lesion Segmentation: An Annotation-Free Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Inconvenient Truth about Forecast Combinations

by
Pablo Pincheira-Brown
1,*,
Andrea Bentancor
2 and
Nicolás Hardy
3
1
School of Business, Universidad Adolfo Ibáñez, Diagonal Las Torres 2640, Peñalolén, Santiago 7940000, Chile
2
Facultad de Economía y Negocios, Universidad de Talca, Talca 8460000, Chile
3
Facultad de Administración y Economía, Universidad Diego Portales, Huechuraba 8170641, Chile
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(18), 3806; https://doi.org/10.3390/math11183806
Submission received: 14 June 2023 / Revised: 13 August 2023 / Accepted: 28 August 2023 / Published: 5 September 2023
(This article belongs to the Special Issue Time Series Forecasting for Economic and Financial Phenomena)

Abstract

:
It is well-known that the weighted averages of two competing forecasts may reduce mean squared prediction errors (MSPE) and may also introduce certain inefficiencies. In this paper, we take an in-depth view of one particular type of inefficiency stemming from simple combination schemes: Mincer and Zarnowitz inefficiency or auto-inefficiency for short. Under mild assumptions, we show that linear convex forecast combinations are almost always auto-inefficient, and, therefore, greater reductions in MSPE are almost always possible. In particular, we show that the process of taking averages of forecasts may induce inefficiencies in the combination, even when individual forecasts are efficient. Furthermore, we show that the so-called “optimal weighted average” traditionally presented in the literature may indeed be inefficient as well. Finally, we illustrate our findings with simulations and an empirical application in the context of the combination of headline inflation forecasts for eight European economies. Overall, our results indicate that in situations in which a number of different forecasts are available, the combination of all of them should not be the last step taken in the search of forecast accuracy. Attempts to take advantage of potential inefficiencies stemming from the combination process should also be considered.

1. Introduction

Decision makers are quite often confronted with two or more forecasts for the same target variable. In this scenario, [1] identifies two possible different strategies: the search for the best possible single forecasting method and the search for the best possible combination of the available forecasts. This last strategy has received a lot of attention in the literature since the seminal work of [2]. In their article, the authors combine two sets of forecasts coming from airline passenger data. They conclude that a composite forecast may display fewer mean squared prediction errors (MSPE) than either of the single original projections.
Even when the superior predictive ability of one forecast over another is suspected, one could test for forecast encompassing. Refs. [3,4] claim that the superior accuracy of one forecast over another does not necessarily mean that the inferior forecast is useless. It could be the case that the superior forecast could benefit from using some of the information contained in the outperformed forecast. Granger and Newbold consider the possibility that an average of the superior and inferior forecasts may yield a new and more accurate forecast. When this is not possible, it is said that the superior forecast is not only more accurate than the outperformed forecast but also encompasses it (see [5,6]).
Since 1969, a number of different papers have been written on topics directly or indirectly related to the combination of forecasts. A sample of the influential work from the next two decades includes [7,8,9,10]. More recent papers are also published on the topic including, for instance, [11,12,13,14,15,16,17,18,19,20,21,22].
Despite the huge variety of combination methods available in the literature, two particular families of combination strategies have attracted special attention. Using the terms in [10], these two families are known as the variance–covariance method of [2] and the regression method, introduced by [8]. Broadly speaking, the first approach generates the combined forecast as a weighted average of the pool of single individual forecasts. Notice that this weighted average does not need to be convex. The latter approach is one in which the combining weights are obtained as the coefficient estimates of a regression between the target variable and the set of available individual forecasts. Figure 1 depicts the general situation that faces a decision maker confronted with two or more forecasts for the same target variable.
In general terms, the combination of forecasts is reported as a successful strategy to improve forecast accuracy. Ref. [1] shows an interesting table in which the simple average of several methods outperforms either of the individual forecasts available for US inflation. This is just one example of a pattern that the literature has been exploring so far. More empirical examples of the good behavior of combination schemes are found, for instance, in [7,17,23,24,25], just to name a few.
Different theoretical approaches aim at explaining the success of combination strategies. Given a set of forecasts and a loss function, the optimal combination could be found as the solution of an optimization problem looking for weights to minimize the expected loss. In many applications, such an optimization problem is well-defined and leads to non-trivial optimal weights, ensuring reductions in the loss function or, in other words, ensuring combination gains. Ref. [26] provides an interesting summary of the different environments in which combination gains are possible.
Despite these theoretical efforts, some questions are still unresolved. For instance, part of the relevant literature investigates what is known as the “Combination Puzzle“, which refers to the fact that simple combination schemes, like an arithmetic average, frequently outperform more complex and theoretically optimal combination rules. See, for instance, [27], for a discussion about this puzzle. More generally, as suggested by [15], the best way to build combination weights is still a matter of open debate.
While it is clear that combination gains may exist in a number of applications, part of the literature analyzes the efficiency of some combination strategies. For instance, ref. [10] indicates that the regression approach is a combination scheme that leaves room for improvement due to the autocorrelation in the residuals that is inherent to this combination method. Refs. [10,26] also mention that the Bates and Granger approach is potentially inefficient due to the introduction of the constraint of the coefficients summing to unity. The extent to which these inefficiencies are indeed relevant requires a case-by-case analysis, yet it is our reading of the literature that the potential inefficiencies of forecast combinations are mostly overshadowed by the enthusiastic literature that mainly emphasizes the direct benefits of forecast combinations in boosting forecast accuracy.
The main objective of this paper is to show that linear convex forecast combinations are almost always auto-inefficient, and, therefore, greater reductions in MSPE are almost always possible. By auto-inefficiency, we refer to the notion of efficiency analyzed by [28]. Aside from showing that auto-inefficiency almost always emerges in forecast combinations, we prove two additional results: First, we show that the process of taking averages of forecasts may induce inefficiencies in the combination, even when the individual forecasts are efficient. Second, we show that the so-called “optimal weighted average”, which is traditionally presented in the literature, may indeed be inefficient as well.
Currently, the state of the art in forecast combinations is to explore new different weighting schemes to combine forecasts, and no attempt to take advantage of potential inefficiencies stemming from the combination process is carried out. Our contribution to the existing literature is to show that auto-inefficiencies are commonplace in forecast combinations, which opens an interesting avenue for future research: to find the best way to remove these inefficiencies to further improve forecast accuracy.
We focus on linear convex combinations because they are used by practitioners in many empirical applications. In particular, Consensus Economics reports individual forecasts and their simple averages. Furthermore, many simple linear convex combinations are considered to be very accurate, which is consistent with the aforementioned combination puzzle. In addition, these combination strategies allow for an interpretation of the combination as a consensus forecast. Finally, many of these linear convex combinations do not require previous knowledge of the target variable to construct the combined forecast, which is a clear advantage of simple methods compared to the [8] approach.
The rest of the paper is organized as follows. In Section 2, we set the econometric environment. Section 3 contains the main theoretical results and simulated examples. Section 4 illustrates our findings with an empirical application. Section 5 concludes and presents possible extensions for further research. Appendix A provides some additional theoretical and empirical results.

2. Econometric Environment

Let us consider Y t to be a stationary and ergodic time series process. We assume that at time t we want to forecast the random variable Y t + h , which is equivalent to say that we look for an h-step ahead of the forecast for our target variable. We use the following notation: Y t ( h ) represents a generic forecast for the target variable Y t + h , which is built with the information available at time t . We assume that we have two forecasts available for the same target variable Y t + h . We denote these forecasts as Y 1 , t ( h ) and Y 2 , t h .
We further assume that the vector process
Y 1 , t ( h ) Y 2 , t ( h ) Y t + h 3 × 1
is weakly stationary and ergodic and has a positive definite variance–covariance matrix V .

2.1. The Combined Forecast

Consider the following combination of forecasts
Y t c h = λ Y 1 , t h + 1 λ Y 2 , t h = λ Y 1 , t h Y 2 , t h + Y 2 , t h  
λ [ 0 , 1 ]
where Y t c h denotes the combined forecast. The corresponding forecast errors are
u t c h = λ u 1 , t h + 1 λ u 2 , t h = λ u 1 , t h u 2 , t h + u 2 , t h
where u 1 , t ( h ) and u 2 , t ( h ) represent the errors associated with forecasts Y 1 , t ( h ) and Y 2 , t h , respectively:
u 1 , t h = Y t + h Y 1 , t h ,   u 2 , t h = Y t + h Y 2 , t h
We assume, without loss of generality, that the mean squared prediction errors ( M S P E ) of forecast 2 are as good as those of forecast 1, which is to say that
M S P E 2 ( h ) E u 2 , t 2 ( h ) M S P E 1 ( h ) E [ u 1 , t 2 ( h ) ]
When the combined forecast displays fewer M S P E than forecasts Y 1 , t ( h ) and Y 2 , t ( h ) do, we follow [26] and say that combination gains (CG) do exist.
Proposition 1 shows the conditions for this to happen.

2.2. Combination Gains

Proposition 1.
If
E u 1 , t ( h ) u 2 , t ( h ) u 2 , t ( h ) < 0
then combination gains are possible for λ ∈ (0, 1).
Proof. 
Notice that
E u t c ( h ) 2 = E λ u 1 , t ( h ) u 2 , t ( h ) + u 2 , t ( h ) 2 = λ 2 E u 1 , t ( h ) u 2 , t ( h ) 2 + E u 2 , t ( h ) 2 + 2 λ E u 1 , t ( h ) u 2 , t ( h ) [ u 2 , t ( h ) ]
Therefore,
E u t c ( h ) 2 E u 2 , t ( h ) 2 = λ 2 E u 1 , t ( h ) u 2 , t ( h ) 2 + 2 λ E u 1 , t ( h ) u 2 , t ( h ) [ u 2 , t ( h ) ]  
As shown later, a direct implication of a positive definite variance–covariance matrix V is that
E u 1 , t ( h ) u 2 , t ( h ) 2 = E Y 1 , t ( h ) Y 2 , t ( h ) 2 > 0
With this, we can see that (6) is a strictly convex quadratic form with, at most, two different real roots. One of them is zero, which rules out the possibility of two complex roots. The other real root is
λ 2 u = 2 E u 1 , t ( h ) u 2 , t ( h ) [ u 2 , t ( h ) ] E u 1 , t ( h ) u 2 , t ( h ) 2 > 0
Univariate convex quadratic forms with a zero root must fall into one of the three cases depicted in Scheme 1. Given that λ 2 u > 0 , we are in a situation like that depicted with the blue line. Therefore, combination gains are achieved in the interval of
( 0 , λ 2 u )
which, in particular, ensures combination gains in, at least, a subset of the open set ( 0 ,   1 ) . □
Well acquainted with this analysis, [12] proposes testing the following null hypothesis of no encompassing
H 0 : E u 1 , t ( h ) u 2 , t ( h ) u 2 , t h = 0
Under this null hypothesis, a combination of forecasts is not improving the accuracy because
E u t c ( h ) 2 E u 2 , t ( h ) 2     λ R
and no combination gains are possible.
We are also interested in analyzing a particular property of forecasts that we call auto-efficiency.

2.3. Auto-Efficiency

Definition 1.
Consider a target variable  Y t + h  and a forecast  Y t f ( h ) . We say that the forecast  Y t f ( h )  is auto-efficient as long as
C o v Y t + h Y t f ( h ) , Y t f ( h )   = 0
 which, in the case of  Y t f ( h )  being unbiased, could be expressed as
E Y t + h Y t f ( h ) Y t f ( h ) = 0
This last expression indicates that forecast errors are orthogonal to the forecast itself. If this condition does not hold true, we say that the forecast Y t f ( h ) is auto-inefficient. In general, definitions in the same line are originally attributed to the early work of [28] and are called Minzer–Zarnowitz efficiency.
Notice that auto-efficiency is one of the conditions satisfied by optimal forecasts under quadratic loss. Furthermore, the notion of auto-efficiency is relevant for at least two reasons:
(1)
When forecasts are unbiased, auto-inefficiency compromises the inverse relationship between the MSPE and the explained variance of the forecast.
(2)
Violations of auto-efficiency, in theory, allow for a simple modification of the forecast Y t f ( h ) to produce a new revised forecast with fewer MSPE than Y t f ( h ) .
Let us briefly explain the previous two points in more detail. The first point above relies on a remark made by [29]. Notice that
Y t + h = Y t f ( h ) + u t f ( h )
where u t f h is the forecast error.
Therefore,
V Y t + h = V Y t f ( h ) + 2 C o v u t f ( h ) , Y t f ( h ) + V ( u t f ( h ) )
When forecasts are unbiased, we have that
E Y t + h = E [ Y t f ( h ) ]
So
E Y t + h Y t f ( h ) = E u t f ( h ) = 0
C o v Y t f ( h ) , u t f ( h ) = E Y t f ( h ) u t f ( h )
V u t f ( h ) = E u t f ( h ) 2
Therefore,
V Y t + h = V Y t f h + 2   E Y t f h u t f h + E u t f h 2
So, if auto-efficiency holds,
V Y t + h = V Y t f h + M S P E ( h )
M S P E ( h ) E u t f ( h ) 2
and there is an inverse relationship between M S P E ( h ) and the “explained” variance of model V Y t f h . Nevertheless, if auto-efficiency does not hold, we can have reductions in M S P E ( h ) that are associated with reductions in the “explained” variance of the model as well, which is counterintuitive.
Point (2) indicates that, when auto-inefficiency holds true, the forecast itself contains information that could be used to predict its own forecast errors. We could then build a linear model for the forecast errors as follows:
u t f ( h ) Y t + h Y t f h = α + β Y t f h + u t * ( h )
E ( u t * ( h ) ) = E ( Y t f h u t * ( h ) ) = 0
which define the following coefficients:
β C o v Y t f h , u t f ( h ) V Y t f h ; α E u t f ( h ) β E Y t f h
We could build a revised forecast Y t f r h as follows:
Y t f r h = α + 1 + β Y t f h
Y t f r h = E u t f ( h ) β E Y t f h + 1 + β Y t f h
Y t f r h = Y t f h + E u t f ( h ) + β Y t f h E Y t f h
When forecasts are unbiased, we have E u t f ( h ) = 0 and, consequently,
Y t f r h = Y t f h + β Y t f h E Y t f h
Therefore, β provides information regarding the need for shrinkage or an upscale adjustment in the term Y t f h E Y t f h .
The new forecast error would be
Y t + h Y t f r h = Y t + h α 1 + β Y t f h = Y t + h Y t f h α β Y t f h = u t f ( h ) α + β Y t f h = u t * ( h )
Interestingly,
E u t * ( h ) 2 < E u t f ( h ) 2
This is so, as long as α + β Y t f h 0 . In fact,
E u t * ( h ) 2 = E u t f ( h ) α + β Y t f h 2 = E u t f ( h ) 2 + E α + β Y t f h 2 2 E [ u t f ( h ) α + β Y t f h ] = E u t f ( h ) 2 + E α + β Y t f h 2 2 E α + β Y t f h + u t * ( h ) α + β Y t f h = E u t f ( h ) 2 + E α + β Y t f h 2 2 E α + β Y t f h 2 2 E u t * ( h ) α + β Y t f h = E u t f ( h ) 2 E α + β Y t f h 2
Notice that even if the original forecast is unbiased, we may have α 0 . This is so because
α E u t f ( h ) β E Y t f h
So, in the unbiased case,
α β E Y t f h
And α may still be different from zero, due to the auto-inefficiency of Y t f h , as long as
E Y t + h 0
In the next subsection, we describe the assumptions that we need to obtain our main results.

2.4. Assumptions

We are interested in an environment characterized by
  • One target variable Y t + h .
  • Two forecasts Y 1 , t ( h ) and Y 2 , t ( h ) , such that
    M S P E 2 ( h ) E u 2 , t ( h ) 2 M S P E 1 ( h ) E u 1 , t ( h ) 2
    E u 1 , t ( h ) = E u 2 , t ( h ) = 0
  • Combination gains do exist in a region of the open set (0, 1). In other words,
    E u 1 , t ( h ) u 2 , t ( h ) u 2 , t ( h ) < 0
    We also make use of the following assumption.
  • The following vector
    W t = Y 1 , t ( h ) Y 2 , t ( h ) Y t + h 3 x 1
and is weakly stationary and ergodic, with a positive definite variance–covariance matrix, V. Notice that we have explicitly removed the horizon “h” from W t . This is just to emphasize that the relevant sub index is t, as we are considering h as a fixed horizon relative to t.
From now on, we use a simpler notation. Let Y ˜ = Y t + h . This is our target variable to forecast. We also drop the “t” subscript from Y 1 , t h , Y 2 , t h , and Y t c h to further simplify the notation. We do the same with the corresponding forecast errors, so they look as follows:
u 1 ( h ) = Y ˜ Y 1 ( h ) ,   u 2 ( h ) = Y ˜ Y 2 ( h )
u c ( h ) = λ u 1 ( h ) + 1 λ   u 2 ( h ) = λ u 1 ( h )   u 2 ( h ) +   u 2 ( h )
Notice that a direct implication of a positive definite variance–covariance matrix V is that
E Y 1 ( h ) Y 2 ( h ) 2 > 0
Otherwise, we would have
0 = E Y 1 ( h ) Y 2 ( h ) 2 = V Y 1 ( h ) Y 2 ( h ) + E Y 1 ( h ) Y 2 ( h ) 2
and, due to the assumption of unbiased forecasts, we would have
V Y 1 ( h ) Y 2 ( h ) = 0
This means that
V Y 1 ( h ) + V Y 2 ( h ) = 2 C o v ( Y 1 ( h ) , Y 2 ( h ) )
Let us consider V 12 to be defined as the variance covariance matrix of sub-vector
Y 1 ( h ) Y 2 ( h ) 2 x 1
Then, it has to be the case that V 12 is a positive definite matrix as well. Nevertheless
1 1 V 12 1 1 = V Y 1 ( h ) 2 C o v Y 1 ( h ) , Y 2 ( h ) + V Y 2 ( h ) = 0
which is a contradiction to the fact that V 12 is positive and definite.
Remark 1.
Expression (25) is important. It says that  Y 1 ( h ) and Y 2 ( h )  are not exactly the same probabilistic forecasts. They are distinct in a meaningful sense, so we combine forecasts embedding different information.
We have displayed the basic econometric framework and simplified notation that we are using in this paper. In the next section, we show the main results of the article.

3. Main Theoretical Results

One of the main points of this paper is to show that traditional weighted averages of forecasts are almost always auto-inefficient. In fact, the next proposition shows that the majority of forecast combinations with λ ∈ (0, 1) are auto-inefficient. Previous to that though, notice that with straightforward algebra the auto-efficiency of the combined forecast could be expressed in the following way:
E Y c h u c h = λ 2 E Y 1 h Y 2 h 2 + λ E Y 1 h Y 2 h u 2 h Y 2 h + E Y 2 h u 2 h
It would be also useful to express (27) as follows:
E Y c h u c h = λ 2 E Y 1 ( h ) u 1 ( h ) + 1 λ 2 E Y 2 ( h ) u 2 ( h ) + λ 1 λ E Y 1 ( h ) u 2 ( h ) + Y 2 ( h ) u 1 ( h )
The proofs of these expressions are in the Appendix A.

3.1. Auto-Inefficiency of Forecast Combinations

Proposition 2.
Let  Y ˜  denote a target variable and  Y 1 h ,   Y 2 ( h )   d e n o t e  two forecasts for  Y ˜ , such that assumptions 1–4 hold true. Then, there are at most two different combinations  λ 1 , λ 2 ( 0 , 1 ) , for which the combined forecast is auto-efficient.
Proof. 
Let us consider the expected value of the combined forecast times its forecast error:
E Y c h u c h = E λ Y 1 h Y 2 ( h ) + Y 2 ( h ) [ λ u 1 ( h ) u 2 ( h ) + u 2 ( h ) ]
This expression defines the quadratic form in (27):
E Y c h u c h = λ 2 E Y 1 h Y 2 ( h ) 2 + λ E Y 1 h Y 2 ( h ) u 2 ( h ) Y 2 ( h ) + E Y 2 ( h ) u 2 ( h )
Let us recall that we already showed that the following expression (25) holds true:
E Y 1 h Y 2 ( h ) 2 > 0
This means that E Y c h u c h is a strictly concave quadratic form and, of course, is different from the zero function. As a consequence, E Y c h u c h has, at most, two real roots that may or may not lie within the (0,1) interval, so it may be the case that every single combination is auto-efficient. In any case, at most, two combinations are auto-efficient. □
Corollary 1.
Under Assumptions 1–4, if we further assume that both individual forecasts are auto-efficient, then any weighted average of the forecasts display a positive auto-inefficiency.
Proof. 
We already saw that
0 < E Y 1 h Y 2 ( h ) 2
Notice that
0 < E Y 1 h Y 2 ( h ) 2 = E Y 1 h Y 2 ( h ) [ u 2 ( h ) u 1 ( h ) ]
0 < E Y 1 h Y 2 ( h ) 2 = E [ Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h ) u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
0 < E Y 1 h Y 2 ( h ) 2 = E [ Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
Using expression (27),
E Y c h u c h = λ 2 E Y 1 h u 1 h + 1 λ 2 E Y 2 h u 2 h + λ 1 λ E Y 1 h u 2 ( h ) + Y 2 ( h )   u 1 ( h )
Given that the two individual forecasts are auto-efficient,
E Y 1 h u 1 ( h ) = E Y 2 ( h ) u 2 ( h ) = 0
so we have
E Y c h u c h = 1 λ λ E Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) > 0     λ 0 , 1
 □
Proposition 2 shows that most of the possible forecast combinations are auto-inefficient. The previous corollary shows a particular case ensuring that all possible combinations within (0, 1) are auto-inefficient, even if both individual forecasts are auto-efficient. This is important because disregarding this result may lead to incorrect conclusions when analyzing tests based upon aggregated information from surveys. The finding that a combination of forecasts is inefficient does not imply that individual forecasts share this property.
The next proposition provides more general conditions under which every single combination in (0, 1) is auto-inefficient.
Proposition 3.
For  Y ˜ ,  Y 1 h , and  Y 2 ( h ) , as in Proposition 2, let us assume that
E Y 1 h u 1 ( h ) 0
E Y 2 ( h ) u 2 ( h ) 0
Then, for every single combination λ ( 0 , 1 ) , the combined forecast displays auto-inefficiency.
Proof. 
Using
0 < E Y 1 h Y 2 ( h ) 2 = E Y 1 h Y 2 ( h ) [ u 2 ( h ) u 1 ( h ) ]
0 < E Y 1 h Y 2 ( h ) 2 = E [ Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h ) u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
we obtain
0 < E Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) E [ Y 1 h u 1 ( h ) + Y 2 ( h ) u 2 ( h ) ]
Therefore,
E Y 1 h u 1 ( h ) + Y 2 ( h ) u 2 ( h ) < E [ Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
But, because we are assuming that (31) and (32) hold true, we have that the left-hand-side in the previous inequality is greater than or equal to zero:
0 E Y 1 h u 1 h + Y 2 h u 2 h < E [ Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
Therefore,
0 < E [ Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) ]
Again using expression (27),
E Y c h u c h = λ 2 E Y 1 h u 1 ( h ) + 1 λ 2 E Y 2 ( h ) u 2 ( h ) + λ 1 λ E Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h )
we conclude that
E Y c h u c h = λ 2 E Y 1 h u 1 h + 1 λ 2 E Y 2 h u 2 h + + λ 1 λ E Y 1 h u 2 ( h ) + Y 2 ( h ) u 1 ( h ) > 0     λ ( 0 , 1 )
 □
It is important to remark that conditions (31) and (32) may be tested using simple tests or a strategy based on the “reality check” of [30], which is also used in [31].
The next proposition explores if, by any chance, the optimal forecast combination satisfies some special condition of auto-efficiency.
Proposition 4.
For  Y ˜ ,  Y 1 h , and  Y 2 ( h ) , as in Proposition 3, we can find a unique  λ * ( 0 , 1 )  such that
λ * = A r g min λ ( 0 , 1 ) E λ u 1 ( h ) + 1 λ u 2 ( h ) 2
For this  λ * , we also have
E Y c h u c h | λ * = λ * E Y 2 ( h ) u 1 ( h ) + 1 λ * E [ Y 2 ( h ) u 2 ( h ) ]
Proof. 
E λ u 1 ( h ) + 1 λ u 2 ( h ) 2 = λ 2 E u 1 ( h ) u 2 ( h ) 2 + E u 2 ( h ) 2 + 2 λ E u 1 ( h ) u 2 ( h ) u 2 ( h )
This is a strictly convex quadratic function that admits a unique global minimum defined by the following first order conditions:
2 λ E u 1 ( h ) u 2 ( h ) 2 + 2 E u 1 ( h ) u 2 ( h ) u 2 ( h ) = 0
This equation is solved by
λ * = E [ u 1 ( h ) u 2 ( h ) ] [ u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2
Now, let us recall that by assumption 2
0 E u 1 2 ( h ) u 2 2 ( h ) = E [ u 1 ( h ) u 2 ( h ) ] [ u 1 ( h ) + u 2 ( h ) ]
Therefore,
0 E u 1 2 ( h ) u 2 2 ( h ) = E u 1 ( h ) u 2 ( h ) u 1 ( h ) u 2 ( h ) + 2 u 2 ( h ) = E u 1 ( h ) u 2 ( h ) 2 + 2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ]
So, dividing by E u 1 ( h ) u 2 ( h ) 2 > 0, we have
0 E u 1 ( h ) u 2 ( h ) 2 + 2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2
Or
1 + 2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2 0
1 2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2 0
2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2 1
By the assumption of combination gains, we have
E u 1 ( h ) u 2 ( h ) u 2 ( h ) < 0
Therefore,
0 < 2 E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2 1
Or
0 < λ * = E u 2 ( h ) [ u 1 ( h ) u 2 ( h ) ] E u 1 ( h ) u 2 ( h ) 2 1 2 < 1
This is an interesting result. When combinations gains do exist, the optimal combination is always achieved within the (0, 1) interval, and, moreover, the optimal combination assigns a higher weight to the most accurate individual forecast.
Now, let us find an expression for E Y c h u c h | λ * :
E Y c h u c h | λ * = λ * 2 E Y 1 h Y 2 ( h ) 2 + λ * E Y 1 h Y 2 ( h ) u 2 ( h ) Y 2 ( h ) + E Y 2 ( h ) u 2 ( h ) = λ * 2 E u 2 ( h ) u 1 ( h ) 2 λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) Y 2 ( h ) + E Y 2 ( h ) u 2 ( h ) = λ * λ * E u 2 ( h ) u 1 ( h ) 2 λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) Y 2 ( h ) + E Y 2 ( h ) u 2 ( h ) = λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) Y 2 ( h ) + E [ Y 2 ( h ) u 2 ( h ) ] = λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) λ * E u 1 ( h ) u 2 ( h ) u 2 ( h ) + λ * E u 1 ( h ) u 2 ( h ) Y 2 ( h ) + E [ Y 2 ( h ) u 2 ( h ) ] = λ * E u 1 ( h ) u 2 ( h ) Y 2 ( h ) + E [ Y 2 ( h ) u 2 ( h ) ] = λ * E Y 2 ( h ) u 1 ( h ) + 1 λ * E Y 2 ( h ) u 2 ( h )
This last expression indicates that the optimal combination may or may not be auto-efficient. Notice that if
E Y 2 ( h ) u 1 ( h )   and   E Y 2 ( h ) u 2 ( h )
share the same sign, then there is no way for the optimal combination to display auto-efficiency, so it is auto-inefficient as well. □

3.2. Size of the Auto-Inefficiency in the Forecast Combinations

Let us consider the following notation:
S = E Y 1 h Y 2 ( h ) Y ˜ = E u 1 ( h ) u 2 ( h ) Y ˜
Let us also recall that
E u c h 2 = λ 2 E u 1 ( h ) u 2 ( h ) 2 + 2 λ E u 1 ( h ) u 2 ( h ) u 2 ( h ) + E u 2 ( h ) 2
E Y c h u c h = λ 2 E Y 1 h Y 2 ( h ) 2 + λ E Y 1 h Y 2 ( h ) u 2 Y 2 ( h ) + E [ Y 2 ( h ) u 2 ( h ) ]
Notice that
E u c h 2 + E Y c h u c h = E u 2 ( h ) 2 + E Y 2 ( h ) u 2 ( h ) + 2 λ E u 1 ( h ) u 2 ( h ) u 2 ( h ) + λ E Y 1 h Y 2 ( h ) [ u 2 ( h ) Y 2 ( h ) ] = E u 2 ( h ) 2 + E Y 2 ( h ) u 2 ( h ) + 2 λ E u 1 ( h ) u 2 ( h ) u 2 ( h ) λ E u 1 ( h ) u 2 ( h ) Y ˜ + 2 λ E [ u 1 ( h ) u 2 ( h ) ] [ Y 2 ( h ) ] = E u 2 ( h ) 2 + E Y 2 ( h ) u 2 ( h ) + 2 λ E u 1 ( h ) u 2 ( h ) Y ˜ λ E u 1 ( h ) u 2 ( h ) Y ˜ = E u 2 ( h ) 2 + E Y 2 ( h ) u 2 ( h ) + λ E u 1 ( h ) u 2 ( h ) Y ˜
Therefore, we obtain
E u c h 2 + E Y c h u c h = E u 2 ( h ) 2 + E Y 2 ( h ) u 2 ( h ) + λ E u 1 ( h ) u 2 ( h ) Y ˜
or
E u c h 2 E u 2 ( h ) 2 = E Y c h u c h E Y 2 ( h )   u 2 ( h ) + λ E u 1 ( h ) u 2 ( h ) Y ˜
or
E u c h 2 E u 2 ( h ) 2 + E Y c h u c h E Y 2 ( h ) u 2 ( h ) = λ S
which could be written as
C G λ + A E G λ = λ S
where AEG stands for auto-efficiency gains (or losses) relative to the auto-efficient status of forecast 2. The first term in the left-hand side is a strictly convex quadratic form with a zero root. The second term in the left-hand side is a strictly concave quadratic form with a zero root. If the S term is zero (S = 0), then the two quadratic forms would be the same but with opposite signs. This implies that movements along the quadratic forms are totally compensated for. When S 0, this is not the case, and movements along the quadratic forms are not totally compensated for: they are only partially compensated for. The size of this S term is indeed key for combination gains to be totally or just partially compensated by changes in auto-efficiency.

3.3. Simulated Examples

In this subsection, we illustrate with simple simulated examples the main results shown in the previous sections. In all these cases, we impose the existence of combination gains and explore if auto-efficiency is satisfied or not with one-step-ahead forecasts. Let us consider the three-dimensional vector
W t = Y t + 1 X t Z t 3 × 1 N ( 0 , Ω )
We consider the following three cases for the matrix Ω :
Ω 1 = 1.60 0.60 0.75 0.60 0.70 0.25 0.75 0.25 0.90 ;   Ω 2 = 1.60 0.60 0.75 0.60 0.60 0.25 0.75 0.25 0.75
Ω 3 = 2.50 1.125 1.25 1.125 2.00 0.25 1.25 0.25 2.25 ;  
All these matrices are symmetric, definite, and positive, as their eigenvalues are (2.33, 0.33, 0.54); (2.28, 0.25, 0.42); and (4.13, 0.76, 1.87), respectively. We use two different forecasts for Y t + 1 :
Y 1 , t f X t
Y 2 , t f Z t
The forecast errors are given by
u 1 , t = Y t + 1 X t
u 2 , t = Y t + 1 Z t
Let us analyze the case in which we have Ω ( 1 ) . The respective MSPE and mean squared forecasts (MSF) are
M S P E Y 1 , t f = E Y t + 1 X t 2 = E Y t + 1 2 + E X t 2 2 E Y t + 1 X t = 1.6 + 0.7 1.2 = 1.1
M S P E Y 2 , t f = E Y t + 1 Z t 2 = E Y t + 1 2 + E Z t 2 2 E Y t + 1 Z t = 1.6 + 0.9 1.5 = 1
M S F Y 1 , t f = E X t 2 = 0.7
M S F Y 2 , t f = E Z t 2 = 0.9
So, clearly, forecast 2 is more accurate than forecast 1 in terms of MSPE and displays higher MSF. Nevertheless, forecast 1 is not encompassed by forecast 2, so combination gains are possible. This is so because
E u 1 , t u 2 , t u 2 , t = E Y t + 1 X t Y t + 1 Z t Y t + 1 Z t = E Z t X t Y t + 1 Z t = E Z t Y t + 1 E Z t 2 E X t Y t + 1 + E Z t X t = 0.75 0.90 0.6 + 0.25 = 0.5 < 0
From Scheme 2, we show that most combinations in (0, 1) display reductions in MSPE compared to the best performing individual forecast. At the same time, most of these combinations are auto-inefficient. In particular, both the individual forecasts and the optimal combination, achieved by λ * = 0.45 , are inefficient.
Let us analyze the case in which we have Ω ( 2 ) . The respective MSPE and mean squared forecasts (MSF) are
M S P E Y 1 , t f = E Y t + 1 X t 2 = E Y t + 1 2 + E X t 2 2 E Y t + 1 X t = 1.6 + 0.6 1.2 = 1
M S P E Y 2 , t f = E Y t + 1 Z t 2 = E Y t + 1 2 + E Z t 2 2 E Y t + 1 Z t = 1.6 + 0.75 1.5 = 0.85
M S F Y 1 , t f = E X t 2 = 0.6
M S F Y 2 , t f = E Z t 2 = 0.75
So, again, forecast 2 is more accurate than forecast 1 in terms of MSPE and displays higher MSF. As in the previous case, forecast 1 is not encompassed by forecast 2, so combination gains are possible. This is so because
E u 1 , t u 2 , t u 2 , t = E Y t + 1 X t Y t + 1 Z t Y t + 1 Z t = E Z t X t Y t + 1 Z t = E Z t Y t + 1 E Z t 2 E X t Y t + 1 + E Z t X t = 0.75 0.75 0.6 + 0.25 = 0.35 < 0
Different from the Ω ( 1 ) case (see Scheme 2), we now have that both individual forecasts are auto-efficient:
E u 1 , t X t = E Y t + 1 X t X t = E Y t + 1 X t E X t 2 = 0.6 0.6 = 0
E u 2 , t Z t = E Y t + 1 Z t Z t = E Y t + 1 Z t E Z t 2 = 0.75 0.75 = 0
Yet, we clearly see from Scheme 3 that all the linear convex combinations of both individual forecasts are inefficient. In particular, the optimal combination in the (0, 1) interval is also inefficient.
For Ω (3), we have a slightly different situation, in which both individual forecasts are inefficient:
E u 1 , t X t = E Y t + 1 X t X t = E Y t + 1 X t E X t 2 = 1.125 2 = 0.875
E u 2 , t Z t = E Y t Z t Z t = E Y t + 1 Z t E Z t 2 = 1.250 2.250 = 1
Both forecasts display the same MSPE (2.25), combinations gains are possible, and the MSPE are minimized for λ * = 0.5 :
M S P E Y 1 , t f = E Y t + 1 X t 2 = E Y t + 1 2 + E X t 2 2 E Y t + 1 X t = 2.5 + 2 2.25 = 2.25
M S P E Y 2 , t f = E Y t + 1 Z t 2 = E Y t + 1 2 + E Z t 2 2 E Y t + 1 Z t = 2.5 + 2.25 2.5 = 2.25
λ * = E u 1 , t u 2 , t u 2 , t E u 1 , t u 2 , t 2 = E X t Z t Y t + 1 E X t Z t Z t E X t Z t 2
λ * = 1.125 1.25 0.25 + 2.25 2 + 2.25 0.5 = 1.875 3.75 = 0.5
The most important feature of the Ω (3) case is that all the linear convex combinations of both individual forecasts are inefficient, with the two exceptions of both the real roots of expression (27), which, in this case, are 0.5 and 0.533. Yet the optimal combination in terms of MSPE is exactly one of the previous roots: 0.5.
E Y t c u t c = λ 2 E Y 1 , t f u 1 , t + 1 λ 2 E Y 2 , t f u 2 , t + λ 1 λ E Y 1 , t f u 2 , t + Y 2 , t f u 1 , t > 0     λ ( 0 , 1 )
E Y t c u t c λ = 0.5 = 0.25 0.875 + 0.25 1 + 0.25 0.875 + 1 = 0
So, here, we have a case with almost all the combinations being inefficient, with only two exceptions, one of them being the optimal combination. Scheme 4 shows what is occurring in this case.
Scheme 2 and Scheme 4 correspond to virtuous examples. In Scheme 2, we can see that for low values of λ we achieve both reductions in the MSPE and reductions in auto-inefficiency (in absolute value), with respect to the more accurate individual forecast. Scheme 3 is not virtuous, in the sense that reductions in the MSPE are associated with increments in the auto-inefficiency status of the combined forecast.

4. Empirical Illustration

In this section, we illustrate the relevance of our results with two different exercises aimed at forecasting headline inflation in eight different European economies: Austria, Belgium, the Czech Republic, Denmark, Estonia, Germany, Greece, and the United Kingdom. Our first approach is inspired by the previous work in [32], which compares the predictive ability of two different forecasting models: a model built with a local core inflation measure and a model built with an international inflation measure. As reported in the vast literature, the “food” and “energy” components in CPI tend to be highly volatile; hence, removing these components may improve the predictive ability of a model. See [33,34] for supporters of the “core predicts headline” argument and [35] for a comprehensive discussion. The idea that international inflation is closely related to domestic inflation comes from a seminal paper [36], which shows that there is an important pass-through from some measures of international inflation to local inflation. See [37,38] for a comprehensive analysis of this predictive ability in OECD economies.
Our second exercise is similar to the first one, but, now, we compare the predictive ability of a model built with an international inflation measure and a model built with a measure of output gap. In what follows, our target variable is the year-on-year CPI headline inflation for our sample of countries. We download our data from the OECD database (https://stats.oecd.org/, 25 July 2023) at the monthly frequency, from January 2000 to December 2022. For the construction of our measure of the output gap, we use the “Production and Sales” seasonally adjusted series available at the monthly frequency in the OECD database.
Let π t be the year-on-year headline inflation rate for a given country at time t, π t c o r e be the corresponding core inflation, and π t I n t be the arithmetic average of headline inflation of all OECD economies (as a proxy of an international factor). Then, in our first exercise, the two competing forecasting models are
π t + 1 = c + θ π t + γ π t c o r e + e 1 , t + 1     ( M o d e l   1 )
π t + 1 = α + ρ π t + β π t I n t + e 2 , t + 1   ( M o d e l   2 )
where e 1 , t + 1 and e 2 , t + 1 are simply error terms. In this exercise, we only focus on one-step-ahead forecasts. To conduct our out-of-sample analysis, we consider an initial estimation window of R = 50 observations to estimate the parameters of Models 1 and 2. Given that the total number of observations for each country is T = 276, we are left with P = 226 observations to evaluate our forecasts (T = R + P). To update the estimates of our parameters, we consider a rolling window of size R = 50 . To keep the notation simple, let I n t _ F o r e c a s t t be the forecast constructed with Model 2 at time “t” for π t + 1 and C o r e _ F o r e c a s t t be its counterpart constructed with Model 1.
Additionally, we consider the optimal combination between both forecasts. The optimal weight assigned to each forecast is determined by expression (35), and the combined forecast is constructed as follows:
C o m b i n e d   F o r e c a s t = λ * C o r e _ F o r e c a s t t + 1 λ * I n t _ F o r e c a s t t  
Table 1 next reports that (i) the optimal weight λ * assigned to C o r e _ F o r e c a s t t , in order to minimize the MSPE; (ii) the MSPE of both forecasts ( I n t _ F o r e c a s t t and C o r e _ F o r e c a s t t ) and the MSPE of the optimal combination; (iii) the inefficiency of the optimal combination forecast, reported as two times the covariance between the forecast and its respective forecasting error; and (iv) the ratio between the MSPE of the optimal forecast combination and the MSPE of Model 2 (notice that Model 2 produces forecasts with fewer MSPE than Model 1 in all eight cases, with no exceptions whatsoever. For this reason, the ratio is exclusively compared with Model 2. Of course, the combination gains are higher if compared with Model 1).
Two features of Table 1 are worth mentioning. First, the optimal combination forecast is statistically inefficient in all eight cases, at the 10% significance level, with no exceptions whatsoever. Moreover, the null hypothesis of efficiency is rejected at the tighter 1% significance level in five out of eight cases. Second, the optimal combination always improves the accuracy of the forecasts, yet the size of the improvement is heterogeneous. For instance, the ratio M S P E o p t i m a l   c o m b i n a t i o n M S P E   M o d e l   2 is 0.793 for the Czech Republic, indicating impressive gains by considering the combined forecast. In contrast, the average MSPE ratio for Austria, Denmark, and Estonia is just 0.98, suggesting very limited gains for those economies.
Scheme 5 shows the MSPE and auto-inefficiency curves for Belgium and the United Kingdom. This illustrates, with real data, one of the main points of our paper: the optimal combination in terms of the MSPE turns out to be highly inefficient; in fact, in both figures, when approaching the optimal ( 1 λ * ) from the left, we observe that the combined MSPE decreases (left axis), while the inefficiency of the combined forecast simultaneously increases in absolute terms (right axis).
For our second exercise, we combine forecasts coming from a model built with an international inflation measure (our previous model 2) with forecasts coming from the following third model
π t + 1 = a + b π t + ϕ ( ln Y t ln Y t * ) + e 3 , t + 1   ( M o d e l   3 )
where Y t is the output, and Y t * is the potential output (or output trend); hence, ln Y t ln Y t * is just an estimate of the output gap. We construct Y t * with the Hodrick–Prescott filter with a smoothing parameter of 14,400, as traditionally suggested for monthly data. Notice that Model 3 is similar to Model 1 but uses the output gap as a predictor instead of the core inflation. Table 2 is akin to Table 1, but it compares forecasts coming from Model 3 with forecasts from our previous Model 2.
Just like we do in our first exercise, now we consider the optimal combination between both forecasts, where the optimal weight is determined using expression (35):
C o m b i n e d   F o r e c a s t = λ * O u t p u t _ G a p _ F o r e c a s t t + 1 λ * I n t _ F o r e c a s t t
Interestingly, Table 2 shows mixed results between both individual forecasts in terms of accuracy: in five out of eight cases, Model 3 exhibits fewer MSPE than Model 2. This result is in sharp contrast with Table 1, in which Model 2 displays the least MSPE in all cases, with no exceptions whatsoever. For this reason, instead of reporting M S P E o p t i m a l   c o m b i n a t i o n M S P E   M o d e l   2 , in Table 2 we report M S P E o p t i m a l   c o m b i n a t i o n M S P E   B e s t   m o d e l , where the M S P E   B e s t   m o d e l are just the MSPE of the best individual model (either Model 3 or Model 2).
Notably, the main messages of Table 1 remain the same here: i) the forecast built as the optimal combination is statistically inefficient in most cases (seven out of eight), at least at the 5% significance level, and ii) combinations always improve the MSPE of the forecasts, although these combinations gains tend to be smaller in this second exercise, with a ratio M S P E o p t i m a l   c o m b i n a t i o n M S P E   B e s t   m o d e l ranging from 0.01% to 2.68%.
We acknowledge that there is some debate about how to estimate the potential output Y t * . For instance, [39] criticizes the use of the Hodrick–Prescott (HP) filter and proposes a regression-based strategy. In contrast, [40] compares the HP filter with Hamilton’s approach, providing support for the former relative to the latter. See [41,42] for more discussion on this topic. An interesting approach is mentioned in [43], which proposes a more aggressive smoothing parameter for the HP filter. Table 3 is akin to Table 2, but this time using the filter suggested by Ravn and Uhlig for monthly data (a smoothing parameter of 129,600). While the optimal combinations here are different compared to those in Table 2, the main message of our paper remains the same, as seven out of eight of these optimal forecast combinations are auto-inefficient as well.
Finally, in Appendix A.2, we show the results of our first forecasting exercise when applied to an extended set of an additional 19 OECD economies. These results corroborate those presented in this section, as, in the great majority of the cases, combination gains are also associated with auto-inefficiency.

5. Summary and Conclusions

It is well-known that the weighted averages of two competing forecasts may reduce the mean squared prediction errors (MSPE). It is less known, however, that forecast combinations may be inefficient. In this paper, we take an in-depth view of one particular type of problem stemming from simple combination schemes. This problem is called forecast auto-inefficiency and refers to the notion of inefficiency analyzed by [28].
Under mild assumptions, we show that linear convex forecast combinations are almost always auto-inefficient, and, therefore, room for accuracy improvement is almost always possible. We also identify testable conditions under which every linear convex combination of two forecasts is auto-inefficient. In particular, we show that the process of taking averages of forecasts may induce inefficiencies in the combination, even when individual forecasts are auto-efficient. The extent to which these inefficiencies are indeed relevant requires a case-by-case analysis. Nonetheless, it is striking that, in many applications in which a number of different forecasts are available, the combination of all of them seems to be the last step in the search of forecast accuracy, and no attempt to take advantage of the potential inefficiencies stemming from the combination process is pursued. Articles exploring the benefits of forecast combinations in terms of accuracy but totally disregarding their potential inefficiencies are widespread across time and topics. Some interesting and recent examples are [44], when forecasting stock market volatility; [45], when forecasting stock market returns; [46], when forecasting oil prices; and [47], when forecasting several macroeconomic variables in eight european economies.
We illustrate our findings with an empirical application in which two different forecasts for headline CPI inflation are combined. We mainly focus on the cases of eight European economies, but, in the Appendix A, we generalize our empirical results to a broader set by adding 19 OECD economies. We show that gains from combination are achieved in all of our exercises. More interestingly, we also show that, in 89% of all our different cases, the optimal combination displays a statistically significant Mincer–Zarnowitz inefficiency, which is consistent with our simple theoretical derivations.
Regarding limitations and avenues for future research, there are three things worth mentioning. First, it would be interesting to extend our derivations and main results for the general case of n > 2 competing forecasts. Second, an obvious extension of our work is to explore the implications of relaxing some of our assumptions over the forecasts and the target variable (e.g., unbiased forecasts, weak stationarity, and ergodicity). Finally, we remain silent on how to exploit these inefficiencies in a real-time exercise. Given the results reported in our paper, an exciting avenue for future research is to study how to improve the accuracy of forecast combinations by taking advantage of the already reported inefficiencies in real-time exercises, when only a low number of forecast errors might be available.

Author Contributions

Conceptualization, P.P.-B. and A.B.; Methodology, P.P.-B. and N.H.; Software, N.H.; Formal analysis, P.P.-B. and N.H.; Data curation, N.H.; Writing—original draft, P.P.-B. and A.B.; Writing—review & editing, P.P.-B. and A.B.; Visualization, N.H.; Supervision, P.P.-B.; Project administration, P.P.-B. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by FONDECYT grant number 11080109, and the APC was partially funded by Universidad Adolfo Ibáñez.

Data Availability Statement

Data can be found here https://stats.oecd.org.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Expressions (27) and (28) in Section 3

Proof. 
Let us derive expression (27):
E Y c h u c h = E λ Y 1 h + 1 λ Y 2 ( h )   λ u 1 ( h ) + 1 λ u 2 ( h )
E Y c h u c h = E λ Y 1 h Y 2 ( h )   + Y 2 ( h )   [ λ u 1 ( h ) u 2 ( h ) + u 2 ( h ) ] = λ 2 E Y 1 h Y 2 ( h )   u 1 ( h ) u 2 ( h ) + λ E Y 1 h Y 2 ( h )   u 2 ( h ) + λ E u 1 ( h ) u 2 ( h ) Y 2 ( h )   + E [ Y 2 ( h )   u 2 ( h ) ]
Notice that
u 1 ( h ) u 2 ( h ) = Y ˜ Y 1 , t h Y ˜ Y 2 ( h ) = ( Y 1 h Y 2 ( h ) )
Therefore,
E Y c h u c h = λ 2 E Y 1 h Y 2 ( h )   2 + λ E Y 1 h Y 2 ( h )   u 2 ( h ) λ E Y 1 h Y 2 ( h )   Y 2 ( h )   + E [ Y 2 ( h )   u 2 ( h ) ]
So, finally, we have
E Y c h u c h = λ 2 E Y 1 h Y 2 ( h )   2 + λ E Y 1 h Y 2 ( h )   u 2 ( h ) Y 2 ( h )   + E [ Y 2 ( h )   u 2 ( h ) ]
which is the desired result.
Let us derive the second expression. From (27), we obtain
E Y c h u c h = λ 2 E Y 1 h Y 2 ( h )   u 2 ( h ) u 1 ( h )   + λ E Y 1 h Y 2 ( h )   u 2 ( h ) Y 2 ( h )   + E Y 2 ( h )   u 2 ( h ) = λ 2 E Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h )   u 2 ( h ) + Y 2 ( h )   u 1 ( h )   + λ E Y 1 h Y 2 ( h )   u 2 ( h ) λ E Y 1 h Y 2 ( h )   Y 2 ( h )   + E Y 2 ( h )   u 2 ( h ) = λ 2 E Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h )   u 2 ( h ) + Y 2 ( h )   u 1 ( h )   + λ E Y 1 h Y 2 ( h )   u 2 ( h ) λ E u 2 ( h ) u 1 ( h )   Y 2 ( h )   + E Y 2 ( h )   u 2 ( h ) = λ 2 E Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h )   u 2 ( h ) + Y 2 ( h )   u 1 ( h )   + λ E Y 1 h u 2 ( h ) λ E Y 2 ( h )   u 2 ( h ) λ E Y 2 ( h )   u 2 ( h ) + λ E u 1 ( h )   Y 2 ( h )   + E Y 2 ( h )   u 2 ( h ) = λ 2 E Y 1 h u 2 ( h ) Y 1 h u 1 ( h ) Y 2 ( h )   u 2 ( h ) + Y 2 ( h )   u 1 ( h )   + λ E Y 1 h u 2 ( h ) + λ E u 1 ( h )   Y 2 ( h )   + E Y 2 ( h )   u 2 ( h ) ( 1 2 λ )
Therefore,
E Y c h u c h = λ 2 E u 1 ( h )   Y 1 h + λ 2 2 λ + 1 E Y 2 ( h )   u 2 ( h ) + λ λ 2 E [ Y 1 h u 2 ( h ) + Y 2 ( h )   u 1 ( h )   ] = λ 2   E u 1 ( h )   Y 1 h + 1 λ 2 E Y 2 ( h )   u 2 ( h ) + λ ( 1 λ ) E [ Y 1 h u 2 ( h ) + Y 2 ( h )   u 1 ( h )   ]
which is expression (28), the desired result. □

Appendix A.2. Empirical Illustration with an Extended Dataset

Table A1. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with core and a model with international inflation. Extended results for 27 OECD economies.
Table A1. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with core and a model with international inflation. Extended results for 27 OECD economies.
AustriaBelgiumCanadaColombiaCzech
Republic
DenmarkEstonia
λ*0.2550.2810.4900.0590.4430.1640.330
MSPE (International)0.0940.2290.2130.1020.3640.1060.872
MSPE (Core)0.1110.2760.2140.2710.4080.1390.934
Ratio MSPEopt/MSPEint0.9760.9630.9280.9940.7930.9870.977
MSPE Optimal Combination0.0920.2200.1980.1010.2880.1040.852
Inefficiencies (Opt. Combination)−0.145 **−0.633 ***−0.134−0.384 ***−0.476 *−0.364 ***−4.300 ***
FinlandFranceGermanyGreeceIrelandIsraelKorea
λ*0.1410.1340.3210.3070.1950.1700.405
MSPE (International)0.1280.0840.1730.3840.2480.2090.161
MSPE (Core)0.1720.2040.2000.5330.5190.7250.172
Ratio MSPEopt/MSPEint0.9910.9650.9540.9060.9320.8920.943
MSPE Optimal Combination0.1270.0810.1650.3480.2310.1870.152
Inefficiencies (Opt. Combination)−0.293 ***−0.157 ***−0.164 **−1.286 ***−0.931 ***−0.246 **−0.102
LatviaLuxembourgMexicoNetherlandsNorwayPolandSlovak
Republic
λ*0.0620.3680.0750.0860.2360.2210.193
MSPE (International)0.5170.2570.1380.3120.2750.2330.205
MSPE (Core)1.9080.3730.6110.4300.3200.2660.625
Ratio MSPEopt/MSPEint0.9880.7680.9770.9970.9830.9880.876
MSPE Optimal Combination0.5110.1970.1350.3110.2700.2300.180
Inefficiencies (Opt. Combination)−1.923 **−0.237 **−0.063−0.708 **−0.247 **−1.021 ***-0.400 **
SloveniaSpainSwedenSwitzerlandUnited KingdomUnited States
λ*0.2100.2100.0850.0690.3100.318
MSPE (International)0.3710.3080.1700.0890.0850.246
MSPE (Core)0.4750.4310.2980.2660.0980.278
Ratio MSPEopt/MSPEint0.9790.9700.9930.9890.9600.963
MSPE Optimal Combination0.3630.2990.1690.0880.0810.237
Inefficiencies (Opt. Combination)−0.703 ***−0.601 ***0.004−0.150 ***−0.338 ***−0.292 **
* p < 10%, ** p < 5%, *** p < 1%.

References

  1. Elliot, G.; Timmermann, A. Economic Forecasting. J. Econ. Lit. 2008, 46, 3–56. [Google Scholar] [CrossRef]
  2. Bates, J.; Granger, C. The combination of forecasts. Oper. Res. Q. 1969, 20, 451–468. [Google Scholar] [CrossRef]
  3. Granger, C.; Newbold, P. Some Comments on the Evaluation of Economic Forecasts. Appl. Econ. 1973, 5, 35–47. [Google Scholar] [CrossRef]
  4. Granger, C.; Newbold, P. Forecasting Economic Time Series, 2nd ed.; Academic Press: Orlando, FL, USA, 1986. [Google Scholar]
  5. Chong, Y.; Hendry, D. Econometric Evaluation of Linear Macroeconomic Models. Rev. Econ. Stud. 1986, 53, 671–690. [Google Scholar] [CrossRef]
  6. Clements, M.; Hendry, D. On the Limitations of Comparing Mean Square Forecast Errors. J. Forecast. 1993, 12, 617–637. [Google Scholar] [CrossRef]
  7. Newbold, P.; Granger, C. Experience With Forecasting Univariate Time Series and the Combination of Forecasts. J. R. Stat. Soc. Ser. A 1974, 137, 131–165. [Google Scholar] [CrossRef]
  8. Granger, C.; Ramanathan, R. Improved methods of combining forecasts. J. Forecast. 1984, 3, 197–204. [Google Scholar] [CrossRef]
  9. Clemen, R. Linear constraints and the efficiency of combined forecasts. J. Forecast. 1986, 5, 31–38. [Google Scholar] [CrossRef]
  10. Diebold, F. Serial correlation and the combination of forecasts. J. Bus. Econ. Stat. 1988, 6, 105–111. [Google Scholar]
  11. Batchelor, R.; Dua, P. Forecaster diversity and the benefits of combining forecasts. Manag. Sci. 1995, 41, 68–75. [Google Scholar] [CrossRef]
  12. Harvey, D.; Leybourne, S.; Newbold, P. Tests for Forecast Encompassing. J. Bus. Econ. Stat. 1998, 16, 54–259. [Google Scholar]
  13. Stock, J.; Watson, M. Combination forecasts of output growth in a seven-country data set. J. Forecast. 2004, 23, 405–430. [Google Scholar] [CrossRef]
  14. Aiolfi, M.; Timmermann, A. Persistence in Forecasting Performance and Conditional Combination Strategies. J. Econom. 2006, 135, 31–53. [Google Scholar] [CrossRef]
  15. Hansen, B. Least Squares Forecast Averaging. J. Econom. 2008, 146, 342–350. [Google Scholar] [CrossRef]
  16. Capistrán, C.; Timmermann, A. Forecast Combination with Entry and Exit of Experts. J. Bus. Econ. Stat. 2009, 27, 429–440. [Google Scholar] [CrossRef]
  17. Clements, M.; Harvey, D. Combining Probability Forecasts. Int. J. Forecast. 2011, 27, 208–223. [Google Scholar] [CrossRef]
  18. Poncela, P.; Rodriguez, J.; Sánchex-Mangas, R.; Senra, E. Forecast combinations through dimension reduction techniques. Int. J. Forecast. 2011, 27, 224–237. [Google Scholar] [CrossRef]
  19. Kolassa, S. Combining exponential smoothing forecasts using Akaike weights. Int. J. Forecast. 2011, 27, 238–251. [Google Scholar] [CrossRef]
  20. Costantini, M.; Kunst, R. Combining forecasts based on multiple encompassing tests in a macroeconomic core system. J. Forecast. 2011, 30, 579–596. [Google Scholar] [CrossRef]
  21. Cheng, X.; Hansen, B. Forecasting with factor-augmented regression: A frequentist model averaging approach. J. Econom. 2015, 186, 280–293. [Google Scholar] [CrossRef]
  22. Wang, X.; Hyndman, R.; Li, F.; Kang, Y. Forecast combinations: An over 50-year review. Int. J. Forecast. 2022, in press. [Google Scholar] [CrossRef]
  23. Wright, J. Bayesian Model Averaging and exchange rate forecasts. J. Econom. 2008, 146, 329–341. [Google Scholar] [CrossRef]
  24. Chowell, G.; Luo, R.; Sun, K.; Roosa, K.; Tariq, A.; Viboud, C. Real-time forecasting of epidemic trajectories using computational dynamic ensembles. Epidemics 2020, 30, 100379. [Google Scholar] [CrossRef]
  25. Atiya, A. Why does forecast combination work so well? Int. J. Forecast. 2020, 36, 197–200. [Google Scholar] [CrossRef]
  26. Timmermannn, A. Forecast Combinations. In Handbook of Economic Forecasting; Elliott, G., Granger, C., Timmermannn, A., Eds.; Elsevier: Amsterdam, The Netherlands, 2006; pp. 135–194. [Google Scholar]
  27. Aiolfi, M.; Capistrán, C.; Timmermann, A. Forecast Combinations. In The Oxford Handbook of Economic Forecasting; Clements, M., Hendry, D., Eds.; OUP: New York, NY, USA, 2011. [Google Scholar]
  28. Mincer, J.; Zarnowitz, V. The Evaluation of Economic Forecasts. In Economic Forecasts and Expectations; Mincer, J., Ed.; National Bureau of Economic Research: New York, NY, USA, 1969. [Google Scholar]
  29. Patton, A.; Timmermann, A. Forecast Rationality Tests Based on Multi-Horizon Bounds. J. Bus. Econ. Stat. 2012, 30, 1–17. [Google Scholar] [CrossRef]
  30. White, H. A Reality Check for Data Snooping. Econometrica 2000, 68, 1097–1126. [Google Scholar] [CrossRef]
  31. Pincheira, P. Shrinkage Based Tests of Predictability. J. Forecast. 2013, 32, 307–332. [Google Scholar]
  32. Pincheira, P.; Hardy, N. Correlation Based Tests of Predictability; MPRA Paper 112014; University Library of Munich: München, Germany, 2022. [Google Scholar]
  33. Bermingham, C. How useful is core inflation for forecasting Headline inflation? Econ. Soc. Rev. 2007, 38, 355–377. [Google Scholar]
  34. Song, L. Do underlying measures of inflation outperform headline rates? Evidence from Australian data. Appl. Econ. 2005, 37, 339–345. [Google Scholar] [CrossRef]
  35. Pincheira, P.; Selaive, J.; Nolazco, J.L. Forecasting inflation in Latin America with core measures. Int. J. Forecast. 2019, 35, 1060–1071. [Google Scholar] [CrossRef]
  36. Ciccarelli, M.; Mojon, B. Global Inflation. Rev. Econ. Stat. 2010, 92, 524–535. [Google Scholar] [CrossRef]
  37. Pincheira, P. A power booster factor for out-of-sample tests of predictability. Economía 2022, 45, 150–183. [Google Scholar] [CrossRef]
  38. Medel, C.; Pedersen, M.; Pincheira, P. The Elusive Predictive Ability of Global Inflation. Int. Financ. 2016, 19, 120–146. [Google Scholar] [CrossRef]
  39. Hamilton, J.D. Why you should never use the Hodrick-Prescott filter. Rev. Econ. Stat. 2018, 100, 831–843. [Google Scholar] [CrossRef]
  40. Dritsaki, M.; Dritsaki, C. Comparison of HP Filter and the Hamilton’s Regression. Mathematics 2022, 10, 1237. [Google Scholar] [CrossRef]
  41. Jönsson, K. Cyclical dynamics and trend/cycle definitions: Comparing the HP and Hamilton filters. J. Bus. Cycle Res. 2020, 16, 151–162. [Google Scholar] [CrossRef]
  42. Jönsson, K. Real-time US GDP gap properties using Hamilton’s regression-based filter. Empir. Econ. 2020, 59, 307–314. [Google Scholar] [CrossRef]
  43. Ravn, M.O.; Uhlig, H. On adjusting the Hodrick-Prescott filter for the frequency of observations. Rev. Econ. Stat. 2002, 84, 371–376. [Google Scholar] [CrossRef]
  44. Niu, Z.; Wang, C.; Zhang, H. Forecasting stock market volatility with various geopolitical risks categories: New evidence from machine learning models. Int. Rev. Financ. Anal. 2023, 89, 102738. [Google Scholar] [CrossRef]
  45. Lv, W.; Qi, J. Stock market return predictability: A combination forecast perspective. Int. Rev. Financ. Anal. 2022, 84, 102376. [Google Scholar] [CrossRef]
  46. Safari, A.; Davallou, M. Oil price forecasting using a hybrid model. Energy 2018, 148, 49–58. [Google Scholar] [CrossRef]
  47. Capek, J.; Cuaresma, J.C.; Hauzemberger, N. Macroeconomic forecasting in the euro area using predictive combinations of DSGE models. Int. J. Forecast. 2022, in press. [Google Scholar] [CrossRef]
Figure 1. Decision tree when two or more forecasts are available for the same target variable. Notes to Figure 1: When two or more forecasts are available for the same target variable, the decision maker has two options: to choose one individual forecast out of the total available pool and make a decision according to it or to build a composite forecast using all the individual available forecasts as ingredients for this construction. When building a composite forecast, there are basically two major approaches in the literature: The first of them is to construct a weighted average of the forecasts. This includes a simple arithmetic average of the individual forecasts as a particular case. This strategy is known as the [2] approach. The second major approach is to construct a new forecast as a linear combination of the available forecasts but with unconstrained weights, which means that it could take negative values or be greater than one. This latter strategy is known as the [8] approach. To give an example, in the case of two available forecasts, f 1 and f 2 , the [2] approach proposes a third forecast as follows, α f 1 + 1 α f 2 , whereas the [8] approach proposes a third forecast as follows, c + β f 1 + γ f 2 , where α ,   β ,   γ are just real numbers. Frequently, estimates of the parameters, c , β , and γ , are obtained like ordinary least squares estimators in a regression of the available target against a constant and the two available individual forecasts, f 1 and f 2 .
Figure 1. Decision tree when two or more forecasts are available for the same target variable. Notes to Figure 1: When two or more forecasts are available for the same target variable, the decision maker has two options: to choose one individual forecast out of the total available pool and make a decision according to it or to build a composite forecast using all the individual available forecasts as ingredients for this construction. When building a composite forecast, there are basically two major approaches in the literature: The first of them is to construct a weighted average of the forecasts. This includes a simple arithmetic average of the individual forecasts as a particular case. This strategy is known as the [2] approach. The second major approach is to construct a new forecast as a linear combination of the available forecasts but with unconstrained weights, which means that it could take negative values or be greater than one. This latter strategy is known as the [8] approach. To give an example, in the case of two available forecasts, f 1 and f 2 , the [2] approach proposes a third forecast as follows, α f 1 + 1 α f 2 , whereas the [8] approach proposes a third forecast as follows, c + β f 1 + γ f 2 , where α ,   β ,   γ are just real numbers. Frequently, estimates of the parameters, c , β , and γ , are obtained like ordinary least squares estimators in a regression of the available target against a constant and the two available individual forecasts, f 1 and f 2 .
Mathematics 11 03806 g001
Scheme 1. Univariate convex quadratic forms with a zero root.
Scheme 1. Univariate convex quadratic forms with a zero root.
Mathematics 11 03806 sch001
Scheme 2. Auto-inefficiency of most forecast xombinations.
Scheme 2. Auto-inefficiency of most forecast xombinations.
Mathematics 11 03806 sch002
Scheme 3. Auto-inefficiency of forecast combinations when individual forecasts are auto-efficient.
Scheme 3. Auto-inefficiency of forecast combinations when individual forecasts are auto-efficient.
Mathematics 11 03806 sch003
Scheme 4. The optimal combination may display auto-efficiency.
Scheme 4. The optimal combination may display auto-efficiency.
Mathematics 11 03806 sch004
Scheme 5. MSPE of the forecast combination (black solid line; left axis) and auto-inefficiency of the forecast combination (red dotted line; right axis). Notes: the black solid line represents the MSPE for the combination of forecasts when the weight associated with model 2, (1 − λ*), is determined by the x axis. The red dotted line represents the auto-inefficiency of the forecast combinations, measured as the covariance between the forecast error and the forecast of the combination.
Scheme 5. MSPE of the forecast combination (black solid line; left axis) and auto-inefficiency of the forecast combination (red dotted line; right axis). Notes: the black solid line represents the MSPE for the combination of forecasts when the weight associated with model 2, (1 − λ*), is determined by the x axis. The red dotted line represents the auto-inefficiency of the forecast combinations, measured as the covariance between the forecast error and the forecast of the combination.
Mathematics 11 03806 sch005
Table 1. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with core and a model with international inflation.
Table 1. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with core and a model with international inflation.
AustriaBelgiumCzech
Republic
DenmarkEstoniaGermanyGreeceUnited Kingdom
λ*0.2550.2810.4430.1640.3300.3210.3070.310
MSPE (International)0.0940.2290.3640.1060.8720.1730.3840.085
MSPE (Core)0.1110.2760.4080.1390.9340.2000.5330.098
MSPE Optimal Combination0.0920.2200.2880.1040.8520.1650.3480.081
Ratio MSPEopt/MSPEint0.9760.9630.7930.9870.9770.9540.9060.960
Inefficiency (Optimal Combination)−0.145 **−0.633 ***−0.476 *−0.364 ***−4.300 ***−0.164 **−1.286 ***−0.338 ***
Notes: λ * represents the optimal combination in terms of MSPE in the expression Y t c = λ * π t c o r e f o r e c a s t + 1 λ * π t I n t f o r e c a s t . Ratio MSPEopt/MSPEint represents the MSPE ratio between the optimal combination and the best individual forecast. * p < 10%, ** p < 5%, *** p < 1%.
Table 2. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with output gap and a model with international inflation.
Table 2. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with output gap and a model with international inflation.
AustriaBelgiumCzech
Republic
DenmarkEstoniaGermanyGreeceUnited
Kingdom
λ*0.1990.9770.6730.1460.7260.6480.6420.241
MSPE (International)0.0940.2290.3640.1060.8720.1730.3840.085
MSPE (Output Gap)0.1140.2050.3530.1090.7900.1630.3630.098
MSPE Optimal Combination0.0930.2050.3500.1050.7770.1590.3530.083
Ratio MSPEopt/MSPE (Best Model)0.9861.0000.9900.9990.9830.9740.9730.983
Inefficiencies (Opt. Combination)−0.160 **−0.441 ***−1.815 ***−0.366 ***−2.893 **−0.059−1.193 **−0.349 ***
** p < 5%, *** p < 1%.
Table 3. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with a different measure of output gap and a model with international inflation.
Table 3. MSPE and inefficiency of the combined forecast when predicting headline inflation with a model with a different measure of output gap and a model with international inflation.
AustriaBelgiumCzech
Republic
DenmarkEstoniaGermanyGreeceUnited Kingdom
λ*0.1390.7520.6210.1300.8530.6430.5850.093
MSPE (International)0.0940.2290.3640.1060.8720.1730.3840.085
MSPE (Output Gap)0.1180.2140.3550.1110.7690.1630.3710.109
MSPE Optimal Combination0.0930.2130.3500.1050.7660.1590.3560.084
Ratio MSPEopt/MSPE (Best Model)0.9930.9920.9860.9990.9960.9730.9620.997
Inefficiencies (Opt. Combination)−0.158 **−0.551 ***−1.849 ***−0.366 ***−3.001 **−0.075−1.249 **−0.375 ***
** p < 5%, *** p < 1%.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pincheira-Brown, P.; Bentancor, A.; Hardy, N. An Inconvenient Truth about Forecast Combinations. Mathematics 2023, 11, 3806. https://doi.org/10.3390/math11183806

AMA Style

Pincheira-Brown P, Bentancor A, Hardy N. An Inconvenient Truth about Forecast Combinations. Mathematics. 2023; 11(18):3806. https://doi.org/10.3390/math11183806

Chicago/Turabian Style

Pincheira-Brown, Pablo, Andrea Bentancor, and Nicolás Hardy. 2023. "An Inconvenient Truth about Forecast Combinations" Mathematics 11, no. 18: 3806. https://doi.org/10.3390/math11183806

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop