*5.1. Optimal Infeasible Indicator-Based* F*-Test*

The optimal infeasible F-test with a known location shift in the marginal process is computable in simulations. Table 7 reports the rejections of invariance, which are always high for large shifts, but fall as departures from weak exogeneity decrease. Empirical rejection frequencies approximate maximum achievable power for this type of test. The correct step indicator is almost always significant in the conditional model for location shifts larger than 2.5√*σ*22, even for relatively small values of (*<sup>ρ</sup>* <sup>−</sup> *<sup>β</sup>*).

**Table 7.** Power of the optimal infeasible F-test for a failure of invariance using a known step indicator for *α*<sup>2</sup> = 0.01 at *T*<sup>1</sup> = 80, *T* = 100, *M* = 1, 000.


#### *5.2. Potency of the SIS-Based Test*

Table 8 records the Stage 1 gauge and potency at different levels of location shift (*d*) and departures from weak (and hence super) exogeneity via (*ρ* − *β*). The procedure is slightly over-gauged at Stage 1 for small shifts, when its potency is also low, and both gauge and potency are correctly unaffected by the magnitude of (*ρ* − *β*), whereas Stage 1 potency rises rapidly with *d*.

**Table 8.** Stage 1 gauge and potency at *α*<sup>1</sup> = 0.01 for *T*<sup>1</sup> = 80, *T* = 100, *M* = 1000 and *β* = 2.


Table 9 records Stage 2 potency for the three values of *α*1. It shows that for a failure of invariance, even when *ρ* − *β* = 0.25, test potency can increase at tighter Stage 1 significance levels, probably by reducing the retention rate of irrelevant step indicators. Comparing the central panel with the matching experiments in Table 7, there is remarkably little loss of rejection frequency from selecting indicators by SIS at Stage 1, rather than knowing them, except at the smallest values of *d*.

**Table 9.** Stage 2 potency for a failure of invariance at *α*<sup>2</sup> = 0.01, *T*<sup>1</sup> = 80, *T* = 100, and *M* = 1000.


#### **6. Application to the Small Artificial-Data Policy Model**

To simulate a case of invariance failure from a policy change, which could be checked by FInv(*τ*=**0**) *in-sample*, followed by forecast failure, we splice the two scenarios from Figure 2 sequentially in the order of the 100 observations in panels (III+IV) then those used for panels (I+II), creating a sample of *T* = 200.

Next, we estimate (5) with SIS, retaining the policy variable, and test the significance of the selected step-indicators in (6). At Stage 1, using *α*<sup>1</sup> = 0.001, as the model is mis-specified and the sample is *T* = 189 keeping the last 11 observations for the forecast period, two indicators are selected. Testing these in (6) yields FInv(*τ*=**0**)(2, 186) = 13.68∗∗, strongly rejecting invariance of the parameters of the model for *yt* to shifts in the model of *xt*.

Figure 3 reports the outcome graphically, where the ellipses show the period with the earlier break in the DGP without a location shift but with the policy change. Panel (I) shows the time series for *xt* with the fitted and forecast values, denoted *<sup>x</sup>*3*t*, from estimating the agency's model with SIS which delivered the indicators for testing invariance. Panel (II) shows the outcome for *yt* after adding the selected indicators denoted SIS*se* from the marginal model for *xt*, which had earlier rejected invariance. Panel (III) reports the outcome for a setting where the invariance failure led to an improved model, which here coincides with the in-sample DGP. This greatly reduces the later forecast errors and forediction failures. Finally, Panel (IV) augments the estimated in-sample DGP equation (with all its regressors retained) by selecting using SIS at *α* = 0.01. This further reduces forecast failure, although constancy can still be rejected from the unanticipated location shift. If the invariance rejection had led to the development of an improved model, better forecasts, and hopefully improved foredictions and policy decisions, would have resulted. When the policy model is not known publicly (as with MPC decisions), the agency alone can conduct these tests. However, an approximate test based on applying SIS to an adequate sequence of published forecast errors could highlight potential problems.

**Figure 3.** (**I**) Forecast failure for *xt* by *<sup>x</sup>*3*<sup>t</sup>* even with SIS; (**II**) Forecast failure in *<sup>y</sup>*3*<sup>t</sup>* even augmented by the SIS indicators selected from the margin model for *xt*; (**III**) Smaller forecast failure from *<sup>y</sup>t* based on the in-sample DGP; (**IV**) Least forecast failure from *<sup>y</sup>t* based on the in-sample DGP with SIS.

#### *6.1. Multiplicative Indicator Saturation*

In the preceding example, the invariance failure was detectable by SIS because the policy change created a location shift by increasing *zt* by *δ*. A zero-mean shift in a policy-relevant derivative, would not be detected by SIS, but could be by multiplicative indicator saturation (MIS) proposed in Ericsson (2012). MIS interacts step indicators with variables as in *dj*,*<sup>t</sup>* = *zt*1{*j*≤*t*}, so *dj*,*<sup>t</sup>* = *zt* when *<sup>j</sup>* ≤ *<sup>t</sup>* and is zero otherwise. Kitov and Tabor (2015) have investigated its performance in detecting changes in parameters in zero-mean settings by extensive simulations. Despite the very high dimensionality of the resulting parameter space, they find MIS has a gauge close to the nominal significance level for suitably tight *α*, and has potency to detect such parameter changes. As policy failure will occur

after a policy-relevant parameter shifts, advance warning thereof would be invaluable. Even though the above illustration detected a failure of invariance, it did not necessarily entail that policy-relevant parameters had changed. We now apply MIS to the *T* = 200 artificial data example in Section 6 to ascertain whether such a change could be detected, focusing on potential shifts in the coefficient of *zt*−<sup>1</sup> in (6).

Selecting at *α* = 0.005 as there are more then 200 candidate variables yielded:

$$y\_t = \begin{array}{c c c c c c} \multicolumn{3}{c}{} - \text{ 1.12} \ z\_{t-1} \mathbf{1}\_{\{t \le 90\}} & + & \mathbf{1.17} \ z\_{t-1} \mathbf{1}\_{\{t \le 100\}} & + & \mathbf{6.01} \ - \text{ 0.857} \ z\_{t-1} \text{ &} & \mathbf{ 46} \end{array} \tag{46}$$

with *σ* = 0.60. Thus, the in-sample shift of −1 in *δλ*1*θ*<sup>1</sup> is found over *t* = 90, ... , 100, warning of a lack of invariance in the key policy parameter from the earlier policy change, although that break is barely visible in the data, as shown by the ellipse in Figure 3 (II). To understand how MIS is able to detect the parameter change, consider knowing where the shift occurred and splitting your data at that point. Then you would be startled if fitting your correctly specified model separately to the different subsamples did not deliver the appropriate estimates of their DGP parameters. Choosing the split by MIS will add variability, but the correct indicator, or one close to it, should accomplish the same task.

#### **7. Forecast Error Taxonomy and Associated Tests**

Table 10 relates the taxonomy in Table 1 to the sources of the forecast errors from (16) to illustrate which indicator-saturation test could be used, where the order (SIS, IIS) etc. shows their likely potency.


**Table 10.** The taxonomy of systematic forecast failures with associated tests.

When the source of forecast failure is the equilibrium mean or forecast origin mis-estimation, then SIS is most likely to detect the systematically signed forecast errors, whereas for other unobserved terms IIS is generally best equipped to detect these changes. When the slope parameter is the source of failure for *δ* = 0, then SIS is generally best, whereas when *δ* = 0, IIS might help. In practice, policy invalidity and forediction failure are probably associated with *i(c)* and *ii(c)*, where both SIS and IIS tests for super exogeneity are valid. In this setting, policy failure can also be triggered through *ii(a)* and *ii(b)* which makes an SIS test for super exogeneity again attractive. Absent a policy intervention, then zero-mean changes result in *ii(c)*, so may best be detected using multiplicative indicator saturation.

#### **8. How to Improve Future Forecasts and Foredictions**

Scenarios above treated direct forecasts and those derived from foredictions as being essentially the same. When a major forecast error occurs, the agency can use a robust forecasting device such as an intercept correction (IC), or differencing the forecasts, to set them 'back on track' for the next period. Although sometimes deemed 'ad hoc', Clements and Hendry (1998) show the formal basis for their success in improving forecasts. However, the foredictions that led to the wrong policy implementation cannot be fixed so easily, even if the agency's next narrative alters its story. In our example, further increases in *δ* will induce greater forecast failure if the policy model is unchanged: viable *policy* requires invariance of the model to the policy change. Nevertheless, there is a partial 'fix' to the forecast failure and policy invalidity. If the lack of invariance is invariant, so policy shifts change the model's parameters in the same way each time as in (14), the shift associated with a past policy change can be added as an IC to a forecast based on a later policy shift. We denote such an IC by SIS*se IC*, which has the advantage that it can be implemented before experiencing forecast failure. This is shown in Figure 4(I),(II), focusing on the last 50 periods, where the policy change coincides with the location shift at observation 191. The first panel records the forecasts for a model of *yt* which includes the SIS*se* indicator for the period of the first forecast failure, and also includes SIS*se IC* as an imposed IC, from *T* = 191. Had a location shift not also occurred, SIS*se IC* would have corrected the forecast for the lack of invariance, and could have been included in the policy analysis and the associated foredictions.

**Figure 4.** (**I**) Forecasts for *yt* by *<sup>y</sup>*3<sup>∗</sup> *<sup>t</sup>* , just using SIS*se* for the first policy-induced shift and SIS*se IC* at observation 191; (**II**) Forecasts for *yt* by *<sup>y</sup>*3<sup>∗</sup> *<sup>t</sup>* also with SIS in-sample; (**III**) Forecasts from *T* = 192 for *yt* by *<sup>y</sup>*3*i*,*<sup>t</sup>* with a 1-observation IC but without SIS; (**IV**) Forecasts for *yt* by *<sup>y</sup>*3*i*,*<sup>t</sup>* also with SIS in-sample.

Figure 4 also shows how effective a conventional IC is in the present context after the shift has occurred, using a forecast denoted by *<sup>y</sup>*3*i*,*t*. The IC is a step indicator with a value of unity from observation *t* = 191 onwards when the forecast origin is *T* = 192, so one observation later, the forecast error is used to estimate the location shift. Compared to the massive forecast failure seen for the models of *yt* in Figure 3 (I) & (II), neither of the sets of forecast errors in Figure 4 (III) & (IV) fails a constancy test (FChow(10, 188) = 0.34 and FChow(10, 185) = 0.48). The IC alone corrects most of the forecast failure, but as (IV) shows, SIS improves the in-sample tracking by correcting the earlier location-shift induced failure and improves the accuracy of the resulting forecasts.6

In real-time forecasting, these two steps could be combined, using SIS*se IC* as the policy is implemented, followed by an IC one-period later when the location shift materialises, although a further policy change is more than likely in that event. Here, the mis-specified econometric models of the relationships between the variables are unchanged, but their forecasts are very different: successful forecasts do not imply correct models.

#### **9. Conclusions**

We considered two potential implications of forecast failure in a policy context, namely forediction failure and policy invalidity. Although an empirical forecasting model cannot necessarily be rejected following forecast failure, when the forecasts derived from the narratives of a policy agency are very close to the model's forecasts, as Ericsson (2016) showed was true for the FOMC minutes, then forecast failure entails forediction failure. Consequently, the associated narrative and any policy decisions based thereon also both fail. A taxonomy of the sources of forecast errors showed what could be inferred from forecast failure, and was illustrated by a small artificial-data policy model.

A test for invariance and the validity of policy analysis was proposed by selecting shifts in all marginal processes using step-indicator saturation and checking their significance in the conditional model. The test was able to detect failures of invariance when weak exogeneity failed and the marginal processes changed from a location shift. Compared to the nearest corresponding experiment in Hendry and Santos (2010), the potency of FInv is considerably higher for SIS at *α*<sup>2</sup> = 0.01 than IIS at *α*<sup>2</sup> = 0.025 (both at *α*<sup>1</sup> = 0.025) as shown in Figure 5.

**Figure 5.** Comparison of the potency of SIS with IIS.

A test rejection outcome by FInv indicates a dependence between the conditional model parameters and those of the marginals, warning about potential mistakes from using the conditional model to predict the outcomes of policy changes that alter the marginal processes by location shifts, which is a common policy scenario. Combining these two features of forecast failure with non-invariance allows forediction failure and policy invalidity to be established when they occur. Conversely, learning that the policy model is not invariant to policy changes could lead to improved models, and we also showed a 'fix' that could help mitigate forecast failure and policy invalidity.

While all the derivations and Monte Carlo experiments here have been for 1-step forecasts from static regression equations, a single location shift and a single policy change, the general

<sup>6</sup> As noted above, the lagged impact of the policy change causes *<sup>x</sup>*<sup>191</sup> to overshoot, so *<sup>x</sup>*3*i*,*<sup>t</sup>* is somewhat *above xt* over the forecast horizon, albeit a dramatic improvement over Figure 3 (I): using 2-periods to estimate the IC solves that.

nature of the test makes it applicable when there are multiple breaks in several marginal processes, perhaps at different times. Generalizations to dynamic equations, to conditional systems, and to other non-stationary settings, probably leading to more approximate null rejection frequencies, are the focus of our present research.

**Acknowledgments:** This research was supported in part by grants from the Robertson Foundation (award 9907422), Institute for New Economic Thinking (grant 20029822) and Statistics Norway (through Research Council of Norway Grant 236935). We are indebted to Jurgen A. Doornik, Neil R. Ericsson, Bent Nielsen, Ragnar Nymoen, Felix Pretis, the guest editors Rocco Mosconi and Paolo Paruolo, and two anonymous referees for helpful comments on an earlier version. It is a great pleasure to participate in a Special Issue of *Econometrics* in honour of Søren Johansen and Katarina Juselius who have both made major contributions to understanding the theory and practice of analysing non-stationary time series, and have been invaluable long-time co-authors. All calculations and graphs used *OxMetrics* Doornik (2013) and *PcGive* Doornik and Hendry (2013), which implements *Autometrics*.

**Author Contributions:** All authors contributed equally to this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


Cartwright, Nancy. 1989. *Nature's Capacities and their Measurement*. Oxford: Clarendon Press.


Stenner, Alfred J. 1964. On Predicting our Future. *Journal of Philosophy* 16: 415–28.

Zhang, Kun, Jiji Zhang, and Bernhard Schölkopf. 2015. Distinguishing Cause from Effect Based on Exogeneity. Available online: http://arxiv.org/abs/1504.05651 (accessed on 10 October 2015).

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
