1. Introduction
Minimum trace reconciliation, developed by [
1], is an innovation in the literature on forecast reconciliation. The tool enables a systematic approach to forecasting with linear constraints, which encompasses a wide range of applications, including electricity demand forecasting [
2] and macroframework forecasting [
3,
4].
The proof of [
1], however, has a gap and is not easy to extend to more general situations. Their proof attempts to solve a minimization problem by replacing the objective function with its lower bound. Although they find the solution that minimizes the lower bound, the minimizer is not shown to coincide with the solution to the original problem, which creates a gap in the proof.
This paper provides an alternative proof and argues that it is not only simpler but also can be extended to incorporate more general results in the literature. The proof is more direct in the sense that it solves the first-order condition in the space of the non-square matrix. An almost identical proof can be used to prove Theorem 3.3 of [
5], which shows that the minimum trace reconciliation and minimum weighted trace reconciliation lead to an identical formula. By selecting a special weight in the weighted trace reconciliation problem, we can also see why the lower bound minimization in [
1] reaches the same formula. Thus, the alternative proof not only has pedagogical value but also connects the results in the literature from a unified perspective.
The paper is organized into six sections. In
Section 2, we provide the setup of the problem. In
Section 3, we briefly illustrate the proof of [
1]. In
Section 4, we provide an alternative proof of [
1].
Section 5 extends the proof to incorporate [
5] and discusses the insights. In
Section 6, we make our conclusions.
2. Setup
The setup and notation follow [
1]. Let
and
be
and
vectors of random variables, where
. The two vectors are constrained linearly by
where
is a
matrix, and its last
rows are identity matrix
and, thus,
is of full column rank for any matrix
. Intuitively,
represents the most disaggregated level and
includes
itself and aggregates of the subcomponents as specified by
, although mathematically,
can include negative elements. In any case, the realization of
is linearly dependent and belongs to
as
.
Suppose that an
-step ahead forecast based on the information up to time
, denoted by
and called “base” forecast, is given. The base forecast
is assumed to be an unbiased estimator of
where
is the expectation conditional on the information up to time
. But an issue is that
may not belong to
, which motivates forecast reconciliation.
A reconciled forecast
given an
matrix
is a linear transformation of
such that
The role of
is to map the base forecast
into the most disaggregated level. The reconciled forecast
is assumed to be unbiased and, thus, satisfies
Note that the necessity of the last equivalence follows from multiplying from the left of both sides. is a full-rank square matrix as is a full-rank matrix, so is invertible. The sufficiency follows from multiplying from the left of both sides.
The forecast error of the reconciled forecast can be expressed as
where
is the covariance matrix of the
-step ahead base forecast error and is assumed to be invertible (i.e., excluding the case of zero forecast error and the case of degenerated matrix
for aggregation). The equality follows because
noting that
holds. Ref. [
1] attempted to prove that the matrix
that minimizes the trace of the covariance matrix subject to the unbiasedness constraint is
3. Gap in Proof of [1]
The proof of [
1] can be divided into two steps. First, they show in its online Appendix A2 that the objective function can be bounded from below, as follows:
Second, a minimization problem where the objective function is the lower bound is solved.
The proof ends here, and thus, one still needs to show that the minimizers of the two problems () and () coincide. This presents a gap in the proof.
This gap is non-trivial because minimizing a function is not generally the same as minimizing its lower bound. That is, a function
being bounded from below by another function
does not guarantee that their minimizers coincide. For a counterexample where minimizers do not coincide, consider
where
is bounded by
from below, as follows:
but the minimizers do not coincide, as follows:
In the case of (
) and (
), however, the minimizers do coincide, as explained in proposition 2 of
Section 5.
5. An Extension of the Alternative Proof
The proof can be applied to the environment of weighted trace minimization as Theorem 3.3 of [
5]. To motivate the extension, suppose we have the base forecast of the variables in the GDP expenditure approach and want to reconcile it to satisfy
where
is GDP,
is consumption,
is investment,
is government expenditure, and
is net export. The minimum trace reconciliation minimizes the variance of forecast error with equal weights, as follows:
subject to the constraint (
). Since the forecast of GDP often attracts more attention than the others, a natural question is whether it is possible to improve the forecast of some variables at the expense of other variables by adjusting the weights.
Such specification can be expressed as a weighted trace, as follows:
where
is a
matrix with its
element equal to
. When the
weight matrix is a diagonal matrix, the objective function is a weighted sum of the variance of forecast errors. Note that the constraint in (
) is the same as that in (
), so the same unbiasedness assumption is still imposed. The same unbiasedness assumption is also reflected in the objective function as in (
).
As [
5] showed and [
1] proved in their unpublished manuscript, the optimal matrix
is independent of
as long as
is symmetric and invertible. Therefore, in practice, one does not need to exercise judgment or estimate how much weight to put on which variable.
Proposition 1. For any symmetric and invertible matrix the solution to (25) is Proof. The proof is almost identical to
Section 4. Let the Lagrangian be
Following the same logic as
Section 4, the first-order condition leads to
Since this has to hold for all
,
Multiplying
on both sides from the left and using
gives the following:
The formula follows because
is a full-rank square matrix and, thus, invertible.
□
Intuitively, the fact that the weight matrix does not matter can be interpreted as saying that there is not a trade-off between variables, as if the choice matrix has enough degree of freedom in mixing the base forecast so that the variance of each variable’s forecast error can be minimized variable by variable, without affecting the variance of other variables’ forecast errors.
Mathematically, the proof shares an almost identical structure as
Section 4, which is a special case when
. Since a symmetric invertible matrix can be factorized as
from Takagi’s factorization, the objective function can be written as
for any full-rank square matrix
. In fact, since the proof only requires
to be invertible, one can extend
to be a non-square matrix and show that the objective function of (
) is a special case of that of (
).
Proposition 2. There exists a weightsuch that the objective function of (11) equals that of (25).
Proof. Let
and
. The objective function of (25) collapses to that of (11).
□
Note that since
is invertible, the proof of proposition 1 can be applied to show (
). This is one way to see why the proof of [
1] reaches the same formula. One insight from the right side of (
) is that it represents the summed variance of the forecast error of the most disaggregated variables. Thus, minimizing the summed variance of all variables is equivalent to minimizing the summed variance of the most disaggregated variables.
In summary, the extension to allow a general weight highlights two observations. First, the irrelevance of weight implies that the objective function, being the trace of the forecast error covariance matrix, is not essential, although () is called minimum trace reconciliation in the literature. What is essential is the unbiasedness assumption, and thus, it could alternatively be called an optimal unbiased reconciliation. Second, the irrelevance of weight suggests that () reconciles the base forecast as if the forecast error variance of each variable can be minimized independently, but at the same time, () can be obtained by minimizing the variance of only the bottom-level variables. The extension suggests that these two apparently contradictory interpretations can coexist.