1. Introduction. Methods of Construction Asymptotically Efficient Estimates for Parameters of Stationary Time Series
In applications of mathematical statistics to modern problems of data analysis in natural science and technology, it is often impossible to use the classical observation models in the form of a sequence of independent identically distributed random variables (i.i.d. model). As a rule, the i.i.d. model does not provide sufficient accuracy of statistical inferences about the unknown parameters of the investigated physical processes, distorted by noise, if both of them are stationary random processes.
Thus, it is important to generalize the classical results of the statistical theory of parameter estimation, developed for the i.i.d. model, in order to apply them to actual practical problems in the analysis of real physical processes.
In modern systems for analyzing physical wave fields, a large number of parameters are simultaneously measured, and many sensors are used to improve the accuracy of the analysis. That is, multidimensional time series are subjected to statistical processing, and vector parameters are estimated as a result of this processing.
For many statistical models of multivariate time series, it is impossible to synthesize statistically efficient estimates
of vector parameters
for which the standard deviation matrices are minimal for any finite size
n of observations and are equal to the inverse Fisher information matrix:
where
;
is the probability density of the observations .
At the same time, asymptotically efficient (AE) estimates
can be constructed for a wide class of multivariate time series with interdependent elements
possessing a strong mixing property [
1]. For AE-estimates, equality (1) is attained asymptotically for
:
They can be found in the class
of
regular estimates
for which the random quantities
,
have limit distributions with finite second moments. This statement is one of the results of the extensive asymptotic theory of statistical inference for random time series, which is most fully presented in [
2]. Fundamental results in this theory were obtained in the known publications [
3,
4,
5,
6]. In these books, sufficient conditions were established under which AE-estimates exist for many probabilistic models of random time series and continuous processes.
The main condition under which the AE-estimates can be constructed is the local asymptotic normality (LAN) of the likelihood ratio
of observations
[
3]. It means that the likelihood ratio of the observations
admits the following asymptotic expansion:
where
;
is a family of statistics for which probability distributions tend as
to the
q-dimensional Gaussian distributions with the parameters
uniformly in
;
(
) in
-probability uniformly in
;
where
is any number.
Many publications, for example, [
7,
8,
9,
10,
11,
12,
13,
14], have been devoted to proving the LAN property for various probabilistic models of time series other than the i.i.d model. The results of research in this direction, obtained up to the end of the twentieth century, are summarized in the monograph [
2]. It was shown that the LAN property is inherent in a wide class of multidimensional time series and continuous random processes.
The formulation of the LAN condition (2) largely determined the further development and practical applications of the asymptotic estimation theory. In the well-known monograph [
6], it is shown that under the LAN condition, the maximum likelihood estimate belongs to the class
of regular statistical estimates and is an AE-estimate.
At the same time, using the decomposition (2) of the likelihood function of observations, new AE-estimates were constructed, which differ from the traditional maximum likelihood estimates and are computationally simpler. An elegant and, in many cases, the most computationally simple method for constructing AE-estimates, was proposed in [
3,
4]. It is based on R. Fisher’s [
15] idea of “improving” the quality of some “simple” estimate to the quality of an AE-estimate. In mentioned publications, L. Le Cam showed that the AE-estimate can be obtained using the equation:
where
is an arbitrary
-consistent estimate of the parameter
for which the quantities
,
,
have the property: for any
there is
, such that
.
Note that Equation (3) defines a whole class of AE-estimates, the quality of which is asymptotically equivalent to the quality of the ML-estimate, since , in the LAN expansion (2) and the -consistent estimate are not unique functions. For this reason, in many practically important cases, formula (3) allows one to obtain AE-estimates, which are computationally much simpler than ML-estimates.
2. Construction of M-Estimates for Parameters of Stationary Time Series with Suitable Asymptotical Properties
The AE-estimates have some disadvantages from the point of view of practical applications. First, they can be synthesized only if the probability density of the observations is fully known. In practice, some important details of this density are often not fully defined. Only a certain class is known to which this density belongs. Second, the quality of AE-estimates is often unstable to deviations of the actual density from the assumed one for which they were synthesized. Even a small deviation from the expected density can lead to a significant loss in the accuracy of the AE-estimate.
In the publications [
16,
17], methods were developed for constructing estimates that are
robust to changes in the distribution of observations, and in many applications, such robust estimates are preferable to AE-estimates. A robust estimate
is constructed by finding the global maximum of a certain objective function
(a criterion of estimation quality), which differs from likelihood function:
In addition to robust estimates, estimates synthesized using Equation (4) arise in other problems of mathematical statistics. The examples include Bayesian estimation problems, estimation problems with interfering (nuisance) parameters, problems arising in the analysis of natural and economic dynamical systems.
The estimates obtained as the maxima of some objective functions
were called “M-estimates”. Apart from books [
16,
17], they were considered in many other publications, for example, in [
18,
19]. In most of these publications, the M-estimates were constructed and analyzed for the i.i.d. model of random observations.
The authors are not aware of publications in which the asymptotic properties of M-estimates were studied with a sufficient level of mathematical rigor for multidimensional stationary random time series that have a strong mixing property. The authors are also unaware of publications devoted to the construction of computationally simple estimates that are asymptotically equivalent in quality to M-estimates.
In this paper, we consider an approach to solving these problems from the standpoint of view of the asymptotic theory of statistical inference [
2], which is based on Le Cam’s concept of local asymptotically normality.
We suppose that random objective function
is twice differentiable in
-probability with respect to components of the vector
; that is, there exist the following family of vector statistics
and matrix function
:
In this case, the M-estimate (4) is
one of the roots
of the following equation system with respect to the parameter
:
In this paper, we show how to find the estimate
, which is a root of the equation system (6), and, at the same time, it is an
-consistent estimate of the parameter
u. It is proved in Theorem 1 that under certain restrictions, such an estimate
can be found using the algorithm
where
;
;
is any
-consis- tent estimate of the parameter
u.
Conditions are formulated in Theorem 1 on the family of statistics
and the sequence of the matrix functions
that are sufficient for the asymptotic normality of the estimate (7):
, where the asymptotic covariance matrix
is equal to
The corollary of Theorem 1 describes a method for constructing another estimate that has the same asymptotical distribution as the estimate (7) but does not require an auxiliary -consistent estimate .
Note that the statements of Theorem 1 and the corollary were formulated earlier in [
20]. In our paper, the above statements are proved under more general assumptions, and simpler proofs are given.
Theorem 1. A. There exists a-consistent estimate of the parameter u.
B. Let the family of statistics,, and the sequence of positive definite symmetric-matrix functionssatisfy the following constraints:
B1. For each value of the parameter, the sequence of statisticsis asymptotically normal with zero mean and the covariance matrix:
where B2.
For each value of the parameter,
the following asymptotic expansion of the statisticholds:wherefor any;is a continuous function of. Then the following statement is true:
For any-consistent estimateof the parameter, the statisticis the-consistent and asymptotically normal estimate of the parameterwith the moments:where=.
Corollary 1. (a) Let, for any , a statistic be the root of the equation with respect to the parameter with probability equal to 1.
(b) Let the statistic also is a -consistent estimate of the parameter . Then the statistic is asymptotically normal with the moments .
Remark 1. (
a)
The statement similar to Statement (T1) of Theorem 1 was proved in [3,4] in the case when the objective function is the likelihood function of having the LAN property (2). In this case , the matrix function andwhere is the Fisher matrix. It follows from Theorem 1, that in this caseConsequently, the statisticis asymptotically normal with the parameters, and hence, it is the asymptotically efficient estimate of the parameter u. (b) It follows from the corollary of Theorem 1 that a statistic , which has the property: with probability equal to one, and at the same time is a -consistent estimate of the parameter , is asymptotically normal with the moments . Consequently, the statistic is the asymptotically efficient estimate of the parameter .
Thus, Theorem 1 is, in some sense, an extension of Le Cam’s results to the case of an arbitrary objective function whose gradient satisfies conditions B1, B2 of Theorem 1.
3. Proof of Theorem 1
In the course of proving Theorem 1, we will omit, if it is obvious, the dependence of functional quantities on the observations and sometimes denote their dependence on the parameter u by a subscript.
In these notations, the definition of the estimate
can be written as
Then we can write the following chain of equalities:
where
=
. It follows from (9):
By denoting
=
, we obtain from (10):
where the random quantities
,
,
have the property: for any
there is
such that
.
At the same time, from condition B2 of Theorem 1, we obtain:
where
.
The comparison Equations (11) and (12) allow us to prove the following Lemma.
Lemma 1. Under the conditions of Theorem 1, the following convergences take place for any :
(a) , (b) .
The following statement will be needed below.
Lemma 2. Let some random variablesandhave the properties:
(a) ; (b) for any .
Then.
The proof of Lemma 2 is quite simple, and we omit it.
Taking into account Equations (9)–(12) and statements of Lemmas 1 and 2, we can write the following equalities:
where the existence of the limits follows from conditions B1, B2 of Theorem 1. According to conditions B1 of Theorem 1, we have:
where
Therefore:
= , where . □
5. Proof of Lemma 1
(
a) For any
,
q > 0 and
, we can write the following equation:
Let denote
the conditional probability of the event
under the condition of the event
. Then (13) can be rewritten as:
According to (11), there is
such that
for any
. It follows then from (14) that for any
and
where
.
According to (12), for any
,
and
where
,
It follows from (15), (16) that for any .
(
b) Since
, to prove statement (
b) of Lemma 1, it suffices to check that
is bounded in probability. Since
satisfies conditions B2 of Theorem 1, for any
there exists
that for all
n the following inequality holds:
. So, we can write:
Since satisfies statement (a) of Lemma 1, one can find a number such that < 2. □