Next Article in Journal
A Fast Image Restoration Algorithm Based on a Fixed Point and Optimization Method
Next Article in Special Issue
Rates of Convergence in Laplace’s Integrals and Sums and Conditional Central Limit Theorems
Previous Article in Journal
A Mathematical Model of the Transition from Normal Hematopoiesis to the Chronic and Accelerated-Acute Stages in Myeloid Leukemia
Previous Article in Special Issue
Two Approaches to the Construction of Perturbation Bounds for Continuous-Time Markov Chains
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Wavelet Thresholding Risk Estimate for the Model with Random Samples and Correlated Noise

1
Faculty of Computational Mathematics and Cybernetics, M. V. Lomonosov Moscow State University, Moscow 119991, Russia
2
Institute of Informatics Problems, Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Moscow 119333, Russia
Mathematics 2020, 8(3), 377; https://doi.org/10.3390/math8030377
Submission received: 13 February 2020 / Revised: 2 March 2020 / Accepted: 5 March 2020 / Published: 8 March 2020
(This article belongs to the Special Issue Stability Problems for Stochastic Models: Theory and Applications)

Abstract

:
Signal de-noising methods based on threshold processing of wavelet decomposition coefficients have become popular due to their simplicity, speed, and ability to adapt to signal functions with spatially inhomogeneous smoothness. The analysis of the errors of these methods is an important practical task, since it makes it possible to evaluate the quality of both methods and equipment used for processing. Sometimes the nature of the signal is such that its samples are recorded at random times. If the sample points form a variational series based on a sample from the uniform distribution on the data registration interval, then the use of the standard threshold processing procedure is adequate. The paper considers a model of a signal that is registered at random times and contains noise with long-term dependence. The asymptotic normality and strong consistency properties of the mean-square thresholding risk estimator are proved. The obtained results make it possible to construct asymptotic confidence intervals for threshold processing errors using only the observed data.

1. Introduction

In digital signal processing tasks, it is often assumed that the recorded signal samples are independent. However, there are many physical processes that demonstrate long-term dependence where correlations between observations decrease rather slowly. For example, long-term dependence is often observed in geophysical processes where it takes the form of long periods of large or small values of observations. Interferences in communication channels demonstrate similar phenomena. Wavelet methods are widely used in the analysis and processing of signals recorded during the study of such processes.
The wavelet decomposition of a function f ( x ) is a series
f ( x ) = j , k Z f , ψ j , k ψ j , k ( x ) ,
where ψ j , k ( x ) = 2 j / 2 ψ ( 2 j x k ) , and ψ ( x ) is a wavelet function. The indices j and k are called the scale and the shift, respectively. This decomposition provides a time/scale representation of the signal function, that allows one to localise its features. There exist many wavelet functions with various properties.
In practice, a discrete wavelet transform is used. It is a multiplication of a sampled signal vector by an orthogonal matrix defined by the wavelet function ψ ( x ) (in practice implemented with a fast cascade algorithm [1]). This transform is applied to data, and the threshold processing of the resulting wavelet coefficients is performed [1]. For a model of signal samples with an equispaced grid, these methods were well studied by D. Donoho, I. Johnstone, B. Silverman and others [2,3,4,5,6,7,8,9,10]. Statistical properties of the mean-square risk estimator were also studied. It is shown that under certain conditions it is strongly consistent and asymptotically normal [11,12,13].
In some experiments it is not possible to record signal samples at regular intervals [14]. Sometimes registration of samples is made at random times. It was shown by T. Cai and L. Brown [15] that, if the sample points form a variational series based on a sample from the uniform distribution on the data registration interval, then the rate of the mean-square thresholding risk remains, up to a logarithmic factor, equal to the optimal rate in the class of Lipschitz regular functions. A special case of the uniform distribution appears, for example, when considering a Poisson process, and since the conditional distribution of its points on a given time interval, given the number of points, is uniform. These models can arise, for example, in astronomy when considering the stellar intensity. In this paper, it is proven that under some regularity conditions, the statistical properties of the risk estimator also remain the same for both equispaced and random sample grids.

2. Long-Term Dependence

Let the signal function f ( x ) be defined on the segment [ 0 , 1 ] and be uniformly Lipschitz regular with some exponent γ > 0 and Lipschitz constant L > 0 : f Lip ( γ , L ) . Assume that the samples of f ( x ) contain additive correlated noise and are recorded at random times that are independent and uniformly distributed on [ 0 , 1 ] . Namely, consider the following data model:
Y i = f ( x ( i ) ) + e i , i = 1 , , N ( N = 2 J ) ,
where 0 x ( 1 ) < < x ( N ) 1 is the variational series based on a sample from the uniform distribution on the segment [ 0 , 1 ] , and { e i , i Z } is a stationary Gaussian process with the covariance sequence r k = cov ( e i , e i + k ) . We assume that e i have zero mean and unit variance. We also assume that the noise autocovariance function decreases at the rate of r k k α , where 0 < α < 1 . This corresponds to the long-term dependence between observations [7].
The observations consist of pairs ( x ( 1 ) , Y 1 ) , , ( x ( N ) , Y N ) , where the distances between the samples are, generally, not equal. It is known that E x ( i ) = i / ( N + 1 ) (see Lemma 2 in [15]). Along with (1), consider a sample with equal distances between sample points
1 N + 1 , Z 1 , , N N + 1 , Z N .
where
Z i = f i N + 1 + e i , i = 1 , , N .
For the sample (2) threshold processing methods have been developed that effectively suppress the noise and provide an “almost” optimal rate of the mean-square risk [7,8]. The discrete wavelet transform with Meyer wavelets is applied to the sample (2) to obtain a set of empirical wavelet coefficients [1]
W j , k = μ j , k + 2 ( 1 α ) ( J j ) 2 ξ j , k , j = 0 , , J 1 , k = 0 , , 2 j 1 ,
where μ j , k are discrete wavelet coefficients of the sample
f 1 N + 1 , , f N N + 1 ,
and the noise coefficients ξ j , k have the standard normal distribution, but are not independent. The variances of W j , k have the form σ j 2 = 2 ( 1 α ) ( J j ) [12]. To suppress the noise the coefficients W j , k are processed with the hard thresholding function ρ H ( y , T ) = x 1 ( | y | > T ) or the soft thresholding function ρ S ( y , T ) = sgn ( x ) | y | T + with some threshold T, and the estimates W ^ j , k are obtained. After that, the inverse wavelet transform is performed. The idea of threshold processing is that the wavelet transform provides a “sparse” representation of the useful signal function; i.e., the signal is represented by a relatively small number of modulo large coefficients. To provide a “sparse” representation of a function that is uniformly Lipschitz regular with an exponent γ , the wavelet function participating in the discrete wavelet transform must have M continuous derivatives ( M γ ) and M vanishing moments. It also must decrease fast enough at infinity. It is further assumed that the Meyer wavelets [1] that satisfy all the necessary conditions are used to perform the wavelet transform.
If we apply the discrete wavelet transform to the sample (1), we obtain the set of empirical wavelet coefficients
V j , k = ν j , k + ξ j , k , j = 0 , , J 1 , k = 0 , , 2 j 1 .
Here ν j , k are the coefficients of the discrete wavelet transform of the sample
f x ( 1 ) , , f x ( N ) .
In general, V j , k are not equal to W j , k , and ν j , k are not equal to μ j , k . However, one can apply the same thresholding procedure to the coefficients V j , k as to the coefficients W j , k and obtain the estimators V ^ j , k . The following sections discuss the properties of the resulting estimators.

3. Mean-Square Thresholding Risk

The mean-square thresholding risk for a sample with random grid is defined as
R ν ( f , T ) = j = 0 J 1 k = 0 2 j 1 E ( V ^ j , k μ j , k ) 2 .
We also define the mean-square risk for the equispaced sample as
R μ ( f , T ) = j = 0 J 1 k = 0 2 j 1 E ( W ^ j , k μ j , k ) 2 .
The threshold selection is one of the main problems in threshold processing. For the class Lip ( γ , L ) , the threshold T γ = σ j 4 α γ 2 γ + α l n 2 j (calculated for each j) is close to optimal [16]. Using the results of [7] (Theorem 3), we can estimate the rate of R μ ( f , T γ ) .
Theorem 1.
Let α > 1 / 2 and f Lip ( γ , L ) on the segment [ 0 , 1 ] with γ > ( 4 α 2 ) 1 . Then for the threshold T γ we have
R μ ( f , T γ ) C 2 2 γ + α 2 α γ 2 γ + α J J 2 γ + 2 α 2 γ + α ,
where C is a positive constant.
Additionally, repeating the arguments of Theorem 1 in [15], it can be shown that similar statement is valid for R ν ( f , T γ ) when γ > max { ( 4 α 2 ) 1 , 1 / 2 } . Thus, the replacement of equally-spaced samples by random ones does not affect the upper estimate for the rate of the mean-square risk.

4. Properties of the Mean-Square Risk Estimate

Since expression (3) explicitly depends on the unknown values of μ j , k , it cannot be calculated in practice. However, it is possible to construct its estimate based only observable data. This estimate is determined by the expression
R ^ ν ( f , T ) = j = 0 J 1 k = 0 2 j 1 F [ V j , k , T ] ,
where F [ V j , k , T ] = ( V j , k 2 σ 2 ) 1 ( | V j , k | T ) + σ 2 1 ( | V j , k | > T ) for the hard threshold processing and F [ V j , k , T ] = ( V j , k 2 σ 2 ) 1 ( | V j , k | T ) + ( σ 2 + T 2 ) 1 ( | V j , k | > T ) for the soft threshold processing [1,3].
Estimator (4) provides an opportunity to get an idea of the evaluation error for the function f, since it can be calculated using only the observable values V j , k . The following statement establishes its asymptotic normality, that, in particular, allows constructing asymptotic confidence intervals for the mean-square risk (3).
Theorem 2.
Let f Lip ( γ , L ) on the segment [ 0 , 1 ] with γ > max { ( 4 α 2 ) 1 , 1 / 2 } , α > 1 / 2 , and let the Meyer wavelet satisfy the conditions listed above. Then for the hard and soft threshold processing we have
P R ^ ν ( f , T γ ) R ν ( f , T γ ) D J < x Φ ( x ) w h e n J ,
where Φ ( x ) is the distribution function of the standard normal law, D J 2 = C α 2 J , and the constant C α depends only on α and the wavelet type.
Remark 1.
In practice, one needs to know the constant C α . Unlike the case of independent observations, this constant depends on the chosen wavelet. The method of calculation of C α is discussed in [12].
Proof. 
Let us prove the theorem for the hard threshold processing method. In the case of soft threshold processing, the proof is similar.
Along with R ^ ν ( f , T γ ) , consider
R ^ μ ( f , T γ ) = j = 0 J 1 k = 0 2 j 1 F [ W j , k , T γ ]
and write the difference R ^ ν ( f , T γ ) R ν ( f , T γ ) in the form
R ^ ν ( f , T γ ) R ν ( f , T γ ) = R ^ μ ( f , T γ ) R μ ( f , T γ ) + R ˜ ,
where
R ˜ = R ^ ν ( f , T γ ) R ^ μ ( f , T γ ) ( R ν ( f , T γ ) R μ ( f , T γ ) ) .
In [12] with the use of the results of [17,18,19] it is shown that
P R ^ μ ( f , T γ ) R μ ( f , T γ ) D J < x Φ ( x ) when J .
Therefore, to prove the theorem, it suffices to show that
R ˜ 2 J / 2 P 0 when J .
Under the conditions γ > max { ( 4 α 2 ) 1 , 1 / 2 } and α > 1 / 2 , by virtue of Theorem 1 and a similar statement for R ν ( f , T ) , we obtain that
R ν ( f , T γ ) R μ ( f , T γ ) 2 J / 2 0 when J .
Set
j 0 α 2 γ + α J + 1 2 γ + α log 2 J .
Let us represent R ^ ν ( f , T γ ) R ^ μ ( f , T γ ) as
R ^ ν ( f , T γ ) R ^ μ ( f , T γ ) = S 1 + S 2 ,
where
S 1 = j = 0 j 0 1 k = 0 2 j 1 F [ V j , k , T γ ] F [ W j , k , T γ ] ,
S 2 = j = j 0 J 1 k = 0 2 j 1 F [ V j , k , T γ ] F [ W j , k , T γ ] .
Since for some constant C ˜ > 0 we have
| F [ V j , k , T γ ] | C ˜ T γ 2 , | F [ W j , k , T γ ] | C ˜ T γ 2 a . s . ,
then
S 1 2 J / 2 P 0 when J .
Next
S 2 = j = j 0 J 1 k = 0 2 j 1 F [ V j , k , T γ ] F [ W j , k , T γ ] = j = j 0 J 1 k = 0 2 j 1 ( V j , k 2 W j , k 2 ) +
+ j = j 0 J 1 k = 0 2 j 1 ( W j , k 2 2 σ 2 ) 1 ( | V j , k | T γ , | W j , k | > T γ ) +
+ j = j 0 J 1 k = 0 2 j 1 ( 2 σ 2 V j , k 2 ) 1 ( | V j , k | > T γ , | W j , k | T γ ) +
+ j = j 0 J 1 k = 0 2 j 1 ( W j , k 2 V j , k 2 ) 1 ( | V j , k | > T γ , | W j , k | > T γ ) .
Consider the sum j = j 0 J 1 k = 0 2 j 1 ( V j , k 2 W j , k 2 ) :
j = j 0 J 1 k = 0 2 j 1 ( V j , k 2 W j , k 2 ) = j = j 0 J 1 k = 0 2 j 1 ( ν j , k 2 μ j , k 2 ) + 2 j = j 0 J 1 k = 0 2 j 1 ξ j , k ( ν j , k μ j , k ) .
Using the results of [12,15,20], it can be shown that the conditional distribution of this sum for fixed x i is normal with the mean
j = j 0 J 1 k = 0 2 j 1 ( ν j , k 2 μ j , k 2 )
and the variance that is less than
C ˜ α j = j 0 J 1 k = 0 2 j 1 ( ν j , k μ j , k ) 2 ,
where C ˜ α is a positive constant.
Since f Lip ( γ , L ) , repeating the arguments of [20], it can be shown that
1 2 J / 2 E x | j = j 0 J 1 k = 0 2 j 1 ( ν j , k 2 μ j , k 2 ) | 0 ,
1 2 J / 2 E x j = j 0 J 1 k = 0 2 j 1 ( ν j , k μ j , k ) 2 0 .
Hence, applying the Markov inequality, we obtain
1 2 J / 2 j = j 0 J 1 k = 0 2 j 1 ( ν j , k 2 μ j , k 2 ) P 0 ,
1 2 J / 2 j = j 0 J 1 k = 0 2 j 1 ( ν j , k μ j , k ) 2 P 0
when J . Thus,
j = j 0 J 1 k = 0 2 j 1 ( V j , k 2 W j , k 2 ) 2 J / 2 P 0 when J .
The remaining sums in (6) contain indicators where either | V j , k | > T γ or | W j , k | > T γ . Repeating the reasoning from [12] and using (7), it can be shown that, when divided by 2 J / 2 , they also converge to zero in probability. The theorem is proven. □
Theorem 2 provides the possibility to construct asymptotic confidence intervals for the mean-square thresholding risk on the basis of its estimate.
In addition to the asymptotic normality, the estimator (4) also possesses the property of strong consistency.
Theorem 3.
Suppose that the conditions of Theorem 2 are satisfied. Then for hard or soft threshold processing, for any λ > 1 / 2 we have
R ^ ν ( f , T γ ) R ν ( f , T γ ) 2 λ J 0 a . s . w h e n   J .
Since (5) holds, for fixed x i the conditional version of Bosq inequality [21] (Theorem 1.3) applies for (4), and the proof of this statement almost completely repeats the proof of the corresponding risk estimator property in [13].

5. Discussion

As it has been already mentioned, Theorem 2 provides the possibility to construct asymptotic confidence intervals for the mean-square thresholding risk. For practical purposes, it is desirable to have guaranteed confidence intervals. These intervals could be constructed based on the estimates of the convergence rate in Theorem 2. The estimates should depend on the Lipschitz parameters and parameter α . Guaranteed confidence intervals would help to explain how the results of Theorems 2 and 3 affect the error estimation for the finite signal size. We therefore leave the problem of estimation of the rate of convergence and explicit numerical simulation for future work.
The obtained results are applicable to Meyer wavelets. Their advantage is that they possess infinitely many vanishing moments. It simplifies the proof of asymptotic normality in [12]. In view of the results of [8] it is clear that similar conclusions could be obtained with other wavelets that have a large enough number of vanishing moments (e.g., various Daubechies families).
It follows from Theorems 2 and 3 that the statistical properties of the mean-square risk estimator in a model with the uniform random design remain the same as in a model with equispaced samples. Note that this situation is not common. Random times of sample registration can also result in a random sample size. This situation was considered in [22]. In this case, the properties of the model can significantly differ from the properties of the fixed sample size model. For example, the limit distribution of the mean-square risk estimator can be a scale mixture of normal laws, which can have significantly heavier tails than the normal distribution. In particular, this distribution may belong to the class of stable laws, and it is well known that the variances of all stable laws, except the normal one, are infinite (the properties of stable distributions are discussed in detail in the monograph of V. M. Zolotarev [23]; see also [24]).

Funding

This research was funded by Russian Science Foundation, project number 18-11-00155.

Conflicts of Interest

The author declares no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: New York, NY, USA, 1999. [Google Scholar]
  2. Donoho, D.; Johnstone, I.M. Ideal Spatial Adaptation via Wavelet Shrinkage. Biometrika 1994, 81, 425–455. [Google Scholar] [CrossRef]
  3. Donoho, D.; Johnstone, I.M. Adapting to Unknown Smoothness via Wavelet Shrinkage. J. Am. Stat. Assoc. 1995, 90, 1200–1224. [Google Scholar] [CrossRef]
  4. Donoho, D.; Johnstone, I.M.; Kerkyacharian, G.; Picard, D. Wavelet shrinkage: Asymptopia? J. R. Stat. Soc. Ser. B 1995, 57, 301–369. [Google Scholar] [CrossRef]
  5. Marron, J.S.; Adak, S.; Johnstone, I.M.; Neumann, M.H.; Patil, P. Exact risk analysis of wavelet regression. J. Comput. Graph. Stat. 1998, 7, 278–309. [Google Scholar]
  6. Antoniadis, A.; Fan, J. Regularization of Wavelet Approximations. J. Am. Stat. Assoc. 2001, 96, 939–967. [Google Scholar] [CrossRef]
  7. Johnstone, I.M.; Silverman, B.W. Wavelet threshold estimates for data with correlated noise. J. R. Stat. Soc. Ser. B 1997, 59, 319–351. [Google Scholar] [CrossRef]
  8. Johnstone, I.M. Wavelet shrinkage for correlated data and inverse problems adaptivity results. Stat. Sin. 1999, 9, 51–83. [Google Scholar]
  9. Kudryavtsev, A.A.; Shestakov, O.V. Asymptotic behavior of the threshold minimizing the average probability of error in calculation of wavelet coefficients. Dokl. Math. 2016, 93, 295–299. [Google Scholar] [CrossRef]
  10. Kudryavtsev, A.A.; Shestakov, O.V. Asymptotically optimal wavelet thresholding in models with non-gaussian noise distributions. Dokl. Math. 2016, 94, 615–619. [Google Scholar] [CrossRef]
  11. Shestakov, O.V. Asymptotic normality of adaptive wavelet thresholding risk estimation. Dokl. Math. 2012, 86, 556–558. [Google Scholar] [CrossRef]
  12. Eroshenko, A.A. Statistical Properties of Signal and Image Estimates, Using Threshold Processing of Coefficients in Wavelet Decompositions. Ph.D. Thesis, M. V. Lomonosov Moscow State University, Moscow, Russia, 2015. [Google Scholar]
  13. Shestakov, O.V. Almost everywhere convergence of a wavelet thresholding risk estimate in a model with correlated noise. Moscow Univ. Comput. Math. Cybern. 2016, 40, 114–117. [Google Scholar] [CrossRef]
  14. Cai, T.; Brown, L. Wavelet Shrinkage for Nonequispaced Samples. Ann. Stat. 1998, 26, 1783–1799. [Google Scholar] [CrossRef]
  15. Cai, T.; Brown, L. Wavelet Estimation for Samples with Random Uniform Design. Stat. Probab. Lett. 1999, 42, 313–321. [Google Scholar] [CrossRef] [Green Version]
  16. Jansen, M. Noise Reduction by Wavelet Thresholding; Lecture Notes in Statistics; Springer: New York, NY, USA, 2001; Volume 161. [Google Scholar]
  17. Taqqu, M.S. Weak Convergence to Fractional Brownian Motion and to the Rosenblatt Process. Z. Wahrscheinlichkeitsth. Verw. Geb. 1975, 31, 287–302. [Google Scholar] [CrossRef]
  18. Bradley, R.C. Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions. Probab. Surv. 2005, 2, 107–144. [Google Scholar] [CrossRef] [Green Version]
  19. Peligrad, M. On the Asymptotic Normality of Sequences of Weak Dependent Random Variables. J. Theor. Probab. 1996, 9, 703–715. [Google Scholar] [CrossRef]
  20. Shestakov, O.V. Properties of wavelet estimates of signals recorded at random time points. Inform. Appl. 2019, 13, 16–21. [Google Scholar]
  21. Bosq, D. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction; Springer: New York, NY, USA, 1996. [Google Scholar]
  22. Shestakov, O.V. Convergence of the Distribution of the Threshold Processing Risk Estimate to a Mixture of Normal Laws at a Random Sample Size. Syst. Means Inform. 2019, 29, 31–38. [Google Scholar]
  23. Zolotarev, V.M. One-Dimensional Stable Distributions; AMS: Providence, RI, USA, 1986. [Google Scholar]
  24. Gnedenko, B.V.; Korolev, V.Y. Random Summation: Limit Theorems and Applications; CRC Press: Boca Raton, FL, USA, 1996. [Google Scholar]

Share and Cite

MDPI and ACS Style

Shestakov, O. Wavelet Thresholding Risk Estimate for the Model with Random Samples and Correlated Noise. Mathematics 2020, 8, 377. https://doi.org/10.3390/math8030377

AMA Style

Shestakov O. Wavelet Thresholding Risk Estimate for the Model with Random Samples and Correlated Noise. Mathematics. 2020; 8(3):377. https://doi.org/10.3390/math8030377

Chicago/Turabian Style

Shestakov, Oleg. 2020. "Wavelet Thresholding Risk Estimate for the Model with Random Samples and Correlated Noise" Mathematics 8, no. 3: 377. https://doi.org/10.3390/math8030377

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop