**1. Introduction**

In digital signal processing tasks, it is often assumed that the recorded signal samples are independent. However, there are many physical processes that demonstrate long-term dependence where correlations between observations decrease rather slowly. For example, long-term dependence is often observed in geophysical processes where it takes the form of long periods of large or small values of observations. Interferences in communication channels demonstrate similar phenomena. Wavelet methods are widely used in the analysis and processing of signals recorded during the study of such processes.

The wavelet decomposition of a function *f*(*x*) is a series

$$f(\mathbf{x}) = \sum\_{j,k \in \mathbb{Z}} \langle f, \psi\_{j,k} \rangle \psi\_{j,k}(\mathbf{x})\_{\mathbf{x}}$$

where *ψj*,*<sup>k</sup>*(*x*) = 2*j*/2*ψ*(2*jx* − *k*), and *ψ*(*x*) is a wavelet function. The indices *j* and *k* are called the scale and the shift, respectively. This decomposition provides a time/scale representation of the signal function, that allows one to localise its features. There exist many wavelet functions with various properties.

In practice, a discrete wavelet transform is used. It is a multiplication of a sampled signal vector by an orthogonal matrix defined by the wavelet function *ψ*(*x*) (in practice implemented with a fast cascade algorithm [1]). This transform is applied to data, and the threshold processing of the resulting wavelet coefficients is performed [1]. For a model of signal samples with an equispaced grid, these methods

were well studied by D. Donoho, I. Johnstone, B. Silverman and others [2–10]. Statistical properties of the mean-square risk estimator were also studied. It is shown that under certain conditions it is strongly consistent and asymptotically normal [11–13].

In some experiments it is not possible to record signal samples at regular intervals [14]. Sometimes registration of samples is made at random times. It was shown by T. Cai and L. Brown [15] that, if the sample points form a variational series based on a sample from the uniform distribution on the data registration interval, then the rate of the mean-square thresholding risk remains, up to a logarithmic factor, equal to the optimal rate in the class of Lipschitz regular functions. A special case of the uniform distribution appears, for example, when considering a Poisson process, and since the conditional distribution of its points on a given time interval, given the number of points, is uniform. These models can arise, for example, in astronomy when considering the stellar intensity. In this paper, it is proven that under some regularity conditions, the statistical properties of the risk estimator also remain the same for both equispaced and random sample grids.

## **2. Long-Term Dependence**

Let the signal function *f*(*x*) be defined on the segmen<sup>t</sup> [0, 1] and be uniformly Lipschitz regular with some exponent *γ* > 0 and Lipschitz constant *L* > 0: *f* ∈ Lip(*<sup>γ</sup>*, *<sup>L</sup>*). Assume that the samples of *f*(*x*) contain additive correlated noise and are recorded at random times that are independent and uniformly distributed on [0, 1]. Namely, consider the following data model:

$$Y\_{\bar{i}} = f(\mathbf{x}\_{\{\bar{i}\}}) + \varepsilon\_{\bar{i}\prime} \quad \bar{i} = 1, \dots, N \text{ (\$N = 2\${}^{\bar{I}}\$)},\tag{1}$$

where 0 ≤ *<sup>x</sup>*(1) < ... < *<sup>x</sup>*(*N*) ≤ 1 is the variational series based on a sample from the uniform distribution on the segmen<sup>t</sup> [0, 1], and {*ei*, *i* ∈ Z} is a stationary Gaussian process with the covariance sequence *rk* = **cov**(*ei*,*ei*+*<sup>k</sup>*). We assume that *ei* have zero mean and unit variance. We also assume that the noise autocovariance function decreases at the rate of *rk* ∼ *k*−*α*, where 0 < *α* < 1. This corresponds to the long-term dependence between observations [7].

The observations consist of pairs (*<sup>x</sup>*(1),*Y*1), ... ,(*<sup>x</sup>*(*N*),*YN*), where the distances between the samples are, generally, not equal. It is known that <sup>E</sup>*<sup>x</sup>*(*i*) = *i*/(*N* + 1) (see Lemma 2 in [15]). Along with (1), consider a sample with equal distances between sample points

$$\left(\frac{1}{N+1}, Z\_1\right), \dots, \left(\frac{N}{N+1}, Z\_N\right). \tag{2}$$

where

$$Z\_i = f\left(\frac{i}{N+1}\right) + \mathfrak{e}\_{i\prime} \quad i = 1, \ldots, N.$$

For the sample (2) threshold processing methods have been developed that effectively suppress the noise and provide an "almost" optimal rate of the mean-square risk [7,8]. The discrete wavelet transform with Meyer wavelets is applied to the sample (2) to obtain a set of empirical wavelet coefficients [1]

$$\mathcal{W}\_{j,k} = \mu\_{j,k} + 2^{\frac{(1-\mathfrak{a})(l-j)}{2}} \mathfrak{f}\_{j,k}, \ j = 0, \dots, l-1, \ k = 0, \dots, 2^j - 1,$$

where *μj*,*<sup>k</sup>* are discrete wavelet coefficients of the sample

$$f\left(\frac{1}{N+1}\right), \dots, f\left(\frac{N}{N+1}\right),$$

and the noise coefficients *ξj*,*<sup>k</sup>* have the standard normal distribution, but are not independent. The variances of *Wj*,*<sup>k</sup>* have the form *σ*<sup>2</sup> *j* = 2(<sup>1</sup>−*<sup>α</sup>*)(*J*−*j*) [12]. To suppress the noise the coefficients *Wj*,*<sup>k</sup>* are processed with the hard thresholding function *ρH*(*y*, *T*) = *x***<sup>1</sup>**(|*y*| > *T*) or the soft thresholding function *ρS*(*y*, *T*) = sgn(*x*)(|*y*| − *T*)+ with some threshold *T*, and the estimates *W j*,*k* are obtained. After that, the inverse wavelet transform is performed. The idea of threshold processing is that the wavelet transform provides a "sparse" representation of the useful signal function; i.e., the signal is represented by a relatively small number of modulo large coefficients. To provide a "sparse" representation of a function that is uniformly Lipschitz regular with an exponent *γ*, the wavelet function participating in the discrete wavelet transform must have *M* continuous derivatives (*M* ≥ *γ*) and *M* vanishing moments. It also must decrease fast enough at infinity. It is further assumed that the Meyer wavelets [1] that satisfy all the necessary conditions are used to perform the wavelet transform.

If we apply the discrete wavelet transform to the sample (1), we obtain the set of empirical wavelet coefficients

$$V\_{j,k} = \nu\_{j,k} + \mathfrak{z}\_{j,k}, \ j = 0, \dots, J-1, \ k = 0, \dots, 2^j - 1.$$

Here *<sup>ν</sup>j*,*<sup>k</sup>* are the coefficients of the discrete wavelet transform of the sample

$$f\left(x\_{(1)}\right), \ldots, f\left(x\_{(N)}\right).$$

In general, *Vj*,*<sup>k</sup>* are not equal to *Wj*,*k*, and *<sup>ν</sup>j*,*<sup>k</sup>* are not equal to *μj*,*k*. However, one can apply the same thresholding procedure to the coefficients *Vj*,*<sup>k</sup>* as to the coefficients *Wj*,*<sup>k</sup>* and obtain the estimators *V j*,*k*. The following sections discuss the properties of the resulting estimators.

#### **3. Mean-Square Thresholding Risk**

The mean-square thresholding risk for a sample with random grid is defined as

$$R\_V(f, T) = \sum\_{j=0}^{J-1} \sum\_{k=0}^{2^j - 1} \mathbb{E}(\hat{\mathcal{V}}\_{j,k} - \mu\_{j,k})^2. \tag{3}$$

We also define the mean-square risk for the equispaced sample as

$$R\_{\mu}(f,T) = \sum\_{j=0}^{J-1} \sum\_{k=0}^{2^j - 1} \mathbb{E}(\widehat{\mathcal{W}}\_{j,k} - \mu\_{j,k})^2.$$

The threshold selection is one of the main problems in threshold processing. For the class Lip(*<sup>γ</sup>*, *<sup>L</sup>*), the threshold *Tγ* = *<sup>σ</sup>j*1 4*αγ* 2*γ*+*α* ln2*j* (calculated for each *j*) is close to optimal [16]. Using the results of [7] (Theorem 3), we can estimate the rate of *<sup>R</sup>μ*(*f* , *<sup>T</sup>γ*).

**Theorem 1.** *Let α* > 1/2 *and f* ∈ Lip(*<sup>γ</sup>*, *L*) *on the segment* [0, 1] *with γ* > (<sup>4</sup>*α* − <sup>2</sup>)−1*. Then for the threshold Tγ we have*

$$\mathcal{R}\_{\mu}(f, T\_{\gamma}) \le C 2^{\frac{2\gamma + a - 2a\gamma}{2\gamma + a}} h^{\frac{2\gamma + 2a}{2\gamma + a}}.$$

*where C is a positive constant.*

Additionally, repeating the arguments of Theorem 1 in [15], it can be shown that similar statement is valid for *<sup>R</sup>ν*(*f* , *<sup>T</sup>γ*) when *γ* > max{(<sup>4</sup>*α* − <sup>2</sup>)−1, 1/2}. Thus, the replacement of equally-spaced samples by random ones does not affect the upper estimate for the rate of the mean-square risk.

#### **4. Properties of the Mean-Square Risk Estimate**

Since expression (3) explicitly depends on the unknown values of *μj*,*k*, it cannot be calculated in practice. However, it is possible to construct its estimate based only observable data. This estimate is determined by the expression

$$\hat{R}\_{\nu}(f, T) = \sum\_{j=0}^{J-1} \sum\_{k=0}^{2^j - 1} F[V\_{j,k\nu}, T]\_{\prime} \tag{4}$$

where *<sup>F</sup>*[*Vj*,*k*, *<sup>T</sup>*]=(*V*2*j*,*<sup>k</sup>* − *<sup>σ</sup>*<sup>2</sup>)**1**(|*Vj*,*<sup>k</sup>*| ≤ *T*) + *<sup>σ</sup>*2**1**(|*Vj*,*<sup>k</sup>*| > *T*) for the hard threshold processing and *<sup>F</sup>*[*Vj*,*k*, *<sup>T</sup>*]=(*V*2*j*,*<sup>k</sup>* − *<sup>σ</sup>*<sup>2</sup>)**1**(|*Vj*,*<sup>k</sup>*| ≤ *<sup>T</sup>*)+(*σ*<sup>2</sup> + *<sup>T</sup>*<sup>2</sup>)**1**(|*Vj*,*<sup>k</sup>*| > *T*) for the soft threshold processing [1,3].

Estimator (4) provides an opportunity to ge<sup>t</sup> an idea of the evaluation error for the function *f* , since it can be calculated using only the observable values *Vj*,*k*. The following statement establishes its asymptotic normality, that, in particular, allows constructing asymptotic confidence intervals for the mean-square risk (3).

**Theorem 2.** *Let f* ∈ Lip(*<sup>γ</sup>*, *L*) *on the segment* [0, 1] *with γ* > max{(<sup>4</sup>*α* − <sup>2</sup>)−1, 1/2}*, α* > 1/2*, and let the Meyer wavelet satisfy the conditions listed above. Then for the hard and soft threshold processing we have*

$$\mathbb{P}\left(\frac{\widehat{R}\_{\boldsymbol{\nu}}(f, T\_{\gamma}) - R\_{\boldsymbol{\nu}}(f, T\_{\gamma})}{D\_{f}} < x\right) \to \Phi(x) \text{ when } f \to \infty$$

*where* <sup>Φ</sup>(*x*) *is the distribution function of the standard normal law, D*2*J* = *<sup>C</sup>α*<sup>2</sup>*J, and the constant Cα depends only on α and the wavelet type.*

**Remark 1.** *In practice, one needs to know the constant C<sup>α</sup>. Unlike the case of independent observations, this constant depends on the chosen wavelet. The method of calculation of Cα is discussed in [12].*

**Proof.** Let us prove the theorem for the hard threshold processing method. In the case of soft threshold processing, the proof is similar.

Along with *R ν*(*f* , *<sup>T</sup>γ*), consider

$$\hat{R}\_{\mu}(f, T\_{\gamma}) = \sum\_{j=0}^{J-1} \sum\_{k=0}^{2^j - 1} F[\mathcal{W}\_{j, k\prime} T\_{\gamma}]$$

and write the difference *R ν*(*f* , *<sup>T</sup>γ*) − *<sup>R</sup>ν*(*f* , *<sup>T</sup>γ*) in the form

$$
\widehat{R}\_{\mathcal{V}}(f, T\_{\widehat{\mathcal{V}}}) - R\_{\mathcal{V}}(f, T\_{\widehat{\mathcal{V}}}) = \widehat{R}\_{\mathcal{V}}(f, T\_{\widehat{\mathcal{V}}}) - R\_{\mathcal{V}}(f, T\_{\widehat{\mathcal{V}}}) + \widetilde{R}\_{\mathcal{V}}
$$

where

$$
\widetilde{\mathcal{R}} = \widehat{\mathcal{R}}\_{\boldsymbol{\nu}}(\boldsymbol{f}, \boldsymbol{T}\_{\boldsymbol{\gamma}}) - \widehat{\mathcal{R}}\_{\boldsymbol{\mu}}(\boldsymbol{f}, \boldsymbol{T}\_{\boldsymbol{\gamma}}) - (\mathcal{R}\_{\boldsymbol{\nu}}(\boldsymbol{f}, \boldsymbol{T}\_{\boldsymbol{\gamma}}) - \mathcal{R}\_{\boldsymbol{\mu}}(\boldsymbol{f}, \boldsymbol{T}\_{\boldsymbol{\gamma}})) .
$$

In [12] with the use of the results of [17–19] it is shown that

$$\mathbb{P}\left(\frac{\widehat{R}\_{\mu}(f,T\_{\gamma}) - R\_{\mu}(f,T\_{\gamma})}{D\_{f}} < x\right) \to \Phi(x) \text{ when } f \to \infty.$$

Therefore, to prove the theorem, it suffices to show that

$$\frac{\stackrel{\curvearrowright}{R}}{2^{I/2}} \xrightarrow{P} 0 \text{ when } I \to \infty.$$

Under the conditions *γ* > max{(<sup>4</sup>*α* − <sup>2</sup>)−1, 1/2} and *α* > 1/2, by virtue of Theorem 1 and a similar statement for *<sup>R</sup>ν*(*f* , *<sup>T</sup>*), we obtain that

$$\frac{R\_{\nu}(f\_{\prime},T\_{\gamma}) - R\_{\mu}(f\_{\prime},T\_{\gamma})}{2^{f/2}} \to 0 \text{ when } f \to \infty.$$

Set

$$j\_0 \approx \frac{\alpha}{2\gamma + \alpha} J + \frac{1}{2\gamma + \alpha} \log\_2 J.$$

> Let us represent *R ν*(*f* , *<sup>T</sup>γ*) − *R μ*(*f* , *<sup>T</sup>γ*) as

$$
\widehat{\mathcal{R}}\_{\mathcal{V}}(f, T\_{\mathcal{T}}) - \widehat{\mathcal{R}}\_{\mathcal{V}}(f, T\_{\mathcal{T}}) = \mathcal{S}\_1 + \mathcal{S}\_{2\mathcal{V}}
$$

where

$$S\_1 = \sum\_{j=0}^{j\_0-1} \sum\_{k=0}^{2^j-1} \left( F[V\_{j,k\prime}T\_\gamma] - F[W\_{j,k\prime}T\_\gamma] \right),$$

$$S\_2 = \sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j-1} \left( F[V\_{j,k\prime}T\_\gamma] - F[W\_{j,k\prime}T\_\gamma] \right).$$

Since for some constant *C* ˜ > 0 we have

$$\left| F[V\_{j,k\_{\prime}}T\_{\gamma}] \right| \leq \mathcal{C}T\_{\gamma\prime}^{2} \quad \left| F[W\_{j,k\_{\prime}}T\_{\gamma}] \right| \leq \mathcal{C}T\_{\gamma}^{2} \text{ a.s.} \tag{5}$$

then

$$\frac{S\_1}{2^{I/2}} \xrightarrow{P} 0 \text{ when } J \to \infty.$$

Next

$$\begin{split} \mathcal{S}\_{2} &= \sum\_{j=j\_{0}}^{I-1} \sum\_{k=0}^{2^{j}-1} \left( F[V\_{j,k}, T\_{\gamma}] - F[W\_{j,k}, T\_{\gamma}] \right) = \sum\_{j=j\_{0}}^{I-1} \sum\_{k=0}^{2^{j}-1} (V\_{j,k}^{2} - W\_{j,k}^{2}) + \\ &+ \sum\_{j=j\_{0}}^{I-1} \sum\_{k=0}^{2^{j}-1} (W\_{j,k}^{2} - 2\sigma^{2}) \mathbf{1}(|V\_{j,k}| \le T\_{\gamma\prime}, |W\_{j,k}| > T\_{\gamma}) + \\ &+ \sum\_{j=j\_{0}}^{I-1} \sum\_{k=0}^{2^{j}-1} (2\sigma^{2} - V\_{j,k}^{2}) \mathbf{1}(|V\_{j,k}| > T\_{\gamma\prime}, |W\_{j,k}| \le T\_{\gamma}) + \\ &+ \sum\_{j=j\_{0}}^{I-1} \sum\_{k=0}^{2^{j}-1} (W\_{j,k}^{2} - V\_{j,k}^{2}) \mathbf{1}(|V\_{j,k}| > T\_{\gamma\prime}, |W\_{j,k}| > T\_{\gamma}). \end{split}$$

Consider the sum *J*−1 ∑ *j*=*j*0 2*j*−1 ∑ *k*=0 (*V*2*j*,*<sup>k</sup>* − *<sup>W</sup>*2*j*,*<sup>k</sup>*):

$$\sum\_{j=j\text{j}}^{J-1} \sum\_{k=0}^{2^j - 1} (V\_{j,k}^2 - \mathcal{W}\_{j,k}^2) = \sum\_{j=j\text{j}}^{J-1} \sum\_{k=0}^{2^j - 1} (\nu\_{j,k}^2 - \mu\_{j,k}^2) + 2 \sum\_{j=j\text{j}}^{J-1} \sum\_{k=0}^{2^j - 1} \xi\_{j,k} (\nu\_{j,k} - \mu\_{j,k}).$$

Using the results of [12,15,20], it can be shown that the conditional distribution of this sum for fixed *xi* is normal with the mean

$$\sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j - 1} (\nu\_{j,k}^2 - \mu\_{j,k}^2)$$

and the variance that is less than

$$\tilde{C}\_{\alpha} \sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j - 1} \left( \nu\_{j,k} - \mu\_{j,k} \right)^2 \nu\_{\alpha}$$

where *C* ˜ *α* is a positive constant.

> Since *f* ∈ Lip(*<sup>γ</sup>*, *<sup>L</sup>*), repeating the arguments of [20], it can be shown that

$$\frac{1}{2^{J/2}}\mathbb{E}\_{\mathbf{x}}\left|\sum\_{j=j\_0}^{J-1}\sum\_{k=0}^{2^j-1} (\nu\_{j,k}^2 - \mu\_{j,k}^2)\right| \to 0,$$

$$\frac{1}{2^{1/2}}\mathbb{E}\_{\mathbf{x}}\sum\_{j=j\_0}^{J-1}\sum\_{k=0}^{2^j-1}(\nu\_{j,k}-\mu\_{j,k})^2 \to 0.\tag{7}$$

Hence, applying the Markov inequality, we obtain

$$\begin{aligned} \frac{1}{2^{I/2}} \sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j - 1} (\nu\_{j,k}^2 - \mu\_{j,k}^2) & \xrightarrow{\mathbb{P}} 0, \\\\ \frac{1}{2^{I/2}} \sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j - 1} (\nu\_{j,k} - \mu\_{j,k})^2 & \xrightarrow{\mathbb{P}} 0 \end{aligned}$$

when *J* → ∞. Thus,

$$\frac{\sum\_{j=j\_0}^{J-1} \sum\_{k=0}^{2^j - 1} (V\_{j,k}^2 - W\_{j,k}^2)}{2^{J/2}} \xrightarrow{\mathbb{P}} 0 \text{ when } J \to \infty.$$

The remaining sums in (6) contain indicators where either |*Vj*,*<sup>k</sup>*| > *Tγ* or |*Wj*,*<sup>k</sup>*| > *<sup>T</sup>γ*. Repeating the reasoning from [12] and using (7), it can be shown that, when divided by 2*<sup>J</sup>*/2, they also converge to zero in probability. The theorem is proven.

Theorem 2 provides the possibility to construct asymptotic confidence intervals for the mean-square thresholding risk on the basis of its estimate.

In addition to the asymptotic normality, the estimator (4) also possesses the property of strong consistency.

**Theorem 3.** *Suppose that the conditions of Theorem 2 are satisfied. Then for hard or soft threshold processing, for any λ* > 1/2 *we have*

$$\frac{\widehat{R}\_{\mathcal{V}}(f, T\_{\mathcal{V}}) - R\_{\mathcal{V}}(f, T\_{\mathcal{V}})}{2^{\lambda f}} \to 0 \text{ a.s. when } f \to \infty.$$

Since (5) holds, for fixed *xi* the conditional version of Bosq inequality [21] (Theorem 1.3) applies for (4), and the proof of this statement almost completely repeats the proof of the corresponding risk estimator property in [13].
