**Limit Theorems of Probability Theory**

Editors

**Alexander Tikhomirov Vladimir Ulyanov**

Basel • Beijing • Wuhan • Barcelona • Belgrade • Novi Sad • Cluj • Manchester

*Editors* Alexander Tikhomirov Institute of Physics and Mathematics Komi Science Center of Ural Division of the Russian Academy of Sciences Syktyvkar, Russia

Vladimir Ulyanov Faculty of Computational Mathematics and Cybernetics Lomonosov Moscow State University Moscow, Russia

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Mathematics* (ISSN 2227-7390) (available at: https://www.mdpi.com/si/mathematics/Limit Theo Probab Theory).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

Lastname, A.A.; Lastname, B.B. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-9192-6 (Hbk) ISBN 978-3-0365-9193-3 (PDF) doi.org/10.3390/books978-3-0365-9193-3**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license. The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons Attribution-NonCommercial-NoDerivs (CC BY-NC-ND) license.

## **Contents**


### **Alexander N. Tikhomirov**


## **About the Editors**

#### **Alexander Tikhomirov**

Alexander Tikhomirov has been a professor at the Institute of Physics and Mathematics, Komi Science Center of Ural Division of the Russian Academy of Sciences, Russia, since 2008. Professor Tikhomirov received his Ph.D. degree in mathematics from St. Petersburg State University, Russia, in 1977 and his Habilitation (Doctor of Sciences) from the Steklov Mathematical Institute of the Russian Academy of Sciences, Russia, in 1996. From 1997 to 2020, he worked at the Faculty of Mathematics at Syktyvkar State University, where he became a full professor in 1998. His research focuses on random matrices, strong mixing conditions, limit theorems, and circular law.

#### **Vladimir Ulyanov**

Vladimir Ulyanov is currently a professor at the Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Russia, and a professor at the Faculty of Social Sciences, National Research University—Higher School of Economics, Russia. He received his Ph.D. degree from Lomonosov Moscow State University in 1978 and his Habilitation (Doctor of Sciences) from Steklov Mathematical Institute of the Russian Academy of Sciences in 1994. He was awarded the State Prize of the USSR for Young Scientists in 1987. He worked as an Alexander von Humboldt Research Fellow in Germany from 1991 to 1993 and a JSPS Research Fellow in Japan in 1999 and 2004, respectively. He has worked as a visiting professor/researcher at Bielefeld University, Germany; the University of Leiden; the University de Paris V; the University of Hong Kong; the Institute of Statistical Mathematics in Tokyo; the National University of Singapore; the University of Melbourne; Shandong University, China, etc. He is currently a Member of the Bernoulli Society. His research lies in limit theorems of probability theory, vector-valued random variables, weak limit theorems, Gaussian processes, approximation in statistics, and transforms of probability distributions.

**Alexander N. Tikhomirov 1,2,***<sup>∗</sup>* **and Vladimir V. Ulyanov 2,3,4,***<sup>∗</sup>*


M. Loeve wrote that "the fundamental limit theorems of Probability theory may be classified into two groups. One group deals with the problem of limit laws of sequences of some of random variables, the other deals with the problem of limits of random variables, in the sense of almost sure convergence, of such sequences. These problems will be labeled, respectively, the Central Limit Problem (CLP) and the Strong Central Limit Problem (SCLP). Like all mathematical problems, the CLP and SCLP are not static; as answers to old queries are discovered they experience the usual development and new problems arise".

The papers in this Special Issue present new directions and new advances for limit theorems in probability theory and its applications. The list of topics is extensive, and it includes classical models of sums of both independent and various types of dependent random variables, probabilities of large deviations, functional limit theorems, and limit theorems for random processes, in high-dimensional spaces, for spectra of random matrices and random graphs, and more.

In [1], Xia Wang and Miaomiao Zhang obtain a large deviation principle for the maximum of the absolute value of partial sums of independent, identically distributed, centered, random variables. It is assumed that tail probabilities for "positive" and "negative" tails of the summand have the same exponential decrease.

Estimating the expected value of a random variable via data-driven methods is one of the most fundamental problems in statistics. In [2], Rundong Luo, Yiming Chen, and Shuai Song present an extension of Olivier Catoni's classical M-estimators of the empirical mean, which focus on heavy-tailed data by imposing more precise inequalities on exponential moments of Catoni's estimator. The authors show that their estimators behave better than Catoni's estimators, both in practice and theory. The results obtained are illustrated on modeled and real data.

Paper [3], by Friedrich Götze and Andrei Yu Zaitsev, deals with studying a connection of the Littlewood–Offord problem to estimations of the concentration functions of some symmetric, infinitely divisible distributions. It is shown that the concentration function of a weighted sum of independent, identically distributed, random variables is estimated in terms of the concentration function of a symmetric, infinitely divisible distribution, whose spectral measure is concentrated on the set of plus–minus weights.

There has been a renewed interest in exponential concentration inequalities for stochastic processes in probability and statistics over the last three decades. De la Peña established a good exponential inequality for a discrete time, locally square, integrable martingale. In [4],, Naiqi Liu, Vladimir V. Ulyanov, and Hanchao Wang obtain de la Peña's inequalities for a stochastic integral of multivariate point processes. The proof is primarily based on the Doléans-Dade exponential formula and the optional stopping theorem. As an application, they obtain an exponential inequality for block counting process in the Λ–coalescent.

**Citation:** Tikhomirov, A.N.; Ulyanov, V.V. On the Special Issue "Limit Theorems of Probability Theory". *Mathematics* **2023**, *11*, 3665. https:// doi.org/10.3390/math11173665

Received: 8 August 2023 Accepted: 16 August 2023 Published: 25 August 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

In [5], Alexander N. Tikhomirov and Dmitry A. Timushev prove the local Marchenko– Pastur law for sparse sample covariance matrices that corresponded to rectangular observation matrices and sparse probability. The new bounds of the distance between Laplace transforms of the empirical spectral distribution function of the sparse sample covariance matrices and the Marchenko–Pastur law distribution function are obtained in the complex domain. It is assumed that a sparse probability and the moments of the matrix elements satisfy some conditions.

In see [6], Mihailo Jovanovi´c, Vladica Stojanovi´c, Kristijan Kuk, Brankica Popovi´c, and Petar Cisar describe one of the non-linear (and non-stationary) stochastic models, the ˇ Gaussian, or Generalized, Split-BREAK (GSB) process, which is used in the analysis of time series with pronounced and accentuated fluctuations. In the beginning, the stochastic structure of the GSB process and its important distributional and asymptotic properties are given. To that end, a method based on characteristic functions (CFs) was used. Various procedures for the estimation of model parameters, asymptotic properties, and numerical simulations of the obtained estimators are also investigated. Finally, as an illustration of the practical application of the GSB process, an analysis of the dynamics and stochastic distribution of the infected and immunized populations in relation to COVID-19 in the Republic of Serbia is presented.

The Poisson Stochastic Index process (PSI-process) represents a special kind of a random process, when the discrete time of a random sequence is replaced by the continuous time of a "counting" process of a Poisson type. In [7], Yuri Yakubovich, Oleg Rusakov, and Alexander Gushchin establish a functional limit theorem for normalized cumulative sums of PSI-processes in the Skorokhod space. This theorem can be used in different ways. The PSI-processes are very simple, and some results can be obtained directly for their sums and imply the corresponding facts of the limiting stationary Gaussian process. On the other hand, the theory of stationary Gaussian processes has been deeply developed in the last few decades, and some results of this theory can have consequences for pre-limiting processes, which model a number of real life phenomena.

In [8], Igor Borisov and Maman Jetpisbaev consider a class of additive functionals of a finite or countable collection of the group frequencies of an empirical point process that corresponds to, at most, a countable partition of the sample space. Under broad conditions, it is shown that the asymptotic behavior of the distributions of such functionals is similar to the behavior of the distributions of the same functionals of the accompanying Poisson point process. However, the Poisson versions of the additive functionals under consideration, unlike the original ones, have the structure of sums (finite or infinite) of independent random variables, which allows them to reduce the asymptotic analysis of the distributions of additive functionals of an empirical point process to classical problems of the theory of summation of independent random variables.

In [9], Shuya Kanagawa investigates asymptotic expansions for *U*-statistics and *V*-statistics with degenerate kernels, and finds the order estimates for the remainder terms. It implies the corresponding results for the Cramér–von Mises statistics of a uniform distribution on (0,1). The scheme of the proof is based on three steps. The first one is the almost certain convergence in a Fourier series expansion of the kernel function. The key condition for the convergence is the nuclearity of a linear operator defined by the kernel function. The second one is a representation of *U*-statistics or *V*-statistics, by single sums of Hilbert space valued random variables. The third one is the application of asymptotic expansions for single sums of Hilbert space valued random variables.

In [10], Alexander Bulinski and Nikolay Slepov study the convergence rate in the famous Rényi theorem by means of the Stein method refinement. Namely, it is demonstrated that the new estimate of the convergence rate of the normalized geometric sums to exponential laws involving the ideal probability metric of the second order is sharp. Some recent results concerning the convergence rates in Kolmogorov and Kantorovich metrics are extended as well. In contrast to many previous works, there are no assumptions that the summands of geometric sums are positive and have the same distribution. For the

first time, an analogue of the Rényi theorem is established for the model of exchangeable random variables. Furthermore, within this model, a sharp estimate of convergence rate to a specified mixture of distributions is provided. The convergence rate of the appropriately normalized random sums of random summands to the generalized gamma distribution is estimated. Here, the number of summands follows the generalized negative binomial law. The sharp estimates of the proximity of random sums of random summand distributions to the limit law are established both for independent summands and for the model of exchangeable ones. The inverse to the equilibrium transformation of the probability measures is introduced and, in this way, a new approximation of the Pareto distributions by exponential laws is proposed. The integral probability metrics, and the techniques of integration with respect to sign measures, are essentially employed.

In [11], Yasunori Fujikoshi and Tetsuro Sakurai consider the high-dimensional consistencies of KOO methods for selecting response variables in multivariate linear regression with some covariance structures. The method, which was named the knock-one-out (KOO) method, determines "selection" or "no selection" for each variable by comparing the model that removes that variable and the full model. It is assumed that the covariance structure is one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. A sufficient condition for model selection consistency is obtained using a KOO method under a high-dimensional asymptotic framework, such that sample size, the number of response variables, and the number of explanatory variables are large.

In [12], Alexander N. Tikhomirov considers the limit of the empirical spectral distribution of Laplace matrices of generalized random graphs. Applying the Stieltjes transform method, the author proves under general conditions that the limit spectral distribution of Laplace matrices converges with the free convolution of the semicircular law and the normal law.

In [13], Gerd Christoph and Vladimir V. Ulyanov complete their studies on the formal construction of asymptotic approximations for statistics based on a random number of observations. Second-order Chebyshev–Edgeworth expansions of asymptotically normally or chi-squared distributed statistics from samples with negative binomial or Pareto-like distributed random sample sizes are obtained. The results can have applications for a wide spectrum of asymptotically normally or chi-square distributed statistics. Random, non-random, and mixed scaling factors for each of the studied statistics produce three different limit distributions. In addition to the expected normal or chi-squared distributions, Student's t-, Laplace, Fisher, gamma, and weighted sums of generalized gamma distributions also occur.

The Kolmogorov and total variation distance between the laws of random variables have upper bounds are represented by the *L*1-norm of densities when random variables have densities. In [14], Yoon-Tae Kim and Hyun-Suk Park derive an upper bound, in terms of densities such as the Kolmogorov and total variation distance, for several probabilistic distances (e.g., Kolmogorov distance, total variation distance, Wasserstein distance, Forter– Mourier distance, etc.) between the laws of *F* and *G* in the case where a random variable *F* follows the invariant measure that admits a density and a differentiable random variable *G*, in the sense of Malliavin calculus, and also allows a density function.

In [15], Manuel L. Esquível and Nadezhda P. Krasii describe the structure of the random matrices by deterministic matrices, forming the skeletons of the random matrices. The authors propose to use an algorithm of matrix substitutions with entries in a finite field of integers that modulo some prime number, akin to the algorithm of one dimensional automatic sequences. A random matrix has the structure of a given skeleton if, to the same number of an entry of the skeleton in the finite field, it corresponds a random variable having, at least, as its expected value, the correspondent value of the number in the finite field. Affine matrix substitutions are introduced, and fixed-point theorems that allow for the consideration of steady states of the structure, which are essential for an efficient

observation, are proven. For some more restricted classes of structured random matrices, the parameter estimation of the entries is addressed, as well as the convergence in law, and also some aspects of the spectral analysis of the random operators associated with the random matrix. Finally, aiming at possible applications, it is shown that there is a procedure to associate a canonical random surface to every random structured matrix of a certain class.

In summary, this Special Issue proposes and develops new mathematical methods and approaches, new algorithms and research frameworks, and their applications to solve various nontrivial practical problems. We strongly believe that the selected topics and results will be attractive and useful to the international scientific community, and will contribute to further research in the field of limit theorems in probability theory.

**Acknowledgments:** The research activity of the Guest Editors was conducted within the framework of the HSE University Basic Research Programs and within the program of the Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Large Deviations for the Maximum of the Absolute Value of Partial Sums of Random Variable Sequences**

**Xia Wang \* and Miaomiao Zhang**

Faculty of Science, College of Statistics and Date Science, Beijing University of Technology, Beijing 100124, China; zhangmiaomiaoqd@163.com

**\*** Correspondence: wangxia@bjut.edu.cn

**Abstract:** Let {*ξ<sup>i</sup>* : *i* ≥ 1} be a sequence of independent, identically distributed (i.i.d. for short) centered random variables. Let *Sn* = *ξ*<sup>1</sup> + ··· + *ξ<sup>n</sup>* denote the partial sums of {*ξi*}. We show that sequence { <sup>1</sup> *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1} satisfies the large deviation principle (LDP, for short) with a good rate function under the assumption that *P*(*ξ*<sup>1</sup> ≥ *x*) and *P*(*ξ*<sup>1</sup> ≤ −*x*) have the same exponential decrease.

**Keywords:** large deviation principle; principle of the largest term; maximum of the absolute value of partial sums

#### **1. Introduction**

Throughout this paper, on a probability space {Ω, F, *P*}, let {*ξ<sup>i</sup>* : *i* ≥ 1} be a sequence of independent, identically distributed (i.i.d.) centered random variables that take real values. Denote the partial sums *Sn* := ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *ξ<sup>i</sup>* of sequence {*ξ<sup>i</sup>* : *i* ≥ 1}.

The seminal paper of Cramer [1] motivates our work. Cramer obtained that { <sup>1</sup> *<sup>n</sup> Sn* : *n* ≥ 1} satisfies large deviation principle (LDP) with rate function Λ∗(*x*) (see Theorem 1) under finite moments, which is the famous Cramer condition, i.e., if there exists a *δ* > 0 such that *Eeλ*|*X*1<sup>|</sup> < ∞ for all |*λ*| < *δ* . Cramer theorem has the following form for any measurable set *B* ⊂ R:

$$-\inf\_{x\in B^{\circ}}\Lambda^\*(x) \le \liminf\_{n\to\infty} \frac{1}{n} \log P(\frac{S\_n}{n} \in B) \tag{1}$$

$$\leq \limsup\_{n \to \infty} \frac{1}{n} \log P(\frac{S\_n}{n} \in B) \leq -\inf\_{\mathbf{x} \in B} \Lambda^\*(\mathbf{x}).\tag{2}$$

where *B*◦ denotes the interior of *B* and *B*¯ denotes its closure. We call inequality (1) the large deviations lower bound and inequality (2) the large deviations upper bound. If both hold, then sequence { <sup>1</sup> *<sup>n</sup> Sn* : *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>} satisfies LDP with rate function <sup>Λ</sup>∗(*x*). In other words, the theory of LDP deals with large fluctuations and the probability of such large fluctuations decays exponentially.

The tail probability *P*(*Sn* ≥ *nx*) of independent random variables was researched in detail in many papers. Nagaev [2] obtained that partial sums { <sup>1</sup> *<sup>n</sup> Sn* : *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>} for i.i.d. random variables and found that it satisfies LDP under the assumption that *P*(*ξ*<sup>1</sup> ≥ *x*) decreases similarly to a power function. Soon, Nagaev [3] obtained the bounds for probabilities of partial sums of independent random variables, by weakening the requirement, on the hypothesis that generalized and ordinary moments are finite. Under Cramer condition, Kiesel and Stadtmuller [4] extended Cramer theorem to weighted sums of i.i.d. random variables. Moreover, Gantert, Ramanan and Rembart [5] researched the LDP for weighted sums of i.i.d. random variables with stretched exponential tails.

**Citation:** Wang, X.; Zhang, M. Large Deviations for the Maximum of the Absolute Value of Partial Sums of Random Variables Sequence. *Mathematics* **2022**, *10*, 758. https:// doi.org/10.3390/math10050758

Academic Editors: Alexander Tikhomirov, Vladimir Ulyanov and Mark Kelbert

Received: 14 January 2022 Accepted: 25 February 2022 Published: 27 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The tail probability*P*( max 1≤*k*≤*n Sk* ≥ *nx*) has been researched in depth. Under the Cramer condition, Borovkov and Korshunov [6] conducted work for time-homogeneous Markov chain and Shklyaev [7] conducted work for i.i.d. random variables; both obtained LDP. Soon after, Kozlov [8] obtained LDP results by applying a direct probability approach to *P*( max 1≤*k*≤*n Sk* ≥ *nx*) of i.i.d. non-degenerate random variables, which obey the Cramer condition. Lately, Fan, Grama and Liu [9] established the LDP for sequence { 1 *<sup>n</sup>* max 1≤*k*≤*n Sk* : *n* ≥ 1} of martingale differences random variables under finite subexponential moments condition.

Feller [10] mentioned the importance of the estimation of tail probability *P*( max |*Sk*| ≥

1≤*k*≤*n nx*), which has attracted broad attention in recent decades. Recently, Li [11] established the upper bound estimation for probability *P*( max 1≤*k*≤*n* |*Sk*| ≥ *nx*) of martingale differences random variables bounded in *Lp*. For strictly stationary and negatively associated random variables, Xing and Yang [12] obtained some exponential inequalities for the maximum of the absolute value of partial sums via classical techniques based on blocking and truncation. Moreover, the upper bound estimation for tail probability *P*( max 1≤*k*≤*n* |*Sk*| ≥ *nx*) for martin-

gale differences random variables was obtained by Fan, Grama and Liu [13] in situations where conditional subexponential moments are bounded.

The above results demonstrate that the research for probability { <sup>1</sup> *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1} only obtained large deviations upper bound. To fill this gap, we shall primarily obtain the result that sequence { <sup>1</sup> *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1} of i.i.d. random variables satisfies LDP under the assumption that *P*(*ξ*<sup>1</sup> ≥ *x*) and *P*(*ξ*<sup>1</sup> ≤ −*x*) have the same exponential decrease (see Corollary 1), i.e., we obtain large deviations lower bound and large deviations upper bound.

This article is organized as follows. We firstly introduce the necessary knowledge about definitions and theorems that we need in Section 2. Then, the main theorems and corollaries are presented in Section 3. Moreover, in Section 4, we provide the lemmas needed to prove the conclusions and proofs of our main results.

#### **2. Preliminaries**

Before we present our results and proofs, we introduce some definitions and theorems that can be found in cf. [14,15].

**Definition 1.** *(1) A function I:* F → [0, ∞] *is called a rate function if it is non-negative and lower semicontinuous, i.e., the level sets* {*x* : *I*(*x*) ≤ *α*} *are closed, for α* ∈ R*. (2) A rate function I is said to be good if, in addition, its level sets are compact.*

**Definition 2.** *We say that a sequence of random variables* {*ξ<sup>i</sup>* : *i* ≥ 1} *satisfies LDP in* R *with rate function I if I is a rate function and, for any measurable set B* ∈ B(R), *the following is the case:*

$$\begin{aligned} -\inf\_{\boldsymbol{x}\in B^{\circ}} I(\boldsymbol{x}) &\leq \liminf\_{n\to\infty} \frac{1}{n} \log P(\boldsymbol{\xi}\_n \in B) \\ &\leq \limsup\_{n\to\infty} \frac{1}{n} \log P(\boldsymbol{\xi}\_n \in B) \leq -\inf\_{\boldsymbol{x}\in B} I(\boldsymbol{x}), \end{aligned}$$

*where B*◦ *denotes the interior of B, and B denotes its closure.* ¯

**Theorem 1.** *(Cramer's theorem) Let* {*ξ<sup>i</sup>* : *i* ≥ 1} *be a sequence of i.i.d. real value random variables on* (Ω, <sup>F</sup>, *<sup>P</sup>*)*. Let partial sums Sn* <sup>=</sup> *<sup>n</sup>* ∑ *i*=1 *ξi, and let* Λ(*θ*) *be log moment generating function of ξ*1*, i.e.,* Λ(*θ*) = *logEeθξ*<sup>1</sup> *, and let* Λ∗(*x*) *be convex conjugate of* Λ*, i.e.,* Λ∗(*x*) =

sup *θ*∈*R* {*θξ* − Λ(*θ*)}*. Then,* { *Sn <sup>n</sup>* : *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>} *satisfies LDP with rate function* <sup>Λ</sup>∗(*x*) *in* <sup>R</sup> *if* <sup>Λ</sup> *is finite in a neighborhood of zero, i.e., for any measurable set B* ⊂ R *of the following:*

$$\begin{aligned} -\inf\_{x \in B^{\circ}} \Lambda^{\*}(x) &\leq \liminf\_{n \to \infty} \frac{1}{n} \log P(\frac{S\_n}{n} \in B) \\ &\leq \limsup\_{n \to \infty} \frac{1}{n} \log P(\frac{S\_n}{n} \in B) \leq -\inf\_{x \in B} \Lambda^{\*}(x). \end{aligned}$$

**Theorem 2.** *(Principle of the largest term) Let an and bn be sequences in* R+*. Then, the following is the case:*

$$\limsup\_{n \to \infty} \frac{1}{n} \log(a\_n + b\_n) \le \limsup\_{n \to \infty} \frac{1}{n} \log(a\_n) \lor \limsup\_{n \to \infty} \frac{1}{n} \log(b\_n).$$

*and the following is the case.*

$$\liminf\_{n \to \infty} \frac{1}{n} \log(a\_n + b\_n) \ge \liminf\_{n \to \infty} \frac{1}{n} \log(a\_n) \lor \liminf\_{n \to \infty} \frac{1}{n} \log(b\_n).$$

#### **3. Main Results**

Let {*ξ<sup>i</sup>* : *<sup>i</sup>* <sup>≥</sup> <sup>1</sup>} be a sequence of i.i.d. centered random variables and denote *Sn* :<sup>=</sup> *<sup>n</sup>* ∑ *i*=1 *<sup>ξ</sup>i*. Then, we shall investigate the LDP for the sequence of { <sup>1</sup> *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1}. The main results of this paper are as follows.

**Theorem 3.** *Let* {*ξ<sup>i</sup>* : *i* ≥ 1} *be a sequence of i.i.d. random variables. If Eξ*<sup>1</sup> = 0*, Eξ*<sup>1</sup> <sup>2</sup> < ∞ *and for some constants α* ∈ (0, 1)*,* 0 < *C*<sup>1</sup> ≤ *C*2*, the following is the case:*

$$-\mathbb{C}\_2 \le \liminf\_{x \to \infty} \frac{1}{x^a} \log P(\mathbb{f}\_1 \ge x) \le \limsup\_{x \to \infty} \frac{1}{x^a} \log P(\mathbb{f}\_1 \ge x) \le -\mathbb{C}\_{1,1}$$

*then for all x > 0, we have the following.*

$$\begin{aligned} -\mathsf{C}\_{2}\mathfrak{x}^{\mathfrak{a}} &\leq \liminf\_{n\to\infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1\leq k\leq n} S\_{k} \geq n\mathfrak{x})\\ &\leq \limsup\_{n\to\infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1\leq k\leq n} S\_{k} \geq n\mathfrak{x}) \leq -\mathsf{C}\_{1}\mathfrak{x}^{\mathfrak{a}}.\end{aligned}$$

**Theorem 4.** *Let* {*ξ<sup>i</sup>* : *i* ≥ 1} *be a sequence of i.i.d. random variables. If Eξ*<sup>1</sup> = 0 *and for some constants α* ∈ (0, 1)*,* 0 < *C*<sup>1</sup> ≤ *C*2*,* 0 < *C*<sup>3</sup> ≤ *C*4*, the following is the case:*

$$-\mathbb{C}\_2 \le \liminf\_{\mathbf{x} \to \infty} \frac{1}{\mathbf{x}^a} \log P(\mathbb{f}\_1 \ge \mathbf{x}) \le \limsup\_{\mathbf{x} \to \infty} \frac{1}{\mathbf{x}^a} \log P(\mathbb{f}\_1 \ge \mathbf{x}) \le -\mathbb{C}\_1.$$

$$-\mathbb{C}\_4 \le \liminf\_{\mathbf{x} \to \infty} \frac{1}{\mathbf{x}^a} \log P(\mathbb{f}\_1 \le -\mathbf{x}) \le \limsup\_{\mathbf{x} \to \infty} \frac{1}{\mathbf{x}^a} \log P(\mathbb{f}\_1 \le -\mathbf{x}) \le -\mathbb{C}\_3.$$

*then for all x > 0, we have the following.*

$$\begin{aligned} -\left(\mathsf{C}\_{2}\wedge\mathsf{C}\_{4}\right)x^{a} &\leq \liminf\_{n\to\infty} \frac{1}{n^{a}} \log P(\max\_{1\leq k\leq n}|S\_{k}|\geq nx) \\ &\leq \limsup\_{n\to\infty} \frac{1}{n^{a}} \log P(\max\_{1\leq k\leq n}|S\_{k}|\geq nx) \leq -(\mathsf{C}\_{1}\wedge\mathsf{C}\_{3})x^{a}.\end{aligned}$$

**Corollary 1.** *Let* {*ξ<sup>i</sup>* : *i* ≥ 1} *be a sequence of i.i.d. random variables. If Eξ<sup>i</sup>* = 0*, and for some constants α* ∈ (0, 1)*, C* > 0*, we have the following:*

$$\lim\_{\mathfrak{x}\to\infty} \frac{1}{\mathfrak{x}^{\mathfrak{a}}} \log P(\mathfrak{f}\_1 \ge \mathfrak{x}) = -C,$$

$$\lim\_{\mathfrak{x}\to\infty} \frac{1}{\mathfrak{x}^{\mathfrak{a}}} \log P(\mathfrak{f}\_1 \le -\mathfrak{x}) = -C,$$

*then for all x > 0, the following is obtained.*

$$\lim\_{n \to \infty} \frac{1}{n^{\kappa}} \log P(\max\_{1 \le k \le n} |S\_k| \ge n\pi) = -\mathbb{C}x^{\kappa}.$$

*Then,* { <sup>1</sup> *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1} *satisfies LDP with the good rate function I*(*x*) = *Cxα*.

#### **4. Proofs of Main Results**

To prove our main results, we need the following lemmas, and we also will provide their proofs.

**Lemma 1.** *For a random variable ξ*<sup>1</sup> *with Eξ*<sup>1</sup> = 0*, we assume E*(*ξ*<sup>1</sup> <sup>2</sup>*exp*{(*ξ*<sup>+</sup> <sup>1</sup> )*α*}) < <sup>∞</sup>*, for some constant α* ∈ (0, 1)*. Set η*<sup>1</sup> = *ξ*11{*ξ*1≤*y*}*, for y >0. Then, the following is the case.*

$$E e^{y^{\alpha-1}\eta\_1} \le 1 + \frac{y^{2\alpha-2}}{2} E(\mathfrak{J}\_1^2 e \mathfrak{x} p\{(\mathfrak{J}\_1^{\star+})^\alpha\}).$$

**Proof of Lemma 1.** By Taylor's expansion, we can obtain

$$e^{y^{\kappa-1}\eta\_1} \le 1 + y^{\kappa-1}\eta\_1 + \frac{y^{2\kappa-2}\eta\_1^2}{2}e^{y^{\kappa-1}\eta\_1^+}.$$

The following is the case:

$$\begin{aligned} \eta\_1^+ &= \, \_0\tilde{\xi}\_1 \mathbf{1}\_{\{0 \le \xi\_1 \le y\}} \\ &\le \quad y^{1-\alpha} \tilde{\xi}\_1 \mathbf{1}\_{\{0 \le \xi\_1 \le y\}} \\ &\le \quad y^{1-\alpha} (\tilde{\xi}\_1^+)^\alpha \, \_t \end{aligned}$$

and *η*<sup>2</sup> <sup>1</sup> ≤ *<sup>ξ</sup>*<sup>2</sup> <sup>1</sup>. Then, we obtain the following.

$$\begin{aligned} E e^{\mathfrak{z}^{\alpha-1}\eta\_1} &\leq& 1 + \mathfrak{z}^{\alpha-1} E \eta\_1 + \frac{\mathfrak{z}^{2\alpha-2}}{2} E (\eta\_1^2 e^{\mathfrak{z}^{\alpha-1}\eta\_1^+}) \\ &\leq& 1 + \mathfrak{z}^{\alpha-1} E \mathfrak{z}\_1 + \frac{\mathfrak{z}^{2\alpha-2}}{2} E (\mathfrak{z}\_1^2 \exp\{(\mathfrak{z}\_1^{\alpha})^a\}) \\ &=& 1 + \frac{\mathfrak{z}^{2\alpha-2}}{2} E (\mathfrak{z}\_1^2 \exp\{(\mathfrak{z}\_1^{\alpha})^a\}). \end{aligned}$$

Thus, we complete the proof of Lemma 1.

**Lemma 2.** *Assume* {*ξ<sup>i</sup>* : *i* ≥ 1} *is an i.i.d. random variables sequence. If Eξ*<sup>1</sup> = 0*, and for some constants α* ∈ (0, 1)*, C* > 0*, Eξ*<sup>2</sup> 1*exp*{(*ξ*<sup>+</sup> <sup>1</sup> )*α*} < <sup>∞</sup>*,*

$$\limsup\_{\mathfrak{x}\to\infty} \frac{1}{\mathfrak{x}^{\mathfrak{a}}} \log P(\mathfrak{x}\_1 \ge \mathfrak{x}) \le -C\_{\omega}$$

*then for all x > 0, the following is the case.*

$$\limsup\_{n \to \infty} \frac{1}{n^{\kappa}} \log P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \le -\mathsf{C} \mathbf{x}^{\kappa}.$$

**Proof of Lemma 2.** Set *η<sup>i</sup>* = *ξi*1{*ξi*≤*y*} for *y* > 0. Then, the following is the case.

$$\begin{split} P(\max\_{1\le k\le n} S\_k \ge x) &\le P(\max\_{1\le k\le n} \sum\_{i=1}^k \eta\_i \ge x) + P(\max\_{1\le k\le n} \sum\_{i=1}^k \xi\_i \mathbf{1}\_{\{\xi\_i > y\}} > 0) \\ &= P(\sum\_{i=1}^k \eta\_i \ge x, \exists k \in [1, n]) + P(\max\_{1\le i\le n} \xi\_i > y) \\ &:= P\_1 + P\_2. \end{split} \tag{3}$$

For all *x* > 0, denote stopping time

$$T(\boldsymbol{x}) = \min \{ k \in [1, n] : \sum\_{i=1}^{k} \eta\_i \ge \boldsymbol{x} \} \text{ and } \min \mathcal{O} = 0.$$

We easily obtain

$$\mathbf{1}\_{\{\sum\_{i=1}^{k} \eta\_{i} \ge x, \ \exists k \in [1,n] \}} = \sum\_{k=1}^{n} \mathbf{1}\_{\{T(x) = k\}}.$$

In order to obtain the upper bound of *P*1, we consider martingale *Z*(*λ*) = {(*Zk*(*λ*), F*k*) : *k* ≥ 0}, where F*<sup>k</sup>* = *σ*(*ξ*1, *ξ*2, ··· , *ξk*), *k* ≥ 0, and the following is the case.

$$Z\_k(\lambda) = \prod\_{i=1}^k \frac{\exp\{\lambda \eta\_i\}}{E \exp\{\lambda \eta\_i\}}, \qquad Z\_0(\lambda) = 1.$$

Let the following be the case.

$$Z\_{T(x)\wedge k}(\lambda) = \prod\_{i=1}^{T(x)\wedge k} \frac{\exp\{\lambda \eta\_i\}}{\operatorname{Exp}\{\lambda \eta\_i\}}, \qquad Z\_0(\lambda) = 1.$$

By the property of martingale, then {(*ZT*(*x*)∧*k*(*λ*), F*k*) : *k* ≥ 0} is also a martingale. Because *E*(*ZT*(*x*)∧*n*(*λ*)) = *E*(*Z*0(*λ*)) = 1, then we define the probability measure *dP<sup>λ</sup>* := *ZT*(*x*)∧*ndP* and define the expectation with respect to *P<sup>λ</sup>* by *Eλ*.

*<sup>P</sup>*<sup>1</sup> <sup>=</sup> *<sup>E</sup>λ*[*ZT*(*x*)∧*n*(*λ*)<sup>−</sup>11{∑*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *<sup>η</sup>i*≥*x*, <sup>∃</sup>*k*∈[1,*n*]}] = *Eλ*[( *T*(*x*)∧*n* ∏*i*=1 *exp*{*ληi*} *Eexp*{*ληi*} )−<sup>1</sup> *<sup>n</sup>* ∑ *k*=1 1{*T*(*x*)=*k*}] = *n* ∑ *k*=1 *Eλ*[( *T*(*x*)∧*n* ∏*i*=1 *exp*{*ληi*} *Eexp*{*ληi*} )<sup>−</sup>11{*T*(*x*)=*k*}] = *n* ∑ *k*=1 *Eλ*[( *k* ∏*i*=1 *exp*{*ληi*} *Eexp*{*ληi*} )<sup>−</sup>11{*T*(*x*)=*k*}] = *n* ∑ *k*=1 *Eλ*[*exp*{− *k* ∑ *i*=1 *log exp*{*ληi*} *Eexp*{*ληi*} }1{*T*(*x*)=*k*}] = *n* ∑ *k*=1 *Eλ*[*exp*{−*λ k* ∑ *i*=1 *η<sup>i</sup>* + *k* ∑ *i*=1 *logEeλη<sup>i</sup>* }1{*T*(*x*)=*k*}] = *n* ∑ *k*=1 *Eλ*[*exp*{−*λ k* ∑ *i*=1 *<sup>η</sup><sup>i</sup>* + *klogEeλη*<sup>1</sup> }1{*T*(*x*)=*k*}]. (4)

Under the conditions of Lemma 2, we take *λ* = *yα*<sup>−</sup>1, and by Lemma 1, and *log*(1 + *t*) ≤ *t*, for ∀*t* ≥ 0, we obtain the following.

$$\log \mathbb{E}e^{y^{\kappa-1}\eta\_1} \le \log \left(1 + \frac{y^{2\alpha-2}}{2} E(\xi\_1^2 \exp\{ (\xi\_1^+)^a \} )\right)$$

$$\le \frac{y^{2\alpha-2}}{2} E(\xi\_1^+ \exp\{ (\xi\_1^+)^a \} ). \tag{5}$$

On the set {*T*(*x*) = *k*}, we obtain ∑*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *η<sup>i</sup>* ≥ *x*. Combining this fact with (4) and (5), we obtain that, for all *x* > 0, the following is the case.

$$P\_1 \le \sum\_{k=1}^n E\_\lambda \left( \exp\{-\lambda x + n \frac{y^{2a-2}}{2} E\left(\tilde{\xi}\_1^{-2} \exp\{ (\tilde{\xi}\_1^{-+})^a \} \right) \} \mathbf{1}\_{\{T(x) = k\}} \right)$$

$$\le \exp\left\{ -y^{a-1} x + n \frac{y^{2a-2}}{2} E\left(\tilde{\xi}\_1^{-2} \exp\{ (\tilde{\xi}\_1^{-+})^a \} \right) \right\} E\_\lambda \left(\sum\_{k=1}^n \mathbf{1}\_{\{T(x) = k\}} \right)$$

$$\le \exp\left\{ -y^{a-1} x + n \frac{y^{2a-2}}{2} E\left(\tilde{\xi}\_1^{-2} \exp\{ (\tilde{\xi}\_1^{-+})^a \} \right) \right\}. \tag{6}$$

Next, using the Markov inequality, we obtain the following.

$$\begin{split} P\_2 &= P(\bigcup\_{i=1}^n \{\tilde{\xi}\_i > y\}) \\ &\le nP(\tilde{\xi}\_1 > y) \\ &\le nP(\tilde{\xi}\_1^2 \exp\{ (\tilde{\xi}\_1^+)^a \} > y^2 \exp\{ y^a \}) \\ &\le \frac{n}{y^2} \exp\{-y^a\} E(\tilde{\xi}\_1^2 \exp\{ (\tilde{\xi}\_1^+)^a \}). \end{split} \tag{7}$$

Let *y* = *x*. Combining (3), (6) and (7) together, we obtain the following.

$$\begin{split} &P(\max\_{1\le k\le n} S\_k \ge x) \\ &\le \exp\{-\mathbf{x}^a + \frac{nE\left(\tilde{\xi}\_1^2 \exp\{\left(\tilde{\xi}\_1^+\right)^a\}\right)}{2\mathbf{x}^{2-2a}}\} + \frac{nE\left(\tilde{\xi}\_1^2 \exp\{\left(\tilde{\xi}\_1^+\right)^a\}\right)}{\mathbf{x}^2} e^{-\mathbf{x}^a} \\ &= e^{-\mathbf{x}^a} (\exp\{\frac{nE\left(\tilde{\xi}\_1^2 \exp\{\left(\tilde{\xi}\_1^+\right)^a\}\right)}{2\mathbf{x}^{2-2a}}\} + \frac{nE\left(\tilde{\xi}\_1^2 \exp\{\left(\tilde{\xi}\_1^+\right)^a\}\right)}{\mathbf{x}^2}). \end{split}$$

Now, we replace x by nx in the above inequality; then, the following is obtained.

$$P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \le e^{-n^a \mathbf{x}^a} (\exp\{\frac{E(\tilde{\xi}\_1^\- \mathbf{2} \exp\{ (\tilde{\xi}\_1^\- )^a \})}{2n^{1-2a} \mathbf{x}^{2-2a}}\} + \frac{E(\tilde{\xi}\_1^\- \mathbf{2} \exp\{ (\tilde{\xi}\_1^\- )^a \})}{n \mathbf{x}^2}).$$

Take log and limsup to both sides and use the principle of the largest term; here, we obtain the LDP upper bound.

$$\limsup\_{n \to \infty} \frac{1}{n^{\alpha}} \log P(\max\_{1 \le k \le n} S\_k \ge n\varkappa) \le -\varkappa^{\alpha}.$$

Now we end the proof of Lemma 2.

In the following, we prove Theorem 3.

**Proof of Theorem 3.** (i) Firstly, we prove the upper bound. Let *ε* ∈ (0, 1) be fixed, *ξ* <sup>1</sup> = *C*1 1 *<sup>α</sup> ξ*1( *C*<sup>1</sup> − *ε C*1 )*β*, *β* > <sup>1</sup> *<sup>α</sup>* . By the condition given in Theorem 3, we have the following:

$$\limsup\_{\mathfrak{x}\to\infty} \frac{1}{\mathfrak{x}^{\mathfrak{A}}} \log P(\mathfrak{f}\_1 \ge \mathfrak{x}) \le -\mathsf{C}\_{1\prime}$$

we obtain for ∀*ε* > 0, ∃ *x*0, such that when *x* > *x*0,

$$\frac{\log P(\mathcal{J}\_1 \ge x)}{x^{\alpha}} \le -\mathcal{C}\_1 + \varepsilon;$$

that is,

$$P(\mathfrak{F}\_1 \ge x) \le e \mathfrak{x} p\{- (\mathbb{C}\_1 - \varepsilon) \mathfrak{x}^{\kappa}\}.$$

Thus, for ∀*ε* > 0, ∃ *x*0, such that when *x* > *x*0,

$$P(\mathfrak{J}'\_1 \ge x) \le \varepsilon x p \{- (\mathsf{C}\_1 - \varepsilon)^{1 - a\beta} \mathsf{C}\_1^{a\beta - 1} x^a \}.$$

Then, the following is the case:

$$\begin{aligned} E\{ (\xi\_1'^+)^2 \exp\{ (\xi\_1'^+)^a \} \} &= \int\_0^\infty P(\xi\_1' \ge \mathfrak{x}) (2\mathfrak{x} + a\mathfrak{x}^{a-1}) e^{\mathfrak{x}^a} \\ &\le 2 \int\_0^\infty \mathfrak{x} e^{-\theta \mathfrak{x}^a} d\mathfrak{x} + a \int\_0^\infty \mathfrak{x}^{a+1} e^{-\theta \mathfrak{x}^a} d\mathfrak{x} \\ &< \infty, \end{aligned}$$

where *θ* = (*C*<sup>1</sup> − *ε*)1−*αβC*<sup>1</sup> *αβ*−<sup>1</sup> − 1, *θ* > 0. Because *Eξ*<sup>2</sup> <sup>1</sup> < ∞, one can easily obtain *E*(*ξ* <sup>1</sup>)<sup>2</sup> = *<sup>C</sup>* 2 *α* 1 ( *C*<sup>1</sup> − *ε C*1 )2*βEξ*<sup>2</sup> <sup>1</sup> < ∞. Then, we obtain the following.

$$\begin{split} E(\xi\_1')^2 \exp\{ (\xi\_1'^+)^a \} &= E((\xi\_1')^2 \exp\{ (\xi\_1'^+)^a \} \mathbf{1}\_{\{\xi\_1'^ \ge 0\}} + (\xi\_1')^2 \exp\{ (\xi\_1'^+)^a \} \mathbf{1}\_{\{\xi\_1'<0\}}) \\ &= E((\xi\_1')^2 \exp\{ (\xi\_1'^+)^a \} \mathbf{1}\_{\{\xi\_1'^ \ge 0\}} + (\xi\_1')^2 \mathbf{1}\_{\{\xi\_1'<0\}}) \\ &\leq E(\xi\_1'^+)^a \exp\{ (\xi\_1'^+)^a \} + E(\xi\_1')^2 \\ &< \infty. \end{split}$$

Thus, sequence {*ξ i* } satisfies the conditions of Lemma 2, and we denote *S <sup>k</sup>* <sup>=</sup> *<sup>k</sup>* ∑ *i*=1 *ξ i* . Then, we obtain, for all *x* > 0, the following.

$$\begin{aligned} &\limsup\_{n\to\infty} \frac{1}{n^{\alpha}} \log P(\max\_{1\le k\le n} S\_k' \ge nx) \\ &= \limsup\_{n\to\infty} \frac{1}{n^{\alpha}} \log P(C\_1^{\frac{1}{\alpha}}(1-\frac{\varepsilon}{C\_1})^{\beta} \max\_{1\le k\le n} S\_k \ge nx), \\ &\le -x^{\kappa}. \end{aligned}$$

Thus, we obtain the following.

$$\limsup\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1 \le k \le n} S\_k \ge nx) \le -C\_1 (1 - \frac{\varepsilon}{C\_1})^{a\beta} x^{\mathfrak{a}}.$$

Letting *ε* → 0, we obtain the following.

$$\limsup\_{n \to \infty} \frac{1}{n^a} \log P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \le -C\_1 \mathbf{x}^a. \tag{8}$$

(ii) Next, we will prove the lower bound. Because {*ξ<sup>i</sup>* : *i* ≥ 1} is an i.i.d sequence, the following is the case.

$$\begin{split} P(\max\_{1\le k\le n} S\_k \ge nx) &\ge P(S\_n \ge nx) \\ &= P(\tilde{\xi}\_1 + \sum\_{i=2}^n \tilde{\xi}\_i \ge n(\varepsilon + x) - n\varepsilon) \\ &\ge P(\{\sum\_{i=2}^n \tilde{\xi}\_i \ge -n\varepsilon\} \cap \{\tilde{\xi}\_1 \ge n(\varepsilon + x)\}) \\ &= P(\sum\_{i=2}^n \tilde{\xi}\_i \ge -n\varepsilon) P(\tilde{\xi}\_1 \ge n(\varepsilon + x)). \end{split} \tag{9}$$

By using the weak law of large numbers and the following fact:

$$\{\sum\_{i=2}^n \xi\_i \ge -(n-1)\varepsilon\} \subseteq \{\sum\_{i=2}^n \xi\_i \ge -n\varepsilon\},$$

we know the following.

$$\lim\_{n \to \infty} P(\sum\_{i=2}^{n} \xi\_i \ge -n\varepsilon) = 1. \tag{10}$$

Then, by the condition in Theorem 3, lim inf *x*→∞ 1 *<sup>x</sup><sup>α</sup>* log *<sup>P</sup>*(*ξ*<sup>1</sup> <sup>≥</sup> *<sup>x</sup>*) ≥ −*C*2, we obtain ∀ *ε* > 0, ∃ *x*0, *s*.*t*. ∀*x* > *x*0,

$$\frac{\log P(\tilde{\xi}\_1 \ge x)}{\mathfrak{x}^{\mathfrak{a}}} \ge -C\_2 - \varepsilon.$$

Then, the following is the case.

$$P(\mathfrak{f}\_1 \ge \mathfrak{x}) \ge \exp\{- (\mathsf{C}\_2 + \mathfrak{e})\mathfrak{x}^{\mathfrak{a}}\}.$$

Thus, we obtain the following.

$$P(\mathbb{S}\_1 \ge n(\mathbf{x} + \varepsilon)) \ge \exp\{-[(\mathbf{x} + \varepsilon)n]^a (\mathbb{C}\_2 + \varepsilon)\}.\tag{11}$$

Combining (9), (10) and (11) together, we easily obtain the following.

$$\begin{aligned} &\liminf\_{n\to\infty}\frac{1}{n^{\alpha}}\log P(\max\_{1\le k\le n} S\_k \ge nx) \\ &\ge \liminf\_{n\to\infty}\frac{1}{n^{\alpha}}\log P(\sum\_{i=2}^n \xi\_i \ge -n\varepsilon) + \liminf\_{n\to\infty}\frac{1}{n^{\alpha}}\log P(\xi\_1 \ge n(\varepsilon + \varepsilon)) \\ &\ge - (\mathcal{C}\_2 + \varepsilon)(x + \varepsilon)^{\alpha}. \end{aligned}$$

Letting *ε* → 0, we obtain the following.

$$\liminf\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1 \le k \le n} S\_k \ge nx) \ge -C\_2 x^{\mathfrak{a}}.\tag{12}$$

At last, by (8) and (12), we obtain, for all x > 0, the following.

$$\begin{aligned} -\mathsf{C}\_{2}\mathfrak{x}^{\mathfrak{a}} &\leq \liminf\_{n\to\infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1\leq k\leq n} S\_{k} \geq n\mathfrak{x})\\ &\leq \limsup\_{n\to\infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1\leq k\leq n} S\_{k} \geq n\mathfrak{x}) \leq -\mathsf{C}\_{1}\mathfrak{x}^{\mathfrak{a}}.\end{aligned}$$

Thus, we complete the proof of Theorem 3.

In the following, we prove Theorem 4.

**Proof of Theorem 4.** By the condition lim inf *x*→∞ 1 *<sup>x</sup><sup>α</sup> logP*(*ξ*<sup>1</sup> ≥ *x*) ≤ −*C*1, we obtain for ∀ *ε* > 0, ∃ *x*0, such that when ∀*x* > *x*0, the following.

$$P(\mathfrak{f}\_1 \ge \mathfrak{x}) \le \exp\{- (\mathsf{C}\_1 - \mathfrak{e})\mathfrak{x}^{\mathfrak{a}}\}.$$

By condition lim inf *x*→∞ 1 *<sup>x</sup><sup>α</sup> logP*(*ξ*<sup>1</sup> ≤ −*x*) ≤ −*C*3, we obtain for ∀ *ε* > 0, ∃ *x*0, such that when ∀*x* > *x*0, *P*(*ξ*<sup>1</sup> ≤ −*x*) ≤ *exp*{−(*C*<sup>3</sup> − *ε*)*xα*}.

Thus, we obtain the following.

$$\begin{split} \|E\_{\mathfrak{F}}\|^{2} &= \int\_{0}^{\infty} 2xP(|\varzeta\_{1}| \ge x)dx \\ &= \int\_{0}^{x\_{0}} 2xP(|\varzeta\_{1}| \ge x)dx + \int\_{x\_{0}}^{\infty} 2xP(|\varzeta\_{1}| \ge x)dx \\ &\le x\_{0}^{2} + \int\_{x\_{0}}^{\infty} 2xP(\varzeta\_{1} \ge x)dx + \int\_{x\_{0}}^{\infty} 2xP(\varzeta\_{1} \le -x)dx \\ &\le x\_{0}^{2} + \int\_{x\_{0}}^{\infty} 2xexp\{-(C\_{1} - \varepsilon)x^{a}\}dx + \int\_{x\_{0}}^{\infty} 2xexp\{-(C\_{3} - \varepsilon)x^{a}\}dx \\ &< \infty. \end{split}$$

Thus, by Theorem 3, we obtain the following:

$$\limsup\_{n \to \infty} \frac{1}{n^a} \log P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \le -\mathsf{C}\_1 \mathsf{x}^a,$$

$$\limsup\_{n \to \infty} \frac{1}{n^a} \log P(\max\_{1 \le k \le n} S\_k \le -n\mathbf{x}) \le -\mathsf{C}\_3 \mathsf{x}^a,$$

and we know the following.

$$P(\max\_{1 \le k \le n} |S\_k| \ge n\mathbf{x}) \le P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) + P(\max\_{1 \le k \le n} (-S\_k) \ge n\mathbf{x}).$$

Thus we obtain the following inequality by the principle of the largest term.

$$\begin{split} & \limsup\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1 \le k \le n} |S\_k| \ge n\mathbf{x}) \\ & \le \limsup\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log \left( P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) + P(\max\_{1 \le k \le n} (-S\_k) \ge n\mathbf{x}) \right) \\ & \le \limsup\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \vee \limsup\_{n \to \infty} \frac{1}{n^{\mathfrak{a}}} \log P(\max\_{1 \le k \le n} (-S\_k) \ge n\mathbf{x}) \right) \\ & \le (-\mathsf{C}\_1 \mathbf{x}^{\mathfrak{a}}) \vee (-\mathsf{C}\_3 \mathbf{x}^{\mathfrak{a}}) \\ & \le -(\mathsf{C}\_1 \wedge \mathsf{C}\_3) \mathbf{x}^{\mathfrak{a}}. \end{split} \tag{13}$$

By the given conditions in Theorem <sup>4</sup> , lim inf *<sup>x</sup>*→<sup>∞</sup> 1 *<sup>x</sup><sup>α</sup> logP*(*ξ*<sup>1</sup> ≥ *x*) ≥ −*C*2, lim inf *x*→∞ 1 *<sup>x</sup><sup>α</sup> logP*(*ξ*<sup>1</sup> ≤ −*x*) ≥ −*C*4, we obtain the following.

$$\liminf\_{n \to \infty} \frac{1}{n^{\alpha}} \log P(\max\_{1 \le k \le n} S\_k \ge nx) \ge -C\_2 x^{\alpha},$$

$$\limsup\_{n \to \infty} \frac{1}{n^{\alpha}} \log P(\max\_{1 \le k \le n} S\_k \le -nx) \ge -C\_4 x^{\alpha}.$$

Since the following is the case:

$$P(\max\_{1 \le k \le n} |S\_k| \ge n\mathbf{x}) \le P(\max\_{1 \le k \le n} S\_k \ge n\mathbf{x}) \lor P(\max\_{1 \le k \le n} (-S\_k) \ge n\mathbf{x}),$$

then we obtain the following.

$$\begin{split} &\liminf\_{n\to\infty} \frac{1}{n^{\kappa}} \log P(\max\_{1\le k\le n} |S\_k| \ge nx) \\ &\ge \liminf\_{n\to\infty} \frac{1}{n^{\kappa}} \log \left( P(\max\_{1\le k\le n} S\_k \ge nx) \lor P(\max\_{1\le k\le n} (-S\_k) \ge nx) \right) \\ &\ge \liminf\_{n\to\infty} \frac{1}{n^{\kappa}} \log P(\max\_{1\le k\le n} S\_k \ge nx) \lor \liminf\_{n\to\infty} \frac{1}{n^{\kappa}} \log P(\max\_{1\le k\le n} (-S\_k) \ge nx) \right) \\ &\ge (-C\_2 x^{\kappa}) \lor (-C\_4 x^{\kappa}) \\ &\ge -(C\_2 \land C\_4) x^{\kappa}. \end{split} \tag{14}$$

Combining (13) and (14), we complete the proof of Theorem 4.

**Proof of Corollary 1.** Take *C*<sup>1</sup> = *C*<sup>2</sup> = *C*<sup>3</sup> = *C*<sup>4</sup> = *C* in Theorem 4, we can obtain the following easily for all x > 0.

$$\lim\_{n \to \infty} \frac{1}{n^{\alpha}} \log P(\max\_{1 \le k \le n} |S\_k| \ge n\mathbf{x}) = -\mathbb{C}\mathbf{x}^{\alpha}.$$

Because the upper bound and the lower bound are same, we can obtain the fact that { 1 *<sup>n</sup>* max 1≤*k*≤*n* |*Sk*| : *n* ≥ 1} satisfies LDP with good rate function *I*(*x*) = *Cxα*.

#### **5. Conclusions**

We obtained LDP for the maximum of the absolute value of partial sums of i.i.d. centered random variables under the assumption that *P*(*ξ*<sup>1</sup> ≥ *x*) and *P*(*ξ*<sup>1</sup> ≤ −*x*) have the same exponential decrease. For further research, we will consider LDP for the maximum of the absolute value of partial sums of other types of dependent random variables, such as martingale differences and acceptable random variables.

**Author Contributions:** X.W. is mainly responsible for providing funding acquisition and scientific research. M.Z. is mainly responsible for writing the original draft and scientific research. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Natural Science Foundation of China (No. 62072044) and Beijing Natural Science Foundation (No. 1202001).

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank the referees for their very helpful comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **On the M-Estimator under Third Moment Condition**

**Rundong Luo 1, Yiming Chen 2,\* and Shuai Song <sup>3</sup>**


**Abstract:** Estimating the expected value of a random variable by data-driven methods is one of the most fundamental problems in statistics. In this study, we present an extension of Olivier Catoni's classical M-estimators of the empirical mean, which focus on the heavy-tailed data by imposing more precise inequalities on exponential moments of Catoni's estimator. We show that our works behave better than Catoni's both in practice and theory. The performances are illustrated in the simulation and real data.

**Keywords:** M-estimator; Catoni's estimator; empirical mean

**MSC:** 60E15; 62F35

#### **1. Introduction**

In this study, we focused on estimating the mean *m* = E*X* of a real random variable *X*, supposing that *X*1, ... , *Xn* are independent and identically distributed drawn from *X*. It is well known that the empirical mean *<sup>m</sup><sup>n</sup>* <sup>=</sup> *<sup>n</sup>*−<sup>1</sup> <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Xi* is the most popular estimator of *m*, and theoretical properties have been thoroughly studied [1].

However, recent works have concentrated more on the performance of the estimator when the distribution is heavy-tailed (the second moment or fourth moment of the distribution does not exist), which is becoming more and more common in many research fields (see, e.g., Embrechts, Klüppelberg, and Mikosch [2]). When the data have a heavy tail, the traditional method such as the empirical mean performs poorly, and appropriate robust estimators are required, which drives related research on M-estimator (generalizations of Maximum Likelihood estimator) for correction of the outliers (Huber [3]).

There has been a renewed interest in the area of robust statistics over the last several decades. Nemirovsky and Yudin [4], Hsu and Sabato [5], and Jerrum et al. [6] proposed various forms of Median-of-means (MOM) estimators to handle data in different situations. They called for dividing the data into several groups with equal size and then calculating the empirical mean within each group, finally taking the median of these empirical means as the formal MOM estimator, which reduces the impact of heavy-tailed data. Tukey and McLaughlin [7] and Huber and Ronchetti [8] tried to improve the performance of the empirical mean by using a truncation of *X* (they name it truncated mean), which removed part of the sample containing *γn* maximum and minimum values depending on the parameter *γ* ∈ (0, 1) and then averaged the remaining values to improve the robustness. Catoni [9], Audibert, and Catoni [10] studied the properties of M-estimation for regression problems. The relevant works about robust techniques in various fields are summarized in Bartlett and Mendelson [11], Maronna [12], Bubeck, and Lugosi [13].

Recently, Catoni [14] modified the empirical mean to a new robust estimator. It is easy to observe that the empirical mean is the solution of the following estimation equation

$$\sum\_{i=1}^{n} (X\_i - \mu) = 0.\tag{1}$$

**Citation:** Luo, R.; Chen, Y.; Song, S. On the M-Estimator under Third Moment Condition. *Mathematics* **2022**, *10*, 1713. https://doi.org/10.3390/ math10101713

Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

Received: 7 April 2022 Accepted: 7 May 2022 Published: 17 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

If we change the form of Equation (1) to

$$\sum\_{i=1}^{n} \phi[\alpha(X\_i - \mu)] = 0.\tag{2}$$

The solution of (2) is called Catoni's mean estimator, where *φ* : R → R is a nondecreasing differentiable truncation function such that for any *x* ∈ R, − log 1 − *x* + *x*2/2 ≤ *φ*(*x*) ≤ log 1 + *x* + *x*2/2 , and *α* is a parameter to ensure the existence of the estimator. We denote Catoni's mean estimator by *<sup>m</sup>n*,*α*. The main purpose of the truncation function is to make *φ*(*x*) grow slower than *x*, and then the effect of outliers due to heavy tails in X will be diminished. Although *φ*(*x*) is not the derivative of some explicit error function, it still can be considered as an influence function in robust theory.

By a mild assumption that the variance *v* = E (*X* − *m*)<sup>2</sup> of the distribution exists and choosing the parameter *α* to optimize the bounds, Catoni [14] obtained the following performance of *<sup>m</sup>n*,*α*.

**Theorem 1.** *Let X*1, ... , *Xn be independent, identically distributed random variables, which are drawn from X. We assume that the mean m and variance v of X exist. For any x* ∈ R+*, and positive integer <sup>n</sup> such that <sup>n</sup>* <sup>&</sup>gt; <sup>2</sup>*x. Catoni's mean estimator <sup>m</sup>n*,*<sup>α</sup> with parameter <sup>α</sup>* <sup>=</sup> 2*x nv* 1+ <sup>2</sup>*<sup>x</sup> n*−2*x*) 

*satisfies,*

$$\mathbb{P}\left\{|\tilde{m}\_{n,a} - m| \ge \sqrt{\frac{2\upsilon\chi}{n - 2\chi}}\right\} \le 2e^{-x}.\tag{3}$$

*Moreover, if we choose α to be independent from x as follows, and assume n* > 2(1 + *x*)*, α* = 2 *nv* , *then*

$$\mathbb{P}\left\{|\tilde{m}\_{n,a} - m| \ge \frac{1+\mathfrak{x}}{1 - (1+\mathfrak{x})/n} \sqrt{\frac{v}{2n}}\right\} \le 2e^{-x}.\tag{4}$$

The method of Catoni [14] is widely promoted as a robust estimator by Brownlees, Joly, and Lugosi [15], Minsker [16], and Wang et al. [17]. We need to point out here that the parameter *α* is the solution of the equation where the derivative of Catoni's estimator's deviation with respect to *α* equals to 0. When *v* = 0, the Catoni's estimator's deviation is 0, and no specific *α* is needed. This also holds for Theorem 2.

The main contribution of this article is to improve Catoni's estimator under the assumption of the third moment condition, and we named it the third-moment Catoni estimator. Starting from the adjustment of the truncation function denoted by *ψ*(*x*) in our work, as Figure 1 shows, the influence function with the third moment performs closer to the true value than the original one of Catoni's. We obtained a more precise exponential moment upper bound, which leads to a better error bound.

Simultaneously, our work had a better performance for the samples drawn from the t-distribution, which is common in many fields of research(see Jones and Faddy [18]). As a special case of the heavy-tailed distribution, the third moment of the t-distribution exists, which satisfies our assumptions about the distribution. We present the superiority of our estimator in a Monte Carlo simulation. We also show the performance of the proposed estimator under a skewed normal distribution to evaluate the adaptability of the estimator to other distributions.

**Figure 1.** Different chooses of influence function.

The rest of the article is organized as follows. In Section 2, we introduce the main result of the third-moment Catoni's estimator. A Monte Carlo simulation is provided in Section 3 to compare the performance of the proposed estimator with Catoni's estimator for t-distribution. Section 4 examines the performance of the proposed estimator on real data.

#### **2. Main Result**

Let (*Xi*) *n <sup>i</sup>*=<sup>1</sup> denote an i.i.d. sample drawn from the distribution of *X*. Let *m*, *v*, and *s* be the mean, variance, and third central moment of *X*, respectively, which is E(*X*) = *m*,E (*X* − *m*)<sup>2</sup> = *v*, and E (*X* − *m*)<sup>3</sup> = *s*.

The influence function *ψ*(*x*) here should be considered wider than the original function as Catoni's to obtain a more accurate exponential moment. In this study, we assumed that

$$\psi(\mathbf{x}) = \begin{cases} \log\left(1 + \mathbf{x} + \mathbf{x}^2/2 + \mathbf{x}^3/6\right), & \mathbf{x} \ge \mathbf{0} \\ -\log\left(1 - \mathbf{x} + \mathbf{x}^2/2 - \mathbf{x}^3/6\right), & \mathbf{x} < \mathbf{0}. \end{cases} \tag{5}$$

Our mean estimator *<sup>m</sup>n*,*<sup>α</sup>* is the unique solution of *Rn*,*α*(*μ*) = 0, where

$$R\_{\mathfrak{n},\mathfrak{a}}(\mu) = \sum\_{i=1}^{n} \psi(a(X\_i - \mu)). \tag{6}$$

Next, we present our main result that bounds the *<sup>m</sup>n*,*<sup>α</sup>* <sup>−</sup> *<sup>m</sup>* with the appropriate choice of negative parameter *α*:

**Theorem 2.** *Let X*1, ... , *Xn be independent, identically distributed random variables with finite mean m, variance v, and third central moment s. For any x* > 0*, the error bound between the estimator and the empirical mean satisfies*

$$\mathbb{P}\left\{|\hat{m}\_{\mathfrak{n},\mathfrak{a}} - m| \ge 2\left(\sqrt[3]{\frac{q}{2} + \sqrt{\Delta}} + \sqrt[3]{\frac{q}{2} - \sqrt{\Delta}}\right)\right\} \le 2e^{-x},\tag{7}$$

*where*

$$
\Delta = \left(\frac{q}{2}\right)^2 + \left(\frac{p}{3}\right)^3,\\
p = \frac{3 + 3v\alpha^2}{a^2},\\
q = \frac{na^3s + 6x - 4n}{na^3}.
$$

Under some technical assumptions that will be mentioned in the following corollary, we have the following upper bound on the probability of the exponential tail:

**Corollary 1.** *Let X*1, ... , *Xn be independent, identically distributed random variables with finite mean m, variance v and third central moment s. For any x* > 0 *and assume that n* > <sup>3</sup> <sup>2</sup> (1 + *x*) *and* − 4*n*3*v*<sup>3</sup> <sup>729</sup> *s* - 4*n*3*v*<sup>3</sup> <sup>729</sup> *,*

$$\mathbb{P}\left\{|\hat{m}\_{n,a} - m| \ge (1+x)\sqrt{\frac{v}{n}}\right\} \le 2e^{-x}.\tag{8}$$

**Remark 1.** *It is obvious that with the assumption that n is a positive integer and satisfies n* > <sup>3</sup> <sup>2</sup> (1 + *x*) *and* − 4*n*3*v*<sup>3</sup> <sup>729</sup> *s* - 4*n*3*v*<sup>3</sup> <sup>729</sup> *, then*

$$\frac{1+x}{1-(1+x)/n}\sqrt{\frac{v}{2n}} \ge (1+x)\sqrt{\frac{v}{n'}}$$

*By assuming that α* < 0*, we obtained a better estimator bias than (4) in Catoni's result.*

**Remark 2.** *When the sample was small, our result was still valid with a small s. We might consider the following example. Let X*1, ... , *Xn be independent, identically distributed random variables, which are drawn from X. Assuming that the mean m* = 0.01*, variance v* = 1*, x* = 1*, n* = 4*, and whenever* − 4*n*3*v*<sup>3</sup> <sup>729</sup> *s* - 4*n*3*v*<sup>3</sup> <sup>729</sup> *such as s* = 0.2*, which satisfies the assumption we have*

$$\mathbb{P}(|\hat{m}\_{n,\mathfrak{a}} - 1| \ge 1) \le \frac{2}{e} \cdot 1$$

For the convenience of proof, we first present the following lemma (Cardano formula); refer to Høyrup's [19] for more details.

**Lemma 1.** *For any general cubic equation of the form x*<sup>3</sup> + *px* + *q* = 0*, one of the roots over the field of real numbers has the form:*

$$\mathbf{x} = \sqrt[3]{-\frac{q}{2} + \sqrt{\Delta}} + \sqrt[3]{-\frac{q}{2} - \sqrt{\Delta}},\tag{9}$$

*where the discriminant of the root* Δ *is as follows, when* Δ > 0*, the cubic equation has one real root; the cubic equation has three real roots when* Δ ≤ 0*.*

$$
\Delta = \frac{q^2}{4} + \frac{p^3}{27}.
$$

**Proof of Theorem 2.** Due to the inequality (5) about the *ψ*(*x*), we have the following exponential moment inequality of *Rn*,*α*(*μ*), for all *μ* ∈ R:

$$\begin{split} \mathbb{E}\left[\mathbf{e}^{R\_{\boldsymbol{u},\boldsymbol{a}}(\boldsymbol{\mu})}\right] &\leq \left(\mathbb{E}\left[1+a(\boldsymbol{X}-\boldsymbol{\mu})+\frac{a^{2}(\boldsymbol{X}-\boldsymbol{\mu})^{2}}{2}+\frac{a^{3}(\boldsymbol{X}-\boldsymbol{\mu})^{3}}{6}\right]\right)^{n} \\ &= \left(1+\mathbb{E}[a(\boldsymbol{X}-\boldsymbol{\mu})]+\mathbb{E}\left[\frac{a^{2}(\boldsymbol{X}-\boldsymbol{\mu})^{2}}{2}\right]+\mathbb{E}\left[\frac{a^{3}(\boldsymbol{X}-\boldsymbol{\mu})^{3}}{6}\right]\right)^{n} \end{split} \tag{10}$$

with a brief calculation, we have *E*[*X* − *μ*] = *v* + (*m* − *μ*)<sup>2</sup> and *E*[*X* − *μ*] = (*m* − *μ*)<sup>3</sup> + *v*(*m* − *μ*) + *s*; so, the inequality (10) can be bounded by the following term:

$$\exp\left(na(m-\mu)+\frac{na^2\left(\upsilon+(m-\mu)^2\right)}{2}+\frac{na^3}{6}\left((m-\mu)^3+3\upsilon(m-\mu)+s\right)\right).$$

Similarly,

$$\begin{split} &\mathbb{E}\left[\mathbf{e}^{-R\_{\pi,0}}\right] \leq \left(\mathbb{E}\left[1-a(X-\mu)+\frac{a^2(X-\mu)^2}{2}-\frac{a^3(X-\mu)^3}{6}\right]\right)^n \\ &= \left(1-\mathbb{E}[a(X-\mu)]+\mathbb{E}\left[\frac{a^2(X-\mu)^2}{2}\right]-\mathbb{E}\left[\frac{a^3(X-\mu)^3}{6}\right]\right)^n \\ &= \left(1-a(m-\mu)+\frac{a^2\left(v+(m-\mu)^2\right)}{2}-\frac{a^3}{6}\left((m-\mu)^3+3v(m-\mu)+s\right)\right)^n \\ &\leq \exp\left(-ma(m-\mu)+\frac{na^2\left(v+(m-\mu)^2\right)}{2}-\frac{na^3}{6}\left((m-\mu)^3+3v(m-\mu)+s\right)\right). \end{split} \tag{11}$$

Let

$$A\_1 = na(m - \mu) + \frac{na^2(\upsilon + (m - \mu)^2)}{2} + \frac{na^3}{6} \left( (m - \mu)^3 + 3\upsilon(m - \mu) + s \right),$$

$$A\_2 = -na(m - \mu) + \frac{na^2(\upsilon + (m - \mu)^2)}{2} - \frac{na^3}{6} \left( (m - \mu)^3 + 3\upsilon(m - \mu) + s \right),$$

whenever *Xi* has a finite third moment *s*. We can obtain from the Markov inequality that for any *μ* ∈ R and *x* ∈ R+,

$$\begin{split} & \mathbb{P} \{ R\_{\mathbb{R}, \mathbb{a}}(\mu) \ge A\_1 + \infty \} \\ & = \mathbb{P} \{ \exp(R\_{\mathbb{R}, \mathbb{a}}(\mu)) \ge \exp(A\_1 + \infty) \} \\ & \le \mathbb{E} \left[ e^{R\_{\mathbb{a}, \mathbb{a}}(\mu)} \right] / \exp(A\_1 + \infty) \\ & \le e^{-\infty} . \end{split} \tag{12}$$

In the same way, we have

$$\mathbb{P}\{-R\_{n,a}(\mu) \ge A\_2 + x\} \le e^{-x}.\tag{13}$$

Then, as shown in Figure 2.

**Figure 2.** Representation of *<sup>R</sup> <sup>f</sup>*(*μ*) and the cubic equation *<sup>C</sup>*+(*μ*) and *<sup>C</sup>*−(*μ*).

We can control the estimator *<sup>m</sup>n*,*<sup>α</sup>* by the roots of the cubic equation as follows:

$$\begin{aligned} \mathbb{C}\_{+}(\mu) &= A\_{1} + \mathbb{x} = 0, \\ \mathbb{C}\_{-}(\mu) &= -A\_{2} - \mathbb{x} = 0. \end{aligned} \tag{14}$$

Equation (13) above can be regarded as a cubic equation about *m* − *μ*. To solve (13), we first convert it into a standard-form one-dimensional cubic equation by letting *yn* = *m* − *μ* − <sup>1</sup> *<sup>α</sup>* ,(*n* = 1, 2), and then we obtain the following equations:

$$\begin{aligned} y\_1^3 + \frac{3 + 3va^2}{a^2} y\_1 + \frac{na^3s + 6x - 4n}{na^3} &= 0, \\ y\_2^3 + \frac{3 + 3va^2}{a^2} y\_2 - \frac{na^3s + 6x - 4n}{na^3} &= 0. \end{aligned} \tag{15}$$

For any *α* ∈ R−, according to Lemma 1, since (3 + 3*vα*2)/*α*<sup>2</sup> is always positive, Δ is always greater than 0. In this case, our equation has one real root and two imaginary roots, which means we can control the *<sup>m</sup>n*,*<sup>α</sup>* by the root of (13) as follows:

$$\begin{aligned} \mu\_+ &= m - \frac{1}{\alpha} + \sqrt[3]{\frac{q}{2} - \sqrt{\Delta}} + \sqrt[3]{\frac{q}{2} + \sqrt{\Delta}}, \\ \mu\_- &= m - \frac{1}{\alpha} - \sqrt[3]{\frac{q}{2} + \sqrt{\Delta}} - \sqrt[3]{\frac{q}{2} - \sqrt{\Delta}}, \end{aligned}$$

where the Δ, *p*, and *q* are the same as above. We can easily obtain from the formula above that *Rn*,*α*(*μ*+) <sup>≤</sup> 0, implying that *<sup>m</sup>α*,*<sup>n</sup>* <sup>&</sup>lt; *<sup>μ</sup>*<sup>+</sup> with probability at least 1 <sup>−</sup> *<sup>e</sup>*−*x*, since *Rn*,*α*(*μ*) is a non-increasing function. Similarly, *<sup>m</sup>α*,*<sup>n</sup>* <sup>&</sup>gt; *<sup>μ</sup>*<sup>−</sup> with probability at least 1 <sup>−</sup> *<sup>e</sup>*−*x*. Then, by choosing the parameter *<sup>α</sup>*, we can derive the performance of the estimator *<sup>m</sup>α*,*<sup>n</sup>* for the bias of the mean *m*. That is, with probability at least 1 − 2*e*−*x*, we have

$$
\mu\_- < \widehat{m}\_{\alpha,n} < \mu\_+.
$$

The proof of Theorem 2 is completed.

**Proof of Corollary 1.** In fact, the right-hand side of (7) can be bounded as follows without limiting the sign of s:

$$\begin{split} |\hat{m}\_{n,\alpha} - m| &\leq 2\left( \left| \sqrt[3]{\frac{q}{2} + \sqrt{\Delta}} + \sqrt[3]{\frac{q}{2} - \sqrt{\Delta}} \right| \right) \\ &< 4 \left| \sqrt[3]{\frac{n\alpha^3 s + 6\alpha - 4n}{2n\alpha^3}} \right| \\ &= 4 \left| \sqrt[3]{-\frac{2}{\alpha^3} + \frac{3\alpha}{n\alpha^3} + \frac{s}{2}} \right|, \end{split} \tag{16}$$

with the assumption *n* > <sup>3</sup> <sup>2</sup> (1 + *x*), which is weaker than Catoni's, (16) can be bounded by

$$\begin{aligned} 4 \left| \sqrt[3]{\frac{2}{\alpha^3} - \frac{2}{\alpha^3} + \frac{s}{2}} \right| \\ = 4 \left| \sqrt[3]{\frac{s}{2}} \right|. \end{aligned} \tag{17}$$

Moreover, assuming that − 4*n*3*v*<sup>3</sup> <sup>729</sup> *s* - 4*n*3*v*<sup>3</sup> <sup>729</sup> , we can obtain that (17) is bounded by (1 + *x*) *v <sup>n</sup>* ; then, (8) holds.

#### **3. Simulation**

In this section, we considered the performance of the estimator with respect to the t-distribution on applications by Monte Carlo simulation exercise results. We focused on the performance of the estimator in *L*<sup>1</sup> regression. Our data were simulated from a linear model generated from a t-distribution regressed by our proposed estimator; we measured the loss of the regression by the minimization of the *L*<sup>1</sup> norm.

The details of the simulation are as follows: we considered *n* independent, identically distributed real random variables pairs (*X*1,*Y*1),(*X*2,*Y*2)..., (*Xn*,*Yn*) where *Xi* take their values in R<sup>3</sup> while *Yi* in R, and the explanatory variables *Xi* are drawn from a multivariate normal distribution with 0 mean, and variance is a three-dimensional identity matrix. The response variable *Yi* is generated as follows:

$$Y\_i = X\_i^T \theta + \varepsilon\_{i\prime} \tag{18}$$

where the parameter vector *θ* is set to be (0.25, −0.25, 0.50), and *<sup>i</sup>* is an error term with zero mean and unit variance, which is drawn from a Student t-distribution. Our main goal was to estimate the parameter *θ* by minimizing the *L*<sup>1</sup> risk

$$\mathbb{E}\left|Y - X\_i^T \theta\right| \prime$$

and then we defined the the *L*<sup>1</sup> estimators *θ* 1, the classical Catoni mean estimator *θ* 2, and the third-moment Catoni's estimator *θ* <sup>3</sup> as follows

$$\begin{aligned} \widehat{\theta}\_{1} &= \arg\min\_{\theta} \widehat{R}\_{1}(\theta) = \arg\min\_{\theta} \frac{1}{n} \sum\_{i=1}^{n} \left| Y\_{i} - X\_{i}^{T}\theta \right|, \\ \widehat{\theta}\_{2} &= \arg\min\_{\theta} \widehat{R}\_{2}(\mu) = \arg\min\_{\theta} \frac{1}{na} \sum\_{i=1}^{n} \Phi\left(a\left(\left| Y\_{i} - X\_{i}^{T}\theta \right| - \mu\right)\right) = 0, \\ \widehat{\theta}\_{3} &= \arg\min\_{\theta} \widehat{R}\_{3}(\mu) = \arg\min\_{\theta} \frac{1}{na} \sum\_{i=1}^{n} \Psi\left(a\left(\left| Y\_{i} - X\_{i}^{T}\theta \right| - \mu\right)\right) = 0, \end{aligned} \tag{19}$$

where the *<sup>R</sup>*2(*μ*), *<sup>R</sup>*3(*μ*) is the root of the right side of the equation, respectively; *<sup>φ</sup>*(*x*) is the widest choice defined in Catoni's result, the parameter *α* = 1, which is the same as Brownless's work; *ψ*(*x*) was set as above; and the parameter *α* = −1. The measures for the performance of the estimator are as follows:

$$\begin{aligned} R\left(\widehat{\theta}\_{1}\right) - R(\theta) &= \mathbb{E}\left|Y - X^{T}\widehat{\theta}\_{1}\right| - \mathbb{E}\left|Y - X^{T}\theta\right|, \\ R\left(\widehat{\theta}\_{2}\right) - R(\theta) &= \mathbb{E}\left|Y - X^{T}\widehat{\theta}\_{2}\right| - \mathbb{E}\left|Y - X^{T}\theta\right|, \\ R\left(\widehat{\theta}\_{3}\right) - R(\theta) &= \mathbb{E}\left|Y - X^{T}\widehat{\theta}\_{3}\right| - \mathbb{E}\left|Y - X^{T}\theta\right|. \end{aligned} \tag{20}$$

The simulation experiments repeated with different sample sizes, which ranged from 50 to 1000 and with degrees of freedom of the t-distribution ranging from 1 to 7. Each set of the sample size experiments was replicated 1000 times, and for each replication, we evaluated the performance of the regression by the mean of the sample *X* 1,*<sup>Y</sup>* 1 ,(*X* 2,*<sup>Y</sup>* <sup>2</sup>), ... ,(*<sup>X</sup> m*,*Y <sup>m</sup>*)—that is, i.i.d.with the sample (*X*1,*Y*1),(*X*2,*Y*2)..., (*Xn*,*Yn*). We used the following equation to evaluate the performance of the regression, which called excess risk. *m* 2

$$\begin{split} \tilde{\boldsymbol{R}} \left( \widehat{\boldsymbol{\theta}}\_{1} \right) &= \frac{1}{m} \sum\_{i=1}^{m} \left| \boldsymbol{Y}\_{i}^{\prime} - \boldsymbol{Z}\_{i}^{T} \widehat{\boldsymbol{\theta}}\_{1} \right|^{2}, \\ \tilde{\boldsymbol{R}} \left( \widehat{\boldsymbol{\theta}}\_{2} \right) &= \frac{1}{m} \sum\_{i=1}^{m} \left| \boldsymbol{Y}\_{i}^{\prime} - \boldsymbol{Z}\_{i}^{T} \widehat{\boldsymbol{\theta}}\_{2} \right|^{2}, \\ \tilde{\boldsymbol{R}} \left( \widehat{\boldsymbol{\theta}}\_{3} \right) &= \frac{1}{m} \sum\_{i=1}^{m} \left| \boldsymbol{Y}\_{i}^{\prime} - \boldsymbol{Z}\_{i}^{T} \widehat{\boldsymbol{\theta}}\_{3} \right|^{2}. \end{split} \tag{21}$$

Figure 3 displays the performance of the excess risk for three estimators when *n* = 500 and the degrees of freedom of the t-distribution ranged from 1 to 7; we can obtain that the

**Figure 3.** Excess risk varies with degrees of freedom.

The results of the Monte Carlo simulation including the performance of the estimator for different *n* are presented in Table 1, and we also compared the performance between the proposed estimator and other estimators with various risks in Table 2 where sample size *n* = 500 and degrees of freedom *d* = 1; the *L*<sup>1</sup> represents the general *L*<sup>1</sup> regression; the *C* and *C*<sup>3</sup> denote the original Catoni estimator and our third-moment Catoni estimator, respectively; and the ER, RB, and SMSE represents the excess risk, relative risk (| <sup>ˆ</sup> *θ*2−*θ*2| *θ*<sup>2</sup> , with <sup>ˆ</sup> *θ* = <sup>1</sup> <sup>1000</sup> <sup>∑</sup><sup>1000</sup> *<sup>j</sup>*=<sup>1</sup> <sup>ˆ</sup> *θ*(*j*)), and the square root of the mean square error ( <sup>√</sup>*MSE* <sup>=</sup> 1 <sup>1000</sup> <sup>∑</sup><sup>1000</sup> *<sup>j</sup>*=<sup>1</sup> [ <sup>ˆ</sup> *θ*(*j*)<sup>2</sup> − *θ*2]2).

We can derive from the table that when the distribution has a heavy tail, our estimator performs better in most cases than the other two estimators, and the excess risk of the estimator decreases as the sample size increases. At the same time, with the degrees of freedom of the t-distribution rising, the tail of the t-distribution becomes thinner, which becomes closer to the normal distribution, and the performance of all procedures on excess risk is significantly improved; additionally, the proposed estimator also performs well for different risks.

**Table 1.** The excess risk of the *L*1, Catoni, and third-moment Catoni regression estimator for different degrees of freedom and sample size *n*.



**Table 2.** Comparisons of the performance between the proposed estimator and other estimators with various risks.

We also examined the performance of the third-moment Catoni estimator on a skewed normal distribution in Table 3; the model still follows (18) where follows a skewed normal distribution with shape parameter *α* = 1, 3, 5 and with other settings unchanged. We can draw conclusions from the table that the bias of the improved estimator is still smaller than the original one. However, the deviation in the estimator did not display a significant difference as the shape parameter *α* changed. We suppose that this results from the tail behavior of the skew normal distribution in that the existence of its fourth moment conflicts with the usual assumption that the fourth moment of heavy tail distribution does not exist. At the same time, neither Catoni's estimator nor our estimator performed better than the estimator obtained by L1 regression.

**Table 3.** The excess risk of the *L*1, Catoni, and third-moment Catoni regression estimator on a skewed normal distribution.


#### **4. Empirical Analysis**

In this section, we used the proposed procedure to research the dataset "tumor cell resistance to death," an artificial dataset consisting of two different types of tumor cells A and B, and the experiment records their resistance to different doses of experimental drugs. The explanatory variable *Xi* here is the dose of the drug, and the response variable *Yi* is the score representing the resistance to death, ranging from 0 to 4. These data are available in the R lqr package; Galarza et al. [20] have studied these data by the quantile regression method.

In Figures 4–7, we display the QQplot and the log-QQplot about the scores for cell A and cell B, and it can be seen that the distribution of both cells lacks normality; however, the normality is satisfied between cells and log-scores; besides, the boxplot and the bee colony diagram in Figures 8 and 9 shows that both cell A and cell B have heavy-tails, which allows us to focus on the following regression model:

$$\log(Y\_i) = \beta \bullet + \beta\_1 X\_{i\prime}$$

where *Yi* and *Xi* are defined before. Our focus was estimating the parameters *β*<sup>0</sup> and *β*1, the solution of the following equation:

$$\hat{r}\_{\beta}(u) = \frac{1}{n\alpha} \sum\_{i=1}^{n} \psi \left( \alpha \left( \left| \log(\boldsymbol{Y}\_i) - \beta\_0 - \boldsymbol{X}\_i^\top \beta\_1 \right| - u \right) \right) = 0.$$

**Figure 4.** QQplot for cell A.

**Figure 5.** QQplot for cell B.

**Figure 6.** log-QQplot for cell A.

**Figure 7.** log-QQplot for cell B.

Let *<sup>R</sup>C*(*β*) denote the solution of the *rβ*(*u*) = 0; then, the Catoni regression estimator of *β*<sup>0</sup> and *β*<sup>1</sup> is in the form as follows:

**Figure 8.** Boxplot about the log-scores for the two types of cells.

**Figure 9.** The bee colony diagram about the log-scores for the two types of cells.

Moreover, we compared the proposed estimator with the classical OLS estimator in Figures 10 and 11. The residuals plots are shown in Figures 12–15, from which we can draw the conclusion that the distribution of the residual of the three-order Catoni regression performs more uniformly; furthermore, the Mean Squared Error of the third-moment Catoni regression and OLS regression was 0.1120, 0.1255 for cell A and 0.2268, 0.2335 for cell B, respectively, which indicates that the proposed method is a better regression.

**Figure 10.** Regression for cell A.

**Figure 11.** Regression for cell B.

**Figure 12.** OLS regression residual plot for cell A.

**Figure 13.** Third-moment Catoni regression residual plot for cell A.

**Figure 14.** OLS regression residual plot for cell B.

**Figure 15.** Third-moment Catoni regression residual plot for cell B.

#### **5. Discussion**

Estimating the mean of random variables is a classical issue in statistics [1], and it has been well studied in classical statistics; however, with the discovery of heavy-tailed distribution in many research fields, its existence is an important challenge in statistics. When the data have heavy tails, the traditional estimators such as the empirical mean usually perform poorly. Therefore, how to find an appropriate robust procedure is a wellknown problem and has aroused great interest. A new estimator based on reconstructing the structure of the empirical mean was proposed by Catoni, which has excellent theoretical properties on the bias.

The Catoni's estimator is based on the existence of the variance *v* of the random variable. Therefore, with a weaker assumption on the moment conditions, it is an interesting issue whether the estimator has a better performance. In this study, we assumed that the third moment *s* of the data exists, and a more accurate upper bound of the exponential moment was obtained, which motivates an estimator with a better bias. To a certain extent, the assumption reduces the robustness to outliers, but it has a minimal effect in heavy tails

distribution (the fourth moment does not exist). In future work, we have the following goals: first, we believe that our method can be applied as an improved mean estimator to any relevant model as long as the third moment of the distribution has good theoretical properties and wide application; second, it is an interesting idea to discuss and compare the bias bound of the proposed estimator with the minimax bound; finally, the estimation of the variance in regression models is very important in statistical inference. Considering that the deviation of the estimator given in our main theoretical results from the true value can be regarded as the confidence interval based on the known variance; therefore, the proposed estimator is not suitable for the estimation of variance, but it is an interesting issue how a proper variance estimator affects the bias of our estimator; additionally, we will consider variance estimation under heavy-tailed distributions in later work.

**Author Contributions:** Conceptualization, Y.C.; methodology, Y.C. and R.L.; investigation, Y.C. and R.L.; software, S.S.; writing, Y.C. and R.L.; Y.C. has designed the framework of this study and substantively revised it; R.L. and Y.C. have performed the methodology and simulation; S.S. implements research on empirical analysis research. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Natural Science Foundation of China (No. 72073082).

**Data Availability Statement:** The dataset for the empirical analysis can be derived from the following resource available in CRAN, https://cran.r-project.org/web/packages/lqr/index.html, accessed on 12 February 2022.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A New Bound in the Littlewood–Offord Problem**

**Friedrich Götze <sup>1</sup> and Andrei Yu. Zaitsev 2,3,\***


**Abstract:** The paper deals with studying a connection of the Littlewood–Offord problem with estimating the concentration functions of some symmetric infinitely divisible distributions. It is shown that the concentration function of a weighted sum of independent identically distributed random variables is estimated in terms of the concentration function of a symmetric infinitely divisible distribution whose spectral measure is concentrated on the set of plus-minus weights.

**Keywords:** concentration functions; inequalities; the Littlewood–Offord problem; sums of independent random variables

**MSC:** 60F05; 60E15; 60G50

The aim of the present work is to provide a supplement to the paper of Eliseeva and Zaitsev [1]. We studied a connection of the Littlewood–Offord problem with estimating the concentration functions of some symmetric infinitely divisible distributions. In the study, we repeat the arguments of [1], adding, at the last step, an application of Jensen's inequality.

Let *X*, *X*1, ... , *Xn* be independent identically distributed (i.i.d.) random variables. The concentration function of a **R***d*-dimensional random vector *Y* with distribution *F* = L(*Y*) is defined by the equality

$$Q(F, \lambda) = \sup\_{x \in \mathbb{R}^d} \mathbf{P}(Y \in x + \lambda B), \quad 0 \le \lambda \le \infty,$$

where *B* = {*x* ∈ **R***<sup>d</sup>* : *x* ≤ 1/2}. Of course, *Q*(*F*, ∞) = 1. Let *a* = (*a*1, ... , *an*), where *ak* = (*ak*1, ... , *akd*) ∈ **R***d*, *k* = 1, ... , *n*. In this paper, we studied the behavior of the concentration functions of the weighted sums *Sa* <sup>=</sup> *<sup>n</sup>* ∑ *k*=1 *Xk ak* with respect to the properties of vectors *ak*. Interest in this subject has increased considerably in connection with the study of eigenvalues of random matrices (see, for instance, Friedland and Sodin [2], Rudelson and Vershynin [3,4], Tao and Vu [5,6], Nguyen and Vu [7], Vershynin [8], Tikhomirov [9], Livshyts, Tikhomirov and Vershynin [10], Campos et al. [11]). For a detailed history of the problem, we refer to a review of Nguyen and Vu [12]. The authors of the above articles (see also Halász [13]) called this question the Littlewood–Offord problem, since, for the first time, this problem was considered in 1943 by Littlewood and Offord [14] in connection with the study of random polynomials. They considered a special case, where the coefficients *ak* ∈ **R** are one-dimensional, and *X* takes values ±1 with probabilities 1/2.

The recent achivements in estimating the probabilities of singularity of random matrices [9–11] were based on the Rudelson and Vershynin [3,4,8] method of *least common denominator.* Note that the results of [2,4,8] (concerning the Littlewood–Offord problem) were improved and refined in [15–17].

**Citation:** Götze, F.; Zaitsev, A.Y. A New Bound in the Littlewood–Offord Problem. *Mathematics* **2022**, *10*, 1740. https://doi.org/10.3390/ math10101740

Academic Editor: Christophe Chesneau

Received: 11 April 2022 Accepted: 10 May 2022 Published: 19 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Now, we introduce some notation. In the sequel, let *Fa* denote the distribution of the sum *Sa*, let *Ey* be the probability measure concentrated at a point *y*, and let *G* be the distribution of the symmetrized random variable *<sup>X</sup>* <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>−</sup> *<sup>X</sup>*2. For *<sup>δ</sup>* <sup>≥</sup> 0, we denote

$$p(\delta) = G\{\{z : |z| > \delta\}\}.\tag{1}$$

The symbol *c* will be used for absolute positive constants which may be different, even in the same formulas.

Writing *A B* means that |*A*| ≤ *cB*. Furthermore, we will write *A B*, if *A B* and *B A*. We will write *A <sup>d</sup> B*, if |*A*| ≤ *c*(*d*)*B*, where *c*(*d*) > 0 depends on *d* only. Similarly, *A <sup>d</sup> B*, if *A <sup>d</sup> B* and *B <sup>d</sup> A*. The scalar product in **R***<sup>d</sup>* will be denoted · , ·. Later, *x* is the largest integer *k*, such that *k* < *x*. For *x* = (*x*1,..., *xn*) ∈ **R***n*, we will use the norms *x*<sup>2</sup> = *x*<sup>2</sup> <sup>1</sup> + ··· + *<sup>x</sup>*<sup>2</sup> *<sup>n</sup>* and <sup>|</sup>*x*<sup>|</sup> <sup>=</sup> max*<sup>j</sup>* <sup>|</sup>*xj*|. We denote by *<sup>F</sup>*(*t*), *<sup>t</sup>* <sup>∈</sup> **<sup>R</sup>***d*, the characteristic function of *d*-dimensional distributions *F*.

Products and powers of measures will be understood in the convolution sense. For infinitely divisible distribution *F* and *λ* ≥ 0, we denote by *F<sup>λ</sup>* the infinitely divisible distribution with characteristic function *<sup>F</sup>λ*(*t*).

The elementary properties of concentration functions are well studied (see, for instance, refs [18–20]). It is known that

$$Q(F,\mu) \ll\_d (1 + \lfloor \mu/\lambda \rfloor)^d Q(F,\lambda) \tag{2}$$

for any *μ*, *λ* > 0. Hence,

$$Q(F, c\lambda) \asymp\_d Q(F, \lambda). \tag{3}$$

Let us formulate a generalization of the classical Esséen inequality [21] to the multivariate case ([22], see also [19]):

**Lemma 1.** *Let τ* > 0 *and let F be a d-dimensional probability distribution. Then,*

$$Q(\mathcal{F}, \mathbf{r}) \ll\_d \pi^d \int\_{|t| \le 1/\tau} |\widehat{\mathcal{F}}(t)| \, dt. \tag{4}$$

In the general case, *Q*(*F*, *τ*) cannot be estimated from below by the right hand side of inequality (4). However, if we assume additionally that the distribution *F* is symmetric and its characteristic function is non-negative for all *t* ∈ **R**, then we have the lower bound:

$$Q(F, \tau) \gg\_d \tau^d \int\_{|t| \le 1/\tau} \widehat{F}(t) \, dt,\tag{5}$$

and, therefore,

$$Q(F, \tau) \asymp\_d \pi^d \int\_{|t| \le 1/\tau} \widehat{F}(t) \, dt,\tag{6}$$

(see [23] or [18], Lemma 1.5 of Chapter II for *d* = 1). In the multivariate case, relations (5) and (6) may be found in Zaitsev [24]. The use of relation (6) allows us to simplify the arguments of Friedland and Sodin [2], Rudelson and Vershynin [4] and Vershynin [8] which were applied to Littlewood–Offord problem (see [15–17]).

The main result of this paper is a general inequality which reduces the estimation of concentration functions in the Littlewood–Offord problem to the estimation of concentration functions of some infinitely divisible distributions. This result is formulated in Theorem 1.

For *z* ∈ **R**, introduce the distribution *Hz* with the characteristic function

$$\hat{H}\_z(t) = \exp\left(-\frac{1}{2}\sum\_{k=1}^n \left(1 - \cos(\langle t, a\_k \rangle z)\right)\right). \tag{7}$$

It depends on the vector *a*. It is clear that *Hz* is a symmetric infinitely divisible distribution. Therefore, its characteristic function is positive for all *t* ∈ **R***d*.

$$\text{Recall that } G = \mathcal{L}(X\_1 - X\_2) \text{ and } F\_a = \mathcal{L}(S\_a) \text{, where } S\_a = \sum\_{k=1}^n X\_k a\_k.$$

**Theorem 1.** *Let V be an arbitrary one-dimensional Borel measure, such that λ* = *V*{**R**} > 0*, and V* ≤ *G, that is, V*{*B*} ≤ *G*{*B*}*, for any Borel set B. Then, for any τ* > 0*, we have*

$$Q(F\_{\mathfrak{a}}, \mathfrak{r}) \ll\_{d} \int\_{z \in \mathbf{R}} Q(H\_1^{\lambda}, \mathfrak{r}|z|^{-1}) \, W\{dz\},\tag{8}$$

*where W* = *λ*−1*V.*

**Corollary 1.** *For any ε*, *τ* > 0*, we have*

$$Q(F\_{\mathfrak{a}}, \mathfrak{r}) \ll\_d Q(H\_1^{p(\mathfrak{r}/\mathfrak{e})}, \mathfrak{e}),\tag{9}$$

*where p*(·) *is defined in* (1)*.*

In order to verify Corollary 1, we note that the distribution *<sup>G</sup>* <sup>=</sup> <sup>L</sup>(*X*) may be represented as the mixture

$$G = p\_0 G\_0 + p\_1 G\_{1'} \quad \text{where} \quad p\_j = \mathbb{P}\{X \in A\_j\}, \quad j = 0, 1, 2$$

*A*<sup>0</sup> = {*x* : |*x*| ≤ *τ*/*ε*}, *A*<sup>1</sup> = {*x* : |*x*| > *τ*/*ε*}, *Gj* are probability measures defined for *pj* > 0 by the formula *Gj*{*B*} = *G*{*B* ∩ *Aj*}/*pj* , for any Borel set *B*. In fact, *Gj* is the conditional distribution of *<sup>X</sup>*, given that *<sup>X</sup>* <sup>∈</sup> *Aj*. If *pj* <sup>=</sup> 0, then we can take *Gj* as an arbitrary measure.

The conditions of Theorem 1 are satisfied for *V* = *p*1*G*1. *λ* = *p*<sup>1</sup> = *p*(*τ*/*ε*), *W* = *G*1. Inequalities (2) and (6) imply that

$$\begin{split} \mathbb{Q}(\mathsf{F}\_{\mathsf{a}},\mathsf{r}) &\quad \lhd\_{\mathsf{d}} \quad \int\_{\mathsf{z}\in A\_{1}} \mathbb{Q}(H\_{1}^{\lambda},\mathsf{r}|z|^{-1}) \, \mathsf{W}\{dz\} \\ &\leq \sup\_{\mathsf{z}\geq\mathsf{r}/\mathsf{e}} \, \mathsf{Q}\left(H\_{1}^{p(\mathsf{r}/\mathsf{e})},\mathsf{r}/\mathsf{z}\right) = \mathsf{Q}\left(H\_{1}^{p(\mathsf{r}/\mathsf{e})},\mathsf{e}\right), \end{split} \tag{10}$$

proving (9).

Applying Theorem 1 with *V* of the form

$$V\{dz\} = \left(1 + \lfloor \pi(\varepsilon|z|)^{-1} \rfloor\right)^{-d} G\{dz\},\tag{11}$$

and using inequality (2), we obtain.

**Corollary 2.** *For any ε*, *τ* > 0*, we have*

$$Q(F\_{a\prime}\mathfrak{r}) \ll\_d \lambda^{-1} Q(H\_1^{\lambda}, \mathfrak{e}),\tag{12}$$

*where*

$$\lambda = \lambda(G, \tau/\varepsilon) = V\{\mathbf{R}\} = \int\_{z \in \mathbf{R}} \left( 1 + \lfloor \tau(\varepsilon|z|)^{-1} \rfloor \right)^{-d} G\{dz\}.\tag{13}$$

It is clear that *τ*(*ε*|*z*|)−1 = 0 if |*z*| > *τ*/*ε*. Therefore, *λ* = *λ*(*G*, *τ*/*ε*) ≥ *p*(*τ*/*ε*), hence, *Q*(*H<sup>λ</sup>* <sup>1</sup> ,*ε*) <sup>≤</sup> *<sup>Q</sup>*(*Hp*(*τ*/*ε*) <sup>1</sup> ,*ε*). Thus, if *λ <sup>d</sup>* 1, then inequality (12) of Corollary 2 is stronger than inequality (9) of Corollary 1.

The proof of Theorem 1 is based on elementary properties of concentration functions. We repeat the arguments of [1], adding, at the last step, an application of Jensen's inequality. In [1], inequality (2) was used instead. The main result of [1] does not imply Corollary 2. Note that *H<sup>λ</sup>* <sup>1</sup> is an infinitely divisible distribution with the Lévy spectral measure *M<sup>λ</sup>* = *λ* <sup>4</sup> *<sup>M</sup>*∗, where *<sup>M</sup>*<sup>∗</sup> <sup>=</sup> *<sup>n</sup>* ∑ *k*=1 *Eak* + *E*−*ak* . It is clear that the assertions of Theorem 1 and Corollaries 1 and 2 may be treated as statements about the measure *M*∗.

Corollary 1 was already proved earlier in [1,25], see also [26] for the case *τ* = 0. It was used essentially in [25,27] to show that Arak's inequalities for concentration functions may be used for investigations of the Littlewood–Offord problem. Arak has shown that if the concentration function of infinitely divisible distribution is relatively large, then the spectral measure of this distribution is concentrated in a neighborhood of a set with simple arithmetical structure. Together with Corollary 1, this means that a large value of *Q*(*Fa*, *τ*) implies a simple arithmetical structure of the set {±*ak*}*<sup>n</sup> <sup>k</sup>*=1. This statement is similar to the so-called "inverse principle" in the Littlewood–Offord problem (see [5,7,12]).

Note that using the results of Arak [23,28] (see also [18]) one could derive from Corollary 1 inequalities similar to boumds for concentration functions in the Littlewood– Offord problem, which were obtained in a paper of Nguyen and Vu [7] (see also [12]). A detailed discussion of this fact is presented in [25,27]. We noticed that Corollary 2 may be stronger than Corollary 1. Therefore, the results of [25,27] could be improved (in the sense of dependence of constants on the distribution of *X*1) replacing an application of Corollary 1 by an application of Corollary 2. The authors are going to devote a separate publication to this topic.

**Proof of Theorem 1.** Let us show that, for arbitrary probability distribution, *W* and *λ*, *T* > 0,

$$\begin{split} \log \int\_{|t| \le T} \exp \left( -\frac{1}{2} \sum\_{k=1}^{n} \int\_{z \in \mathbb{R}} \left( 1 - \cos(\langle t, a\_k \rangle z) \right) \lambda \, \mathcal{W} \{ dz \} \right) dt \\ \le \int\_{z \in \mathbb{R}} \left( \log \int\_{|t| \le T} \exp \left( -\frac{\lambda}{2} \sum\_{k=1}^{n} \left( 1 - \cos(\langle t, a\_k \rangle z) \right) \right) dt \right) \mathcal{W} \{ dz \} \\ = \int\_{z \in \mathbb{R}} \left( \log \int\_{|t| \le T} \hat{H}\_z^{\lambda}(t) \, dt \right) \mathcal{W} \{ dz \}. \end{split} \tag{14}$$

It is suffice to prove (14) for discrete distributions *W* = ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *pjEzj* , where 0 ≤ *pj* ≤ 1, *zj* ∈ **R**, ∑<sup>∞</sup> *<sup>j</sup>*=<sup>1</sup> *pj* = 1. Applying in this case the generalized Hölder inequality, we have

$$\begin{split} \int\limits\_{|t| \le T} \exp\left(-\frac{1}{2} \sum\_{k=1}^{n} \int\limits\_{z \in \mathbb{R}} \left(1 - \cos(\langle t, a\_k \rangle z)\right) \lambda \, W\{dz\}\right) dt \\ &= \int\limits\_{|t| \le T} \exp\left(-\frac{\lambda}{2} \sum\_{j=1}^{\infty} p\_j \sum\_{k=1}^{n} \left(1 - \cos(\langle t, a\_k \rangle z\_j)\right)\right) dt \\ &\le \prod\_{j=1}^{\infty} \left(\int\limits\_{|t| \le T} \exp\left(-\frac{\lambda}{2} \sum\_{k=1}^{n} \left(1 - \cos(\langle t, a\_k \rangle z\_j)\right)\right) dt\right)^{p\_j}. \end{split} \tag{15}$$

Taking logarithms of the left- and right-hand sides of (15), we get (14). In general cases, we can approximate *W* by discrete distributions in the sense of weak convergence and pass to the limit. Note also that the integrals <sup>|</sup>*t*|≤*<sup>T</sup> dt* may be replaced in (14) by the integrals *μ*{*dt*} with an arbitrary Borel measure *μ*.

Since for characteristic function *<sup>U</sup>*(*t*) of a random vector *<sup>Y</sup>*, we have

$$|\hat{\mathcal{U}}(t)|^2 = \mathbf{E} \exp(i \langle t, \hat{Y} \rangle) = \mathbf{E} \cos(\langle t, \hat{Y} \rangle),$$

where *<sup>Y</sup>* is the corresponding symmetrized random vector, then

$$|\hat{\mathcal{U}}(t)| \le \exp\left(-\frac{1}{2}\left(1 - |\hat{\mathcal{U}}(t)|^2\right)\right) = \exp\left(-\frac{1}{2}\mathcal{E}\left(1 - \cos(\langle t, \bar{\mathcal{V}}\rangle)\right)\right).\tag{16}$$

According to Theorem 1 and relations *V* = *λ W* ≤ *G*, (14) and (16), applying Jensen's inequality of the form exp(**E** *f*(*ξ*)) ≤ **E** exp(*f*(*ξ*)) for any measurable function *f* and any random varialble *ξ*, we have

$$\begin{split} Q(\bar{u}\_{t},\tau) &\ll\_{d} \quad \tau^{d} \int\_{\tau|t|\leq\bar{1}} |\hat{F}\_{t}(t)| \, dt \\ &\ll\_{d} \quad \tau^{d} \int\_{\tau|t|\leq\bar{1}} \exp\left(-\frac{1}{2} \sum\_{k=1}^{n} \mathbb{E}\left(1-\cos(\langle t,a\_{k}\rangle\bar{X})\right)\right) \, dt \\ &= \quad \tau^{d} \int\_{\tau|t|\leq\bar{1}} \exp\left(-\frac{1}{2} \sum\_{k=1}^{n} \int\_{z\in\mathbb{R}} \left(1-\cos(\langle t,a\_{k}\rangle z)\right) G\{dz\}\right) \, dt \\ &\leq \quad \tau^{d} \int\_{\tau|t|\leq\bar{1}} \exp\left(-\frac{1}{2} \sum\_{k=1}^{n} \int\_{z\in\mathbb{R}} \left(1-\cos(\langle t,a\_{k}\rangle z)\right) \lambda \, W\{dz\}\right) \, dt \\ &\leq \quad \exp\left(\int\_{\tau|z|\leq\bar{1}} \log\left(\tau^{d} \int\_{\tau|t|\leq\bar{1}} \hat{H}\_{z}^{\lambda}(t) \, dt\right) W\{dz\}\right) \\ &\leq \quad \int\_{z\in\mathbb{R}} \Big(\tau^{d} \int\_{\tau|t|\leq\bar{1}} \hat{H}\_{z}^{\lambda}(t) \, dt\Big) W\{dz\}. \end{split} \tag{17}$$

Using (6), we have

$$\pi^d \int\_{\tau|t| \le 1} \hat{H}\_z^\lambda(t) \, dt \quad \asymp\_d \quad \mathbb{Q}(H\_z^\lambda, \tau) = \mathbb{Q}(H\_1^\lambda, \tau |z|^{-1}). \tag{18}$$

Substituting this formula into (17), we obtain (8). In (18), we have used that *H<sup>λ</sup> <sup>z</sup>* = L(*zη*), where *η* is a random vector with L(*η*) = *H<sup>λ</sup>* 1 .

**Author Contributions:** Investigation, F.G.; Writing—original draft, A.Y.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors were supported by SFB 1283/2 2021—317210226 and by the RFBR-DFG grant 20-51-12004. The second author was supported by grant RFBR 19-01-00356.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We are grateful to the anonymous for valuable remarks.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **On De la Peña Type Inequalities for Point Processes**

**Naiqi Liu 1, Vladimir V. Ulyanov 2,3 and Hanchao Wang 3,\***


**Abstract:** There has been a renewed interest in exponential concentration inequalities for stochastic processes in probability and statistics over the last three decades. De la Peña established a nice exponential inequality for a discrete time locally square integrable martingale. In this paper, we obtain de la Peña's inequalities for a stochastic integral of multivariate point processes. The proof is primarily based on Doléans–Dade exponential formula and the optional stopping theorem. As an application, we obtain an exponential inequality for block counting process in Λ−coalescent.

**Keywords:** de la Peña's inequalities; purely discontinuous local martingales; stochastic integral of multivariate point processes; Doléans–Dade exponential

**MSC:** 60E15; 60G55

#### **1. Introduction**

Let *S* = (*Sn*)*n*≥<sup>0</sup> be a locally square integrable martingale on (Ω, F,(F*n*)*n*≥1, P). The predictable quadratic variation of *S* = (*Sn*)*n*≥<sup>0</sup> is given by

$$<\mathcal{S}, \mathcal{S}>\_{\mathfrak{n}} = \sum\_{i=1}^{n} \mathbb{E}[((\mathcal{S}\_i - \mathcal{S}\_{i-1})^2 | \mathcal{F}\_{i-1}] \, .]$$

Many authors studied the upper bound of

$$\mathbb{P}(S\_n \ge x\_\prime < S\_\prime S >\_n \le y).$$

The celebrated Freedman inequality is as follows.

**Theorem 1** (Freedman [1])**.** *Let S* = (*Sn*)*n*≥<sup>0</sup> *be a locally square integrable martingale on* (Ω, F,(F*n*)*n*≥1, P)*.If* |*Sk* − *Sk*<sup>−</sup>1| ≤ *c for each* 1 ≤ *k* ≤ *n, then*

$$\mathbb{P}\left(S\_n \ge x, < S\_\prime S >\_n \le y\right) \le \exp\left\{-\frac{x^2}{2(y+cx)}\right\}.$$

This result can be regarded as an extension of Hoeffding [2]. Fan, Grama and Liu [3,4], and Rio [5] obtained a series of remarkable extensions of Freedman inequality [1]. See also Bercu et al. [6] for a recent review in this field.

De la Peña [7] establishes a nice exponential inequality for discrete time locally square integrable martingales.

**Citation:** Liu, N.; Ulyanov, V.V.; Wang, H. On De la Peña Type Inequalities for Point Processes. *Mathematics* **2022**, *10*, 2114. https://doi.org/10.3390/ math10122114

Academic Editor: Christophe Chesneau

Received: 21 April 2022 Accepted: 16 June 2022 Published: 17 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Theorem 2** (De la Peña [7])**.** *Let S* = (*Sn*)*n*≥<sup>0</sup> *be a locally square integrable and conditionally symmetric martingale on* (Ω, F,(F*n*)*n*≥1, P)*. Then,*

$$\mathbb{P}(\mathcal{S}\_n \ge x, \sum\_{i=1}^n (\mathcal{S}\_i - \mathcal{S}\_{i-1})^2 \le y) \le \exp\{-\frac{x^2}{2y}\}.$$

This result is quite different from the classical Freedman's inequality. The challenge for obtaining Theorem 1 is to find an approach based on the use of the exponential Markov's inequality. De la Peña constructed a supermartingale to get Theorem 1. Furthermore, Bercu and Touati [8] established the following result for self-normalized martingales, which are similar to Theorem 1.

**Theorem 3** (Bercu and Touati [8])**.** *Let S* = (*Sn*)*n*≥<sup>0</sup> *be a locally square integrable martingale on* (Ω, F,(F*n*)*n*≥1, P)*. Then, for all x*, *y* > 0*, a* ≥ 0 *and b* > 0*,*

$$\mathbb{P}(\frac{|S\_n|}{a+b\_n} \ge \chi < S, S>\_n \ge \sum\_{i=1}^n (S\_i - S\_{i-1})^2 + y) \le 2\exp\{-\mathfrak{x}^2(ab+\frac{b^2y}{2})\}.$$

It is natural to ask what will happen when we study the continuous-time processes for the above cases? Let (Ω, F,(F*t*)*t*≥0, P) be a stochastic basis. *M* = (*Mt*)*t*≥<sup>0</sup> is a continuous locally square integrable martingale. The predictable quadratic variation of *M*, < *M*, *M* >, is a continuous increasing process, such that (*M*<sup>2</sup> *<sup>t</sup>* − < *M*, *M* >*t*)*t*≥<sup>0</sup> is a local martingale. However, we cannot define an analogy for *M* like ∑*<sup>n</sup> <sup>i</sup>*=1(*Si* − *Si*−1)<sup>2</sup> in Theorems 1 and 3. Since *M* = (*Mt*)*t*≥<sup>0</sup> has jumps, we can replace ∑*<sup>n</sup> <sup>i</sup>*=1(*Si* − *Si*−1)<sup>2</sup> by ∑*s*≤*<sup>t</sup>* |*Ms*| 2. It is an interesting problem to consider De la Peña type inequalities for continuous-time local square integrable martingale with jumps. Some authors obtained the concentration inequalities for continuous-time stochastic processes. Bernstein's inequality for local martingales with jumps was given by van der Geer [9]. Khoshnevisan [10] found some concentration inequalities for continuous martingales. Dzhaparidze and van Zanten [11] extended Khoshnevisan's results to martingales with jumps.

This paper focuses on the De la Peña type inequalities for stochastic integrals of multivariate point processes. Stochastic integrals of multivariate point processes are an essential example of purely discontinuous local martingales. Some useful facts and results essential for this paper's proofs will be collected in Section 2. Section 3 will present our main results and give their proofs, while Section 4 will derive an exponential inequality for block counting process in Λ−coalescent as applications. Usually, *c*, *C*, *K*, ··· denote positive constants, which very often may be different at each occurrence.

#### **2. Preliminaries**

Let(Ω, F,(F*t*)*t*≥0, P) be a stochastic basis. A stochastic process *M* = (*Mt*)*t*≥<sup>0</sup> is called a purely discontinuous local martingale if *M*<sup>0</sup> = 0 and *M* is orthogonal to all continuous local martingales. The reader is referred to the classic book [12] due to Jacod and Shiryayev for more information. We shall restrict ourselves to the integer-valued random measure *μ* on R<sup>+</sup> × R induced by a R<sup>+</sup> × R-valued multivariate point process. In particular, let (*Tk*, *Zk*), *k* ≥ 1, be a multivariate point process, and define

$$\mu(dt, dx) = \sum\_{k \ge 1} \mathbf{1}\_{\{T\_k < \infty\}} \varepsilon\_{(T\_k Z\_k)}(dt, dx), \tag{1}$$

where *<sup>ε</sup>*(*Tk*,*Zk* ) is the delta measure at point (*Tk*, *Zk*). Then *<sup>μ</sup>*(*ω*; [0, *<sup>t</sup>*] <sup>×</sup> <sup>R</sup>) <sup>&</sup>lt; <sup>∞</sup> for all (*ω*, *t*) ∈ Ω × R. Let Ω˜ = Ω × R<sup>+</sup> × R, P˜ = P⊗B, where B is a Borel *σ*-field on R and P a *σ*-field generated by all left continuous adapted processes on Ω × R+. The predictable function is a P˜-measurable function on Ω˜ . Let *ν* be the unique predictable compensator of *μ* (up to a P-null set). Namely, *ν* is a predictable random measure such that for any predictable function *W*, *W* ∗ *μ* − *W* ∗ *ν* is a local martingale, where the *W* ∗ *μ* is defined by

$$\mathcal{W} \* \mu\_t = \begin{cases} \int\_0^t \int\_{\mathbb{R}} \mathcal{W}(s, \mathbf{x}) \mu(ds, d\mathbf{x}), & \text{if } \int\_0^t \int\_{\mathbb{R}} |\mathcal{W}(s, \mathbf{x})| \mu(ds, d\mathbf{x}) < \infty, \\\\ +\infty, & \text{otherwise.} \end{cases}$$

Note the *ν* admits the disintegration

$$\nu(dt, dx) = dA\_t K(\omega, t; dx),\tag{2}$$

where *K*(·, ·) is a transition kernel from (Ω × R+,P) into (R, B), and *A* = (*At*)*t*≥<sup>0</sup> is an increasing càdlág predictable process. For *μ* in (1), which is defined through multivariate point process, *ν* admits

$$\nu(dt, dx) = \sum\_{n \ge 1} \frac{1}{G\_n(\left[t, \infty\right] \times \mathbb{R})} \mathbf{1}\_{\{t \le T\_{n+1}\}} G\_n(dt, dx),$$

where *Gn*(*ω*, *ds*, *dx*) is a regular version of the conditional distribution of (*Tn*+1, *Zn*+1) with respect to *σ*{*T*1, *Z*1, ··· , *Tn*, *Zn*}. In particular, if *Fn*(*dt*) = *Gn*(*dt* × R), the point process *<sup>N</sup>* = <sup>∑</sup>*n*≥<sup>1</sup> **<sup>1</sup>**[*Tn*,∞) has the compensator *At* = *<sup>ν</sup>*([0, *<sup>t</sup>*] × R), which satisfies

$$A\_t = \sum\_{n \ge 1} \int\_0^{T\_{n+1} \wedge t} \frac{1}{F\_n([s, \infty])} F\_n(ds).$$

Now, we define the stochastic integrals of multivariate point processes. For a stopping time *T*, [*T*] = {(*ω*, *t*) : *T*(*ω*) = *t*} is the graph of *T*. For *μ* in (1), define *D* = !<sup>∞</sup> *<sup>n</sup>*=1[*Tn*]. With any measurable function *W* on Ω˜ , we define *at* = *ν*({*t*} × R), and

$$\mathcal{W} \* \nu\_t = \begin{cases} \int\_{\mathbb{R}} W(t, x) \nu(\{t\} \times dx), & \text{if } \int\_{\mathbb{R}} |W(t, x)| \nu(\{t\} \times dx) < \infty, \\\\ +\infty, & \text{otherwise.} \end{cases}$$

We denote by *Gloc*(*μ*) the set of all P−˜ measurable real-valued functions *W* such that [∑*s*≤*t*(*Ws*)2] 1/2 is local integrable variation process, where *<sup>W</sup><sup>t</sup>* <sup>=</sup> *<sup>W</sup>*1*D*(*ω*, *<sup>t</sup>*) <sup>−</sup> *<sup>W</sup>*<sup>ˆ</sup> *<sup>t</sup>*.

**Definition 1.** *If W* ∈ *Gloc*(*μ*)*, the stochastic integral of W with respect to μ* − *ν is defined as a purely discontinuous local martingales, the jump process of which is indistinguishable from W.*

We denote the stochastic integral of *W* with respect to *μ* − *ν* by *W* ∗ (*μ* − *ν*). For a given predictable function *W*, *W* ∗ (*μ* − *ν*) is a purely discontinuous local martingale, which is defined through jump process. It is easy to prove that *W* ∗ (*μ* − *ν*) = *W* ∗ *μ* − *W* ∗ *ν*. Denote *M* = *W* ∗ (*μ* − *ν*).

Itô's formula for a purely discontinuous local martingale is essential for our proofs. Now, we present Itô's formula for *M*.

**Lemma 1** (Itô's formula, Jacod and Shiryaev [12])**.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, W be a given predictable function on* Ω˜ *, and W* ∈ *Gloc*(*μ*)*. Let f be a differentiable function, for M* = *W* ∗ (*μ* − *ν*) *and t* > 0*,*

$$f(M\_t) = f(M\_0) + \int\_0^t f'(M\_{s-})dM\_s + \sum\_{s \le t} [f(M\_s) - f(M\_{s-}) - f'(M\_{s-})\Delta M\_s].$$

Under some conditions, Wang, Lin and Su [13] obtained

$$\mathbb{P}\left(M\_t \ge x, < M, M>\_t \le v^2 \text{ for some } t > 0\right) \le \exp\left\{-\frac{x^2}{2(v^2+c\infty)}\right\}\tag{3}$$

where < *M*, *M* > is the predictable quadratic variation process of *M* = *W* ∗ (*μ* − *ν*),

$$\_t = (W-\mathring{W})^2\*\nu\_t + \sum\_{1 \le s \le t} (1-a\_s)\mathring{W}\_s^2.$$

When *M* is a purely discontinuous local martingale, ∑*s*≤· |Δ*Ms*| <sup>2</sup>− < *M*, *M* > is a local martingale. There will be an interesting problem when the predictable quadratic variation < *M*, *M* > in (3) is replaced by the quadratic variation ∑*s*≤· |Δ*Ms*| 2. In this paper, we will estimate the upper bound of two types of tail probabilities:

$$\mathbb{P}\left(M\_t \ge \mathbf{x}, \sum\_{s\le t} |\Delta M\_s|^2 \le \nu^2 \text{ for some } t > 0\right) \tag{4}$$

and

$$\mathbb{P}\left(M\_t \ge (\alpha + \beta \sum\_{s \le t} |\triangle M\_s|^2) \ge \sum\_{s \le t} |\triangle M\_s|^2 \ge \epsilon \, M\_\prime M >\_t + \upsilon^2 \text{ for some } t > 0\right). \tag{5}$$

It is important to note that the continuity of *A* implies the quasi-left continuity of *M*. However, the quasi-left continuity of *M* can be destroyed easily by changing the filtration in the underlying space. For example, let *N* be a homogeneous Poisson process with respect to F. Let (*Tn*)*n*≥<sup>0</sup> be the sequence of the jump-times of *N*. The process *N* is not quasi-left continuous in the filtration G obtained by enlarging F initially with the *σ*−field R = *σ*(*T*1). ( *σ<sup>n</sup>* = (1 − <sup>1</sup> <sup>2</sup>*<sup>n</sup>* )*T*<sup>1</sup> is a sequence of <sup>G</sup> -stopping times announcing *<sup>T</sup>*1). The main purpose of this paper consists in estimating (4) and (5) when *M* is not quasi-left continuous.

#### **3. The Main Results and Their Proofs**

Now, we present our first main result.

**Theorem 4.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, at* = *ν*({*t*} × R)*, W be a given predictable function on* Ω˜ *, and W* ∈ *Gloc*(*μ*)*. M* = *W* ∗ (*μ* − *ν*)*. Assume* Δ*M* ≥ −1*. Then, for x* > 0*, v* > 0*,*

$$\mathbb{P}\left(M\_t \ge x, \sum\_{s\le t} |\triangle M\_s|^2 \le v^2 \text{ for some } t > 0\right) \le \left(\frac{v^2 + x}{v^2}\right)^{v^2} e^{-x}.$$

**Proof of Theorem 4.** For simplicity of notation, put

$$\begin{aligned} S(\lambda)\_t &= \int\_0^t \int\_{\mathbb{R}} \left( \epsilon^{\left[ \lambda \left( W - \hat{W} \right) - (\lambda + \log(1 - \lambda)) \left( W - \hat{W} \right)^2 \right]} - 1 - \lambda \left( W - \hat{W} \right) \right) \nu (ds, dx) \\ &+ \sum\_{s \le t} (1 - a\_s) \left( \epsilon^{\left[ -\lambda \hat{W}\_s + (\lambda + \log(1 - \lambda)) \left( \hat{W}\_s \right)^2 \right]} - 1 + \lambda \hat{W}\_s \right), \end{aligned}$$

where *λ* ∈ [0, 1).

Furthermore,

$$\begin{split} \Delta S(\lambda)\_{t} &= \ & \int\_{\mathbb{R}} \left( \varepsilon^{[\lambda/(W-W)-(\lambda+\log(1-\lambda))(W-W)^{2}]} - 1 - \lambda(W-\hat{W}) \right) \nu(\{t\}, dx) \\ &+ (1-a\_{t}) \{ \varepsilon^{[-\lambda\hat{W}\_{t}+(\lambda+\log(1-\lambda))(\hat{W}\_{t})^{2}]} - 1 + \lambda \hat{W}\_{t} \} \\ &= & \left[ \varepsilon^{[-\lambda\hat{W}\_{t}-(\lambda+\log(1-\lambda))(\hat{W}\_{t})^{2}]} \right] \left( \int\_{\mathbb{R}} \varepsilon^{[\lambda W-(\lambda+\log(1-\lambda))(W^{2}-2W\hat{W})]} \nu(\{t\}, dx) + 1 - a\_{t} \right) \\ &+ (1-a\_{t})(-1+\lambda \hat{W}\_{t}) - \int\_{\mathbb{R}} \left( 1 + \lambda (W-\hat{W}) \right) \nu(\{t\}, dx) \\ &= & \left[ -\lambda \hat{W}\_{t} - (\lambda + \log(1-\lambda))(\hat{W}\_{t})^{2} \right] \left( \int\_{\mathbb{R}} \varepsilon^{[\lambda W-(\lambda+\log(1-\lambda))(W^{2}-2W\hat{W})]} \nu(\{t\}, dx) + 1 - a\_{t} \right) \\ &+ (1-a\_{t})\lambda \hat{W}\_{t} - 1 - \lambda \int\_{\mathbb{R}} (W-\hat{W}) \nu(\{t\}, dx) .\end{split}$$

In addition, it is easy to see by noting *at* ≤ 1,

$$\int\_{\mathbb{R}} e^{\left[\lambda W - \left(\lambda + \log(1 - \lambda)\right)\left(\mathcal{W}^2 - 2\mathcal{W}\vec{W}\right)\right]} \nu(\{t\}, d\boldsymbol{x}) + 1 - a\_t \ge 0,$$

and

$$(1 - a\_t)\lambda \mathring{W}\_t = \lambda \int\_{\mathbb{R}} (\mathcal{W} - \mathring{W}) \nu(\{t\}, d\boldsymbol{x}) .$$

In combination, we have for all *t* > 0

$$\Delta S(\lambda)\_t > -1\_{\prime\prime}$$

where *λ* ∈ [0, 1). For any semimartingale *S*(*λ*)*t*, the Doléans–Dade exponential is

$$\mathcal{E}(S(\lambda))\_t = e^{S(\lambda)\_t - S(\lambda)\_0 - \frac{1}{2} \cdot \mathbb{S}(\lambda)^t, \mathbb{S}(\lambda)^{t} \succeq\_t} \prod\_{s \le t} (1 + \Delta S(\lambda)\_t) e^{-\Delta S(\lambda)\_t}.$$

We shall first show that the process *<sup>e</sup>*[*λMt*−(*λ*+log(1−*λ*)) <sup>∑</sup>*s*≤*t*(Δ*Ms*)2)]/E(*S*(*λ*))*<sup>t</sup> t*≥0 is a local martingale. Denote *Xt* = *λMt* − (*λ* + log(1 − *λ*)) ∑*s*≤*t*(Δ*Ms*)2, *Yt* = ∑*s*≤*t*(Δ*Ms*)2.

The Itô formula yields

$$\begin{split} e^{X\_{l}} &= \quad 1 + e^{X\_{l-}} \cdot X + \sum\_{s \le t} (e^{X\_{s}} - e^{X\_{s-}} - e^{X\_{s-}} \Delta X\_{s}) \\ &= \quad 1 + \lambda e^{X\_{l-}} \cdot M - (\lambda + \log(1 - \lambda)) e^{X\_{l-}} \cdot Y \\ &\quad + \sum\_{s \le t} (e^{X\_{s}} - e^{X\_{s-}} - e^{X\_{s-}} \Delta X\_{s}) \\ &= \quad 1 + \lambda e^{X\_{l-}} \cdot M + \sum\_{s \le t} (e^{X\_{l}} - e^{X\_{l-}} - \lambda e^{X\_{l-}} \Delta M\_{s}). \end{split}$$

For *X*, the jump part of *X* is

$$\begin{array}{rcl} \Delta X &=& [\lambda(\mathcal{W}-\hat{\mathcal{W}})-(\lambda+\log(1-\lambda))(\mathcal{W}-\hat{\mathcal{W}})^2]1\_D \\ &-\lambda \hat{\mathcal{W}} 1\_{D^c} + (\lambda+\log(1-\lambda))\hat{\mathcal{W}}^2 1\_{D^c} \end{array}$$

where *D* is the thin set, which is exhausted by {*Tn*}*n*≥1. Thus,

$$\sum\_{s \le t} (e^{\Lambda X\_s} - 1 - \lambda \Delta M\_s) - S(\lambda) =: Z\_t \tag{6}$$

is a local martingale. Denote Ξ(*λ*)*<sup>t</sup>* = ∑*s*≤*t*(*e*Δ*Xs* − 1 − *λ*Δ*Ms*), we have

$$\begin{aligned} &\sum\_{s\le t} (\varepsilon^{X\_s} - \varepsilon^{X\_{s-}} - \lambda \varepsilon^{X\_{s-}} \Delta M\_s) - \varepsilon^{X\_-} \cdot S(\lambda) \\ &= \quad \varepsilon^{X\_-} \cdot \Xi(\lambda) - \varepsilon^{X\_-} \cdot S(\lambda) = \varepsilon^{X\_{t-}} \cdot Z. \end{aligned}$$

Thus,

$$\begin{aligned} &e^X - e^{X\_-} \cdot S(\lambda) \\ &= \quad 1 + \lambda e^{X\_{t-}} \cdot M + \sum\_{s \le t} (e^{X\_s} - e^{X\_{s-}} - \lambda e^{X\_{s-}} \Delta M\_s) - e^{X\_-} \cdot S(\lambda) \\ &= \quad 1 + \lambda e^{X\_{t-}} \cdot M + e^{X\_{t-}} \cdot Z =: N\_1. \end{aligned}$$

*N*<sup>1</sup> is a local martingale. Following the similar arguments in Wang Lin and Su [13], we have *eXt*/E(*S*(*λ*))*<sup>t</sup> t*≥0 is a local martingale. In fact, set *H* = *eX*, *G* = E(*S*(*λ*)), *A* = *S*(*λ*) and *f*(*h*, *g*) = *<sup>h</sup> <sup>g</sup>* . The Itô formula yields

$$\begin{aligned} f(H,G) &= -1 + \frac{1}{G\_{-}} \cdot H - \frac{H\_{-}}{G\_{-}^{2}} \cdot G \\ &+ \sum\_{s \le \cdot} \left( \Delta f(H,G)\_{s} - \frac{\Delta H\_{s}}{G\_{s-}} + \frac{f(H,G)\_{s-}}{G\_{s-}} \Delta G\_{s} \right). \end{aligned}$$

Since E(*S*(*λ*)) = 1 + E(*S*(*λ*))<sup>−</sup> · *S*(*λ*), we have

$$\begin{aligned} &\frac{1}{\mathcal{G}\_{-}} \cdot H - \frac{H\_{-}}{\mathcal{G}\_{-}^{2}} \cdot G\\ &= \quad \frac{1}{\mathcal{G}\_{-}} \cdot H - \frac{H\_{-}}{\mathcal{G}\_{-}} \cdot S(\lambda) = \frac{1}{\mathcal{G}\_{-}} \cdot (e^{X} - e^{X\_{-}} \cdot S(\lambda))\\ &= \quad \frac{1}{\mathcal{G}\_{-}} \cdot N\_{1}. \end{aligned}$$

Noting that Δ*G* = *G*−Δ*A*, Δ*N*<sup>1</sup> = Δ*H* − *H*−Δ*A*, we have

$$
\Delta f(H, G)\_s - \frac{\Delta H\_s}{G\_{s-}} + \frac{f(H, G)\_{s-}}{G\_{s-}} \Delta G\_s = -\frac{\Delta N\_{1s} \Delta A\_s}{G\_{s-} \left(1 + \Delta A\_s\right)}
$$

where *A* is a predictable process, and *N* is a local martingale. By the property of the Stieltjes integral, we have

$$\sum\_{\mathbf{g}\leq\cdot\atop\mathbf{g}\leq\cdot\leq}\Delta f(H,\mathbf{G})\_{\mathbf{s}}-\frac{\Delta H\_{\mathbf{s}}}{\mathbf{G}\_{\mathbf{s}-}}+\frac{f(H,\mathbf{G})\_{\mathbf{s}-}}{\mathbf{G}\_{\mathbf{s}-}}\Delta \mathbf{G}\_{\mathbf{s}}=-\frac{\Delta A}{\mathbf{G}\_{-}(1+\Delta A)}\cdot\mathbf{N}\_{\mathbf{l}}.\tag{7}$$

,

Thus,

$$\left(e^{\mathcal{X}}/\mathcal{E}(\mathcal{S}(\lambda))\right) = 1 + \frac{1}{\mathcal{G}\_{-}} \cdot N\_{1} - \frac{\Delta A}{\mathcal{G}\_{-}(1 + \Delta A)} \cdot N\_{1}$$

is a local martingale.

Let

$$B\_1 = \{ M\_t \ge \ge \ge \sum\_{s \le t} |\triangle M\_s|^2 \le \nu^2 \text{ for some } t > 0 \}$$

and

$$\pi\_1 = \inf \{ t > 0 : M\_t \ge \infty, \sum\_{s \le t} |\triangle M\_s|^2 \le v^2 \}.$$

Note by (4.12) in [4], for *λ* ∈ [0, 1) and *x* ≥ −1,

$$\exp\left\{\lambda\mathfrak{x} + \mathfrak{x}^2(\lambda + \log(1 - \lambda))\right\} \le 1 + \lambda\mathfrak{x}.$$

This implies

$$\int\_{0}^{t} \int\_{-1}^{\infty} \exp\{\lambda \mathbf{x} + (\lambda + \log(1 - \lambda))\mathbf{x}^{2}\} \nu^{M}(d\mathbf{s}, d\mathbf{x}) \leq \int\_{0}^{t} \int\_{-1}^{\infty} (1 + \lambda \mathbf{x}) \nu^{M}(d\mathbf{s}, d\mathbf{x}), \tag{8}$$

because Δ*Mt* ≥ −1 for any *t* > 0, where *ν<sup>M</sup>* is the predictable compensate jump measure of *M*. Inequality (8) implies *S*(*λ*) ≤ 0. Since *e<sup>x</sup>* ≥ *x* + 1 and *eS*(*λ*)*<sup>t</sup>* ≥ E(*S*(*λ*)*t*),

$$\mathbb{E}[\frac{e^{\lambda X\_T}}{e^{\mathcal{S}(\lambda)\_T}}] \le \mathbb{E}[\frac{e^{\lambda X\_T}}{\mathcal{E}(\mathcal{S}(\lambda))\_T}] = 1\tag{9}$$

for any stopping time *T*. Thus, *U* = (*Ut*)*t*≥<sup>0</sup> is a supermartingale, where

$$\mathcal{U}\_t = \frac{\exp\left\{\lambda M\_t + (\lambda + \log(1 - \lambda)) \sum\_{s \le t} (\triangle M\_s)^2\right\}}{\exp\{S(\lambda)\_t\}}.$$

Thus, on *B*<sup>1</sup>

$$\mathcal{U}\_{\tau 1} \ge \exp\{\lambda \ge + (\lambda + \log(1 - \lambda))v^2\}.$$

We have

$$\mathbb{P}(B\_1) \le \inf\_{\lambda \in [0,1)} \exp\{-\lambda x - (\lambda + \log(1 - \lambda))v^2\}$$

$$= \left(\frac{v^2 + \chi}{v^2}\right)^{v^2} e^{-x}. \tag{10}$$

Put

$$\begin{aligned} L(\lambda)\_t &= \int\_0^t \int\_{\mathbb{R}} \left( \varepsilon^{\left[ \lambda \left( W - \hat{W} \right) + f(\lambda) (W - \hat{W})^2 \right]} - 1 - \lambda \left( W - \hat{W} \right) \right) \nu (ds, dx) \\ &+ \sum\_{s \le t} (1 - a\_s) \left( \varepsilon^{\left[ -\lambda \hat{W} + f(\lambda) (\hat{W}\_s)^2 \right]} - 1 + \lambda \hat{W}\_s \right), \end{aligned}$$

where *f*(*λ*) ≥ 0 for *λ* ≥ 0. We have the following proposition from the proof of Theorem 4.

**Proposition 1.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, at* = *ν*({*t*} × R)*, W be a given predictable function on* Ω˜ *. M* = *W* ∗ (*μ* − *ν*)*. Denote X*˜*<sup>t</sup>* = *<sup>λ</sup>Mt* <sup>−</sup> *<sup>f</sup>*(*λ*) <sup>∑</sup>*s*≤*t*(Δ*Ms*)2*, for <sup>λ</sup>* <sup>≥</sup> <sup>0</sup>*. Then, eX*˜ /E(*L*(*λ*)) *is a local martingale.*

In Theorem 4, the condition *M* ≥ −1 plays an important role. In the following theorem, we will present another result, which is the analogy of Theorem 1 in continuous time case.

**Theorem 5.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, at* = *ν*({*t*} × R)*, W be a given predictable function on* Ω˜ *, and W* ∈ *Gloc*(*μ*)*. M* = *W* ∗ (*μ* − *ν*)*, In addition, define*

$$\begin{split} \tilde{S}(\lambda)\_{t} &= : \int\_{0}^{t} \int\_{\mathbb{R}} \left( e^{[\lambda(\mathcal{W}-\vec{\mathcal{W}})-\frac{\lambda^{2}}{2}(\mathcal{W}-\vec{\mathcal{W}})^{2}]} - 1 - \lambda(\mathcal{W}-\vec{\mathcal{W}}) \right) \nu(ds, dx) \\ &+ \sum\_{s \le t} (1 - a\_{s}) (e^{[-\lambda \hat{\mathcal{W}}\_{s} + \frac{\lambda^{2}}{2}(\hat{\mathcal{W}}\_{s})^{2}]} - 1 + \lambda \hat{\mathcal{W}}\_{s}), \end{split}$$

*and assume that for any t* <sup>&</sup>gt; <sup>0</sup> *and <sup>λ</sup>* <sup>&</sup>gt; <sup>0</sup>*, <sup>S</sup>*(*λ*)*<sup>t</sup>* <sup>≤</sup> <sup>0</sup>*. Then, for x* <sup>&</sup>gt; <sup>0</sup>*, v* <sup>&</sup>gt; <sup>0</sup>*,*

$$\mathbb{P}\left(M\_t \ge x, \sum\_{s \le t} |\triangle M\_s|^2 \le v^2 \text{ for some } \ t > 0\right) \le \exp\left\{-\frac{x^2}{2v^2}\right\}$$

**Proof of Theorem 5.** Define

$$V\_t = \frac{\exp\{\lambda M\_t - \frac{\lambda^2}{2} \sum\_{s \le t} |\triangle M\_s|^2\}}{\mathcal{E}(\vec{S}(\lambda))\_t}.$$

By Proposition 1, *<sup>V</sup>* is a local martingale. Note *<sup>S</sup>*(*λ*)*<sup>t</sup>* <sup>≤</sup> 0 for any *<sup>t</sup>* <sup>&</sup>gt; 0 and *<sup>λ</sup>* <sup>&</sup>gt; 0. We have

$$\mathbb{E}[\frac{\exp\{\lambda M\_T - \frac{\lambda^2}{2} \sum\_{s \le T} |\triangle M\_s|^2\}}{e^{\tilde{S}(\lambda)r}}] \le \mathbb{E}[V\_T] = 1\tag{11}$$

for any stopping time *T*.

Recall that

$$B\_1 = \{ M\_t \ge \ge \ge \sum\_{s \le t} |\triangle M\_s|^2 \le v^2 \text{ for some } t > 0 \}$$

and

$$\pi\_1 = \inf \{ t > 0 : M\_t \ge \infty, \sum\_{s \le t} |\triangle M\_s|^2 \le v^2 \}.$$

We have

$$\mathbb{P}(B\_1) \le \inf\_{\lambda \ge 0} \exp\{-\lambda x + \frac{\lambda^2}{2} v^2\}$$

$$= \exp\{-\frac{x^2}{2v^2}\}. \tag{12}$$

**Remark 1.** *For integrable random variable ξ and a positive number a* > 0*, define*

$$T\_a(\xi) = \min(|\xi|, a) \operatorname{sign}(\xi).$$

*If* E[*ξ*] = 0*, and for all a* > 0*,* E[*Ta*(*ξ*)] ≤ 0*. Then, ξ is called heavy on left. Bercu and Touati [14] extended Theorem 1 to general case. Let S* = (*Sn*)*n*≥<sup>0</sup> *be a locally square integrable on* (Ω, F,(F*n*)*n*≥1, P)*. If*

$$\mathbb{E}\left[T\_a(S\_n - S\_{n-1})\,|\,\mathcal{F}\_{n-1}\right] \le 0\tag{13}$$

*for all a* > 0 *and n* > 0*, Bercu and Touati [14] obtained*

$$\mathbb{P}(\mathcal{S}\_n \ge x \prime \sum\_{i=1}^n (\mathcal{S}\_i - \mathcal{S}\_{i-1})^2 \le y) \le \exp\{-\frac{x^2}{2y}\}.$$

*In fact, our condition, <sup>S</sup>*(*λ*)*<sup>t</sup>* <sup>≤</sup> <sup>0</sup>*, is analogy of (13) in continuous time case. Let <sup>N</sup>* = (*Nt*)*t*≥<sup>0</sup> *be a homogeneous Poisson point process with parameter κ, and let* (*ηk*)*k*≥<sup>1</sup> *be a sequence of i.i.d. r.v.'s with a common distribution function F*(*x*)*. Assume N is independent of* (*ηk*)*k*≥1*. Define*

$$Y\_t = \sum\_{k=1}^{N\_t} \eta\_{k\prime} \quad \text{ } t \ge 0. \tag{14}$$

*This is a so-called compound Poisson process. The jump measure of Y is given by*

$$\mu^Y(dt, dx) = \sum\_{k \ge 1} \mathbf{1}\_{\{T\_k < \infty\}} \varepsilon\_{(T\_k \eta\_k)}(dt, dx), \tag{15}$$

*and the predictable compensator ν<sup>Y</sup> is*

$$
\nu^Y(dt, d\mathbf{x}) = \mathbf{x}dt F(d\mathbf{x}).\tag{16}
$$

*Thus,* (*Yt* − *x* ∗ *ν<sup>Y</sup> <sup>t</sup>* )*t*≥<sup>0</sup> *is a purely discontinuous local martingale. For* (*Yt* − *x* ∗ *ν<sup>Y</sup> <sup>t</sup>* )*t*≥0*,*

$$\tilde{S}(\lambda)\_t = \kappa \int\_0^t \int\_{\mathbb{R}} (e^{\left[\lambda x - \frac{\lambda^2}{2} x^2\right]} - 1 - \lambda x) F(d\lambda) ds.$$

*If* <sup>E</sup>[*ηk*] = <sup>0</sup> *for any <sup>κ</sup>* <sup>≥</sup> <sup>1</sup>*, <sup>S</sup>*(*λ*)*<sup>t</sup>* <sup>≤</sup> <sup>0</sup> *implies that*

$$\int\_{\mathbb{R}} e^{\left[\lambda x - \frac{\lambda^2}{2} \mathbf{x}^2\right]} F(d\mathbf{x}) \le 1. \tag{17}$$

*Bercu and Touati [14] found that if η<sup>k</sup> is heavy on the left, then (17) holds. Thus, our condition is an analogy of (13) in continuous time case.*

In [7,15], there were obtained a series of exponential inequalities for events involving ratios in the context of continuous martingales, which in turn extended the results in [10]. Su and Wang [16] extended a similar problem for purely discontinuous local martingales in quasi-left continuous case. In this subsection, we obtained the similar inequality for stochastic integrals of a multivariate point process.

**Theorem 6.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, at* = *ν*({*t*} × R)*, W be a given predictable function on* Ω˜ *, and W* ∈ *Gloc*(*μ*)*. Denote M* = *W* ∗ (*μ* − *ν*)*. Then, for all x* ≥ 0, *β* > 0, *v* > 0 *α* ∈ R*,*

$$\begin{aligned} &\mathbb{P}\{M\_t \ge (a+\beta \sum\_{s\le t} |\triangle M\_s|^2) \ge \sum\_{s\le t} |\triangle M\_s|^2 \ge \_t + v^2 \text{ for some } t > 0 \\ &\le \quad \exp\{-\frac{\mathbf{x}^2}{2}(a\beta + \frac{\beta^2 v^2}{2})\}. \end{aligned}$$

**Proof of Theorem 6.** Recall that *V* = (*Vt*)*t*≥<sup>0</sup> is a local martingale, where

$$V\_t = \frac{\exp\{\lambda M\_t - \frac{\lambda^2}{2} \sum\_{s \le t} |\triangle M\_s|^2\}}{\mathcal{E}(\vec{S}(\lambda))\_t}.$$

For any stopping time *T*,

$$\mathbb{E}[\frac{\exp\{\lambda M\_T - \frac{\lambda^2}{2} \sum\_{s \le T} |\triangle M\_s|^2\}}{\exp\{\bar{S}(\lambda)\_T\}}] \le \mathbb{E}[V\_T] = 1. \tag{18}$$

By Markov's inequality, we obtain that for all *λ* > 0,

$$\begin{split} &\mathbb{P}\left(M\_{t}\geq(\mathfrak{a}+\beta\sum\_{s\leq t}|\triangle M\_{s}|^{2})\ge\text{and}\ \sum\_{s\leq t}|\triangle M\_{s}|^{2}\geq+\upsilon^{2}\text{ for some }t>0\right) \\ &\leq \quad \mathbb{E}[\exp\{\frac{\lambda}{4}M\_{7}-(\frac{\alpha\lambda\lambda}{4}+\frac{\beta\lambda\lambda}{4}\sum\_{s\leq t\_{2}}|\triangle M\_{s}|^{2}\}\mathbbm{1}\_{B\_{2}}] \\ &=\quad \exp\{-\frac{\alpha\lambda\lambda}{4}\}\mathbb{E}[\exp\{\frac{\lambda}{4}M\_{7}-\frac{\lambda^{2}}{8}\Big(\sum\_{s\leq t\_{2}}|\triangle M\_{s}|^{2}+\_{\Sigma}\big)] \\ &\quad +(\frac{\lambda^{2}}{8}-\frac{\beta\lambda\lambda}{4})\sum\_{s\leq\tau\_{2}}|\triangle M\_{s}|^{2}+\frac{\lambda^{2}}{8}\_{\Sigma}\big)\mathbbm{1}\_{B\_{2}}] \\ &\leq \quad \exp\{-\frac{\alpha\lambda\lambda}{4}\}\sqrt{\mathbb{E}[\exp\{\frac{\lambda}{2}M\_{7}-\frac{\lambda^{2}}{4}\Big(\sum\_{s\leq\tau\_{2}}|\triangle M\_{s}|^{2}+\_{\tau\_{2}}\big)]\mathbbm{1}\_{B\_{2}}] \\ &\quad \times\sqrt{\mathbb{E}[\exp\{\left(\frac{\lambda^{2}}{4}-\frac{\beta\lambda\lambda}{2}\right)\sum\_{s\leq\tau\_{2}}|\triangle M\_{s}|^{2}+\frac{\lambda^{2}}{4}\_{\tau\_{2}}]\mathbbm{1}\_{B\_{2}}]}. \end{split}$$

where

$$B\_2 = \{ M\_t \ge (\alpha + \beta \sum\_{s \le t} |\triangle M\_s|^2) x\_\prime \sum\_{s \le t} |\triangle M\_5|^2 \ge < M\_\prime M >\_t + v^2 \text{ for some } t > 0\},$$

$$\pi\_2 = \inf \{ t > 0 : M\_t \ge (a + \beta \sum\_{s \le t} |\triangle M\_s|^2) \ge\_\prime \sum\_{s \le t} |\triangle M\_s|^2 \ge < M\_\prime M >\_\prime + v^2 \}.$$

In fact,

$$\begin{split} &\mathbb{E}[\exp\{\frac{\lambda}{2}M\_{\mathsf{T}\_{2}}-\frac{\lambda^{2}}{4}(\sum\_{s\leq\mathsf{T}\_{2}}|\triangle M\_{s}|^{2}+\triangle M\_{\mathsf{r}}M>\_{\mathsf{T}\_{2}})1\_{\mathsf{B}\_{2}}]\} \\ &\leq\sqrt{\frac{\mathbb{E}[\frac{\exp\{\lambda M\_{\mathsf{T}\_{2}}-\frac{\lambda^{2}}{2}\sum\_{s\leq\mathsf{T}\_{2}}|\triangle M\_{s}|^{2}}{\exp\{\widetilde{S}(\lambda)\_{\mathsf{T}\_{2}}\}}1\_{\mathsf{B}\_{2}}]}}\sqrt{\mathbb{E}[\exp\{\widetilde{S}(\lambda)\_{\mathsf{T}\_{2}}-\frac{\lambda^{2}}{2}\_{\mathsf{r}}]\}}.\end{split}$$

By (18)

$$\mathbb{E}[\frac{\exp\{\lambda M\_{\mathsf{T}2} - \frac{\lambda^2}{2} \sum\_{s \leq \mathsf{T}2} |\triangle M\_s|^2\}}{\exp\{\tilde{S}(\lambda)\_{\mathsf{T}2}\}} \mathbf{1}\_{B\_2}] \leq 1.$$

Furthermore,

$$\mathbb{E}[\exp\{\overline{S}(\lambda)\_{\tau\_2} - \frac{\lambda^2}{2} < M, M >\_{\tau\_2} \}] \le 1$$

by

$$\left| \exp\{\mathfrak{x} - \frac{1}{2}\mathfrak{x}^2\} - 1 - \mathfrak{x} \right| \le \frac{1}{2}\mathfrak{x}^2, \quad \mathfrak{x} \in \mathbb{R}.$$

Taking *λ* = *βx*, we get

$$\mathbb{P}\left(B\_2\right) \le \exp\left\{-\frac{\mathfrak{x}^2}{4}(\alpha\beta + \frac{\beta^2 v^2}{2})\right\} \times \sqrt{\mathbb{P}\left(B\_2\right)}.$$

Thus

$$\begin{aligned} &\mathbb{P}\{M\_t \ge (a+\beta\sum\_{s\le t} |\triangle M\_s|^2)\mathbf{x}\_s \sum\_{s\le t} |\triangle M\_s|^2 \ge \_t + v^2 \text{ for some } t > 0\\ &\le \quad \exp\{-\frac{\mathbf{x}^2}{2}(a\beta + \frac{\beta^2 v^2}{2})\}. \end{aligned}$$

From the proof of Theorem 6, we can obtain the following results.

**Theorem 7.** *Let μ be a multivariate point process, ν be the predictable compensator of μ, at* = *ν*({*t*} × R)*, W be a given predictable function on* Ω˜ *, and W* ∈ *Gloc*(*μ*)*. Denote M* = *W* ∗ (*μ* − *ν*)*. In addition, define*

$$\begin{split} \overline{S}(\lambda)\_{t} &= : \int\_{0}^{t} \int\_{\mathbb{R}} \left( e^{[\lambda(\mathcal{W}-\vec{\mathcal{W}})-\frac{\lambda^{2}}{2}(\mathcal{W}-\vec{\mathcal{W}})^{2}]} - 1 - \lambda(\mathcal{W}-\vec{\mathcal{W}}) \right) \nu(ds, dx) \\ &+ \sum\_{s \le t} (1 - a\_{s}) \left( e^{[-\lambda \vec{\mathcal{W}}\_{s} + \frac{\lambda^{2}}{2}(\vec{\mathcal{W}}\_{s})^{2}]} - 1 + \lambda \vec{\mathcal{W}}\_{s} \right), \end{split}$$

*and assume that for any t* <sup>&</sup>gt; <sup>0</sup> *and <sup>λ</sup>* <sup>&</sup>gt; <sup>0</sup>*, <sup>S</sup>*(*λ*)*<sup>t</sup>* <sup>≤</sup> <sup>0</sup>*. Then for all x* <sup>≥</sup> 0, *<sup>β</sup>* <sup>&</sup>gt; 0, *<sup>v</sup>* <sup>&</sup>gt; <sup>0</sup>*, <sup>α</sup>* <sup>∈</sup> <sup>R</sup>*,*

$$\begin{aligned} &\mathbb{P}\{\boldsymbol{M}\_t \ge (\boldsymbol{a} + \boldsymbol{\beta} \sum\_{s\le t} |\triangle \boldsymbol{M}\_s|^2) \boldsymbol{x}\_r \sum\_{s\le t} |\triangle \boldsymbol{M}\_s|^2 \ge v^2 \text{ for some } t > 0 \\ &\le \quad \exp\{-\frac{\boldsymbol{x}^2}{4} (\boldsymbol{a} \boldsymbol{\beta} + \frac{\boldsymbol{\beta}^2 v^2}{2})\}. \end{aligned}$$

#### **4. Application**

In this section, we will derive exponential inequalities for block counting process in Λ−coalescent. The Λ−coalescent was introduced independently by Pitman [17] and Sagitov [18]. In this paper, the notation and details of Λ−coalescent are from Limic and Talarczyk [19].

Let Λ be an probability measure on [0, 1], Π = (Π*t*)*t*≥<sup>0</sup> is a Markov jump process. Π takes values in the set of partition of {1, 2, ···}. For any *n* ≥ 1, the restriction Π*<sup>n</sup>* of Π to {1, 2, ··· , *n*} is a continuous time Markov chain with the following transitions: when Π*<sup>n</sup>* has *b* blocks, any given *k*−tuples of blocks coalesces at rate

$$
\lambda\_{b,k} = \int\_0^1 r^{k-2} (1-r)^{b-k} \Lambda(dr)
$$

where 2 ≤ *b* ≤ *n*. Let *Nt* be the number of blocks of Π*<sup>t</sup>* at *t*. In fact, *N* = (*Nt*)*t*≥<sup>0</sup> is a point process. Limic and Talarczyk [19] presented integral equation for *N*. Define

$$\pi(dt, dy, d\mathbf{x}) = \sum\_{k \ge 1} \varepsilon\_{\{T\_k, Y\_k, \mathbf{X}^k\}}(dt, dy, d\mathbf{x})$$

where {**Xk**} is an independent array of i.i.d. random variables (*X<sup>k</sup> <sup>j</sup>* )*j*,*k*∈N, where *<sup>X</sup><sup>k</sup> <sup>j</sup>* have uniform distribution on [0, 1]. The multivariate point processes *π* have the compensator *dt* <sup>Λ</sup>(*dy*) *<sup>y</sup>*<sup>2</sup> *d***x**.

Limic and Talarczyk [19] found that

$$N\_t = N\_r - \int\_r^t \int\_0^1 \int\_{[0,1]^N} f(N\_{s-}, y, \mathbf{x}) \pi(ds, dy, d\mathbf{x}) \, ds$$

for all 0 < *r* < *t*, where

$$f(k, y, \mathbf{x}) = \sum\_{j=1}^{k} \mathbf{1}\_{\{x\_i \le y\}} - \mathbf{1} + \mathbf{1}\_{\cap\_{j=1}^k \{x\_j > y\}} \cdot \mathbf{1}$$

Define

$$\Psi(k) = \int\_0^1 \int\_{[0,1]^\mathbb{N}} f(k, y, \mathbf{x}) \frac{\Lambda(dy)}{y^2} d\mathbf{x},$$

$$t = \int\_{v\_t}^{\infty} \frac{1}{\overline{\Psi(q)}} dq\_{\prime}$$

and

$$M\_t = \int\_0^t \int\_0^1 \int\_{[0,1]^N} \frac{f(N\_{s-}, y, \mathbf{x})}{v\_s} (\pi(dt, dy, d\mathbf{x}) - ds \frac{\Lambda(dy)}{y^2} d\mathbf{x}) \dots$$

*M* = (*Mt*)*t*≥<sup>0</sup> plays important role in the study of Λ−coalescent. Limic and Talarczyk [19] obtained that *M* is a square integrable martingale. It is not difficult to see that *M* ≥ 0,

$$\sum\_{s \le t} |\triangle M\_s|^2 = \int\_0^t \int\_0^1 \int\_{[0,1]^{\mathbb{N}}} \frac{f^2(N\_{s-}, y, \mathbf{x})}{\upsilon\_s^2} \pi(dt, dy, d\mathbf{x}) \, dt$$

and

$$<\langle M,M>\_t = \int\_0^t \int\_0^1 \int\_{[0,1]^\mathbb{N}} \frac{f^2(N\_{s-},y,\mathbf{x})}{\upsilon\_s^2} ds \frac{\Lambda(dy)}{y^2} d\mathbf{x}.\rangle$$

We have the following result.

**Theorem 8.** *Let M be defined as above, we have*

$$\mathbb{P}\left(M\_t \ge \mathbf{x}\_\prime \sum\_{s\le t} |\triangle M\_s|^2 \le v^2 \text{ for some } \; t > 0\right) \le \left(\frac{v^2 + \mathbf{x}}{v^2}\right)^{v^2} e^{-\mathbf{x}}$$

*and*

$$\begin{aligned} &\mathbb{P}\{M\_t \ge (a+\beta\sum\_{s\le t}|\triangle M\_s|^2)\mathbf{x}\_\prime \sum\_{s\le t}|\triangle M\_s|^2 \ge &\leqslant M\_\prime M\_\prime > +\upsilon^2 \text{ for some } t > 0\\ &\le \quad \exp\{-\frac{\mathbf{x}^2}{2}(a\beta + \frac{\beta^2\upsilon^2}{2})\}. \end{aligned}$$

*where x* ≥ 0, *β* > 0, *v* > 0*, α* ∈ R*.*

**Author Contributions:** Conceptualization, N.L.; methodology, V.V.U. and H.W.; investigation, N.L. and H.W.; writing, V.V.U. and H.W. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by National Key R&D Program of China (No.2018YFA0703900), Shandong Provincial Natural Science Foundation (No. ZR2019ZD41).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Local Laws for Sparse Sample Covariance Matrices**

**Alexander N. Tikhomirov \* and Dmitry A. Timushev**

Institute of Physics and Mathematics, Komi Science Center of Ural Branch of RAS, 167982 Syktyvkar, Russia; timushev@ipm.komisc.ru

**\*** Correspondence: tikhomirov@ipm.komisc.ru

**Abstract:** We proved the local Marchenko–Pastur law for sparse sample covariance matrices that corresponded to rectangular observation matrices of order *n* × *m* with *n*/*m* → *y* (where *y* > 0) and sparse probability *npn* > log*<sup>β</sup> n* (where *β* > 0). The bounds of the distance between the empirical spectral distribution function of the sparse sample covariance matrices and the Marchenko–Pastur law distribution function that was obtained in the complex domain *z* ∈ D with Im *z* > *v*<sup>0</sup> > 0 (where *v*0) were of order log4 *n*/*n* and the domain bounds did not depend on *pn* while *npn* > log*<sup>β</sup> n*.

**Keywords:** sparse sample covariance matrices; local Marchenko–Pastur law; Stieltjes transformation

**MSC:** 60F99; 60B20

#### **1. Introduction**

The random matrix theory (RMT) dates back to the work of Wishart in multivariate statistics [1], which was devoted to the joint distribution of the entries of sample covariance matrices. The next RMT milestone was the work of Wigner [2] in the middle of the last century, in which the modelling of the Hamiltonian of excited heavy nuclei using a large dimensional random matrix was proposed, thereby replacing the study of the energy levels of nuclei with the study of the distribution of the eigenvalues of a random matrix. Wigner studied the eigenvalues of random Hermitian matrices with centred, independent and identically distributed elements (such matrices were later named Wigner matrices) and proved that the density of the empirical spectral distribution function of the eigenvalues of such matrices converges to the semicircle law as the matrix dimensions increase. Later, this convergence was named Wigner's semicircle law and Wigner's results were generalised in various aspects.

The breakthrough work of Marchenko and Pastur [3] gave impetus to new progress in the study of sample covariance matrices. Under quite general conditions, they found an explicit form of the limiting density of the expected empirical spectral distribution function of sample covariance matrices. Later, this convergence was named the Marchenko– Pastur law.

Sample covariance matrices are of great practical importance for the problems of multivariate statistical analysis, particularly for the method of principal component analysis (PCA). In recent years, many studies have appeared that have connected RMT with other rapidly developing areas, such as the theory of wireless communication and deep learning. For example, the spectral density of sample covariance matrices is used in calculations that relate to multiple input multiple output (MIMO) channel capacity [4]. An important object of study for neural networks is the loss surface. The geometry and critical points of this surface can be predicted using the Hessian of the loss function. A number of works that have been devoted to deep networks have suggested the application of various RMT models for Hessian approximation, thereby allowing the use of RMT results to reach specific conclusions about the nature of the critical points of the surface.

Another area of application for sample covariance matrices is graph theory. The adjacency matrix of an undirected graph is asymmetric, so the study of its singular values

**Citation:** Tikhomirov, A.N.; Timushev, D.A. Local Laws for Sparse Sample Covariance Matrices. *Mathematics* **2022**, *10*, 2326. https:// doi.org/10.3390/math10132326

Academic Editor: Ninoslav Truhar

Received: 11 May 2022 Accepted: 29 June 2022 Published: 3 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

leads to sample covariance matrices. An example of these graphs is the bipartite random graph, the vertices of which can be divided into two groups in which the vertices are not connected to each other.

If we assume that the probability *pn* of having graph edges tends to zero as the number of vertices *n* increases to infinity, we arrive at the concept of sparse random matrices. The behaviour of the eigenvalues and eigenvectors of a sparse random matrix significantly depends on its sparsity and results that are obtained for non-sparse matrices cannot be applied. Sparse sample covariance matrices have applications in random graph models [5] and deep learning problems [6] as well.

Sparse Wigner matrices have been considered in a number of papers (see [7–10]), in which many results have been obtained. With the symmetrisation of sample covariance matrices, it is possible to apply these results when observation matrices are square. However, when the sample size is greater than the observation dimensions, the spectral limit distribution has a singularity at zero, which requires a different approach. The spectral limit distribution of sparse sample covariance matrices with a sparsity of *npn* ∼ *n* (where > 0 was arbitrary small) was studied in [11,12]. In particular, a local law was proven under the assumption that the matrix elements satisfied the moment conditions E |*Xjk*| *<sup>q</sup>* ≤ (*Cq*)*cq*. In this paper, we considered a case with a sparsity of *npn* ∼ log*<sup>α</sup> n* for *α* > 1 and assumed that the matrix element moments satisfied the conditions E |*Xjk*| <sup>4</sup>+*<sup>δ</sup>* ≤ *C* < ∞ and |*Xjk*| ≤ *c*1(*npn*) 1 <sup>2</sup> <sup>−</sup><sup>κ</sup> for κ > 0.

#### **2. Main Results**

We let *m* = *m*(*n*), where *m* ≥ *n*. We considered the independent and identically distributed zero mean random variables *Xjk*, 1 ≤ *j* ≤ *n* and 1 ≤ *k* ≤ *m* with E *Xjk* = 0 and E *X*<sup>2</sup> *jk* = 1 and an independent set of the independent Bernoulli random variables *ξjk*, 1 ≤ *j* ≤ *n* and 1 ≤ *k* ≤ *m* with E *ξjk* = *pn*. In addition, we supposed that *npn* → ∞ as *n* → ∞. In what follows, we omitted the index *n* from *pn* when this would not cause confusion.

We considered a sequence of random matrices:

$$\mathbf{X} = \frac{1}{\sqrt{m! p\_n}} (\mathfrak{f}\_{jk} X\_{jk})\_{1 \le j \le n, 1 \le k \le m}. \tag{1}$$

Denoted by *s*<sup>1</sup> ≥···≥ *sn*, the singular values of **X** and the symmetrised empirical spectral distribution function (ESD) of the sample covariance matrix **W** = **XX**<sup>∗</sup> were defined as:

$$F\_n(\boldsymbol{x}) = \frac{1}{2n} \sum\_{j=1}^n \left( \mathbb{I}\{s\_j \le \boldsymbol{x}\} + \mathbb{I}\{-s\_j \le \boldsymbol{x}\} \right),$$

where I{*A*} stands for the event *A* indicator.

We let *y* := *y*(*n*, *m*) = *<sup>n</sup> <sup>m</sup>* and *Gy*(*x*) be the symmetrised Marchenko–Pastur distribution function with the density:

$$g\_y(\mathbf{x}) = \frac{1}{2\pi y |\mathbf{x}|} \sqrt{(\mathbf{x}^2 - a^2)(b^2 - \mathbf{x}^2)} \mathbb{I}\{a^2 \le \mathbf{x}^2 \le b^2\},$$

where *<sup>a</sup>* <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>√</sup>*<sup>y</sup>* and *<sup>b</sup>* <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>√</sup>*y*. We assumed that *<sup>y</sup>* <sup>≤</sup> *<sup>y</sup>*<sup>0</sup> <sup>&</sup>lt; 1 for *<sup>n</sup>*, *<sup>m</sup>* <sup>≥</sup> 1. When the Stieltjes transformation of the distribution function *Gy*(*x*) was denoted by *Sy*(*z*) and the Stieltjes transformation of the distribution function *Fn*(*x*) was denoted by *sn*(*z*), we obtained:

$$\begin{aligned} S\_{\mathcal{Y}}(z) &= \frac{-z + \frac{1-y}{z} + \sqrt{(z - \frac{1-y}{z})^2 - 4y}}{2y}, \\\ s\_n(z) &= \frac{1}{2n} \left[ \sum\_{j=1}^n \frac{1}{s\_j - z} + \sum\_{j=1}^n \frac{1}{-s\_j - z} \right] = \frac{1}{n} \sum\_{j=1}^n \frac{z}{s\_j^2 - z^2}. \end{aligned}$$

We also put:

$$b(z) = z - \frac{1-y}{z} + 2yS\_{\mathcal{Y}}(z) = -\frac{1}{S\_{\mathcal{Y}}(z)} + yS\_{\mathcal{Y}}(z). \tag{2}$$

In this paper, we proved the so called *local Marchenko–Pastur law* for sparse covariance matrices. We let:

$$
\Lambda\_{\mathfrak{n}} := \Lambda\_{\mathfrak{n}, \mathfrak{y}}(z) = s\_{\mathfrak{n}}(z) - S\_{\mathfrak{y}}(z).
$$

For a constant *δ* > 0, we defined the value κ = κ(*δ*) := *<sup>δ</sup>* <sup>2</sup>(4+*δ*). We assumed that a sparse probability of *pn* and that the moments of the matrix elements *Xij* satisfied the following conditions:


We introduced the quantity *v*<sup>0</sup> = *v*0(*a*0) := *a*0*n*−<sup>1</sup> log4 *n* with a positive constant *a*0. We then introduced the region:

$$\mathcal{D}(a\_0) := \{ z = u + iv : (1 - \sqrt{y} - v)\_+ \le |u| \le 1 + \sqrt{y} + v, V \ge v \ge v\_0 \}.$$

For constants *u*<sup>0</sup> > 0 and *V*, we defined the region:

$$\mathcal{D}(a\_0, a\_1) = \{ z = u + iv : |u| \le u\_{0\prime} \\ V \ge v \ge v\_{0\prime} \\ |b(z)| \ge a\_1 \Gamma\_n \}.$$

Next, we introduced some notations. We let:

$$
\Gamma\_n = 2\mathcal{C}\_0 \log n \left( \frac{1}{n\upsilon} + \min \left\{ \frac{1}{n p |b(z)|}, \frac{1}{\sqrt{n p}} \right\} \right).
$$

We introduced the quantity:

$$d(z) = \frac{\operatorname{Im} b(z)}{|b(z)|}$$

and put:

$$d\_n(z) := \frac{1}{n\upsilon} \left( d(z) + \frac{\log n}{n\upsilon |b(z)|} \right) + \frac{1}{n\upsilon |b(z)|}. \tag{3}$$

We stated the improved bounds for Λ*n*(*z*) and put:

$$\begin{split} \mathcal{T}\_{n} := \mathbb{E}\{ |b(z)| \ge \Gamma\_{n} \} & \left( d\_{n}(z) + d\_{n}^{\frac{3}{4}}(z) \frac{1}{(n\upsilon)^{\frac{1}{2}}} + d\_{n}^{\frac{1}{2}}(z) \frac{1}{(n\upsilon)^{\frac{1}{2}}} \right) \\ & + \mathbb{E}\{ |b(z)| \le \Gamma\_{n} \} \left( \left( \frac{\Gamma\_{n}}{n\upsilon} \right)^{\frac{1}{2}} + \Gamma\_{n}^{\frac{1}{2}} \left( \frac{\Gamma\_{n}^{\frac{1}{2}}}{\sqrt{n\upsilon}} + \frac{1}{\sqrt{n\upsilon}} \right) \right). \end{split}$$

**Theorem 1.** *Assuming that the conditions* (*C*0)*–*(*C*2) *are satisfied. Then, for any Q* ≥ 1 *the positive constants C* = *C*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1)*, K* = *K*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *and a*<sup>0</sup> = *a*0(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *exist, such that for z* ∈ D(*a*0)*:*

$$\Pr\left\{ |\Lambda\_n| \ge K\mathcal{T}\_n \right\} \le \mathcal{C}n^{-Q\_\mathcal{L}}.$$

We also proved the following result.

**Theorem 2.** *Under the conditions of Theorem 1 and for Q* ≥ 1*, the positive constants C* = *C*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1)*, K* = *K*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1)*, a*<sup>0</sup> = *a*0(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *and a*<sup>1</sup> = *a*1(*Q*, *δ*, *μ*4+*δ*, *<sup>c</sup>*0, *<sup>c</sup>*1) *exist, such that for z* <sup>∈</sup> <sup>D</sup>(*a*0, *<sup>a</sup>*1)*:*

$$\Pr\left\{ \left| \operatorname{Im} \Lambda\_n \right| \geq K \mathcal{T}\_n \right\} \leq \mathsf{C} n^{-Q}.$$

#### *2.1. Organisation*

The paper is organised as follows. In Section 3, we state Theorems 3–5 and several corollaries. In Section 4, the delocalisation is considered. In Section 4, we prove the corollaries that were stated in Section 3. Section 6 is devoted to the proof of Theorems 3–5. In Section 7, we state and prove some auxiliary results.

#### *2.2. Notation*

We use *C* for large universal constants, which may be different from line to line. *Sy*(*z*) and *sn*(*z*) denote the Stieltjes transformations of the symmetrised Marchenko–Pastur distribution and the spectral distribution function, respectively. *R*(*z*) denotes the resolvent matrix. We let T = {1, ... , *n*}, J ⊂ T, T(1) = {1, ... , *m*} and K ⊂ T(1). We consider the *σ*-algebras M(J,K), which were generated by the elements of **X** (with the exception of the rows from <sup>J</sup> and the columns from <sup>K</sup>). We write <sup>M</sup>(J,K) *<sup>j</sup>* instead of <sup>M</sup>(J∪{*j*},K) and <sup>M</sup>(J,K) *l*+*n* instead of M(J,K∪{*l*}) for brevity. The symbol **X**(J,K) denotes the matrix **X**, from which the rows with numbers in J and columns with numbers in K were deleted. In a similar way, we denote all objects in terms of **X**(J,K), such that the resolvent matrix is **R**(J,K), the ESD Stieltjes transformation is *s* (J,K) *<sup>n</sup>* , <sup>Λ</sup>(J,K) *<sup>n</sup>* , etc. The symbol <sup>E</sup>*<sup>j</sup>* denotes the conditional expectation with respect to the *σ*-algebra M*<sup>j</sup>* and E*l*+*<sup>n</sup>* denotes the conditional expectation with respect to *σ*-algebra M*l*+*n*. We let J*<sup>c</sup>* = T \ J and K*<sup>c</sup>* = T(1) \ K.

#### **3. Main Equation and Its Error Term Estimation**

Note that *Fn*(*x*) is the ESD of the block matrix:

$$\mathbf{V} = \begin{bmatrix} \mathbf{O}\_n & \mathbf{X} \\ \mathbf{X}^\* & \mathbf{O}\_m \end{bmatrix}'$$

where **O***<sup>k</sup>* is a *k* × *k* matrix with zero elements.

We let **R** = **R**(*z*) be the resolvent matrix of **V**:

$$\mathbf{R} = (\mathbf{V} - z\mathbf{I})^{-1}.$$

By applying the Schur complement, we obtained:

$$\mathbf{R} = \begin{bmatrix} z(\mathbf{X}\mathbf{X}^\* - z^2\mathbf{I})^{-1} & (\mathbf{X}\mathbf{X}^\* - z^2\mathbf{I})^{-1}\mathbf{X} \\ \mathbf{X}^\*(\mathbf{X}\mathbf{X}^\* - z^2\mathbf{I})^{-1} & z(\mathbf{X}^\*\mathbf{X} - z^2\mathbf{I})^{-1} \end{bmatrix}.$$

This implied:

$$s\_n(z) = \frac{1}{n} \sum\_{j=1}^n R\_{jj} = \frac{1}{n} \sum\_{l=1}^m R\_{l+n,l+n} + \frac{m-n}{nz}.$$

For the diagonal elements of **R**, we could write:

$$R\_{jj}^{(\mathbf{J},\mathbf{K})} = S\_{\mathcal{Y}}(z) \left( 1 - \varepsilon\_{j}^{(\mathbf{J},\mathbf{K})} R\_{jj}^{(\mathbf{J},\mathbf{K})} + y \Lambda\_{n}^{(\mathbf{J},\mathbf{K})} R\_{jj}^{(\mathbf{J},\mathbf{K})} \right), \tag{4}$$

for *j* ∈ J*<sup>c</sup>* and:

$$R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})} = -\frac{1}{z + y\mathbb{S}\_{\mathcal{Y}}(z)} \left( 1 - \varepsilon\_{l+n}^{(\mathbb{J},\mathbb{K})} R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})} + y\Lambda\_n^{(\mathbb{J},\mathbb{K})} R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})} \right),\tag{5}$$

for *l* ∈ K*c*. The correction terms *ε* (J,K) *<sup>j</sup>* for *<sup>j</sup>* <sup>∈</sup> <sup>J</sup>*<sup>c</sup>* and *<sup>ε</sup>* (J,K) *<sup>l</sup>*+*<sup>n</sup>* for *<sup>l</sup>* <sup>∈</sup> <sup>K</sup>*<sup>c</sup>* were defined as:

$$\begin{split} \boldsymbol{\varepsilon}\_{j}^{(\text{J},\text{K})} &= \boldsymbol{\varepsilon}\_{j1}^{(\text{J},\text{K})} + \dots + \boldsymbol{\varepsilon}\_{j3}^{(\text{J},\text{K})}, \\ \boldsymbol{\varepsilon}\_{j1}^{(\text{J},\text{K})} &= \frac{1}{m} \sum\_{l=1}^{m} \boldsymbol{R}\_{l+n,l+n}^{(\text{J},\text{K})} - \frac{1}{m} \sum\_{l=1}^{m} \boldsymbol{R}\_{l+n,l+n}^{(\text{J}\cup\{j\},\text{K})}, \\ \boldsymbol{\varepsilon}\_{j2}^{(\text{J},\text{K})} &= \frac{1}{mp} \sum\_{l=1}^{m} (\boldsymbol{\mathcal{X}}\_{jl}^{2} \boldsymbol{\xi}\_{jl} - p) \boldsymbol{R}\_{l+n,l+n}^{(\text{J}\cup\{j\},\text{K})}, \\ \boldsymbol{\varepsilon}\_{j3}^{(\text{J},\text{K})} &= \frac{1}{mp} \sum\_{1 \le l \ne k \le m} \boldsymbol{\mathcal{X}}\_{jl} \boldsymbol{\mathcal{X}}\_{jk} \boldsymbol{\mathfrak{x}}\_{jl}^{\text{J}} \boldsymbol{\mathcal{Z}}\_{l+n,k+n}^{(\text{J}\cup\{j\},\text{K})}. \end{split}$$

and

$$\begin{split} \epsilon\_{l+n}^{(\mathbf{J},\mathbf{K})} &= \epsilon\_{l+n,1}^{(\mathbf{J},\mathbf{K})} + \dots + \epsilon\_{l+n,3}^{(\mathbf{J},\mathbf{K})} \\ \epsilon\_{l+n,1}^{(\mathbf{J},\mathbf{K})} &= \frac{1}{m} \sum\_{j=1}^{n} \mathbb{K}\_{jl}^{(\mathbf{J},\mathbf{K})} - \frac{1}{m} \sum\_{j=1}^{n} \mathbb{K}\_{jj}^{(\mathbf{J},\mathbf{K} \cup \{l+n\})} \,, \\ \epsilon\_{l+n,2}^{(\mathbf{J},\mathbf{K})} &= \frac{1}{mp} \sum\_{j=1}^{n} (\mathbb{X}\_{jl}^{2} \mathbb{xi}\_{jl} - p) \mathbb{K}\_{jj}^{(\mathbf{J},\mathbf{K} \cup \{l+n\})} \,, \\ \epsilon\_{l+n,3}^{(\mathbf{J},\mathbf{K})} &= \frac{1}{mp} \sum\_{1 \le j \ne k \le n} \mathbb{X}\_{jl} \mathbb{X}\_{kl} \mathbb{xi}\_{jl} \mathbb{1}\_{kl} \mathbb{X}\_{jk}^{(\mathbf{J},\mathbf{K} \cup \{l+n\})} \,, \end{split}$$

By summing Equation (4) (J = ∅ and K = ∅), we obtained the self-consistent equation:

$$s\_n(z) = S\_{\underline{y}}(z)(1 + T\_n - y\Lambda\_n s\_n(z))\_\*,$$

with the error term:

$$T\_n = \frac{1}{n} \sum\_{j=1}^n \varepsilon\_j \mathcal{R}\_{\vec{j}\vec{j}} \dots$$

We let *s*<sup>0</sup> > 1 be positive constant *V*, depending on *δ*. The exact values of these constants were defined as below. For 0 < *v* ≤ *V*, we defined *kv* as:

$$k\_v = k\_v(V) := \min\{l \ge 0 : s\_0^l v \ge V\}.$$

Remembering that:

$$
\Lambda\_n = \Lambda\_n(z) := s\_n(z) - S\_{\underline{y}}(z),
$$

and:

$$
\Gamma\_n = 2C\_0 \log n \left( \frac{1}{n\upsilon} + \min \left\{ \frac{1}{np|b(z)|}, \frac{1}{\sqrt{n\bar{p}}} \right\} \right).
$$

We defined:

$$a\_{\boldsymbol{\varPi}}(z) = a\_{\boldsymbol{\varPi}}(\boldsymbol{u}, \boldsymbol{v}) = \begin{cases} \operatorname{Im} b(z) + \Gamma\_{\boldsymbol{n}} \text{ if } |b(z)| \ge \Gamma\_{\boldsymbol{n}}, \\\Gamma\_{\boldsymbol{n}\prime} \text{ if } |b(z)| \le \Gamma\_{\boldsymbol{n}}. \end{cases}$$

The function *b*(*z*) was defined in (2). For a given *γ* > 0, we considered the event:

$$\mathcal{Q}\_{\gamma}(\upsilon) := \left\{ |\Lambda\_n(\iota + i\upsilon)| \le \gamma a\_n(\iota, \upsilon), \text{ for all } \iota \right\}$$

and the event:

$$\hat{\mathcal{Q}}\_{\mathcal{T}}(\boldsymbol{v}) = \bigcap\_{l=0}^{k\_{\boldsymbol{v}}} \mathcal{Q}\_{\mathcal{T}}(\boldsymbol{s}\_{0}^{l} \boldsymbol{v}).$$

For any *γ* value, the constant *V* = *V*(*γ*) existed, such that:

$$\Pr\{\dot{\mathcal{Q}}\_{\gamma}(V)\} = 1.\tag{6}$$

It could be *V* = -2/*γ*, for example. In what follows, we assumed that *γ* and *V* were chosen so that (6) was satisfied and we wrote:

$$
\mathbb{Q} := \bar{\mathbb{Q}}\_{\gamma \cdot}
$$

We defined:

$$\beta\_n(z) := \frac{a\_n(z)}{n\upsilon} + \frac{|A\_0(z)|^2}{n\upsilon}.$$

where

$$A\_0(z) = y S\_y(z) - \frac{1 - y}{z}.$$

In this section, we demonstrate the following results.

**Theorem 3.** *Under the condition* (*C*0)*, the positive constants C* = *C*(*δ*, *μ*4+*δ*, *c*0)*, a*<sup>0</sup> = *a*0(*δ*, *μ*4+*δ*, *c*0) *and a*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*1(*δ*, *<sup>μ</sup>*4+*δ*, *<sup>c</sup>*0) *exist, such that for z* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *iv* <sup>∈</sup> <sup>D</sup>*:*

$$\mathbb{E}\left|T\_n|^q\mathbb{E}\{\mathcal{Q}\}\right| \le C\Big(F\_1 + \dots + F\_6\Big),$$

*where*

*<sup>F</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup> q <sup>n</sup>*(*z*) *<sup>n</sup>qvq* , *<sup>F</sup>*<sup>2</sup> <sup>=</sup> <sup>|</sup>*Sy*(*z*)<sup>|</sup> <sup>2</sup>*qβ<sup>q</sup> <sup>n</sup>*(*z*)I{|*b*(*z*)| ≥ Γ*n*} + |*Sy*(*z*)| <sup>2</sup>*qβ q* 2 *<sup>n</sup>* (*z*)Γ *q* 2 *n* , *F*<sup>3</sup> = |*Sy*(*z*)| <sup>2</sup>*qβ q* 2 *<sup>n</sup>* (*z*)Γ*<sup>q</sup> <sup>n</sup>*(I{|*b*(*z*)| ≤ Γ*n*}I{*z* ∈ D} / ) + |*Sy*(*z*)| <sup>3</sup>*qβ q* 2 *<sup>n</sup>* (*z*)*a q* 2 *<sup>n</sup>* (*z*) (*nv*)*<sup>q</sup>* |*Sy*(*z*)| *<sup>q</sup>*|*A*0(*z*)| *q* <sup>2</sup> *β q* 2 *<sup>n</sup>* (*z*) + <sup>|</sup>*A*0(*z*)<sup>|</sup> *q* 2 (*np*) *q* 2 + 1 (*nv*) *q* 2 , *<sup>F</sup>*<sup>4</sup> <sup>=</sup> <sup>|</sup>*Sy*(*z*)<sup>|</sup> <sup>2</sup>*qβ q* 2 *<sup>n</sup>* (*z*) *a q* 2 *<sup>n</sup>* (*z*)(*nv*)*<sup>q</sup>* |*Sy*(*z*)| *<sup>q</sup>*|*A*0(*z*)| *q* <sup>2</sup> *β q* 2 *<sup>n</sup>* (*z*) + <sup>|</sup>*A*0(*z*)<sup>|</sup> *q* 2 (*np*) *q* 2 + 1 (*nv*) *q* 2 , *F*<sup>5</sup> = *q q* <sup>2</sup> |*Sy*(*z*)| 3*q* <sup>2</sup> *β q* 2 *<sup>n</sup>* (*z*)|*A*0(*z*)| *q* 4 *a q* 4 *<sup>n</sup>* (*z*) (*nv*) *q* 2 (*an*(*z*) + |*b*(*z*)|) *q* 2 + *Cqq q* 2 *an*(*z*)|*Sy*(*z*)| *nv <sup>q</sup>* 4 |*Sy*(*z*)| <sup>2</sup>*βn*(*z*) *q* 4 (*an*(*z*) + |*b*(*z*)|) *q* <sup>2</sup> <sup>|</sup>*Sy*(*z*)<sup>|</sup> *q* 4 (*np*) *q* 4 1 (*nv*) *q* 4 + *Cqqq* |*Sy*(*z*)| <sup>2</sup>*an*(*z*) *nv <sup>q</sup>* 4 |*Sy*(*z*)| <sup>2</sup>*βn*(*z*) *q* 4 (*an*(*z*) + |*b*(*z*)|) *q* <sup>2</sup> <sup>1</sup> (*nv*) *q* 2 , *F*<sup>6</sup> = *Cqq*<sup>2</sup>(*q*−1) (*an*(*z*) + |*b*(*z*)|)*q*−1|*Sy*(*z*)|*β* 1 2 *<sup>n</sup>* (*z*) *qq*−<sup>1</sup> <sup>|</sup>*Sy*(*z*)|*an*(*z*) *nv q*−<sup>1</sup> <sup>1</sup> (*np*)2κ(*q*−1) + *q<sup>q</sup>* |*Sy*(*z*)|*an*(*z*) *nv q*−<sup>1</sup> |*Sy*(*z*)| *<sup>q</sup>*−1*β q*−1 <sup>2</sup> *<sup>n</sup>* (*z*) + *q* 3(*q*−1) 2 |*Sy*(*z*)| <sup>2</sup>*an*(*z*) *nv <sup>q</sup>*−<sup>1</sup> <sup>2</sup> 1 *nv q*−<sup>1</sup> <sup>+</sup> *<sup>q</sup>*2(*q*−1) (*np*)2(*q*−1)κ(*nv*)*q*−<sup>1</sup> <sup>+</sup> *<sup>q</sup>*2(*q*−1) <sup>|</sup>*Sy*(*z*)<sup>|</sup> *q*−1 2 (*nv*)*q*−<sup>1</sup> *an*(*z*)|*Sy*(*z*)| (*nv*) *<sup>q</sup>*−<sup>1</sup> 2 + *q* 5(*q*−1) <sup>2</sup> <sup>1</sup> *nq*−1*vq*−<sup>1</sup> |*Sy*(*z*)|*an*(*z*) *nv <sup>q</sup>*−<sup>1</sup> 2 <sup>+</sup> *<sup>q</sup>*3(*q*−1) (*np*)2(*q*−1)κ(*nv*)*q*−<sup>1</sup> .

**Remark 1.** *Theorem 3 was auxiliary. Tn was the perturbation of the main equation in the Stieltjes transformation of the limit distribution. The size of Tn was responsible for the stability of the solution of the perturbed equation. We were interested in the estimates of Tn that were uniform in the domain* D *and had an order of* log *n*/(*nv*) *(such estimates were needed for the proof of the delocalisation of Theorem 6). It was important to know to what extent the estimates depended on both npn and nv. The estimates behaved differently on the beam and at the ends of the support of the limit distribution (the introduced functions an*(*z*) *and b*(*z*) *were responsible for the behaviour of the estimates, depending on the real part of the argument: on the beam or at the ends of the support of* *the limit distribution). For* Λ*<sup>n</sup> estimation, there were two regimes: for* |*b*(*z*)| ≥ Γ*n, we used the inequality (10) and for* |*b*(*z*)| ≤ Γ*n, we used the inequality (18).*

**Corollary 1.** *Under conditions of Theorem 3, the following inequalities hold:*

$$\begin{split} \mathbb{E}\{|b(z)| \ge \Gamma\_n\} \to |T\_n|^q \mathbb{E}\{\mathcal{Q}\} \le \mathcal{C}^q |b(z)|^q \left[ \left(\frac{q^2}{(np)^{2\varkappa}}\right)^{q-1} d\_n^{\frac{2q-1}{2}}(z) \\ \qquad + d\_n^{\frac{3q}{4}}(z) \left(\frac{q^2}{n\upsilon}\right)^{\frac{q}{4}} + d\_n^{\frac{q}{2}}(z) \left(\frac{q^2}{n\upsilon}\right)^{\frac{q}{2}} + q^{q-1} d\_n^{\frac{3q-2}{2}}(z) \\ \qquad + q^{2(q-1)} d\_n^q(z) \frac{1}{(n\upsilon)^{q-1}} + q^{3(q-1)} d\_n^{\frac{1}{q}}(z) \frac{1}{(n\upsilon)^{q-1} (n\upsilon)^{2\varkappa(q-1)}} \right] \quad (7) \end{split}$$

*and*

$$\mathbb{E}\{\Gamma\_n \ge |b(z)|\} \to |T\_n|^q \mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}^q \left(\frac{\Gamma\_n}{n\upsilon} + \frac{1}{np}\right)^{\frac{q}{2}} \Gamma\_n^{\frac{q}{2}}.\tag{8}$$

**Corollary 2.** *Under the conditions of Theorem 3 and in the domain:*

$$\mathcal{D} = \{ z = u + iv : 1 - \sqrt{y} - v \le |u| \le 1 + \sqrt{y} + v, \; V \ge v \ge v\_0 \},$$

*for any Q* > 1*, a constant C exists that depends on Q, such that:*

$$\Pr\left\{|\Lambda\_n| > \frac{1}{2}\Gamma\_n; \mathcal{Q}\right\} \le \mathsf{C}n^{-\mathcal{Q}\_\cdot}$$

*Moreover, for z* = *u* + *iv to satisfy v* ≥ *v*<sup>0</sup> *and* |*z*| ≥ *C* max{ <sup>√</sup>log *<sup>n</sup>* <sup>√</sup>*np* , log<sup>4</sup> *<sup>n</sup>* (*np*)2<sup>κ</sup> } *and for <sup>Q</sup>* > <sup>1</sup>*, a constant C exists that depends on Q, such that:*

$$\Pr\left\{|\operatorname{Im}\Lambda\_n| > \frac{1}{2}\Gamma\_n; \mathcal{Q}\right\} \le \operatorname{Cn}^{-Q}.$$

**Corollary 3.** *Under the conditions of Theorem 3, for Q* ≥ 1*, a constant C that depends on Q exists, such that:*

$$\Pr\{\mathbf{Q}\} \ge 1 - Cn^{-Q\_\cdot}$$

**Theorem 4.** *Under the conditions of Theorem 1, for Q* ≥ 1*, the positive constants C* = *C*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *and a*<sup>0</sup> = *a*0(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *exists, such that for z* = *u* + *iv* ∈ D(*a*0)*:*

$$\Pr\{\left|\Lambda\_{\hbar}\right| \geq \frac{1}{2}\Gamma\_{\hbar}\} \leq Cn^{-Q}.$$

*Moreover, for Q* ≥ 1*, the positive constants C* = *C*(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1)*, C*<sup>0</sup> = *C*0(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *and a*<sup>0</sup> = *a*0(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *exist, such that for z* = *u* + *iv satisfying v* ≥ *v*<sup>0</sup> *and* |*z*| ≥ Γ*n:*

$$\Pr\left\{|\operatorname{Im}\Lambda\_n| > \frac{1}{2}\Gamma\_n\right\} \le \operatorname{Cn}^{-Q},\tag{9}$$

*where*

$$
\Gamma\_n = \mathbb{C}\_0 \log n \left( \frac{1}{n\upsilon} + \min \left\{ \frac{1}{n p |b(z)|}, \frac{1}{\sqrt{n p}} \right\} \right).
$$

To prove the main result, we needed to estimate the entries of the resolvent matrix.

**Theorem 5.** *Under the condition* (*C*0) *and for* 0 < *γ* < *γ*<sup>0</sup> *and u*<sup>0</sup> > 0*, the constants H* = *H*(*δ*, *μ*4+*δ*, *c*0, *γ*, *u*0)*, C* = *C*(*δ*, *μ*4+*δ*, *c*0, *γ*, *u*0)*, c* = *c*(*δ*, *μ*4+*δ*, *c*0, *γ*, *u*0)*, a*<sup>0</sup> = *a*0(*δ*, *μ*4+*δ*, *c*0, *γ*, *<sup>u</sup>*0) *and <sup>a</sup>*<sup>1</sup> <sup>=</sup> *<sup>a</sup>*1(*δ*, *<sup>μ</sup>*4+*δ*, *<sup>c</sup>*0, *<sup>γ</sup>*, *<sup>u</sup>*0) *exist, such that for* <sup>1</sup> <sup>≤</sup> *<sup>j</sup>* <sup>≤</sup> *<sup>n</sup>*, 1 <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>m</sup> and <sup>z</sup>* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *iv* <sup>∈</sup> <sup>D</sup>*, we have:*

$$\Pr\{|R\_{jk}| > H|S\_y(z)| \colon \widehat{Q}\_\gamma(v)\} \le Cn^{-c\log n}\sqrt{\}$$

$$\Pr\{\max\{|R\_{j,k+n}|, |R\_{j+n,k}|\} > H|S\_{\mathcal{Y}}(z)|; \widehat{Q}\_{\mathcal{T}}(v)\} \le Cn^{-c\log n},$$

$$\Pr\{|R\_{j+n,k+n}| > H|A\_0(z)|; \widehat{Q}\_\gamma(v)\} \le Cn^{-\varepsilon \log n}\_{-},$$

*where*

$$A\_0(z) = y \mathcal{S}\_{\mathcal{Y}}(z) - \frac{1 - y}{z}.$$

**Corollary 4.** *Under the conditions of Theorem 5, for v* ≥ *v*<sup>0</sup> *and q* ≤ *c* log *n, a constant H exists, such that for j*, *k* ∈ T ∪ (T(1) + *n*)*:*

$$\mathbb{E}|\mathcal{R}\_{jk}|^q \mathbb{I}\{\hat{\mathcal{Q}}\_{\gamma}\} \le H^q |S\_{\mathcal{Y}}(z)|^q.$$

#### **4. Delocalisation**

In this section, we demonstrate some applications of the main result. We let **L** = (*Ljk*)*<sup>n</sup> <sup>j</sup>*,*k*=<sup>1</sup> and **<sup>K</sup>** = (*Kjk*)*<sup>m</sup> <sup>j</sup>*,*k*=<sup>1</sup> be orthogonal matrices from the SVD of matrix **X** s.t.:

$$\mathbf{X} = \mathbf{L}\mathbf{\bar{D}}\mathbf{K}^\*,$$

where **<sup>D</sup>** <sup>=</sup> **D***n* **O***n*,*m* and **D** = diag{*s*1, ... ,*sn*}. Here and in what follows, **O***k*,*<sup>n</sup>* denotes a *k* × *n* matrix with zero entries. The eigenvalues of matrix **V** are denoted by *λ<sup>j</sup>* (*λ<sup>j</sup>* = *sj* for *j* = 1, ... , *n*, *λ<sup>j</sup>* = −*sj* for *j* = *n* + 1, ... , 2*n* and *λ<sup>j</sup>* = 0 for *j* = 2*n* + 1, ... , *n* + *m*). We let **u***<sup>j</sup>* = (*uj*,1, ... , *uj*,*n*+*m*) be the eigenvector of matrix **V**, corresponding to eigenvalue *λj*, where *j* = 1, . . . , *n* + *m*.

We proved the following result.

**Theorem 6.** *Under the conditions* (*C*0)*–*(*C*2)*, for Q* ≥ 1*, the positive constants C*<sup>1</sup> = *C*1(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *and C*<sup>2</sup> = *C*2(*Q*, *δ*, *μ*4+*δ*, *c*0, *c*1) *exist, such that:*

$$\Pr\left\{\max\_{1\le j,k\le n}|L\_{jk}|^2 \le C\_1 \frac{\log^4 n}{n} \right\} \le C\_2 n^{-Q}.$$

*Moreover, for j* = 1, . . . *n, we have:*

$$\Pr\left\{\max\_{1\le j\le n,1\le k\le m}|K\_{jk}|^2 \le C\_1 \frac{\log^4 n}{n} \right\} \le C\_2 n^{-Q}.$$

**Proof.** First, we noted that according to [13] based on [14] and Theorem 1, *<sup>c</sup>*1, *<sup>c</sup>*2, *<sup>C</sup>* <sup>&</sup>gt; <sup>0</sup> exists, such that:

$$\Pr\{\overline{c}\_1 \le s\_n \le s\_1 \le \overline{c}\_2\} \ge 1 - \text{Cn}^{-Q}.$$

Furthermore, by Lemma 11, we obtained:

$$R\_{j\bar{j}} = \sum\_{k=1}^{n} |L\_{jk}|^2 \left(\frac{1}{s\_k - z} - \frac{1}{s\_k + z}\right) = \int\_{-\infty}^{\infty} \frac{1}{x - z} dF\_{n\bar{j}}(x),$$

where

$$F\_{n\bar{\jmath}}(\mathbf{x}) = \frac{1}{2} \sum\_{j=1}^{n} |L\_{j\bar{k}}|^2 (\mathbb{I}\{s\_k \le \mathbf{x}\} + \mathbb{I}\{s\_k > -\mathbf{x}\}).$$

We noted that:

$$\max\_{1 \le j \le n} |L\_{jk}|^2 \le 2 \sup\_{\mu : |u| \ge \overline{c\_1}/2} (F\_{nj}(\mu + \lambda) - F\_{nj}(\mu))\_{\mu}$$

and

$$\begin{aligned} F\_{\eta j}(\mathbf{x} + \boldsymbol{\lambda}) - F\_{\eta j}(\mathbf{x}) &= \int\_{\boldsymbol{x}}^{\mathbf{x} + \boldsymbol{\lambda}} dF\_{\eta j}(\boldsymbol{u}) \\ &\leq 2\lambda \int\_{0}^{\lambda} \frac{\lambda}{(\mathbf{x} + \boldsymbol{\lambda} - \boldsymbol{u})^{2} + \lambda^{2}} dF\_{\eta j}(\boldsymbol{u}) \leq 2\lambda \operatorname{Im} R\_{\overline{j}\overline{j}}(\mathbf{x} + \boldsymbol{\lambda} + \boldsymbol{i}\lambda). \end{aligned}$$

These implied that:

$$\sup\_{\substack{\mathbf{x}:|\mathbf{x}|\geq\frac{\overline{c}}{2}}} |F\_{nj}(\mathbf{x}+\lambda) - F\_{nj}(\mathbf{x})| \leq 2\lambda \sup\_{|\mathbf{x}|>\frac{\overline{c}}{4}} \operatorname{Im} R\_{\overline{j}j}(\mathbf{x}+i\lambda).$$

We chose *λ* ∼ *n*−<sup>1</sup> log4 *n*. Then, by Corollary 4, we obtained:

$$\Pr\left\{\sup\_{\mathbf{x}:|\mathbf{x}|>\frac{\overline{c}}{2}}|F\_{\mathbf{n}\overline{\mathbf{y}}}(\mathbf{x}+\lambda)-F\_{\mathbf{n}\overline{\mathbf{y}}}(\mathbf{x})| \le \frac{\mathsf{C}\log^4 n}{n}\right\} \ge 1-\mathsf{C}n^{-Q}.$$

We obtained the bounds for *Kjk* in a similar way. Thus, the theorem was proven.

#### **5. Proof of the Corollaries**

*5.1. The Proof of Corollary 4*

**Proof.** We could write:

$$\mathbb{E}\left|R\_{jk}|^q\mathbb{I}\{\mathcal{Q}\}\right. \leq \mathbb{E}\left|R\_{jk}|^q\mathbb{I}\{\mathcal{Q}\}\mathbb{I}\{\mathcal{A}(v)\}\right. + \mathbb{E}\left|R\_{jk}|^q\mathbb{I}\{\mathcal{Q}\}\mathbb{I}\{\mathcal{A}^c(v)\}\right.$$

Combining this inequality with |*Rjk*| ≤ *v*<sup>−</sup>1, we found that:

$$\mathbb{E}\left|\mathcal{R}\_{jk}|^q\mathbb{I}\{\mathcal{Q}\}\right| \leq \mathcal{C}^q + v\_0^{-q} \mathbb{E}\{\mathbb{I}\{\mathcal{Q}\}\mathbb{I}\{\mathcal{A}^c(v)\}\}.$$

By applying Theorem 5, we obtained what was required. Thus, the corollary was proven.

#### *5.2. The Proof of Corollary 2*

**Proof.** We considered the domain D. We noted that for *z* ∈ D, we obtained:

$$\left|z\right|^2 \ge (1 - \sqrt{y} - v)^2 + v^2 \ge \frac{1}{2}(1 - \sqrt{y})^2 \text{ and } \left|A\_0(z)\right| \le C\_\prime \bar{z}$$

and

$$|b(z)| \le \frac{1-y}{\alpha} + 2\sqrt{y} + B.$$

First, we considered the case |*b*(*z*)| ≥ Γ*n*. This inequality implied that:

$$|b(z)| \ge \frac{\sqrt{2C\_0 \log n}}{\sqrt{n\overline{p}}} \ge \frac{1}{\sqrt{n\overline{p}}}.$$

From there, it followed that:

$$\min\left\{\frac{1}{np|b(z)|}, \frac{1}{\sqrt{np}}\right\} = \frac{1}{np|b(z)|}.$$

Furthermore, for the case |*b*(*z*)| ≥ Γ*n*, we obtained |*bn*(*z*)|I{Q} ≥ (1 − *γ*)|*b*(*z*)|I{Q}. We used the inequality:

$$|\Lambda\_n|\mathbb{I}\{\mathcal{Q}\} \le \frac{C|T\_n|}{|b(z)|}. \tag{10}$$

By Chebyshev's inequality, we obtained:

$$\Pr\{|\Lambda\_{\mathfrak{n}}| \ge \frac{1}{2}\Gamma\_{\mathfrak{n}}; \mathcal{Q}\} \le \frac{2^q \mathbb{E}|T\_n|^q \mathbb{I}\{\mathcal{Q}\}}{\Gamma\_n^q |b(z)|^q}.$$

By applying Corollary 1, we obtained:

$$\Pr\{ |\Lambda\_n| \ge \frac{1}{2}\Gamma\_n \colon \mathcal{Q} \} \le \frac{2^q \mathcal{H}\_n^q}{\Gamma\_n^q} \prime$$

where

$$\begin{split} \mathcal{H}\_{n}^{q} := \mathbb{C}^{q} \Big[ \left( \frac{q^{\frac{1}{2}}}{(np)^{2\varkappa}} \right)^{q-1} d\_{n}^{\frac{2q-1}{2}}(z) + d\_{n}^{\frac{3q}{4}}(z) \Big( \frac{q^{2}}{n\upsilon} \Big)^{\frac{q}{4}} + d\_{n}^{\frac{q}{2}}(z) \Big( \frac{q}{n\upsilon} \Big)^{\frac{q}{2}} \\ + q^{q-1} d\_{n}^{\frac{3q-2}{2}}(z) + q^{2(q-1)} d\_{n}^{q}(z) \frac{1}{(n\upsilon)^{q-1}} + q^{3(q-1)} d\_{n}^{\frac{1}{q}}(z) \frac{1}{(n\upsilon)^{q-1} (n\upsilon)^{2\varkappa(q-1)}} \Big]. \end{split} \tag{11}$$

First, we noted that for *q* = *K* log *n*:

$$\frac{d\_{\rm H}(z)}{\Gamma\_n} \le \frac{\mathcal{C}}{\log n}. \tag{12}$$

Moreover, for *q* = *C* log *n*:

$$\frac{q^2}{2m\Gamma\_n} \le \mathcal{C} \log n.\tag{13}$$

From there, it followed that:

$$C^q d\_n^{\frac{3q}{4}}(z) \left(\frac{q^2}{nv}\right)^{\frac{q}{4}} \le (\frac{C}{\log n})^{\frac{q}{2}}.\tag{14}$$

Furthermore:

$$C^{q} \left(\frac{d\_{n}(z)}{\Gamma\_{n}}\right)^{\frac{q}{2}} \left(\frac{q}{nv\Gamma\_{n}}\right)^{\frac{q}{2}} \le \left(\frac{C}{\log n}\right)^{\frac{q}{2}}.\tag{15}$$

Using these estimations, we could show that:

$$\frac{2^q \mathcal{H}\_n^q}{\Gamma\_n^q} \le \left(\frac{\mathcal{C}}{\log n}\right)^{\frac{q}{2}}\tag{16}$$

By choosing *q* = *K* log *n* and *K* > *C*(*Q*), we obtained:

$$\Pr\{|\Lambda\_n| \ge \frac{1}{2}\Gamma\_n \colon \mathcal{Q}\} \le Cn^{-Q}.$$

Then, we considered the case |*b*(*z*)| ≤ Γ*n*. In this case:

$$(\Gamma\_n^{\frac{1}{2}}(\frac{\Gamma\_n}{n\upsilon} + \frac{1}{np})^{\frac{1}{2}}/\Gamma\_n \le (\frac{1}{n\upsilon} + \frac{1}{np\Gamma\_n})^{\frac{1}{2}} \le \frac{C}{\log n}.\tag{17}$$

By applying the inequality |Λ*n*(*z*)| ≤ *C* -|*Tn*| and Corollary 1, we obtained:

$$\Pr\{|\Lambda\_{\boldsymbol{n}}| \ge \frac{1}{2}\Gamma\_{\boldsymbol{n}}; \mathcal{Q}\} \le \frac{2^{q}(\frac{\Gamma\_{\boldsymbol{n}}}{n\upsilon} + \frac{1}{n\boldsymbol{p}})^{\frac{q}{2}}}{\Gamma\_{\boldsymbol{n}}^{\frac{q}{2}}} \le \mathcal{C}^{q}(\frac{1}{n\upsilon} + \frac{1}{n\boldsymbol{p}\Gamma\_{\boldsymbol{n}}})^{\frac{q}{2}}.$$

It was then simple to show that:

$$\Pr\{|\Lambda\_n| \ge \frac{1}{2}\Gamma\_n \colon \mathcal{Q}\} \le \mathcal{C}n^{-\mathcal{Q}}.$$

Thus, the first inequality was proven. The proof of the second inequality was similar to the proof of the first. We had to use the inequality:

$$|\operatorname{Im}\Lambda\_n| \le \mathbb{C}\sqrt{|T\_n|}.\tag{18}$$

which was valid on the real line, instead of |Λ*n*| ≤ *C* -|*Tn*|, which held in the domain <sup>D</sup>. Moreover, we noted that for any *z* value, we obtained:

$$|\mathcal{S}\_y(z)| |A\_0(z)| \le \mathcal{C}\_\cdot$$

Thus, the corollary was proven.

*5.3. Proof of Corollary 3*

**Proof.** According to Theorem 4:

$$\Pr\{|\Lambda\_n(z)| \le \frac{1}{2}\Gamma\_n(z); \mathcal{Q}\} \ge 1 - \mathcal{C}n^{-\mathcal{Q}\_\cdot}$$

We noted that for *v* = *V*:

$$\Pr\{\mathcal{Q}(z)\} = 1.$$

Furthermore:

$$\left|\frac{d\Lambda(z)}{dz}\right| \le \frac{2}{v^2}.$$

We split the interval [*v*0, *V*] into subintervals by *v*<sup>0</sup> < *v*<sup>1</sup> < ··· < *vM* = *V*, such that for *k* = 1, . . . , *M*:

$$|\Lambda\_n(\mu + iv\_k) - \Lambda\_n(\mu + iv\_{k-1})| \le \frac{1}{2}\Gamma\_n(z).$$

We noted that the event Q*<sup>k</sup>* = {|Λ*n*(*u* + *ivk*)| ≤ <sup>1</sup> <sup>2</sup>Γ*n*(*u* + *ivk*)} implied the event <sup>Q</sup>*k*+<sup>1</sup> <sup>=</sup> {|Λ*n*(*<sup>u</sup>* <sup>+</sup> *ivk*)| ≤ <sup>Γ</sup>*n*}. From there, for *vk* <sup>≤</sup> *<sup>v</sup>* <sup>≤</sup> *vk*+1, *<sup>k</sup>* <sup>=</sup> 0, ... , *<sup>M</sup>* <sup>−</sup> 1, we obtained:

$$\Pr\{\mathcal{Q}(\mu+iv)\} \ge 1 - \Pr\{\mathcal{Q}(\mu+iv\_{k-1})\} - \Pr\{\mathcal{Q}\_{k-1} ^c ; \mathcal{Q}(\mu+iv\_{k-1})\} \ge 1 - \mathcal{C}n^{-\mathcal{Q}}.$$

#### **6. Proof of the Theorems**

*6.1. Proof of Theorem 1* **Proof.** We obtained:

$$\Pr\{\left|\Lambda\_{\mathfrak{n}}(z)\right| \ge \mathcal{T}\_{\mathfrak{n}}\} \le \Pr\{\left|\Lambda\_{\mathfrak{n}}(z)\right| \ge \mathcal{T}\_{\mathfrak{n}}; \mathcal{Q}\} + \Pr\{\mathcal{Q}^{\varepsilon}\}.$$

The second term in the RHS of the last inequality was bounded by Corollary 3. For *z* (such that |*b*(*z*)| ≥ *C*Γ*n*(*z*)), we used the inequality:

$$\left|\Lambda\_n(z)\right| \le \frac{\left|T\_n\right|}{\left|b\_n(z)\right|},$$

the inequality:

$$|b\_n(z)| \ge (1 - \gamma)|b(z)|$$

and the Markov inequality. We could write:

$$\Pr\{|\Lambda\_{\mathcal{U}}(z)| \ge \mathcal{T}\_{\mathfrak{n}}\} \le \frac{\mathbb{E}\{|T\_{\mathcal{N}}|^q; \mathcal{Q}\}}{|\mathcal{T}\_{\mathfrak{n}}|^q |b(z)|^q} + \mathcal{C}n^{-c\log\log n}.$$

We recalled that in the case |*b*(*z*)| ≥ Γ*n*:

$$\mathcal{T}\_n := \mathcal{K}(\widehat{d}\_n(z) + \widehat{d}\_n^{\frac{3}{4}}(z)\frac{1}{(n\upsilon)^{\frac{1}{4}}} + \widehat{d}\_n^{\frac{1}{2}}(z)\frac{1}{(n\upsilon)^{\frac{1}{2}}})\dots$$

In the case |*b*(*z*)| ≥ Γ*<sup>n</sup>* and using Corollary 1, we obtained:

$$\Pr\{ |\Lambda\_n(z)| \ge K\mathcal{T}\_n \} \le \left( \frac{\mathcal{H}\_n}{K\mathcal{T}\_n} \right)^q + \mathcal{C}n^{-c\log\log n}.$$

First, we considered the case |*b*(*z*)| ≥ Γ*n*. By our definition of *rn*(*z*), we obtained:

$$\Pr\{|\Lambda\_{\hbar}(z)| \ge \mathcal{T}\_{\hbar}\} \le \left(\mathbb{C}\frac{1}{K\log^{\frac{1}{2}}n}\right)^{q} + \mathbb{C}n^{-c\log\log n}.\tag{19}$$

This inequality completed the proof for |*b*(*z*)| ≥ Γ*n*.

We then considered |*b*(*z*)| ≤ Γ*n*. We used inequality |Λ*n*(*z*)| ≤ -|*Tn*| and Corollary 1 to obtain:

$$\Pr\{ |\Lambda\_n(z)| \ge \mathcal{T}\_n \} \le \left( \frac{\mathcal{C}}{K} \right)^q. \tag{20}$$

By choosing a sufficiently large *K* value, we obtained the proof. Thus, the theorem was proven.

#### *6.2. Proof of Theorem 2*

**Proof.** The proof of Theorem 2 was similar to the proof of Theorem 1. We only noted that inequality:

$$|\operatorname{Im} \Lambda\_n(\mu + i\upsilon)| \le \sqrt{|T\_n|}$$

held for all *u* ∈ R.

#### *6.3. The Proof of Theorem 5*

**Proof.** Using the definition of the Stieltjes transformation, we obtained:

$$s\_n(z) = \frac{1}{2n} \left( \sum\_{j=1}^n \frac{1}{s\_j - z} + \sum\_{j=1}^n \frac{1}{-s\_j - z} \right) = \frac{1}{n} \sum\_{j=1}^n \frac{z}{s\_j^2 - z^2} \gamma$$

and

$$S\_{\mathcal{Y}}(z) = \frac{-(z^2 - ab) + \sqrt{(z^2 - a^2)(z^2 - b^2)}}{2yz}.$$

It is also well known that for *z* = *u* + *iv*:

$$|S\_y(z)| \le \frac{1}{\sqrt{y}}$$

and

$$A\_0(z) := -\frac{1}{yS\_y(z) + z} = \left(yS\_y(z) - \frac{1-y}{z}\right).$$

We considered the following event for 1 ≤ *j* ≤ *n*, 1 ≤ *k* ≤ *m* and *C* > 0:

$$\mathcal{A}\_{jk}(\upsilon, \mathbb{J}, \mathbb{K}; \mathbb{C}) = \{ |\mathcal{R}\_{jk}^{(\mathbb{J}, \mathbb{K})}(\iota + iv)| \le \mathbb{C} \}.$$

We set:

$$\begin{split} \mathcal{A}^{(1)}(\upsilon, \mathbb{J}, \mathbb{K}) &= \cap\_{j=1}^{n} \cap\_{k=1}^{m} \mathcal{A}\_{j,k}(\upsilon, \mathbb{J}, \mathbb{K}; \mathbb{C}|S\_{\mathcal{Y}}(z)|), \\ \mathcal{A}^{(2)}(\upsilon, \mathbb{J}, \mathbb{K}) &= \cap\_{j=1}^{m} \cap\_{k=1}^{n} \mathcal{A}\_{j+n,k}(\upsilon, \mathbb{J}, \mathbb{K}; \mathbb{C}|S\_{\mathcal{Y}}(z)|), \\ \mathcal{A}^{(3)}(\upsilon, \mathbb{J}, \mathbb{K}) &= \cap\_{j=1}^{n} \cap\_{k=1}^{m} \mathcal{A}\_{j,k+n}(\upsilon, \mathbb{J}, \mathbb{K}; \mathbb{C}|S\_{\mathcal{Y}}(z)|), \\ \mathcal{A}^{(4)}(\upsilon, \mathbb{J}, \mathbb{K}) &= \cap\_{j=1}^{m} \cap\_{k=1}^{m} \mathcal{A}\_{j+n,k+n}(\upsilon, \mathbb{J}, \mathbb{K}; \mathbb{C}|A\_{0}(z)|). \end{split}$$

For *j* ∈ J*c*, *k* ∈ K*<sup>c</sup>* and *u*, we obtained:

$$|\mathcal{R}\_{jk}^{(\mathbb{J},\mathbb{K})}(z)| \le \frac{1}{v}.$$

We recalled:

$$a := a\_n(\mu, v) = \begin{cases} \operatorname{Im} b(z) + \Gamma\_{\mu\_{\prime}} \text{ if } |b(z)| \ge \Gamma\_{\mu\_{\prime}}, \\ \Gamma\_{\mu\_{\prime}} \text{ if } |b(z)| \le \Gamma\_{\mu}. \end{cases}$$

Then:

$$
\Gamma\_n = \Gamma\_n(z) = 2\mathcal{C}\_0 \log n(\frac{1}{n\upsilon} + \min\{\frac{1}{np|b(z)|}, \frac{1}{\sqrt{np}}\}).
$$

We introduced the events:

$$\hat{\mathcal{Q}}\_{\gamma}^{(\mathbb{J},\mathbb{K})}(v) := \bigcap\_{l=0}^{k\_v} \left\{ |\Lambda\_n^{(\mathbb{J},\mathbb{K})}(\mu + is\_0^l v)| \le \gamma a\_n(\mu, s\_0^l v) + \frac{|\mathbb{J}| + \mathbb{K}|}{n s\_0^l v} \right\}.$$

It was easy to see that:

$$
\hat{\mathcal{Q}}\_{\gamma}(v) \subset \hat{\mathcal{Q}}\_{\gamma}^{(\mathcal{J},\mathbb{K})}(v).
$$

In what follows, we used <sup>Q</sup> :<sup>=</sup> <sup>Q</sup>*γ*(*v*).

Equations (4) and (5) and Lemma 10 yielded that for *γ* ≤ *γ*<sup>0</sup> and for J, K that satisfied (|J| + |K|)/*nv* ≤ 1/4, the following inequalities held:

$$2|R\_{jj}^{(\mathbb{J},\mathbb{K})}|\mathbb{I}\{\mathcal{Q}\} \le 2|S\_y(z)| |e\_j^{(\mathbb{J},\mathbb{K})}| |R\_{jj}^{(\mathbb{J},\mathbb{K})}| \, \mathbb{I}\{\mathcal{Q}\} + 2|S\_y(z)|\,\tag{21}$$

and |*A*0(*z*)|(|J| + |K|)/*nv* ≤ 1/4,

$$|R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})}|\,\mathbb{I}\{\mathcal{Q}\} \le 2|A\_0(Z)| |\varepsilon\_{l+n}^{(\mathbb{J},\mathbb{K})}| |R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})}|\,\mathbb{I}\{\mathcal{Q}\} + 2|A\_0(z)|.\tag{22}$$

We noted that for |*z*| ≥ *<sup>C</sup>*<sup>1</sup> log *<sup>n</sup> nv* and <sup>|</sup>J| ≤ *<sup>C</sup>*<sup>2</sup> log *<sup>n</sup>* under appropriate *<sup>C</sup>*<sup>1</sup> and *<sup>C</sup>*2, we obtained *A*0(*z*)(|J| + |K|)/*nv* ≤ 1/4.

We considered the off-diagonal elements of the resolvent matrix. It could be shown that for *j* = *k* ∈ J*c*:

$$R\_{jk}^{(\mathbf{J},\mathbf{K})} = R\_{jj}^{(\mathbf{J},\mathbf{K})} \left( -\frac{1}{\sqrt{mp}} \sum\_{l=1}^{m} X\_{jl} \xi\_{jl} R\_{l+n,k}^{(\mathbf{J}\cup\{j\},\mathbf{K})} \right) = R\_{jj}^{(\mathbf{J},\mathbf{K})} \xi\_{jk}^{(\mathbf{J},\mathbf{K})},\tag{23}$$

for *l* = *k* ∈ K*c*:

$$R\_{l+n,k+n}^{(\mathbb{J},\mathbb{K})} = R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})} \left( -\frac{1}{\sqrt{m!p}} \sum\_{r=1}^{n} X\_{rl} \mathbb{1}\_{\mathbb{J}} R\_{k+n,r}^{(\mathbb{J},\mathbb{K} \cup \{l+n\})} \right) = R\_{l+n,l+n}^{(\mathbb{J},\mathbb{K})} \mathbb{1}\_{l+n,k+n'}^{(\mathbb{J},\mathbb{K})} \tag{24}$$

and

$$\mathcal{R}\_{j,k+n}^{(\mathbf{J},\mathbf{K})} = \mathcal{R}\_{j\bar{j}}^{(\mathbf{J},\mathbf{K})} \left( -\frac{1}{\sqrt{mp}} \sum\_{r=1}^{m} X\_{jr} \xi\_{jr} \mathcal{R}\_{r+n,j+n}^{(\mathbf{J}\cup\{j\},\mathbf{K})} \right) = \mathcal{R}\_{j\bar{j}}^{(\mathbf{J},\mathbf{K})} \mathcal{G}\_{j,k+n'}^{(\mathbf{J},\mathbf{K})}$$
 
$$\mathcal{R}\_{k+n,j}^{(\mathbf{J},\mathbf{K})} = \mathcal{R}\_{j\bar{j}}^{(\mathbf{J},\mathbf{K})} \left( -\frac{1}{\sqrt{mp}} \sum\_{r=1}^{m} X\_{jr} \xi\_{jr} \mathcal{R}\_{r+n,k+n}^{(\mathbf{J}\cup\{j\},\mathbf{K})} \right) = \mathcal{R}\_{j,\bar{j}}^{(\mathbf{J},\mathbf{K})} \mathcal{G}\_{k+n,j'}^{(\mathbf{J},\mathbf{K})} \tag{25}$$

where

$$\begin{split} \zeta\_{j\mathbf{k}}^{(\mathbf{J},\mathbf{K})} &= -\frac{1}{\sqrt{mp}} \sum\_{l=1}^{m} X\_{jl} \mathbb{1}\_{jl} \mathbb{R}\_{l+n,k}^{(\mathbf{J}\cup\{j\},\mathbf{K})}, \quad \zeta\_{j+n,k+n}^{(\mathbf{J},\mathbf{K})} = -\frac{1}{\sqrt{mp}} \sum\_{r=1}^{n} X\_{r\mathbf{j}} \mathbb{Z}\_{r\mathbf{j}} \mathbb{R}\_{r,k+n}^{(\mathbf{J},\mathbf{K}\cup\{j+n\})},\\ \zeta\_{j+n,k}^{(\mathbf{J},\mathbf{K})} &= -\frac{1}{\sqrt{mp}} \sum\_{l=1}^{m} X\_{kl} \mathbb{Z}\_{kl} \mathbb{R}\_{l+n,j+n}^{(\mathbf{J}\cup\{k\},\mathbf{K})}, \quad \zeta\_{j,k+n}^{(\mathbf{J},\mathbf{K})} = -\frac{1}{\sqrt{mp}} \sum\_{l=1}^{n} X\_{lk} \mathbb{Z}\_{l\mathbf{k}} \mathbb{R}\_{l+n,k+n}^{(\mathbf{J}\cup\{j\},\mathbf{K})}. \end{split} \tag{26}$$

Inequalities (21) and (22) implied that:

$$\Pr\{|\mathcal{R}\_{\vec{\mathcal{W}}}|\mathbb{I}\{\mathcal{Q}\} > \mathcal{C}|S\_{\mathcal{Y}}(z)|\} \leq \Pr\left\{|\varepsilon\_{\vec{\mathcal{I}}}|\mathbb{I}\{\mathcal{Q}\} > \frac{1}{4}\right\}\tag{27}$$

for 1 ≤ *j* ≤ *n* and *C* > <sup>√</sup><sup>4</sup> *<sup>y</sup>* and that:

$$\Pr\{|R\_{l+n,l+n}|\mathbb{I}\{\mathcal{Q}\}>\mathcal{C}|A\_0(z)|\} \le \Pr\left\{|\varepsilon\_{l+n}|\mathbb{I}\{\mathcal{Q}\}>\frac{1}{4|A\_0(z)|}\right\}\tag{28}$$

for 1 ≤ *l* ≤ *m* and *C* > 2. Equations (23)–(25) produced:

$$\Pr\{|R\_{jk}|\mathbb{I}\{\mathcal{Q}\} > \mathbb{C}|S\_{\mathcal{Y}}(z)|\} \le \Pr\{|R\_{\tilde{\mathcal{Y}}}|\mathbb{I}\{\mathcal{Q}\} > \mathbb{C}|S\_{\mathcal{Y}}(z)|\} + \Pr\{|\mathbb{I}\_{jk}|\mathbb{I}\{\mathcal{Q}\} > 1\}$$

for 1 ≤ *j* = *k* ≤ *n* and:

$$\begin{split} \Pr\{ |\mathcal{R}\_{l+n,k+n}| \mathbb{I}\{ \mathcal{Q} \} > \mathbb{C} |A\_0(z)| \} &\leq \Pr\{ |\mathcal{R}\_{l+n,l+n}| \mathbb{I}\{ \mathcal{Q} \} > \mathbb{C} |A\_0(z)| \} \\ &+ \Pr\{ |\mathcal{I}\_{l+n,k+n}| \mathbb{I}\{ \mathcal{Q} \} > 1 \} \end{split}$$

for 1 ≤ *l* = *k* ≤ *m*. Similarly, we obtained:

Pr{|*Rl*,*k*+*n*|I{Q} > *C*|*Sy*(*z*)|} ≤ Pr{|*Rl*,*l*|I{Q} > *C*|*Sy*(*z*)|} + Pr{|*ζl*,*k*+*n*|I{Q} > 1} and

$$\Pr\{|R\_{l+n,k}|\mathbb{I}\{\mathcal{Q}\} > \mathbb{C}|S\_y(z)|\} \le \Pr\{|R\_{k,k}|\mathbb{I}\{\mathcal{Q}\} > \mathbb{C}|S\_y(z)|\} + \Pr\{|\mathbb{Q}\_{l+n,k}|\mathbb{I}\{\mathcal{Q}\} > 1\}.$$

We noted that for |*z*| ≤ *B*, we obtained:

$$\frac{1}{|A\_0(z)|} \le B + \sqrt{y}.$$

Using Rosenthal's inequality, we found that:

$$\mathbb{E}\_{\bar{\mathcal{I}}}|\mathcal{J}\_{jk}|^q \leq \mathbb{C}^q \Big( q^{\frac{q}{2}} (n\upsilon)^{-\frac{q}{2}} (\operatorname{Im} R\_{kk}^{(j)})^{\frac{q}{2}} + q^q (n\upsilon)^{-q\varkappa -1} \frac{1}{n} \sum\_{l=1}^m |R\_{k,l+n}^{(j)}|^q \Big)$$

for 1 ≤ *j* = *k* ≤ *n* and that:

$$\begin{split} \mathbb{E}\_{j+n}|\zeta\_{j+n,k+n}|^{q} &\leq \mathcal{C}^{q} \Big( q^{\frac{q}{2}}(nv)^{-\frac{q}{2}} (\operatorname{Im} R\_{k+n,k+n}^{(j+n)})^{\frac{q}{2}} \\ &\quad + q^{q}(np)^{-q\varkappa-1} \frac{1}{n} \sum\_{r=1}^{n} |R\_{k+n,r}^{(j+n)}|^{q} \Big), \\ \mathbb{E}\_{j}|\zeta\_{j,k+n}|^{q} &\leq \mathcal{C}^{q} \Big( q^{\frac{q}{2}}(nv)^{-\frac{q}{2}}(\operatorname{Im} R\_{k+n,k+n}^{(j+n)})^{\frac{q}{2}} \\ &\quad + q^{q}(np)^{-q\varkappa-1} \frac{1}{n} \sum\_{r=1}^{n} |R\_{k+n,r+n}^{(j+n)}|^{q} \Big), \\ \mathbb{E}\_{j+n}|\zeta\_{j+n,k}|^{q} &\leq \mathcal{C}^{q} \Big( q^{\frac{q}{2}}(nv)^{-\frac{q}{2}}(\operatorname{Im} R\_{k+n,k+n}^{(j+n)})^{\frac{q}{2}} \\ &\quad + q^{q}(np)^{-q\varkappa-1} \frac{1}{n} \sum\_{r=1}^{n} |R\_{k+n,r+n}^{(j+n)}|^{q} \Big). \end{split}$$

for 1 ≤ *j* = *k* ≤ *m*. We noted that:

$$\begin{split} \Pr\{ |\boldsymbol{\varepsilon}\_{j}^{(\mathbb{J},\mathbb{K})}| > \frac{1}{4}; \mathcal{Q} \} \leq \Pr\{ \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K})^{\boldsymbol{c}}; \mathcal{Q} \} + \Pr\{ |\boldsymbol{\varepsilon}\_{j}^{(\mathbb{J},\mathbb{K})}| > \frac{1}{4}; \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K}); \mathcal{Q} \}, \\ \Pr\{ |\boldsymbol{\varepsilon}\_{j+n}^{(\mathbb{J},\mathbb{K})}| > \frac{1}{4|A\_{0}(z)|}; \mathcal{Q} \} \leq \Pr\{ \mathcal{A}^{(1)}(sv,\mathbb{J},\mathbb{K})^{\boldsymbol{c}}; \mathcal{Q} \} \\ &+ \Pr\{ |\boldsymbol{\varepsilon}\_{j+n}^{(\mathbb{J},\mathbb{K})}| > 1/(4|A\_{0}(z)|); \mathcal{A}^{(1)}(sv,\mathbb{J},\mathbb{K}); \mathcal{Q} \}, \end{split}$$

$$\begin{split} \Pr\{ |\zeta\_{jk}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{Q} \} &\leq \Pr\{ \mathcal{A}^{(2)}(sv,\mathbb{J},\mathbb{K})^{\circ}; \mathcal{Q} \} \\ &\quad + \Pr\{ |\zeta\_{jk}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{Q} \} \leq \Pr\{ \mathcal{A}^{(3)}(sv,\mathbb{J},\mathbb{K})^{\circ}; \mathcal{Q} \} \\ &\quad + \Pr\{ |\zeta\_{j+n,k+n}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{A}^{(3)}(sv,\mathbb{J},\mathbb{K})^{\circ}; \mathcal{Q} \} \\ \Pr\{ |\zeta\_{j+n,k}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{Q} \} &\leq \Pr\{ \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K})^{\circ}; \mathcal{Q} \} \\ &\quad + \Pr\{ |\zeta\_{j+n,k}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{Q}; \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K}) \}, \\ \Pr\{ |\zeta\_{k,j+n}^{(\mathbb{J},\mathbb{K})}| > 1; \mathcal{Q} \} &\leq \Pr\{ \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K})^{\circ}; \mathcal{Q} \} \\ &\quad + \Pr\{ |\zeta\_{k,j+n}^{(\mathbb{J},\mathbb{K})}(v)| > 1; \mathcal{Q}; \mathcal{A}^{(4)}(sv,\mathbb{J},\mathbb{K}) \}. \end{split}$$

Using Chebyshev's inequality, we obtained:

$$\begin{aligned} \Pr\{ |\boldsymbol{\varepsilon}\_{j}^{(\mathsf{J},\mathsf{K})}| &> 1/4; \,\mathsf{Q}; \mathcal{A}^{(4)} \} \\ &\leq \mathsf{C}^{q} \,\mathbb{E}\left( |\mathbb{E}\_{j}| \varepsilon\_{j} |^{q} \right) \mathbb{I}\{ \mathfrak{Q}^{(\mathsf{J},\mathsf{K})} \} \mathbb{I}\{ \mathcal{A}^{(4)} \}. \end{aligned}$$

By applying the triangle inequality to the results of Lemmas (1)–(3) (which were the property of the multiplicative gradient descent of the resolvent matrix), we arrived at the inequality:

$$\begin{split} \mathbb{E}\_{\mathbb{P}} \mathbb{E} \{ \mathcal{A}^{(4)}(sv, \mathbb{J}, \mathbb{K}) \} |\varepsilon\_{j}|^{q} &\leq \mathbb{C}^{q} \left[ \frac{1}{(nv)^{q}} + \left( \frac{qs|A\_{0}(z)|^{2}}{np} \right)^{\frac{q}{2}} + \frac{1}{np} \left( \frac{qs|A\_{0}(z)|}{(np)^{2\varkappa}} \right)^{q} \right. \\ &\left. + \left( \frac{q^{2}s(a\_{n}(z) + |A\_{0}(z)|)}{nv} \right)^{\frac{q}{2}} + \frac{1}{np} \left( \frac{qs|A\_{0}(z)|}{(nv)} \right)^{\frac{q}{2}} \left( \frac{q^{2}}{np} \right)^{\frac{q}{2}} \\ &\quad + \left( \frac{q^{2}s|A\_{0}(z)|}{(np)^{2\varkappa}} \right)^{q} \frac{1}{(np)^{2\varkappa}} \right]. \end{split}$$

When we set *q* ∼ log2 *n*, *nv* > *C* log<sup>4</sup> *n* and *np* > *C*(log *n*) 2 <sup>κ</sup> and took into account that κ < 1/2 and |*A*0(*z*)| ≤ *C*/|*z*|, then we obtained:

$$\mathbb{E}\_j |\varepsilon\_j|^q \mathbb{I}\{\mathcal{A}^{(4)}(sv, \mathbb{J}\cup\{j\}, \mathbb{K})\} \leq C n^{-c \log n}.$$

Moreover, the constant *c* could be made arbitrarily large. We could obtain similar estimates for the quantities of *εl*+*n*, *ζjk*, *ζj*+*nk*, *ζjk*+*n*, *ζj*+*n*,*k*+*n*. Inequalities (27) and (28) implied:

Pr{|*R*(J,K) *jj* <sup>|</sup>I{Q} <sup>&</sup>gt; *<sup>C</sup>*|*Sy*(*z*)|} ≤ Pr{A(4) (*sv*, J ∪ {*j*}, K) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*, Pr{|*R*(J,K) *<sup>l</sup>*+*n*,*l*+*n*|I{Q} <sup>&</sup>gt; *<sup>C</sup>*|*A*0(*z*)|} ≤ Pr{A(1) (*sv*, J, K ∪ {*l*}) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*, Pr{|*R*(J,K) *jk* <sup>|</sup>I{Q} <sup>&</sup>gt; *<sup>C</sup>*|*Sy*(*z*)|} ≤ Pr{A(2) (*sv*, J, K ∪ {*l*}) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*, Pr{|*Rj*<sup>+</sup>*n*,*k*|I{Q} > *<sup>C</sup>*|*Sy*(*z*)|} ≤ Pr{A(4) (*sv*, J, K ∪ {*j*}) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*, Pr{|*Rk*<sup>+</sup>*n*,*j*|I{Q} > *<sup>C</sup>*|*Sy*(*z*)|} ≤ Pr{A(4) (*sv*, J, K ∪ {*j*}) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*, Pr{|*Rk*<sup>+</sup>*n*,*j*+*n*|I{Q} > *<sup>C</sup>*|*A*0(*z*)|} ≤ Pr{A(3) (*sv*, J, K ∪ {*j*}) *c* } + *Cn*−*<sup>c</sup>* log *<sup>n</sup>*.

The last inequalities produced:

$$\begin{split} \max\_{j,k \in \mathbb{J}^{\varepsilon} \cup \mathbb{K}^{\varepsilon}} & \Pr\{|\mathcal{R}\_{j,k}^{(\mathbb{J},\mathbb{K})}| \: \mathbb{I}\{\mathcal{Q}\} > \mathbb{C}\} \leq \mathbb{C}n^{-\varepsilon \log n} \\ & + \max\_{j \in \mathbb{J}^{\varepsilon}, k \in \mathbb{K}^{\varepsilon}} \max\{\Pr\{\mathcal{A}^{\varepsilon}(sv, \mathbb{J} \cup \{j\}), \mathbb{K}; \mathbb{C}A\_{0}(z)\}\}, \Pr\{\mathcal{A}^{\varepsilon}(s\_{0}v, \mathbb{J}, \mathbb{K} \cup \{k\}; \mathbb{C}A\_{0}(z))\}. \end{split}$$

We noted that *kv* ≤ *C* log *n* for *v* ≥ *v*<sup>0</sup> = *n*−<sup>1</sup> log<sup>4</sup> *n*. So, by choosing *c* large enough, we obtained:

$$\Pr\{\mathcal{A}^c(v) \cap \mathcal{Q}\} \le \text{Cr}^{-c\log n}.$$

This completed the proof of the theorem.

#### *6.4. The Proof of Theorem 3*

**Proof.** First, we noted that for *z* ∈ D, a constant *C* = *C*(*y*, *V*) exists, such that:

$$|b(z)| \le \mathcal{C}.$$

Without a loss of generality, we could assume that Γ−<sup>1</sup> *<sup>n</sup>* ≥ |*b*(*z*)|. We recalled that:

$$a := a\_{\hbar}(z) := a\_{\hbar}(\mu, v) = \begin{cases} \operatorname{Im} b(z) + \Gamma\_{\hbar} \text{ if } |b(z)| \ge \Gamma\_{n\prime}, \\ \Gamma\_{n\prime} \text{ if } |b(z)| \le \Gamma\_{n}. \end{cases}$$

Then:

$$
\Gamma\_n = 2\mathbb{C}\_0 \log n \left( \frac{1}{n\upsilon} + \min \left\{ \frac{1}{n p |b(z)|}, \frac{1}{\sqrt{n p}} \right\} \right).
$$

We considered the smoothing of the indicator *hγ*(*x*):

$$h\_{\gamma}(\boldsymbol{x}, \boldsymbol{v}) = \begin{cases} 1, & \text{for } |\boldsymbol{x}| \le \gamma a, \\ 1 - \frac{|\boldsymbol{x}| - \gamma a}{\gamma a}, & \text{for } \gamma a \le |\boldsymbol{x}| \le 2\gamma a, \\ 0, & \text{for } |\boldsymbol{x}| > 2\gamma a. \end{cases}$$

We noted that:

$$\mathbb{I}\_{\widehat{\mathcal{Q}}\_{\mathcal{I}}(\upsilon)} \le h\_{\mathcal{I}}(|\Lambda\_n(\mu + i\upsilon)|, \upsilon) \le \mathbb{I}\_{\widehat{\mathcal{Q}}\_{\mathcal{I}^\*}(\upsilon)'} $$

where, as before:

$$\widehat{\mathcal{Q}}\_{\gamma}(v) = \bigcap\_{\nu=0}^{k\_{\mathcal{D}}} \{ |\Lambda\_n(\mu + is\_0^{\nu} v)| \le \gamma a\_n(\mu, s\_0^{\nu} v) \}.$$

We estimated the value:

$$D\_{\mathfrak{n}} := \mathbb{E} \left| T\_{\mathfrak{n}} \right|^q h\_{\gamma}^q \left( |\Lambda\_{\mathfrak{n}}|\_{\prime} v \right).$$

It was easy to see that:

$$\mathbb{E} \mid T\_n \vert^q \mathbb{I} \{ \: \vert Q \rangle \} \le D\_n.$$

To estimate *Dn*, we used the approach developed in [15], which refers back to Stein's method. We let:

We set:

$$\varphi(z) := \mathbb{Z}|z|^{q-2}.$$

*<sup>q</sup>*<sup>−</sup>2.

Then, we could write:

$$\begin{aligned} T\_n &:= T\_n h\_\gamma(|\Lambda\_n|\_\prime v). \\\\ D\_n &:= \mathbb{E} \, \hat{T}\_n \varphi(\hat{T}\_n). \end{aligned}$$

The equality:

$$T\_n = 1 + \left(z - \frac{1-y}{z}\right)s\_n(z) + ys\_n^2(z) = b(z)\Lambda\_n(z) + y\Lambda\_n^2(z)$$

implied that a constant *C* exists that depends on *γ* in the definition of Q, such that:

$$\mathbb{E}\left|T\_{\mathbb{H}}\mathbb{I}\{\mathcal{Q}\}\right| \le (|b(z)| |\Lambda\_{\mathbb{H}}(z)| + y |\Lambda\_{\mathbb{H}}(z)|^2) \mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}(a\_n^2(z) + |b(z)| |a\_n(z)|) \mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}.$$

We considered:

$$\mathcal{B} := \mathcal{A}^{(1)} \cap \mathcal{A}^{(2)} \cap \mathcal{A}^{(3)} \cap \mathcal{A}^{(4)}.$$

Then:

$$D\_n \le \mathbb{E} \left| T\_n \right|^q \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} + \mathbb{C} n^{-\varepsilon \log n}.$$

By the definition of *Tn*, we could rewrite the last inequality as:

$$D\_n := \frac{1}{n} \sum\_{j=1}^n \mathbb{E} \, \varepsilon\_j \mathbb{R}\_{\widehat{\mathcal{Y}}} h\_{\widehat{\mathcal{Y}}}(|\Lambda\_n|\_{\prime} v) \, q(\widehat{T}\_n) \, \mathbb{E} \{ \mathcal{B} \} + \mathbb{C} n^{-\varepsilon \log n} \, .$$

We set:

$$D\_n = D\_n^{(1)} + D\_n^{(2)} + \mathbb{C}n^{-c \log n},\tag{29}$$

where

$$\begin{aligned} D\_n^{(1)} &:= \frac{1}{n} \sum\_{j=1}^n \mathbb{E} \, \varepsilon\_{j1} \mathbb{R}\_{\widehat{j}j} h\_{\gamma}(|\Lambda\_n|, v) \, q(\widehat{T}\_n) \, \mathbb{I}\{\mathcal{B}\}\_{\gamma}, \\ D\_n^{(2)} &:= \frac{1}{n} \sum\_{j=1}^n \mathbb{E} \, \widehat{\varepsilon}\_{\widehat{j}} \mathbb{R}\_{\widehat{j}j} h\_{\gamma}(|\Lambda\_n|, v) \, q(\widehat{T}\_n) \, \mathbb{I}\{\mathcal{B}\}\_{\gamma}, \\ \widehat{\varepsilon}\_{\widehat{j}} &:= \varepsilon\_{j2} + \varepsilon\_{j3}. \end{aligned}$$

We obtained:

$$\frac{1}{n}\sum\_{j=1}^{n}\varepsilon\_{j1}R\_{jj} = \frac{1}{2n}s'\_n(z) + \frac{s\_n(z)}{2nz}$$

and this yielded:

$$\left| \frac{1}{n} \sum\_{j=1}^{n} \varepsilon\_{j1} R\_{jj} \right| \le \frac{C}{n\upsilon} \operatorname{Im} s\_n(z) + \frac{C}{n} + \frac{C|\Lambda\_\Pi|}{n|z|}. \tag{30}$$

Then, we used:

$$\frac{|S\_y(z)|}{|z|} \le \frac{1}{1-y} (y|S\_y(z)|^2 + |z||S\_y(z)| + 1) \le \mathcal{C}.$$

Inequality (30) implied that for *z* ∈ D:

<sup>|</sup>*D*(1) *<sup>n</sup>* | ≤ *<sup>J</sup>*1*<sup>D</sup> q*−1 *q <sup>n</sup>* , (31)

where

$$J\_1 = C \frac{a\_n(z)}{m\upsilon}.$$

Further, we considered:

$$
\widehat{T}\_n^{(j)} = \mathbb{E}\_j \widehat{T}\_n, \quad T\_n^{(j)} = \mathbb{E}\_j \, T\_n, \quad \Lambda\_n^{(j)} = \mathbb{E}\_j \, \Lambda\_n.
$$

We noted that by the Jensen inequality, for *q* ≥ 1:

$$\mathbb{E}\left|\widehat{T}\_{\mathfrak{n}}^{(j)}\right|^{q} \leq \mathbb{E}\left|\widehat{T}\_{\mathfrak{n}}\right|^{q}.$$

We represented *<sup>D</sup>*(2) *<sup>n</sup>* in the form:

*<sup>D</sup>*(2) *<sup>n</sup>* <sup>=</sup> *<sup>D</sup>*(21) *<sup>n</sup>* <sup>+</sup> ··· <sup>+</sup> *<sup>D</sup>*(24) *<sup>n</sup>* , (32)

where

$$\begin{split} D\_{n}^{(21)} &:= \frac{S\_{\mathcal{Y}}(z)}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{j} h\_{\gamma} (|\Lambda\_{n}^{(j)}|, v) \boldsymbol{\varrho} (\hat{T}\_{n}^{(j)}) \, \mathbb{I} \{ \mathcal{B} \}, \\ D\_{n}^{(22)} &:= \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{j} (\mathcal{R}\_{jj} - S\_{\mathcal{Y}}(z)) h\_{\gamma} (|\Lambda\_{n}^{(j)}|, v) \boldsymbol{\varrho} (\hat{T}\_{n}^{(j)}) \, \mathbb{I} \{ \mathcal{B} \}, \\ D\_{n}^{(23)} &:= \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{j} \boldsymbol{R}\_{jj} (\boldsymbol{h}\_{\gamma} (|\Lambda\_{n}|, v) - \boldsymbol{h}\_{\gamma} (|\Lambda\_{n}^{(j)}|, v)) \boldsymbol{\varrho} (\hat{T}\_{n}^{(j)}) \, \mathbb{I} \{ \mathcal{B} \}, \\ D\_{n}^{(24)} &:= \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{j} \boldsymbol{R}\_{jj} (|\Lambda\_{n}|, v) (\boldsymbol{\varrho} (\hat{T}\_{n}) - \boldsymbol{\varrho} (\hat{T}\_{n}^{(j)})) \, \mathbb{I} \{ \mathcal{B} \}. \end{split}$$

Since *Ejε<sup>j</sup>* <sup>=</sup> 0, we found:

$$D\_n^{(21)} = \frac{S\_{\mathcal{Y}}(z)}{n} \sum\_{j=1}^n \mathbb{E}\widehat{\varepsilon}\_j h\_{\mathcal{Y}}(|\boldsymbol{\Lambda}\_n^{(j)}|, \boldsymbol{v}) \boldsymbol{\varrho}(\widehat{T}\_n^{(j)}) \, \mathbb{I}\{\mathcal{B}^c\}.$$

From there, it was easy to obtain:

$$|D\_n^{(21)}| \le C n^{-c \log n}.\tag{33}$$

6.4.1. Estimation of *<sup>D</sup>*(22) *<sup>n</sup>*

Using the representation of *Rjj*, we could write:

$$D\_n^{(22)} = \tilde{D}\_n^{(22)} + \hat{D}\_n^{(22)} + \mathring{D}\_n^{(22)},$$

where

$$\begin{split} \tilde{D}^{(22)}\_{n} &:= \frac{\mathcal{S}\_{y}(z)}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}^{2}\_{\hat{j}} R\_{\hat{j}\hat{l}} h\_{\hat{\gamma}}(|\Lambda\_{n}^{(j)}|, \upsilon) \, q(\hat{T}^{(j)}\_{n}) \, \mathbb{I}\{\mathcal{B}\}, \\ \hat{D}^{(22)}\_{n} &:= \frac{y\mathcal{S}\_{y}(z)}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{\hat{j}} \Lambda\_{n} R\_{\hat{j}\hat{l}} h\_{\hat{\gamma}}(|\Lambda\_{n}^{(j)}|, \upsilon) \, q(\hat{T}^{(j)}\_{n}) \, \mathbb{I}\{\mathcal{B}\}, \\ \hat{D}^{(22)}\_{n} &:= \frac{y\mathcal{S}\_{y}(z)}{n} \sum\_{j=1}^{n} \mathbb{E} \hat{\varepsilon}\_{\hat{j}} \varepsilon\_{j1} R\_{\hat{j}\hat{l}} h\_{\hat{\gamma}}(|\Lambda\_{n}^{(j)}|, \upsilon) \, q(\hat{T}^{(j)}\_{n}) \, \mathbb{I}\{\mathcal{B}\} \end{split}$$

By Hölder's inequality:

$$\mathbb{E}\left|\hat{D}\_{n}^{(22)}\right| \leq \frac{\mathbb{C}\left|S\_{\mathcal{Y}}(z)\right|}{n} \sum\_{j=1}^{n} \mathbb{E}^{\frac{1}{q}} \left[\mathbb{E}\_{\hat{\mathcal{Y}}} \left|\hat{\mathbb{E}}\_{\hat{\mathcal{Y}}} \right| \left|\Lambda\_{n}\right| \left|R\_{\hat{\mathcal{Y}}}\right| h\_{\mathcal{Y}}(\left|\Lambda\_{n}^{(j)}\right|, \upsilon) \mathbb{E}\{\mathcal{B}\} \right]^{q} D\_{n}^{\frac{q-1}{q}}.\tag{34}$$

Further:

$$\mathbb{E}\_{\bar{\mathcal{I}}} \Big[ |\widehat{\boldsymbol{\varepsilon}}\_{\bar{\mathcal{I}}}| |\Lambda\_{\boldsymbol{n}}| |R\_{\bar{\mathcal{I}}\bar{\boldsymbol{\beta}}}| h\_{\boldsymbol{\gamma}}(|\Lambda\_{\boldsymbol{n}}^{(j)}| , \boldsymbol{\upsilon}) \, \mathbb{I}\{\mathcal{B}\} \Big] \leq \mathbb{C} |\mathcal{S}\_{\mathcal{Y}}(\boldsymbol{z})| \, \mathbb{E}\_{\bar{\mathcal{I}}} \Big[ |\widehat{\boldsymbol{\varepsilon}}\_{\bar{\mathcal{I}}}| |\Lambda\_{\boldsymbol{n}}| h\_{\boldsymbol{\gamma}}(|\Lambda\_{\boldsymbol{n}}^{(j)}| , \boldsymbol{\upsilon}) \, \mathbb{I}\{\mathcal{B}\} \Big].$$

We obtained:

$$\begin{split} \left| \Lambda\_{\boldsymbol{\mathsf{H}}} \middle| h\_{\boldsymbol{\mathsf{T}}} (|\Lambda\_{\boldsymbol{\mathsf{H}}}^{(j)}| \, \boldsymbol{\mathsf{v}}) \, \mathbb{I} \{ \mathcal{B} \} \right| &\leq \left| \Lambda\_{\boldsymbol{\mathsf{H}}} \middle| h\_{\boldsymbol{\mathsf{T}}} (|\Lambda\_{\boldsymbol{\mathsf{H}}}| \, \boldsymbol{\mathsf{v}}) \, \mathbb{I} \{ \mathcal{B} \} \right| \\ &+ \left| \Lambda\_{\boldsymbol{\mathsf{H}}} \middle| \left| h\_{\boldsymbol{\mathsf{T}}} (|\Lambda\_{\boldsymbol{\mathsf{H}}}| \, \boldsymbol{\mathsf{v}}) - h\_{\boldsymbol{\mathsf{T}}} (|\Lambda\_{\boldsymbol{\mathsf{H}}}^{(j)}| \, \boldsymbol{\mathsf{v}}) \right| \, \mathbb{I} \{ \mathcal{B} \} . \end{split}$$

In the case |*bn*(*z*)| ≥ -|*Tn*|, we obtained:

$$\left|\Lambda\_n\right| \le \frac{\left|T\_n\right|}{\left|b\_n(z)\right|} \le \sqrt{\left|T\_n\right|}.$$

This implied that:

$$\mathbb{E}\left|\Lambda\_n|h\_\gamma(|\Lambda\_n|,\upsilon)\mathbb{I}\{\mathcal{B}\}\mathbb{I}\{\sqrt{|T\_n|} \le |b\_n(z)|\}\right| \le \mathbb{C}\sqrt{|T\_n|}h(|\Lambda\_n|,\upsilon).$$

Furthermore, in the case |*bn*(*z*)| ≤ -|*Tn*| and |*b*(*z*)| ≥ Γ*n*, we obtained:

$$c|b\_{\mathfrak{n}}(z)|\mathbb{I}\{\mathcal{Q}\} \ge (1 - 2\gamma)|b(z)|\mathbb{I}\{\mathcal{Q}\} > c|b(z)|\mathbb{I}\{\mathcal{Q}\}.$$

This implied that:

$$|\Lambda\_n|\mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}(\operatorname{Im} b(z) + \Gamma\_n)\mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}\sqrt{|T\_n|}.$$

For |*b*(*z*)| ≤ Γ*n*, we could write:

$$\begin{split} \mathbb{E}\_{\bar{\mathcal{I}}} \Big[ |\hat{\varepsilon}\_{\bar{\mathcal{I}}}| |\Lambda\_{\boldsymbol{n}}| |R\_{\bar{\mathcal{I}}\boldsymbol{\bar{\mathcal{I}}}}| h\_{\mathcal{I}}(|\Lambda\_{\boldsymbol{n}}^{(j)}|, \boldsymbol{\nu}) \, \mathbb{I} \{ \mathcal{B} \} \Big] &\leq \mathsf{C} |\mathsf{S}\_{\mathcal{Y}}(\boldsymbol{z})| \, \mathbb{E}\_{\bar{\mathcal{I}}} \Big[ |\hat{\varepsilon}\_{\bar{\mathcal{I}}}| |\Lambda\_{\boldsymbol{n}}| \mathbb{I} \{ |\Lambda\_{\boldsymbol{n}}^{(j)}| \leq C \Gamma\_{\boldsymbol{n}} \}, \mathbb{I} \{ \mathcal{B} \} \Big] \\ &\leq \mathsf{C} |\mathsf{S}\_{\mathcal{Y}}(\boldsymbol{z})| \Gamma\_{\boldsymbol{n}} \, \mathbb{E}\_{\bar{\mathcal{I}}} \Big[ |\hat{\varepsilon}\_{\bar{\mathcal{I}}}| \{ |\Lambda\_{\boldsymbol{n}}^{(j)}| \leq C \Gamma\_{\boldsymbol{n}} \}, \mathbb{I} \{ \mathcal{B} \} \Big]. \end{split}$$

Using this, we concluded that:

$$\begin{split} \mathbb{E}\_{\bar{\jmath}} \left[ |\hat{\varepsilon}\_{\bar{\jmath}}| |\Lambda\_{\mathfrak{n}}| h\_{\mathfrak{n}}(|\Lambda\_{\mathfrak{n}}^{(j)}| , \boldsymbol{\nu}) \, \mathbb{I} \{ \mathcal{B} \} \right] &\leq \mathbb{E}\_{\bar{\jmath}}^{\frac{1}{2}} |\hat{\varepsilon}\_{\bar{\jmath}}|^{2} \mathbb{I} \{ |\Lambda\_{\mathfrak{n}}^{(j)}| \, \leq \, \mathrm{Ca}\_{\mathfrak{n}}(\boldsymbol{z}) \} \mathbb{I} \{ \mathcal{B} \} \\ &\quad \times \Big( \mathbb{I} \{ |b(\boldsymbol{z})| \geq \Gamma\_{\mathfrak{n}} \} \, \mathbb{E}\_{\bar{\jmath}}^{\frac{1}{2}} |\widehat{T}\_{\mathfrak{n}}| + \Gamma\_{\mathfrak{n}} \mathbb{I} \{ |b(\boldsymbol{z})| \leq \Gamma\_{\mathfrak{n}} \} \mathbb{I} \{ \boldsymbol{z} \notin \mathcal{D} \} \Big). \end{split}$$

By applying Lemmas 2 and 3, we obtained:

$$\mathbb{E}\_{\hat{f}}\left[|\hat{\varepsilon}\_{\hat{f}}| |\Lambda\_{n}| h\_{\gamma}(|\Lambda\_{n}^{(j)}|, v) \operatorname{\mathbb{I}}\{\mathcal{B}\}\right] \leq \mathbb{C} \beta\_{n}^{\frac{1}{2}}(z) \Big(\operatorname{\mathbb{E}}\_{\hat{f}}^{\frac{1}{2}} |\hat{T}\_{n}| + \Gamma\_{n} \mathbb{I}\{|b(z)| \leq \Gamma\_{n}\} \operatorname{\mathbb{I}}\{z \notin \mathcal{D}\}\Big). \tag{35}$$

By combining inequalities (34) and (35), |*Sy*(*z*)||*A*0(*z*)| ≤ *C* and Young's inequality, we obtained:

$$|\widehat{D}\_n^{(22)}| \le H\_1 D\_n^{\frac{2q-1}{2q}} + H\_2 D\_n^{\frac{q-1}{q}},\tag{36}$$

where

$$\begin{aligned} H\_1 &= \mathbb{C}|\mathbb{S}\_y(z)|^2 \beta\_n^{\frac{1}{2}}(z) \mathbb{I}\{ |b(z)| \ge \Gamma\_n \}, \\ H\_2 &= |\mathbb{S}\_y(z)|^2 \Gamma\_n \beta\_n^{\frac{1}{2}}(z) \mathbb{I}\{ |b(z)| \le \Gamma\_n \} \mathbb{I}\{ z \notin \mathcal{D} \}. \end{aligned}$$

Hölder's inequality and (35) produced:

$$|\bar{D}\_n^{(22)}| \le C|S\_{\mathcal{Y}}(z)|^2 \beta\_n(z) D\_n^{\frac{q-1}{q}}.\tag{37}$$

6.4.2. Estimation of *<sup>D</sup>*(23) *<sup>n</sup>*

We noted that:

$$\begin{split} \|h\_{\gamma}(|\Lambda\_{n}|,\upsilon) - h\_{\gamma}(|\Lambda\_{n}^{(j)}|,\upsilon)\|\|R\_{jj}|\,\mathbb{I}\{\mathcal{B}\} \\ \leq \frac{\mathsf{C}}{a\_{\mathfrak{n}}(z)}|\Lambda\_{\mathfrak{n}} - \Lambda\_{\mathfrak{n}}^{(j)}|\,\mathbb{I}\{\max\{|\Lambda\_{\mathfrak{n}}|\_{\mathsf{r}}|\Lambda\_{\mathfrak{n}}^{(j)}|\} \leq 2\gamma a\_{\mathfrak{n}}(z)\}\,\mathbb{I}\{\mathsf{B}\}. \end{split}$$

Using Hölder's inequality and Cauchy's inequality, we obtained:

$$D\_n^{(23)} \leq \frac{\mathbb{C}|S\_{\mathcal{Y}}(z)|}{a\_n(z)} \frac{1}{n} \sum\_{j=1}^n \mathbb{E}^{\frac{1}{q}} \left\{ \left[ \mathbb{E}\_{\boldsymbol{\mathcal{Y}}} |\hat{\boldsymbol{\varepsilon}}\_{\boldsymbol{j}}|^2 \mathbb{I}\{\boldsymbol{\mathcal{Q}}\} \mathbb{I}(\boldsymbol{\mathcal{B}}) \right]^{\frac{q}{2}} \left[ \mathbb{E}\_{\boldsymbol{\mathcal{Y}}} |\Lambda\_n - \Lambda\_n^{(\boldsymbol{j})}|^2 \mathbb{I}\{\boldsymbol{\mathcal{Q}}\} \mathbb{I}(\boldsymbol{\mathcal{B}}) \right]^{\frac{q}{2}} \right\} D\_n^{\frac{q-1}{q}} \dots$$

By applying Lemmas 2, 3 and 5, we obtained:

$$D\_n^{(23)} \le C|S\_{\mathcal{Y}}(z)|a\_n^{-1}(z)\beta\_n^{\frac{1}{2}}(z)\frac{1}{n}\sum\_{j=1}^n \mathbb{E}^{\frac{1}{q}}\left[\mathbb{E}\_j\left|\Lambda\_n - \Lambda\_n^{(j)}\right|^2\mathbb{I}\{\mathcal{Q}\}\mathbb{I}(\mathcal{B})\right]^{\frac{q}{2}}D\_n^{\frac{q-1}{q}}.$$

6.4.3. Estimation of *<sup>D</sup>*(24) *<sup>n</sup>*

Using Taylor's formula, we obtained:

$$D\_n^{(24)} = \frac{1}{n} \sum\_{j=1}^n \mathbb{E} \widehat{\varepsilon}\_j \mathbb{R}\_{jj} h\_\gamma(|\Lambda\_n|, v) (\widehat{T}\_n - \widehat{T}\_n^{(j)}) \, q'(\widehat{T}\_n^{(j)} + \tau(\widehat{T}\_n - \widehat{T}\_n^{(j)})) \, \mathbb{I}\{\mathcal{B}\}\_n$$

where *τ* is uniformly distributed across the interval [0, 1] and the random variables are independent from each other. Since I{B} = 1 yields |*Rjj*| ≤ *C*|*Sy*(*z*)|, we found that:

$$\mathbb{E}\left|D\_{n}^{(24)}\right| \leq \frac{\mathbb{C}|\mathcal{S}\_{\mathcal{Y}}(z)|}{n} \sum\_{j=1}^{n} \mathbb{E}\left|\widehat{\varepsilon}\_{j}|h\_{\mathcal{Y}}(|\Lambda\_{\mathfrak{n}}|,\boldsymbol{\upsilon})\right|\widehat{T}\_{\mathfrak{n}} - \widehat{T}\_{n}^{(j)}||\boldsymbol{\varrho}^{\prime}(\widehat{T}\_{n}^{(j)} + \boldsymbol{\tau}(\widehat{T}\_{\mathfrak{n}} - \widehat{T}\_{n}^{(j)})) \, \|\mathbbm{1}\{\mathcal{B}\}\boldsymbol{.}$$

Taking into account the inequality:

$$|\langle \boldsymbol{\varrho}'(\hat{T}\_n^{(j)} + \boldsymbol{\tau}(\hat{T}\_n - \hat{T}\_n^{(j)})) \rangle| \le \mathbb{C}q \left[ |\hat{T}\_n^{(j)}|^{q-2} + q^{q-2} |\hat{T}\_n - \hat{T}\_n^{(j)}|^{q-2} \right],$$

we obtained:

$$\begin{split} |D\_{n}^{(24)}| &\leq \frac{\mathsf{C}q|S\_{\mathcal{Y}}(z)|}{n} \sum\_{j=1}^{n} \mathbb{E}\left|\widehat{\varepsilon}\_{j}|h\_{\mathcal{I}}(|\Lambda\_{n}|\_{\prime}\boldsymbol{\upsilon})\right|\widehat{T}\_{n} - \widehat{T}\_{n}^{(j)}| |\widehat{T}\_{n}^{(j)}|^{q-2} \, \mathbb{E}\{\mathcal{B}\} \\ &+ \frac{\mathsf{C}q^{q-1}|S\_{\mathcal{Y}}(z)|}{n} \sum\_{j=1}^{n} \mathbb{E}\left|\widehat{\varepsilon}\_{j}|h\_{\mathcal{I}}(|\Lambda\_{n}|\_{\prime}\boldsymbol{\upsilon})|\widehat{T}\_{n} - \widehat{T}\_{n}^{(j)}|^{q-1} \, \mathbb{E}\{\mathcal{B}\} =: \widehat{D}\_{n}^{(24)} + \widehat{D}\_{n}^{(24)}. \end{split}$$

By applying Hölder's inequality, we obtained:

$$\hat{D}\_n^{(24)} \le \frac{\mathbb{C}q|S\_y(z)|}{n} \sum\_{j=1}^n \mathbb{E}^{\frac{2}{q}} \left[ \mathbb{E}\_{\hat{f}} \{ |\hat{\varepsilon}\_{\hat{f}}| h\_{\hat{\mathcal{I}}}(|\Lambda\_n|, \boldsymbol{\upsilon}) |\widehat{T}\_n - \widehat{T}\_n^{(j)}| \, \mathbb{I}\{ \mathcal{B} \} \} \right]^{\frac{q}{2}} \mathbb{E}^{\frac{q-2}{q}} |\widehat{T}\_n^{(j)}|^q.$$

Jensen's inequality produced:

$$\|\hat{D}\_n^{(24)}\| \le \frac{\mathbb{C}q|S\_{\mathcal{Y}}(z)|}{n} \sum\_{j=1}^n \mathbb{E}^{\frac{2}{q}} \left[ \mathbb{E}\_{\hat{f}} \{ |\hat{\varepsilon}\_{\hat{f}}| h\_{\gamma}(|\Lambda\_n|, \nu) | \hat{T}\_n - \hat{T}\_n^{(j)}| \, \mathbb{I}\{ \mathcal{B} \} \} \right]^{\frac{q}{2}} D\_n^{\frac{q-2}{q}} \dots$$

To estimate *<sup>D</sup>*(24) *<sup>n</sup>* , we had to obtain the bounds for:

$$\boldsymbol{V}\_{\boldsymbol{j}}^{\frac{q}{2}} := \mathbb{E}\left[\, \mathbb{E}\_{\boldsymbol{j}}\left\{ \left| \widehat{\boldsymbol{\varepsilon}}\_{\boldsymbol{j}} \middle| h\_{\boldsymbol{\gamma}} (|\boldsymbol{\Delta}\_{\boldsymbol{n}}|, \boldsymbol{\upsilon}) \middle| \widehat{T}\_{\boldsymbol{n}} - \widehat{T}\_{\boldsymbol{n}}^{(j)} \middle| \boldsymbol{\Pi} \{ \boldsymbol{\mathcal{B}} \} \right\} \right]^{\frac{q}{2}}.$$

Using Cauchy's inequality, we obtained:

$$V\_j^{\frac{q}{2}} \le \mathbb{E}(V\_j^{(1)})^{\frac{q}{4}} (V\_j^{(2)})^{\frac{q}{4}} \le \mathbb{E}^{\frac{1}{2}} (V\_j^{(1)})^{\frac{q}{2}} \mathbb{E}^{\frac{1}{2}} (V\_j^{(2)})^{\frac{q}{2}} \tag{38}$$

where

$$\begin{aligned} V\_j^{(1)} &:= \mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^2 \mathbbm{1} \{ \hat{\mathcal{Q}}\_{2\gamma}(v) \} \, \mathbb{I} \{ \mathcal{B} \}, \\ V\_{\hat{\jmath}}^{(2)} &:= \mathbb{E}\_{\hat{\jmath}} |\hat{T}\_n - \hat{T}\_n^{(j)}|^2 h\_{\gamma}^2 (|\Lambda\_n|\_{\prime} v) \, \mathbb{I} \{ \mathcal{B} \}. \end{aligned}$$

6.4.4. Estimation of *V*(1) *j*

Lemma 2 produced:

$$\mathbb{E}\_j \left| \varepsilon\_{j2} \right|^2 \mathbb{I} \{ \hat{\mathcal{Q}}\_{2\gamma}(v) \} \, \mathbb{I} \{ \mathcal{B} \} \le \frac{C |A\_0(z)|^2}{np} \, \mathbb{I}$$

and, in turn, Lemma 3 produced:

$$\|\mathbb{E}\_j\left|\varepsilon\_{j3}\right|^2 \mathbb{I}\{\widehat{Q}\_{2\gamma}(v)\}\,\mathbb{I}\{\mathcal{B}\} \le \frac{\mathbb{C}}{n\upsilon} a\_{\hbar}(z).$$

By summing the obtained estimates, we arrived at the following inequality:

$$V\_j^{(1)} \le \frac{Ca\_n(z)}{n\upsilon} + \frac{CA\_0^2(z)}{np} = \beta\_n(z). \tag{39}$$

6.4.5. Estimation of *V*(2) *j*

> We considered *<sup>T</sup><sup>n</sup>* <sup>−</sup> *<sup>T</sup>*(*j*) *<sup>n</sup>* . Since *<sup>T</sup><sup>n</sup>* <sup>=</sup> *Tnhγ*(|Λ*n*|, *<sup>v</sup>*) and *<sup>T</sup>*(*j*) *<sup>n</sup>* <sup>=</sup> <sup>E</sup>*<sup>j</sup> <sup>T</sup>n*, we obtained:

$$\begin{split} \widehat{T}\_{n} - \widehat{T}\_{n}^{(j)} &= (T\_{n} - T\_{n}^{(j)}) h\_{\gamma}(|\Lambda\_{n}|, \boldsymbol{\upsilon}) \\ &+ T\_{n}^{(j)} \left( \left[ h\_{\gamma}(|\Lambda\_{n}|, \boldsymbol{\upsilon}) - h\_{\gamma}(|\Lambda\_{n}^{(j)}|, \boldsymbol{\upsilon}) \right] - \mathbb{E}\_{\boldsymbol{\not}} T\_{n} \left[ h\_{\gamma}(|\Lambda\_{n}|, \boldsymbol{\upsilon}) - h\_{\gamma}(|\Lambda\_{n}^{(j)}|, \boldsymbol{\upsilon}) \right] \right) . \end{split}$$

Further, we noted that:

$$T\_n = \Lambda\_n b\_n = \Lambda\_n b(z) + y \Lambda\_n^2.$$

$$T\_n^{(j)} z = \Lambda\_n^{(j)} b(z) + y \mathbb{E}\_j \Lambda\_n^2.$$

Then:

$$\begin{split} T\_n - T\_n^{(j)} &= (\Lambda\_n - \Lambda\_n^{(j)}) (b(z) + 2y \Lambda\_n^{(j)}) \\ &+ y (\Lambda\_n - \Lambda\_n^{(j)})^2 - y \operatorname{\mathbb{E}}\_j (\Lambda\_n - \Lambda\_n^{(j)})^2. \end{split} \tag{40}$$

We obtained:

$$\begin{split} \hat{T}\_{n} - \hat{T}\_{n}^{(j)} &= (b(z) + 2y\Lambda\_{n}^{(j)}) \left[ (\Lambda\_{n} - \Lambda\_{n}^{(j)}) h\_{\gamma}(|\Lambda\_{n}|, \upsilon) \right] \\ &+ y \left[ (\Lambda\_{n} - \Lambda\_{n}^{(j)})^{2} - \mathbb{E}\_{\bar{f}}(\Lambda\_{n} - \Lambda\_{n}^{(j)})^{2} \right] h\_{\gamma}(|\Lambda\_{n}|, \upsilon) \\ &+ T\_{n}^{(j)} \left[ \left( h\_{\gamma}(|\Lambda\_{n}|, \upsilon) - h\_{\gamma}(|\Lambda\_{n}^{(j)}|, \upsilon) - \mathbb{E}\_{\bar{f}}(h\_{\gamma}(|\Lambda\_{n}|, \upsilon) - h\_{\gamma}(|\Lambda\_{n}^{(j)}|, \upsilon)) \right) \right]. \end{split} \tag{41}$$

Then, we returned to the estimation of *V*(2) *<sup>j</sup>* . Equality (41) implied:

*V*(2) *<sup>j</sup>* ≤ 4|*b*(*z*)| <sup>2</sup> <sup>E</sup>*<sup>j</sup>* <sup>|</sup>Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* | <sup>2</sup>*h*<sup>4</sup> *<sup>γ</sup>*(|Λ*n*|, *v*)I{B} <sup>+</sup> <sup>8</sup>*y*<sup>2</sup> <sup>E</sup>*<sup>j</sup>* <sup>|</sup>Λ(*j*) *<sup>n</sup>* | <sup>2</sup>|Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* | <sup>2</sup>*h*<sup>4</sup> *<sup>γ</sup>*(|Λ*n*|, *v*)I{B} + 4*y*<sup>2</sup> <sup>E</sup>*<sup>j</sup>* <sup>|</sup>Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* | <sup>4</sup>*h*<sup>4</sup> *<sup>γ</sup>*(|Λ*n*|, *v*) I{B} + 4*y*<sup>2</sup> <sup>E</sup>*j*(Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* )2*hγ*(|Λ*n*|, *v*) 2 E*<sup>j</sup> h*<sup>2</sup> *<sup>γ</sup>*(|Λ*n*|, *v*)I{B} + <sup>4</sup>|*T*(*j*) *<sup>n</sup>* | <sup>2</sup> E*<sup>j</sup> <sup>h</sup>γ*(|Λ*n*|, *<sup>v</sup>*) <sup>−</sup> *<sup>h</sup>γ*(|Λ(*j*) *<sup>n</sup>* |, *v*) 2 *h*2 *<sup>γ</sup>*(|Λ*n*|, *v*)I{B} + <sup>4</sup>|*T*(*j*) *<sup>n</sup>* | 2 E*j <sup>h</sup>γ*(|Λ*n*|, *<sup>v</sup>*) <sup>−</sup> *<sup>h</sup>γ*(|Λ(*j*) *<sup>n</sup>* |, *v*) <sup>2</sup> E*<sup>j</sup> h*<sup>2</sup> *<sup>γ</sup>*(|Λ*n*|, *v*)I{B}.

We could rewrite this as:

$$V\_j^{(2)} \le A\_1 + A\_2 + A\_3 + A\_4$$

$$\begin{split} A\_{1} &= \mathbb{C} |b(z)|^{2} \mathbb{E}\_{j} |\Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)}|^{2} h\_{\gamma}^{4}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) \mathbb{I} \{\mathcal{B}\}, \\ A\_{2} &= \mathbb{C} \mathbb{E}\_{j} |\Lambda\_{\boldsymbol{n}}^{(j)}|^{2} |\Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)}|^{2} h\_{\gamma}^{4}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) \mathbb{I} \{\mathcal{B}\}, \\ A\_{3} &= \mathbb{C} \mathbb{E}\_{j} |\Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)}|^{4} h\_{\gamma}^{2}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) \left( h\_{\gamma}^{2}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) + \mathbb{E}\_{j} h\_{\gamma}^{2}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) \right) \mathbb{I} \{\mathcal{B}\}, \\ A\_{4} &= \mathbb{C} |T\_{n}^{(j)}|^{2} \mathbb{E}\_{j} \Big{|} h\_{\gamma}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) - h\_{\gamma}(|\Lambda\_{\boldsymbol{n}}^{(j)}|, \boldsymbol{\nu}) \Big{|}^{2} \Big{(} h\_{\gamma}^{2}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) + \mathbb{E}\_{j} h\_{\gamma}^{2}(|\Lambda\_{\boldsymbol{n}}|, \boldsymbol{\nu}) \Big{]} \mathbb{I} \{\mathcal{B}\}. \end{split}$$

First, we found that:

$$A\_1 \le C|b(z)|^2 \mathbb{E}\_j \left| \Lambda\_n - \Lambda\_n^{(j)} \right|^2 h\_\gamma^4(|\Lambda\_n|, \upsilon) \mathbb{I}\{\mathcal{B}\} .$$

and

$$A\_2 \le \mathbb{C}a\_n^2(z) \to\_j |\Lambda\_n - \Lambda\_n^{(j)}|^2 h\_\gamma^4(|\Lambda\_n|\_\prime v) \mathbb{I}\{\mathcal{B}\}.$$

We noted that:

$$A\_3 \le \frac{C}{n^2 \upsilon^2} \mathbb{E}\_{\boldsymbol{\jmath}} |\Lambda\_n - \Lambda\_n^{(\boldsymbol{\jmath})}|^2 h\_{\boldsymbol{\jmath}}^2 (|\Lambda\_n|\_{\boldsymbol{\nu}} \upsilon) \mathbb{I}\{\mathcal{B}\} .$$

It was straightforward to see that:

$$\mathbb{E}\left||T\_n^{(j)}|^2(h\_\gamma^2(|\Lambda\_n(z)|,v) + \mathbb{E}\_j h\_\gamma^2(|\Lambda\_n(z)|,v))\right| \le C(|b(z)|^2 a\_n^2(z) + a\_n^4(z) + \frac{1}{n^4 v^4}).$$

This bound implied that:

$$A\_4 \le C(|b(z)^2 a\_n^2(z) + a\_n^4(z) + \frac{1}{n^4 v^4}) \mathbb{E}\_{\boldsymbol{\gamma}} \Big| h\_{\boldsymbol{\gamma}}(|\Lambda\_n|, v) - h\_{\boldsymbol{\gamma}}(|\Lambda\_n^{(j)}|, v) \Big|^2 \mathbb{I}\{\mathcal{B}\} .$$

Further, since:

$$\left| h\_{\varGamma}(|\Lambda\_{\mathfrak{n}}|, \upsilon) - h\_{\varGamma}(|\Lambda\_{\mathfrak{n}}^{(j)}|, \upsilon) \right| \leq \frac{\mathsf{C}}{\gamma a\_{\mathfrak{n}}(z)} |\Lambda\_{\mathfrak{n}} - \Lambda\_{\mathfrak{n}}^{(j)}| \mathbb{E} \{ \max \{ |\Lambda\_{\mathfrak{n}}|, |\Lambda\_{\mathfrak{n}}^{(j)}| \} \leq (1 + \gamma) a\_{\mathfrak{n}}(z) \} .$$

we could write:

$$A\_4 \le \mathbb{C}(|b(z)|^2 + a\_n^2(z)) \mathbb{E}\_j |\Lambda\_n - \Lambda\_n^{(j)}|^2 \mathbb{I}\{\max\{ |\Lambda\_n|, |\Lambda\_n^{(j)}| \} \le \mathbb{C}a\_n(z) \} \mathbb{I}\{\mathcal{B}\} \,.$$

By combining the estimates that were obtained for *A*1,..., *A*4, we concluded that:

$$\mathbb{P}\_{\hat{\jmath}}V\_{\hat{\jmath}}^{(2)} \le \mathbb{C}(a\_n^2(z) + |b(z)|^2) \,\mathbb{E}\_{\hat{\jmath}} |\Lambda\_n - \Lambda\_n^{(j)}|^2 \mathbb{I}\{\max\{|\Lambda\_n|, |\Lambda\_n^{(j)}|\} \le \mathbb{C}a\_n(z)\}\mathbb{I}\{\mathcal{B}\}\,\mathrm{d}\mu$$

Inequalities (38) and (39) implied the bounds:

$$\begin{split} V\_j^{\frac{q}{4}} \le \mathbb{C}^q \beta\_n^{\frac{q}{4}}(z) (a\_n^2(z) + |b(z)|^2)^{\frac{q}{4}} \\ \qquad \times \mathbb{E} \left( \mathbb{E}\_j |\Lambda\_n - \Lambda\_n^{(j)}|^2 \mathbb{I} \{ \max \{ |\Lambda\_n|, |\Lambda\_n^{(j)}| \} \le \mathbb{C} a\_n(z) \} \mathbb{I} \{ B \} \right)^{\frac{q}{4}}. \end{split} \tag{42}$$

We noted that:

$$\widehat{D}\_n^{(24)} \le \mathbb{C}q \| S\_{\mathcal{Y}}(z) \| \left( \frac{1}{n} \sum\_{j=1}^n V\_j \right) D\_n^{\frac{q-2}{q}} \dots$$

Then, Inequality (42) yielded:

$$\begin{split} \widehat{D}^{(24)}\_{n} &\leq \mathsf{C}q|S\_{\mathcal{Y}}(z)|\beta^{\frac{1}{n}}\_{n}(z)(a\_{n}^{2}(z)+|b(z)|^{2})^{\frac{1}{2}} \\ &\quad \times \frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}^{\frac{2}{q}}\left(\mathbb{E}\_{j}\,|\,\Lambda\_{n}-\Lambda\_{n}^{(j)}|^{2}\mathbb{E}\{\max\{|\Lambda\_{n}|\_{\prime}\,|\,\Lambda\_{n}^{(j)}|\}\leq\mathsf{C}a\_{n}(z)\}\mathbb{I}\{\mathcal{B}\}\right)^{\frac{q}{2}}D\_{n}^{\frac{q-2}{q}}. \end{split}$$

We rewrote this as:

$$
\hat{D}\_n^{(24)} \le L\_1 D\_n^{\frac{q-2}{q}} \,\,\,\,\tag{43}
$$

where

$$\begin{split} L\_1 &= \mathbb{C}q|S\_{\mathcal{Y}}(z)|\boldsymbol{\beta}\_n^{\frac{1}{2}}(z)(\boldsymbol{a}\_n^2(z) + |\boldsymbol{b}(z)|^2)^{\frac{1}{2}} \\ &\quad \times \frac{1}{n} \sum\_{j=1}^n \mathbb{E}^{\frac{2}{q}} \left( \mathbb{E}\_j |\Lambda\_n - \Lambda\_n^{(j)}|^2 \mathbb{I} \{ \max \{ |\Lambda\_n|\_{\boldsymbol{\prime}} |\Lambda\_n^{(j)}| \} \le \mathbb{C}a\_n(z) \} \mathbb{I} \{ \mathcal{B} \} \right)^{\frac{4}{q}}. \end{split}$$

6.4.6. Estimation of *<sup>D</sup>*(24) *<sup>n</sup>*

We recalled that:

$$\tilde{D}\_n^{(24)} = \frac{C^q q^{q-1}}{n} |S\_{\mathcal{Y}}(z)| \sum\_{j=1}^n \mathbb{E} \left| \hat{\varepsilon}\_j \right| |\hat{T}\_n - \hat{T}\_n^{(j)}|^{q-1} h\_{\mathcal{I}}(|\Lambda\_n|, v) \mathbb{I}\{\mathcal{B}\} .$$

Using Inequalities (40) and (41) and *an*(*z*) ≥ *<sup>C</sup> nv* , we obtained:

<sup>|</sup>*T<sup>n</sup>* <sup>−</sup> *<sup>T</sup>*(*j*) *<sup>n</sup>* | ≤ |*b*(*z*)| + |*an*(*z*)| + *C an*(*z*) |*T*(*j*) *<sup>n</sup>* | <sup>|</sup>Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* <sup>|</sup>I{max{|Λ*n*|, <sup>|</sup>Λ(*j*) *<sup>n</sup>* |} ≤ *Can*(*z*)}.

By applying:

$$|T\_n^{(j)}| \mathbb{1} \{ |\Lambda\_n^{(j)}(z)| \le \mathsf{C} a\_n(z) \} \le \mathsf{C} (a\_n^2(z) + |b(z)| a\_n(z)),$$

we obtained:

<sup>|</sup>*T<sup>n</sup>* <sup>−</sup> *<sup>T</sup>*(*j*) *<sup>n</sup>* | ≤*C*(|*b*(*z*)<sup>|</sup> <sup>+</sup> *an*(*z*))|Λ*<sup>n</sup>* <sup>−</sup> <sup>Λ</sup>(*j*) *<sup>n</sup>* <sup>|</sup>I{max{|Λ*n*|, <sup>|</sup>Λ(*j*) *<sup>n</sup>* |} ≤ *Can*(*z*)}.

The last inequality produced:

$$\begin{split} \tilde{D}^{(24)}\_{n} &\leq \frac{C^{q}q^{q-1}(a\_{n}(z)+|b(z)|)^{q-1}}{n}|\mathcal{S}\_{y}(z)|\sum\_{j=1}^{n}\mathbb{E}^{\frac{1}{q}}\left(\mathbb{E}\_{j}|\boldsymbol{\varepsilon}\_{j}|^{2}h\_{\gamma}(|\boldsymbol{\lambda}\_{n}|,v)\mathbb{I}\{\mathcal{B}\}\right)^{\frac{q}{2}} \\ &\times\mathbb{E}^{\frac{q-1}{q}}\left(\mathbb{E}\_{j}|\boldsymbol{\Lambda}\_{n}-\boldsymbol{\Lambda}\_{n}^{(j)}|^{2q}\mathbb{I}\{\mathcal{B}\}\right)^{\frac{1}{2}} \\ &\leq C^{q}q^{q}|\mathcal{S}\_{y}(z)|\rho\_{n}^{\frac{1}{2}}(z)(a\_{n}(z)+|b(z)|)^{q-1}\frac{1}{n}\sum\_{j=1}^{n}\left(\mathbb{E}\left(\mathbb{E}\_{j}|\boldsymbol{\Lambda}\_{n}-\boldsymbol{\Lambda}\_{n}^{(j)}|^{2q}\mathbb{I}\{\mathcal{Q}\}\mathbb{I}\{\mathcal{B}\}\right)\right)^{\frac{q-1}{2q}} .\end{split}$$

.

We put:

$$\mathcal{R}\_n(q) := \frac{1}{n} \sum\_{j=1}^n \mathbb{E}\left( |\mathbb{E}\_j| \Lambda\_n - \Lambda\_n^{(j)}|^2 \mathbb{I}\{\mathcal{B}\} \mathbb{I}\{\mathcal{Q}\} \right)^{\frac{2}{n}}$$

and

$$\mathcal{U}\_{\mathbb{H}}(q) := \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \left| \Lambda\_{\mathbb{H}} - \Lambda\_{\mathbb{m}}^{(j)} \right|^{2q} \mathbb{I}\{\mathcal{B}\} \mathbb{I}\{\mathcal{Q}\}.$$

By applying Lemma 5, we obtained:

$$R\_n(q) \le C^q \frac{|S\_y(z)|^q a\_n^{\frac{q}{2}}(z)}{(n\upsilon)^q} \left( |S\_y(z)|^q |A\_0(z)|^{\frac{q}{2}} \beta\_n^{\frac{q}{2}}(z) + \frac{|A\_0(z)|^{\frac{q}{2}}}{(n\upsilon)^{\frac{q}{2}}} + \frac{1}{(n\upsilon)^{\frac{q}{2}}} \right).$$

Finally, using Lemma 6, we obtained:

*U q*−1 2*q <sup>n</sup>* (*q*) ≤ *Cqqq*<sup>−</sup><sup>1</sup> *an*(*z*) *nv q*−<sup>1</sup> |*Sy*(*z*)| 2(*q*−1) |*A*0(*z*)| (*np*)2<sup>κ</sup> *q*−<sup>1</sup> + *C<sup>q</sup> an*(*z*) *nv q*−<sup>1</sup> |*Sy*(*z*)| 2(*q*−1) *β q*−1 2 *<sup>n</sup>* (*z*) + *Cq*−1*q q*−1 2 |*Sy*(*z*)|*an*(*z*) *nv <sup>q</sup>*−<sup>1</sup> <sup>2</sup> |*Sy*(*z*)||*A*0(*z*)| *nvnp <sup>q</sup>*−<sup>1</sup> 2 + *Cq*−1*qq*−<sup>1</sup> |*Sy*(*z*) *nv q*−1|*A*0(*z*)<sup>|</sup> (*np*)2<sup>κ</sup> (*q*−1) + *Cqqq*<sup>−</sup><sup>1</sup> |*Sy*(*z*)| *nv q*−1 *an*(*z*) *nv <sup>q</sup>*−<sup>1</sup> 2 + *Cq*−1*q* 3(*q*−1) 2 *an*(*z*)|*Sy*(*z*)| *nv <sup>q</sup>*−<sup>1</sup> <sup>2</sup> |*A*0(*z*)||*Sy*(*z*)| (*np*)2<sup>κ</sup> *<sup>q</sup>*−<sup>1</sup> <sup>2</sup> 1 *nvq*−<sup>1</sup> <sup>+</sup> *<sup>C</sup>q*−1*q*2(*q*−1) <sup>|</sup>*A*0(*z*)<sup>|</sup> *<sup>q</sup>*−1|*Sy*(*z*)| *q*−1 (*nv*)*q*−1(*np*)2κ(*q*−1) .

Using:

$$|S\_y(z)| |A\_0(z)| \le 1 + 2\sqrt{y}\_\prime$$

we could write:

$$\begin{split} \mathcal{U}\_{n}^{\frac{q-1}{2}}(q) &\leq \mathcal{C}^{q-1} q^{q-1} \left(\frac{|S\_{y}(z)|a\_{n}(z)}{nv}\right)^{q-1} \frac{1}{(np)^{2\varepsilon(q-1)}} \\ &\quad + \mathcal{C}^{q-1} \left(\frac{|S\_{y}(z)|a\_{n}(z)}{nv}\right)^{q-1} |S\_{y}(z)|^{q-1} \beta\_{n}^{\frac{q-1}{2}}(z) \\ &\quad + \mathcal{C}^{q-1} q^{\frac{q-1}{2}} \left(\frac{|S\_{y}(z)|^{\frac{1}{2}} a\_{n}^{\frac{q}{2}}(z)}{nv}\right)^{q-1} \left(\frac{1}{np}\right)^{\frac{q-1}{2}} + \mathcal{C}^{q} q^{q-1} \left(\frac{1}{nv}\right)^{q-1} \left(\frac{1}{np}\right)^{2\varepsilon(q-1)} \\ &\quad + \mathcal{C}^{q-1} q^{q-1} \frac{|S\_{y}(z)|^{\frac{q-1}{2}}}{(nv)^{q-1}} \left(\frac{a\_{n}(z)|S\_{y}(z)|}{(nv)}\right)^{\frac{q-1}{2}} \\ &\quad + \mathcal{C}^{q} q^{\frac{3(q-1)}{2}} \frac{1}{rt^{q-1}v^{q-1}} \left(\frac{|S\_{y}(z)|a\_{n}(z)}{nv}\right)^{\frac{q-1}{2}} \frac{1}{(np)^{(q-1)\times}} \\ &\quad + \mathcal{C}^{q-1} q^{2(q-1)} \frac{1}{(nv)^{q-1}(np)^{2\varepsilon(q-1)}}. \end{split}$$

By combining Inequalities (29), (31), (32), (33), (36), (37) and (43) and applying Young's inequality, we obtained the proof.

#### *6.5. The Proof of Theorem 4*

**Proof.** We considered the case *z* ∈ D, where

$$\mathcal{D} = \{ z = u + iv : (1 - \sqrt{y} - v)\_+ \le |u| \le 1 + \sqrt{y} + v, \; V \ge v \ge v\_0 = n^{-1} \log^4 n\}.$$

For *z*, we obtained:

$$|2V + (1 + \sqrt{y}) \ge |z| \ge \frac{1}{\sqrt{2}}(1 - \sqrt{y}).$$

This implied that the constant *C*<sup>1</sup> exists, depending on *V*, *y*, such that:

$$|b(z)| \le C\_1.$$

First, we considered the case |*b*(*z*)| ≥ Γ*n*. Without a loss of generality, we assumed that *C*<sup>0</sup> ≥ *C*1, where *C*<sup>0</sup> is the constant in the definition of *an*(*z*). This meant that *an*(*z*) = Im *b*(*z*) + *C*0Γ*n*. Furthermore:

$$|b\_n(z)|\mathbb{I}\{\mathcal{Q}\} \ge (1 - 2\gamma)|b(z)|\mathbb{I}\{\mathcal{Q}\}$$

and

$$\left|\Lambda\_{\mathfrak{n}}(z)\right| \mathbb{I}\{\mathfrak{Q}\} \leq \mathcal{C} \frac{\left|T\_{\mathfrak{n}}\right|}{\left|b(z)\right|}.$$

Using Theorem 3, we obtained:

$$\mathbb{E}\left|\Lambda\_{\mathfrak{n}}(z)\right|^{q}\mathbb{I}\{\mathcal{Q}\} \leq C^{q} \frac{q^{q}\left(F\_{1} + \dots + F\_{6}\right)}{|b(z)|^{q}}.$$

We let:

$$d(z) = \frac{\operatorname{Im} b(z) \vee \frac{1}{\overline{\operatorname{uv}}}}{|b(z)|}.$$

*The analysis of Fi*/|*b*(*z*)| *<sup>q</sup> for i* = 1, . . . , 6. • The bound of *F*1/|*b*(*z*)| *<sup>q</sup>*. By the definition of *an*(*z*) and *F*1, we obtained:

$$F\_1 / |b(z)|^q \le \mathcal{C}^q \left(\frac{d(z)}{n\upsilon} + \frac{1}{n p |b(z)|}\right)^q.$$

• The bound of *F*2/|*b*(*z*)| *<sup>q</sup>*. By the definition of *F*2, we obtained:

$$|F\_2/|b(z)|^q \le C^q |S\_y(z)|^{2q} \left(\frac{d(z)}{(nv)} + \frac{1}{(np|b(z)|)}\right)^q.$$

For this, we used |*Sy*(*z*)||*A*0(*z*)| = |1 + *zSy*(*z*)| ≤ *C*.

• The bound for *F*3/|*b*(*z*)| *<sup>q</sup>*. By the definition of *F*3, we obtained:

$$|F\_{5}/|b(z)|^{q} \leq \left(\frac{|S\_{y}(z)|^{\frac{3q}{2}}a\_{n}^{\frac{q}{2}}(z)}{(nv)^{q}} + \frac{|S\_{y}(z)|^{\frac{q}{2}}}{(nv)^{\frac{q}{2}}(np)^{\frac{q}{2}}} + \frac{|S\_{y}(z)|^{q}}{(nv)^{q}}\right)\left(\frac{1}{(np)|b(z)|} + \frac{d(z)}{nv}\right)^{q}.$$

• The bound of *F*4/|*b*(*z*)| *<sup>q</sup>*. Simple calculations showed that:

$$|F\_4(z)/|b(z)|^q \le \left(\frac{|S\_y(z)|^{\frac{3q}{2}}}{(n\upsilon)^q a\_n^{\frac{q}{2}}(z)} + \frac{|S\_y(z)|^{\frac{q}{2}}}{a\_n^{\frac{q}{2}}(z)(n\upsilon)^{\frac{q}{2}}} + \frac{|S\_y(z)|^q}{(a\_n n\upsilon)^{\frac{q}{2}}}\right) \left(\frac{1}{(n\upsilon)|b(z)|} + \frac{d(z)}{n\upsilon}\right)^q.$$

• The bound of *F*5/|*b*(*z*)| *<sup>q</sup>*. We noted that:

$$|(a\_n(z) + |b(z)|) / |b(z)| \le C.$$

From there and from the definition of *F*5, it followed that:

$$\begin{split} F\_5(z) / |b(z)|^q &\leq C^q q^{\frac{q}{2}} \left( \left( \frac{d(z)}{n\upsilon} + \frac{1}{(np)|b(z)|} \right)^{\frac{3q}{4}} \left( \frac{1}{n\upsilon} \right)^{\frac{q}{4}} \\ &+ \left( \frac{d(z)}{n\upsilon} + \frac{1}{(np)|b(z)|} \right)^{\frac{q}{2}} \left( \frac{|S\_y(z)|}{n\upsilon} \right)^{\frac{q}{2}} \right). \end{split}$$

• The bound of *F*6/|*b*(*z*)| *<sup>q</sup>*. Simple calculations showed that:

*F*6/|*b*(*z*)| *<sup>q</sup>* <sup>≤</sup> *<sup>C</sup>qq*<sup>2</sup>(*q*−1) (*np*)2κ(*q*−1) *β* 1 2 *<sup>n</sup>* (*z*) |*b*(*z*)| *d*(*z*) *nv* <sup>+</sup> 1 *np*|*b*(*z*)| *<sup>q</sup>*−<sup>1</sup> + *Cqq*<sup>2</sup>(*q*−1) *β* 1 2 *<sup>n</sup>* (*z*)|*b*(*z*)| −1 *d*(*z*) *nv* <sup>+</sup> 1 *np*|*b*(*z*)| <sup>3</sup>(*q*−1) 2 + *Cqq* 5(*q*−1) 2 (*np*) *q*−1 2 *β* 1 2 *<sup>n</sup>* (*z*)|*b*(*z*)| −1 *d*(*z*) *nv* <sup>+</sup> 1 *np*|*b*(*z*)| (*q*−1) <sup>2</sup> 1 (*nv*) *q*−1 2 + *Cqq*<sup>3</sup>*<sup>q</sup>* (*np*)2κ(*q*−1) 1 (*nv*)*q*−<sup>1</sup> <sup>+</sup> *<sup>q</sup><sup>q</sup>* <sup>|</sup>*Sy*(*z*)<sup>|</sup> *q*−1 2 (*nv*)*q*−<sup>1</sup> *β* 1 2 *<sup>n</sup>* (*z*) |*b*(*z*)| *d*(*z*) *nv* <sup>+</sup> 1 *np*|*b*(*z*)| *<sup>q</sup>*−<sup>1</sup> 2 + *Cqq*<sup>3</sup>*<sup>q</sup>* (*nv*) *q*−1 2 *β* 1 2 *<sup>n</sup>* (*z*) |*b*(*z*)| *d*(*z*) *nv* <sup>+</sup> 1 *np*|*b*(*z*)| *<sup>q</sup>*−<sup>1</sup> <sup>2</sup> 1 (*np*)κ(*q*−1) + *Cqq*<sup>4</sup>(*q*−1) (*np*)2κ(*q*−1) 1 (*nv*)*q*−<sup>1</sup> *β* 1 2 *<sup>n</sup>* (*z*) |*b*(*z*)| .

We defined:

$$d\_n(z) := \frac{d(z)}{nv} + \frac{1}{(np)|b(z)|}.$$

By combining all of these estimations and using:

$$|d\_{\mathfrak{m}}(z)|b(z)| \ge \frac{1}{np'} $$

we obtained:

$$\mathbb{E}\{\Gamma\_n \le |b(z)|\} \to |\Lambda\_n|^q \mathbb{I}\{\mathcal{Q}\} \le C^q q^q (q^{\frac{q}{2}} (n\upsilon)^{-\frac{q}{2}} d\_n^{\frac{q}{2}}(z) + d\_n^q(z)).$$

For *z* ∈ D (such that Γ*<sup>n</sup>* ≤ |*b*(*z*)|), we could write:

$$\mathbb{E}|\Lambda\_n(z)|^q \mathbb{I}\{\mathcal{Q}\} \le \mathbb{C}^q q^q (q^{\frac{q}{2}}(n\upsilon)^{-\frac{q}{2}}d\_n^{\frac{q}{2}}(z) + d\_n^q(z)) \le \delta^q \Gamma\_n^q.$$

Then, we considered |*b*(*z*)| ≤ Γ*n*. In this case, we used the inequality:

$$|\Lambda\_n| \le \sqrt{|T\_n|}.$$

In what follows, we assumed that *q* ∼ log *n*. *The bound of* E |*Tn*| *<sup>q</sup> for* |*b*(*z*)| ≤ Γ*n*.

• By the definition of *an*(*z*), we obtained:

$$\frac{a\_n(z)}{m\upsilon} = \frac{\Gamma\_n}{m\upsilon}.$$

We could obtain from this that, for sufficiently small *δ* > 0 values:

$$F\_1 \le C^q \Gamma\_n^q / (nv)^q \le \delta^q \Gamma\_n^{2q}.$$

• We noted that Γ*<sup>n</sup>* ≥ Im *b*(*z*) ≥ Im *A*0(*z*). This immediately implied that:

$$C^q q^q F\_2 \le \delta^q \Gamma\_n^{2q}.$$

• We noted that for Im *b*(*z*) ≤ |*b*(*z*)| ≤ Γ*n*, we obtained:

$$\min\{\frac{1}{np|b(z)|}, \frac{1}{\sqrt{np}}\} = \frac{1}{\sqrt{np}}$$

and

$$\frac{1}{np} \le \delta \Gamma\_n^2 / \log^2 n.$$

From there, it followed that:

$$C^{\emptyset}q^{\emptyset} \le \delta^{\emptyset}\Gamma\_{n}^{2\emptyset}.$$

• Simple calculations showed that:

$$C^q q^q F\_4 \le \delta^q \Gamma\_n^{2q}.$$

• Simple calculation showed that:

$$C^q q^q F\_5 \le C^q \Gamma\_n^{4q} \le \delta^q \Gamma\_n^{2q}.$$

• It was straightforward to check that:

$$C^q q^q F\_6 \le C^q \Gamma\_n^{3q} \le \delta^q \Gamma\_n^{2q}.$$

By applying the Markov inequality for Γ*<sup>n</sup>* ≤ Im *b*(*z*) ≤ *C*, we obtained:

$$\Pr\{ |\Lambda\_n| > \mathcal{K}d\_n(z)\log n; \mathcal{Q} \} \le Cn^{-q}.$$

On the other hand, when Im *b*(*z*) ≤ Γ*n*, we used the inequality:

$$|\Lambda\_{\mathfrak{n}}| \le C|T\_{\mathfrak{n}}|^{\frac{1}{2}}.$$

By applying the Markov inequality, we obtained:

$$\Pr\{\left|\Lambda\_{\mathfrak{u}}(z)\right| \le 2\delta\Gamma\_{\mathfrak{u}}; \mathcal{Q}\} \le Cn^{-Q}.$$

This implied that:

$$\Pr\{ |\Lambda\_n(v)| \le \frac{1}{2}\Gamma\_n \colon \mathcal{Q} \} \le \mathcal{C} n^{-Q} \text{.}$$

We noted that Q = Q(*v*) for *V* ≥ *v* ≥ *v*<sup>0</sup> and that for *V* ≥ *v* ≥ *v*0:

$$a\_n(z) \ge \frac{\mathbb{C}\log^2 n}{n}.$$

On the other hand:

$$\sup\_{u} |\Lambda\_n(v) - \Lambda\_n(v')| \le \frac{|v - v'|}{v\_0^2} \le n^2 |v - v'| = n^2 \Delta v.$$

We chose Δ*v*, such that:

$$\sup\_{u} |\Lambda\_{\mathfrak{n}}(v) - \Lambda\_{\mathfrak{n}}(v')| \le \frac{1}{2} \Gamma\_{n}.$$

It was enough to put Δ*v* := *n*<sup>−</sup>4. We let *K* := *V*−*v*<sup>0</sup> Δ*v* . For *ν* = 0, ... , *K* − 1, we defined: *v<sup>ν</sup>* = *v*<sup>0</sup> + *ν*Δ*v*,

and *vK* = *V*. We noted that *v*<sup>0</sup> < *v*<sup>1</sup> < ··· > *vK* = *V* and that:

$$\sup\_{\boldsymbol{\mu}} |\Lambda\_{\boldsymbol{n}}(\boldsymbol{v}\_{\boldsymbol{\nu}+1} - \Lambda\_{\boldsymbol{n}}(\boldsymbol{v}\_{\boldsymbol{\nu}})| \leq \frac{1}{2}\Gamma\_{\boldsymbol{n}}.$$

We started with *vK* = *V*. We noted that:

$$\Pr\{\mathcal{Q}(V)\} = 1.$$

This implied that:

$$\Pr\{ |\Lambda\_\mathfrak{n}(v\_\mathcal{K})| \le \frac{1}{2}\Gamma\_m \} \le Cn^{-Q}.$$

From there, it followed that:

$$\Pr\{\mathcal{Q}(v\_{K-1})\} \le Cn^{-Q\_{\cdot,\cdot}}$$

By repeating this procedure and using the union bound, we obtained the proof. Thus, Theorem 4 was proven.

#### **7. Auxiliary Lemmas**

**Lemma 1.** *Under the conditions of Theorem, for j* ∈ J*<sup>c</sup> and l* ∈ K*c, we have:*

$$\max\left\{|\varepsilon\_{j1}^{(\mathbb{J},\mathbb{K})}|,|\varepsilon\_{l+n,1}^{(\mathbb{J},\mathbb{K})}|\right\} \le \frac{C}{nv}.$$

**Proof.** For simplicity, we only considered the case J = ∅ and K = ∅. We noted that:

$$\begin{split} \varepsilon\_{j1} &= \frac{1}{2m} \Big( \left( \operatorname{Tr} \mathbf{R} - \frac{m-n}{z} \right) - \left( \operatorname{Tr} \mathbf{R}^{(j)} - \frac{m-n-1}{z} \right) \Big) \\ &= \frac{1}{2m} \Big( \operatorname{Tr} \mathbf{R} - \operatorname{Tr} \mathbf{R}^{(j)} \Big) - \frac{1}{2mz} . \end{split}$$

By applying Schur's formula, we obtained:

$$|\mathfrak{e}\_{jl}| \le \frac{1}{n\upsilon}.$$

The second inequality was proven in a similar way.

**Lemma 2.** *Under the conditions of Theorem 5, for all j* ∈ J*c, the following inequalities are valid:*

$$|\mathbb{E}\_j \left| \varepsilon\_{j2}^{(\mathbb{J}, \mathbb{K})} \right|^2 \le \frac{\mu\_4}{np} \frac{1}{n} \sum\_{l=1}^m \left| R\_{l+n, l+n}^{(\mathbb{J} \cup \{j\}, \mathbb{K})} \right|^2$$

*and*

$$\mathbb{E}\_{l+n} \left| \varepsilon\_{l+n,2}^{(\mathbb{J},\mathbb{K})} \right|^2 \leq \frac{\mu\_4}{np} \frac{1}{n} \sum\_{j=1}^n \left| R\_{jj}^{(\mathbb{J},\mathbb{K} \cup \{l\})} \right|^2.$$

*In addition, for q* > 2*, we have:*

$$\mathbb{E}\_{\boldsymbol{\mathbb{P}}}|\boldsymbol{\varepsilon}\_{j2}^{(\mathbb{J},\mathbb{K})}|^{q} \leq \mathbb{C}^{q} \left(\frac{q^{\frac{q}{2}}}{(np)^{\frac{q}{2}}} \left(\frac{1}{n} \sum\_{l=1}^{m} \left|\boldsymbol{R}\_{l+n,l+n}^{(\mathbb{J}\cup\{j\},\mathbb{K})}\right|^{2}\right)^{\frac{q}{2}} + \frac{q^{q}}{(np)^{2q\kappa+1}} \frac{1}{n} \sum\_{l=1}^{m} \left|\boldsymbol{R}\_{l+n,l+n}^{(\mathbb{J}\cup\{j\},\mathbb{K})}\right|^{q}\right)$$

*and for l* ∈ K*c, we have:*

$$\mathbb{E}\_{l+n} |\boldsymbol{\varepsilon}\_{l+n,2}^{(\mathbf{J},\mathbf{K})}|^{q} \leq \mathbb{C}^{q} \left( \frac{q^{\frac{q}{2}}}{(np)^{\frac{q}{2}}} \left( \frac{1}{n} \sum\_{j=1}^{n} \left| \boldsymbol{R}\_{jj}^{(\mathbf{J},\mathbf{K}\cup\{l\})} \right|^{2} \right)^{\frac{q}{2}} + \frac{q^{q}}{(np)^{2q\varkappa+1}} \frac{1}{n} \sum\_{j=1}^{n} \left| \boldsymbol{R}\_{jj}^{(\mathbf{J},\mathbf{K}\cup\{l\})} \right|^{q} \right).$$

**Proof.** For simplicity, we only considered the case J = ∅ and K = ∅. The first two inequalities were obvious. We only considered *q* > 2. By applying Rosenthal's inequality, for *q* > 2, we obtained:

$$\begin{split} \mathbb{E}\_{\bar{f}} \left| \varepsilon\_{\bar{f}} \right|^{q} &= \frac{1}{(mp)^{q}} \mathbb{E}\_{\bar{f}} \left| \sum\_{l=1}^{m} (X\_{jl}^{2} \tilde{\xi}\_{jl} - p) R\_{l+n,l+n}^{(j)} \right|^{q} \\ &\leq \frac{C^{q}}{(mp)^{q}} \left[ q^{\frac{q}{2}} \left( \sum\_{l=1}^{m} \mathbb{E}\_{\bar{f}} \left| X\_{jl}^{2} \tilde{\xi}\_{jl} - p \right|^{2} | R\_{l+n,l+n}^{(j)}|^{2} \right)^{\frac{q}{2}} \\ &\qquad + q^{q} \sum\_{l=1}^{m} \mathbb{E}\_{\bar{f}} \left| X\_{jl}^{2} \tilde{\xi}\_{jl} - p \right|^{q} | R\_{l+n,l+n}^{(j)} |^{q} \right] \\ &\leq \frac{C^{q}}{(mp)^{\frac{q}{2}}} \left[ (q\mu\_{4})^{\frac{q}{2}} \left( \frac{1}{m} \sum\_{l=1}^{m} | R\_{l+n,l+n}^{(j)}|^{2} \right)^{\frac{q}{2}} \\ &\qquad + \frac{mq^{q}}{(mp)^{\frac{q}{2}}} \bar{\mu}\_{2q} \frac{1}{m} \sum\_{l=1}^{m} | R\_{l+n,l+n}^{(j)} |^{q} \right]. \end{split}$$

We recalled that:

$$
\overline{\mu}\_r = \mathbb{E} \left| X\_{jk} \overline{\xi}\_{jk} \right|^r
$$

and under the conditions of the theorem:

$$
\tilde{\mu}\_{2q} \le C^q p(\mathfrak{n}p)^{q-2q\varkappa-2} \mu\_{4+\delta}.
$$

By substituting the last inequality into Inequality (44), we obtained:

$$\mathbb{E}\_{\bar{\mathcal{V}}}|\varepsilon\_{\bar{\mathcal{V}}}|^{q} \leq C^{q} \Big[ \frac{q^{\frac{q}{2}}}{(mp)^{\frac{q}{2}}} \Big( \frac{1}{m} \sum\_{l=1}^{m} |R\_{l+n,l+n}^{(j)}|^{2} \Big)^{\frac{q}{2}} + \frac{q^{q}}{(mp)^{\frac{2q}{2}q\varkappa+1}} \frac{1}{m} \sum\_{l=1}^{m} |R\_{l+n,l+n}^{(j)}|^{q} \Big].$$

The second inequality could be proven similarly.

**Lemma 3.** *Under the conditions of the theorem, for all j* ∈ TJ*, the following inequalities are valid:*

$$\mathbb{E}\_j |\varepsilon\_{j3}^{(\mathbb{J},\mathbb{K})}|^2 \le \frac{C \sum\_{l,k=1}^m |\mathcal{R}\_{l+n,k+n}^{(\mathbb{J}\cup\{j\},\mathbb{K})}(z)|^2}{n^2}$$

*and*

$$\mathbb{E}\_{l+n} \left| \mathfrak{e}\_{l+n,3}^{(\mathbb{J},\mathbb{K})} \right|^2 \leq \frac{C \sum\_{i,k=1}^n |\mathcal{R}\_{i,k}^{(\mathbb{J},\mathbb{K} \cup \{l\})}(z)|^2}{n^2}.$$

*In addition, for q* > 2*, we have:*

$$\begin{split} \mathbb{E}\_{\boldsymbol{\mathbb{P}}}|\boldsymbol{\varepsilon}\_{j3}^{(\mathbb{J},\mathbb{K})}|^{q} &\leq \mathcal{C}^{q} \Big( q^{q}(nv)^{-\frac{q}{2}} \Big( \operatorname{Im} s\_{n}^{(j)}(z) - \operatorname{Im} \left\{ \frac{1-y}{z} \right\} \Big)^{\frac{q}{2}} \\ &\quad + q^{\frac{3q}{2}}(nv)^{-\frac{q}{2}}(np)^{-q\varkappa-1} \frac{1}{n} \sum\_{l=1}^{m} (\operatorname{Im} R\_{I+n,l+n}^{(\mathbb{J}\cup\{j\},\mathbb{K})})^{\frac{q}{2}} \\ &\quad + q^{2q}(np)^{-2q\varkappa} \frac{1}{n^{2}} \sum\_{l=1}^{m} \sum\_{k=1}^{m} |R\_{I+n,k+n}^{(\mathbb{J}\cup\{j\},\mathbb{K})}|^{q} \Big) \end{split}$$

*and for l* ∈ T<sup>1</sup> <sup>K</sup>*, we have:*

$$\begin{split} \mathbb{E}\_{l+n}|\boldsymbol{\varepsilon}\_{l+n,3}^{(\mathbb{J},\mathbb{K})}|^{q} &\leq \mathsf{C}^{q} \Big( q^{q}(n\upsilon)^{-\frac{q}{2}} \Big( \mathrm{Im}\,\boldsymbol{s}\_{n}^{(l)}(\boldsymbol{z}) \Big)^{\frac{q}{2}} \\ &\quad + q^{\frac{3q}{2}}(n\upsilon)^{-\frac{q}{2}}(n\boldsymbol{p})^{-q\varkappa-1} \frac{1}{n} \sum\_{j=1}^{n} (\mathrm{Im}\,\boldsymbol{R}\_{jj}^{(\mathbb{J},\mathbb{K}\cup\{l+n\})})^{q} \\ &\quad + q^{2q}(n\boldsymbol{p})^{-2q\varkappa}n^{-2} \sum\_{j=1}^{n} \sum\_{k=1}^{n} |\boldsymbol{R}\_{kj}^{(\mathbb{J},\mathbb{K}\cup\{l+n\})}|^{q} \Big). \end{split}$$

**Proof.** It sufficed to apply the inequality from Corollary 1 of [16].

We recalled the notation:

$$
\beta\_n(z) = \frac{a\_n(z)}{n\upsilon} + \frac{|A\_0(z)|^2}{n\upsilon}.
$$

**Lemma 4.** *Under the conditions of the theorem, the following bounds are valid:*

$$\left| \mathbb{E}\_{\dot{\jmath}} \left| R\_{\dot{\jmath} \dot{\jmath}} - \mathbb{E}\_{\dot{\jmath}} R\_{\dot{\jmath} \dot{\jmath}} \right|^{2} \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} \right| \leq C |\mathbb{S}\_{\dot{\jmath}}(z)|^{4} \beta\_{n}(z) \tag{45}$$

*and*

$$|\mathbb{E}\_{\dot{\jmath}}|R\_{\dot{\jmath}\dot{\jmath}} - \mathbb{E}\_{\dot{\jmath}}R\_{\dot{\jmath}\dot{\jmath}}|^{q} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} \le \mathbb{C}^{q}|S\_{\dot{\jmath}}(z)|^{2q} q^{q} \left( q^{q} \left( \frac{|A\_{0}(z)|}{(np)^{2\varkappa}} \right)^{q} + \beta\_{n}^{\frac{q}{2}}(z) \right). \tag{46}$$

**Proof.** We considered the equality:

$$R\_{\vec{\jmath}\vec{\jmath}} = -\frac{1}{z - \frac{1-\underline{y}}{z} + y s\_n^{(\underline{\jmath})}(z)} \left(1 + \widehat{\varepsilon}\_{\vec{\jmath}} R\_{\vec{\jmath}\vec{\jmath}}\right).$$

It implied that:

$$R\_{\vec{j}\vec{j}} - \mathbb{E}\_{\vec{j}} R\_{\vec{j}\vec{j}} = -\frac{1}{z - \frac{1-y}{z} + ys\_n^{(\vec{j})}(z)} \left(\hat{\varepsilon}\_{\vec{j}} R\_{\vec{j}\vec{j}} - \mathbb{E}\_{\vec{j}} \hat{\varepsilon}\_{\vec{j}} R\_{\vec{j}\vec{j}}\right). \tag{47}$$

Further, we noted that for a sufficiently small *γ* value, a constant *H* existed, such that:

$$\left| \frac{1}{z - \frac{1 - y}{z} + y s\_n^{(j)}(z)} \right| \mathbb{I} \{ \mathcal{Q} \} \le H | \mathbb{S}\_{\mathcal{Y}}(z) | \mathbb{I} \{ \mathcal{Q} \} \,. $$

Hence:

$$\begin{split} \mathbb{E}\_{\hat{\mathsf{I}}} \left| \mathcal{R}\_{\hat{\mathsf{J}}\hat{\mathsf{I}}} - \mathbb{E}\_{\hat{\mathsf{I}}} \mathcal{R}\_{\hat{\mathsf{J}}\hat{\mathsf{I}}} \right|^{2} \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} &\leq H^{2} |\mathcal{S}\_{\mathcal{Y}}(\boldsymbol{z})|^{2} \Big( \mathbb{E}\_{\hat{\mathsf{I}}} |\hat{\mathsf{e}}\_{\hat{\mathsf{I}}}|^{2} |\mathcal{R}\_{\hat{\mathsf{J}}\hat{\mathsf{I}}}|^{2} \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} \\ &+ \mathbb{E}\_{\hat{\mathsf{I}}} \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} \, \mathbb{E}\_{\hat{\mathsf{I}}} |\hat{\mathsf{e}}\_{\hat{\mathsf{I}}}|^{2} |\mathcal{R}\_{\hat{\mathsf{J}}}|^{2} \Big). \end{split}$$

It was easy to see that:

$$\begin{split} \left| \mathbb{E}\_{\hat{\boldsymbol{\beta}}} |\hat{\boldsymbol{\varepsilon}}\_{\hat{\boldsymbol{\beta}}}|^{2} |\mathcal{R}\_{\hat{\boldsymbol{\beta}}}|^{2} \mathbb{E} \{ \mathcal{Q} \} \mathbb{I} \{ \mathcal{B} \} \right| &\leq C |\mathcal{S}\_{\mathcal{Y}}(\boldsymbol{z})|^{2} \mathbb{E}\_{\hat{\boldsymbol{\beta}}} |\hat{\boldsymbol{\varepsilon}}\_{\hat{\boldsymbol{\beta}}}|^{2} \mathbb{I} \{ \mathcal{Q} \} \mathbb{I} \{ \mathcal{B} \} \\ &\leq C |\mathcal{S}\_{\mathcal{Y}}(\boldsymbol{z})|^{2} \left( \frac{a\_{n}(\boldsymbol{z})}{n\upsilon} + \frac{|A\_{0}(\boldsymbol{z})|^{2}}{n\eta} \right). \end{split}$$

We introduced the events:

$$\mathcal{Q}^{(j)} = \left\{ |\Lambda\_n^{(j)}| \le 2\gamma a\_n(z) + \frac{1}{n\upsilon} \right\}.$$

It was obvious that:

$$\mathbb{I}\{\mathcal{Q}\} \le \mathbb{I}\{\mathcal{Q}\} \mathbb{I}\{\mathcal{Q}^{(j)}\}.$$

Consequently:

$$\mathbb{E}\_{\hat{\jmath}} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} \, \mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^{2} |\mathcal{R}\_{\hat{\jmath}\hat{\jmath}}|^{2} \leq \mathbb{E}\_{\hat{\jmath}} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} \, \mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^{2} |\mathcal{R}\_{\hat{\jmath}\hat{\jmath}}|^{2} \, \mathbb{I}\{\mathcal{Q}^{(j)}\} \,.$$

Further, we considered <sup>Q</sup> <sup>=</sup> {|Λ*n*| ≤ <sup>2</sup>*γan*(*z*)}. We obtained:

$$\mathbb{1}\{\mathcal{Q}^{(j)}\} \le \mathbb{1}\{\tilde{\mathcal{Q}}\}.$$

Then, it followed that:

$$\mathbb{E}\_{\bar{\jmath}} \mathbb{E}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} \, \mathbb{E}\_{\bar{\jmath}} |\hat{\varepsilon}\_{\bar{\jmath}}|^{2} |\mathcal{R}\_{\bar{\jmath}\bar{\jmath}}|^{2} \leq \mathbb{E}\_{\bar{\jmath}} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} \, \mathbb{E}\_{\bar{\jmath}} |\hat{\varepsilon}\_{\bar{\jmath}}|^{2} |\mathcal{R}\_{\bar{\jmath}\bar{\jmath}}|^{2} \, \mathbb{I}\{\tilde{\mathcal{Q}}\} .$$

Next, the following inequality held:

$$\mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^2 |R\_{\hat{\jmath}\hat{\jmath}}|^2 \mathbb{I}\{\tilde{\mathcal{Q}}\} \le \mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^2 |R\_{\hat{\jmath}\hat{\jmath}}|^2 \mathbb{I}\{\tilde{\mathcal{Q}}\} \mathbb{I}\{\tilde{\mathcal{B}}\} + \mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^2 |R\_{\hat{\jmath}\hat{\jmath}}|^2 \mathbb{I}\{\tilde{\mathcal{Q}}\} \mathbb{I}\{\tilde{\mathcal{B}}^c\}.\tag{48}$$

Under the condition *C*<sup>0</sup> and the inequality |*Rjj*| ≤ *v*−<sup>1</sup> <sup>0</sup> , we obtained the bounds:

$$\mathbb{E}\_{\hat{\jmath}} |\hat{\varepsilon}\_{\hat{\jmath}}|^2 |\mathcal{R}\_{\hat{\jmath}\hat{\jmath}}|^2 \mathbb{I}\{\hat{\mathcal{Q}}\} \mathbb{I}\{\hat{\mathcal{B}}^c\} \leq C n^{-\varepsilon \log n}.$$

By applying Lemmas 2 and 3, for the first term on the right side of (48), we obtained:

$$\mathbb{E}\_{\vec{\mathcal{V}}} |\widehat{\varepsilon}\_{\vec{\mathcal{V}}}|^2 |\mathcal{R}\_{\vec{\mathcal{V}}}|^2 \mathbb{E}\{\vec{\mathcal{Q}}\} \mathbb{I}\{\vec{\mathcal{B}}\} \le \mathbb{C} |\mathcal{S}\_{\vec{\mathcal{V}}}(z)|^2 \left(\frac{a\_n(z)}{n\upsilon} + \frac{|A\_0(z)|^2}{n\eta}\right).$$

This completed the proof of Inequality (45). Furthermore, by using representation (47), we obtained:

$$\begin{split} \mathbb{E}\_{\complement} |R\_{\widtie j} - \mathbb{E}\_{\complement} R\_{\widtie j}|^{q} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} &\leq \mathcal{C}^{q} |S\_{\mathcal{Y}}(z)|^{q} \, \mathbb{E} |\hat{\varepsilon}\_{\not{j}}|^{q} |R\_{\not{j}}|^{q} \mathbb{I}\{\mathcal{Q}\} \mathbb{I}\mathcal{B} \\ &\leq \mathcal{C}^{q} |S\_{\mathcal{Y}}(z)|^{2q} \, \mathbb{E}\_{\not{j}} |\hat{\varepsilon}\_{\not{j}}|^{q} |\mathbb{I}\{\mathcal{Q}\} \mathbb{I}\mathcal{B} \, \end{split}$$

By applying Lemmas 2 and 3, we obtained:

$$\begin{split} \mathbb{E}\_{\bar{\jmath}} |R\_{jj} - \mathbb{E}\_{\bar{\jmath}} R\_{jj}|^{q} \mathbb{I}\{\mathcal{Q}\} \, \mathbb{I}\{\mathcal{B}\} &\leq C^{q} |S\_{y}(z)|^{2q} \Big( \left( \frac{q|A\_{0}(z)|^{2}}{np} \right)^{\frac{q}{2}} + \left( \frac{q|A\_{0}(z)|}{(np)^{2\varkappa}} \right)^{q} \\ &+ \left( \frac{q^{2} a\_{\imath}(z)}{n\upsilon} \right)^{\frac{q}{2}} + \left( \frac{q^{3}|A\_{0}(z)|}{n\upsilon(np)^{2\varkappa}} \right)^{\frac{q}{2}} + \left( \frac{q^{2}|A\_{0}(z)|}{(np)^{2\varkappa}} \right)^{q} .\end{split}$$

By applying Young's inequality, we obtained the required proof. Thus, the lemma was proven.

**Lemma 5.** *Under the conditions of the theorem, we have:*

$$\begin{split} \mathbb{E}\_{\mathcal{I}} |\Lambda\_{n} - \Lambda\_{n}^{(j)}|^{2} \mathbb{E} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} \, \leq & \mathcal{C} \frac{|S\_{\mathcal{Y}}(z)|^{4} |A\_{0}(z)| a\_{n}(z)}{(n\upsilon)^{2}} \beta\_{n} + \mathcal{C} \frac{|S\_{\mathcal{Y}}(z)|^{2} |A\_{0}(z)| a\_{n}(z)}{(n\upsilon)^{2} n p} \\ &+ \mathcal{C} \frac{|S\_{\mathcal{Y}}(z)|^{2} a\_{n}(z)}{(n\upsilon)^{3}}. \end{split}$$

**Proof.** We set <sup>Λ</sup>(*j*) *<sup>n</sup>* = *s* (*j*) *<sup>n</sup>* (*z*) − *Sy*(*z*). Using Schur's complement formula:

$$
\Lambda\_{\mathfrak{n}} - \widehat{\Lambda}\_{\mathfrak{n}}^{(j)} = \frac{1}{2n} (1 + \frac{1}{np} \sum\_{l,k=1}^{m} X\_{jl} X\_{jk} \xi\_{jl} \mathfrak{F}\_{jk} [\mathbb{R}^{(j)}]^2 \mathbb{I}\_{k+n,l+n}) \mathbb{R}\_{jj}.
$$

Since <sup>Λ</sup>(*j*) *<sup>n</sup>* was measurable with respect to M(*j*), we could write:

$$
\Lambda\_n - \Lambda\_n^{(j)} = (\Lambda\_n - \widehat{\Lambda}\_n^{(j)}) - \mathbb{E}\_j\{\Lambda\_n - \widehat{\Lambda}\_n^{(j)}\}.
$$

We introduced the notation:

$$\begin{aligned} \eta\_{j1} &= \frac{1}{np} \sum\_{l=1}^{m} (X\_{jl}^2 \mathfrak{f}\_{jl} - p) \left[\mathbf{R}^{(j)}\right]\_{l+n, l+n'} \\ \eta\_{j2} &= \frac{1}{np} \sum\_{l=1}^{m} \sum\_{k=1, k \neq l}^{m} X\_{jl} X\_{jk} \mathfrak{f}\_{jl} \mathfrak{f}\_{jk} \left[\mathbf{R}^{(j)}\right]\_{k+n, l+n'} \end{aligned}$$

In this notation:

$$\begin{split} \Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)} &= \frac{1}{n} \left( 1 + \frac{1}{n} \sum\_{l=1}^{m} [\mathbf{R}^{(j)}]\_{\boldsymbol{I}+\boldsymbol{n},\boldsymbol{I}+\boldsymbol{n}} \right) (\mathsf{R}\_{\boldsymbol{j}\boldsymbol{j}} - \mathsf{E}\_{\boldsymbol{j}} \mathsf{R}\_{\boldsymbol{j}\boldsymbol{j}}) \\ &+ \frac{1}{n} (\eta\_{\boldsymbol{j}1} + \eta\_{\boldsymbol{j}2}) \mathsf{R}\_{\boldsymbol{j}\boldsymbol{j}} - \frac{1}{n} \mathsf{E}\_{\boldsymbol{j}} (\eta\_{\boldsymbol{j}1} + \eta\_{\boldsymbol{j}2}) \mathsf{R}\_{\boldsymbol{j}\boldsymbol{j}}. \end{split}$$

We noted that:

$$\mathbb{E}\_{\bar{\jmath}} \left| \eta\_{\bar{\jmath}1} \right|^2 \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} \le \frac{C}{n^2 p} \sum\_{l=1}^m \left| \left[ \mathbf{R}^{(j)^2} \right]\_{l+n, l+n} \right|^2 \mathbb{I} \{ \mathcal{Q}^{(j)} \} \, \mathbb{I} \{ \mathcal{B}^{(j)} \} \, .$$

Since:

$$\left| \left[ \mathbf{R}^{(j)^2} \right]\_{l+n,l+n} \right| \le \sum\_{k=1}^{m} \left| \mathbf{R}^{(j)} \right|\_{l+n,k+n} \left| \right|^2 \le \frac{C}{\upsilon} \operatorname{Im} R^{(j)}\_{l+n,l+n'} $$

Theorem 5 produced:

$$\mathbb{E}\_{\mathbb{P}}|\eta\_{j1}|^{2}\mathbb{E}\{\mathcal{Q}\}\,\mathbb{E}\{\mathcal{B}\} \le \frac{\mathbb{C}}{npv^{2}}\frac{1}{n}\sum\_{l=1}^{m}\left(\operatorname{Im}R^{(j)}\_{l+n,l+n}\right)^{2}\mathbb{E}\{\mathcal{Q}^{(j)}\}\,\mathbb{E}\{\mathcal{B}^{(j)}\} \le \frac{\mathbb{C}|A\_{0}(z)|a\_{n}(z)}{npv^{2}}.$$

Similarly, for the moment of *ηj*2, we obtained the following estimate:

$$\begin{split} \mathbb{E}\_{\boldsymbol{\jmath}} |\eta\_{j2}|^{2} \, \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} &\leq \frac{\mathsf{C}}{n^{2}} \sum\_{l,k=1}^{m} \left| [\![\mathbf{R}^{(j)}]^{2} \!]\_{l+n,k+n} \right|^{2} \, \mathbb{I} \{ \![\mathcal{Q}^{(j)}] \} \, \mathbb{I} \{ \![\mathcal{B}^{(j)}] \} \\ &\leq \frac{\mathsf{C}}{n^{2}} \, \mathrm{Tr} \, |\mathsf{R}^{(j)}|^{4} \mathbb{I} \{ \![\mathcal{Q}^{(j)}] \} \, \mathbb{I} \{ \![\mathcal{B}^{(j)}] \} \leq \frac{\mathsf{C}}{n r^{3}} a\_{\mathcal{U}}(z) . \end{split}$$

From the above estimates and Lemma 4, we concluded that:

$$\begin{split} & \mathbb{E}\_{\boldsymbol{\jmath}} |\Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)}|^{2} \mathbb{I} \{ \mathcal{Q} \} \mathbb{I} \{ \mathcal{B} \} \\ & \leq C \frac{|A\_{0}(\boldsymbol{z})| a\_{\boldsymbol{n}}(\boldsymbol{z})}{(n\boldsymbol{v})^{2}} \left( \frac{|\mathcal{S}\_{\boldsymbol{y}}(\boldsymbol{z})|^{2}}{np} + \mathbb{E}\_{\boldsymbol{\jmath}} |R\_{\boldsymbol{j}\boldsymbol{j}} - \mathbb{E}\_{\boldsymbol{\jmath}} R\_{\boldsymbol{j}\boldsymbol{j}}|^{2} \right) \mathbb{I} \{ \mathcal{Q} \} \mathbb{I} \{ \mathcal{B} \} + \frac{C |\mathcal{S}\_{\boldsymbol{y}}(\boldsymbol{z})|^{2}}{(n\boldsymbol{v})^{2}} \frac{a\_{\boldsymbol{n}}(\boldsymbol{z})}{n\boldsymbol{v}}. \end{split}$$

Thus, the lemma was proven.

**Lemma 6.** *Under the conditions of the theorem, for* 2 ≤ *q* ≤ *c* log *n, we have:*

$$\begin{split} &\mathbb{E}\_{\mathbb{P}}|\Lambda\_{n}-\Lambda\_{n}^{(\tilde{f})}|^{q}\mathbb{E}\{\mathcal{G}\}\|\,\mathbb{E}\{\mathcal{B}\} \\ &\leq C^{q}|\mathcal{S}\_{\mathcal{Y}}(z)|^{2q}\frac{a\_{n}^{q}(z)}{(nv)^{q}}\Big{(}q^{q}\Big{(}\frac{|A\_{0}(z)|}{(nv)^{2\varkappa}}\Big{)}^{q}+\rho\_{n}^{\frac{q}{2}}(z)\Big{)}+\frac{\mathbb{C}q^{\frac{2}{2}}|\mathcal{S}\_{\mathcal{Y}}(z)|^{q}}{(nv)^{q}(np)^{\frac{q}{2}}}|A\_{0}(z)|^{\frac{q}{2}}a\_{n}^{\frac{q}{2}}(z) \\ &+\frac{\mathbb{C}q^{q}|\mathcal{S}\_{\mathcal{Y}}(z)|^{q}}{(nv)^{q}(np)^{2q\varkappa+1}}|A\_{0}(z)|^{q}+\frac{\mathbb{C}^{q}q^{q}|\mathcal{S}\_{\mathcal{Y}}(z)|^{q}}{(nv)^{\frac{3q}{2}}}a\_{n}^{\frac{q}{2}}(z)+\frac{\mathbb{C}^{q}q^{\frac{3q}{2}}|\mathcal{S}\_{\mathcal{Y}}(z)|^{q}}{(nv)^{\frac{3q}{2}}(np)^{q\varkappa+1}}|A\_{0}(z)|^{\frac{q}{2}}a\_{n}^{\frac{q}{2}}(z) \\ &+\frac{\mathbb{C}|\mathcal{S}\_{\mathcal{Y}}(z)|^{q}q^{2q}}{(np)^{2q\varkappa+2}n^{q}v^{q}}|A\_{0}(z)|^{q}. \end{split}$$

**Proof.** We used the representation:

$$\begin{split} \Lambda\_{\boldsymbol{n}} - \Lambda\_{\boldsymbol{n}}^{(j)} &= \frac{1}{n} \left( 1 + \frac{1}{n} \sum\_{l=1}^{m} [\mathbf{R}^{(j)}]\_{l+n,l+n} \right) (\mathsf{R}\_{jj} - \mathbb{E}\_{j} \mathsf{R}\_{jj}) \\ &+ \frac{1}{n} (\eta\_{j1} + \eta\_{j2}) \mathsf{R}\_{jj} - \frac{1}{n} \mathbb{E}\_{j} (\eta\_{j1} + \eta\_{j2}) \mathsf{R}\_{jj}. \end{split}$$

We noted that by using Rosenthal's inequality:

$$\mathbb{E}\_{\mathbb{P}} |\eta\_{j1}|^q \mathbb{I}\{\mathcal{Q}\} \mathbb{I}\{\mathcal{B}\} \le \frac{\mathbb{C}q^{\frac{q}{2}} |A\_0(z)|^{\frac{q}{2}} a\_n^{\frac{q}{2}}(z)}{v^q n^{\frac{q}{2}} p^{\frac{q}{2}}} + \frac{\mathbb{C}q^q |A\_0(z)|^q}{v^q (np)^{2q\nu+1}}.$$

Similarly, for the second moment of *ηj*2, we obtained the following estimate:

$$\mathbb{E}\_{\mathbb{P}} |\eta\_{j2}|^q \mathbb{I}\{\mathcal{Q}\} \mathbb{I}\{\mathcal{B}\} \le \frac{\mathsf{C}^q \eta^q}{n^{\frac{q}{2}} \upsilon^{\frac{3q}{2}}} a\_n^{\frac{q}{2}}(z) + \frac{\mathsf{C}^q \eta^{\frac{3q}{2}}}{n^{\frac{q}{2}} \upsilon^{\frac{3q}{2}} (np)^{q\varkappa+1}} |A\_0(z)|^{\frac{q}{2}} a\_n^{\frac{q}{2}}(z) + \frac{\mathsf{C}^q \eta^{2q} |A\_0(z)|^q}{(np)^{2q\varkappa+2} \upsilon^q}.$$

From the estimates above and Lemma 4, we concluded that:

$$\begin{split} \mathbb{E}\_{\bar{f}} \, |\, \Lambda\_{n} - \Lambda\_{n}^{(\bar{f})} |^{q} \mathbb{I} \{ \mathcal{Q} \} \\ \leq & \mathbb{C}^{q} \frac{a\_{n}^{q}(z)}{(n\upsilon)^{q}} \mathbb{E}\_{\bar{f}} |\, \mathcal{R}\_{\bar{f}\bar{\imath}} - \mathbb{E}\_{\bar{\jmath}} \, R\_{\bar{\jmath}\bar{\jmath}} |^{q} \mathbb{I} \{ \mathcal{Q} \} \, \mathbb{I} \{ \mathcal{B} \} + \frac{\mathbb{C}^{q} q^{\frac{q}{2}} a\_{n}^{\frac{q}{2}}(z) |A\_{0}(z)|^{\frac{q}{2}} |S\_{y}(z)|^{q}}{(n\upsilon)^{q} (n\upsilon)^{\frac{q}{2}}} \\ & + \frac{\mathbb{C}^{q} q^{q} |S\_{y}(z)|^{q} A\_{0}(z)|^{q}}{(n\upsilon)^{q} (n\upsilon)^{2q\upsilon + 1}} \\ + \frac{\mathbb{C}^{q} q^{q} a\_{n}^{\frac{q}{2}}(z) |S\_{y}(z)|^{q}}{(n\upsilon)^{\frac{3q}{2}}} + \frac{\mathbb{C}^{q} q^{\frac{3q}{2}} |S\_{y}(z)|^{q} |A\_{0}(z)|^{\frac{q}{2}} a\_{n}^{\frac{q}{2}}(z)}{(n\upsilon)^{\frac{3q}{2}} (n\upsilon)^{q\upsilon + 1}} + \frac{\mathbb{C} |S\_{y}(z)|^{q} q^{2q} |A\_{0}(z)|^{q}}{(n\upsilon)^{2q\upsilon + 2}}. \end{split}$$

To finish the proof, we applied Lemma (45) and Inequality (46). Thus, the lemma was proven.

**Lemma 7.** *For* <sup>1</sup> <sup>−</sup> <sup>√</sup>*<sup>y</sup>* <sup>−</sup> *<sup>v</sup>* ≤ |*u*| ≤ <sup>1</sup> <sup>+</sup> <sup>√</sup>*<sup>y</sup>* <sup>+</sup> *v, the following inequality holds:*

$$|b(z)| \le Ca\_n(z).$$

**Proof.** We noted that:

$$b(z) = z - \frac{1-y}{z} + 2yS\_y(z) = \sqrt{(z - \frac{1-y}{z})^2 - 4y}$$

and

$$a\_n(z) = \operatorname{Im}\{\sqrt{(z-\frac{1-y}{z})^2 - 4y}\} + \frac{1}{nv} + \frac{1}{np}.$$

It was easy to show that for 1 <sup>−</sup> <sup>√</sup>*<sup>y</sup>* ≤ |*u*| ≤ <sup>1</sup> <sup>+</sup> <sup>√</sup>*y*:

$$\operatorname{Re}\{(z-\frac{1-y}{z})^2-4y\} \le 0.$$

Indeed:

$$\operatorname{Re}\{(z-\frac{1-y}{z})^2-4y\} \le \mu^2 + \frac{1-y}{\mu^2} - 2(1+y).$$

The last expression was not positive for 1 <sup>−</sup> <sup>√</sup>*<sup>y</sup>* ≤ |*u*| ≤ <sup>1</sup> <sup>+</sup> <sup>√</sup>*y*. From the negativity of the real part, it followed that:

$$\operatorname{Im}\{\sqrt{(z-\frac{1-y}{z})^2-4y}\} \ge \frac{1}{\sqrt{2}}\left|\sqrt{(z-\frac{1-y}{z})^2-4y}\right|$$

This implied the required proof. Thus, the lemma was proven.

**Lemma 8.** *There is an absolute constant C* > 0*, such that for z* = *u* + *iv:*

$$|\Lambda\_n| \le \mathbb{C} \min \{ \frac{|T\_n|}{|b(z)|}, \sqrt{|T\_n|} \},\tag{49}$$

*and that for <sup>z</sup>* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *iv to satisfy* <sup>1</sup> <sup>−</sup> <sup>√</sup>*<sup>y</sup>* <sup>−</sup> *<sup>v</sup>* ≤ |*u*| ≤ <sup>1</sup> <sup>+</sup> <sup>√</sup>*<sup>y</sup>* <sup>+</sup> *<sup>v</sup> and <sup>v</sup>* <sup>&</sup>gt; <sup>0</sup>*, the following inequality is valid:*

$$|\operatorname{Im}\Lambda\_n| \le \mathbb{C} \min\{\frac{|T\_n|}{|b(z)|}, \sqrt{|T\_n|}\}.\tag{50}$$

**Proof.** We changed the variables by setting:

$$w = \frac{1}{\sqrt{y}}(z - \frac{1 - y}{z}), z = \frac{w\sqrt{y} + \sqrt{yw^2 + 4(1 - y)}}{2}\sqrt{y}$$

and

$$
\tilde{S}(w) = \sqrt{y} S\_{\mathfrak{Y}}(z), \\
\tilde{s}\_n(w) = \sqrt{y} s\_n(z).
$$

In this notation, we could rewrite the main equation in the form:

$$1 + w\overline{\mathbf{s}}\_n(w) + \overline{\mathbf{s}}\_n^2(w) = T\_n.$$

It was easy to see that:

$$
\Lambda\_n = \frac{1}{\sqrt{\mathcal{Y}}} (\overline{s}\_n(z) - \overline{S}(w)).
$$

Then, it sufficed to repeat the proof of Lemma B.1 from [17]. We noted that this lemma implied that Inequality (50) held for all *w* with Im *w* > 0 (and, therefore, for all *z*) and that Inequality (49) satisfied | Re *w*| ≤ 2 + Im *w* for *w*.From this, we concluded that Inequality (49) held for *<sup>z</sup>* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *iv*, such that 1 <sup>−</sup> <sup>√</sup>*<sup>y</sup>* <sup>−</sup> *cv* ≤ |*u*| ≤ <sup>1</sup> <sup>+</sup> <sup>√</sup>*<sup>y</sup>* <sup>+</sup> *cv* for a sufficiently small constant *c* > 0.

Thus, the lemma was proven.

**Lemma 9.** *For z* = *u* + *iv, we have:*

$$|A\_0(z)| = \frac{1}{|z + yS\_y(z)|} \le 1 + |b(z)|\sqrt{\rho}$$

*and*

$$\operatorname{Im} A\_0(z) \le \operatorname{Im} b(z)\_{\prime}$$

*where*

$$b(z) = z - \frac{1-y}{z} + 2yS\_y(z).$$

**Proof.** First, we noted that:

$$\frac{1}{z + y\mathbb{S}\_y(z)} = -\left(y\mathbb{S}\_y(z) - \frac{1-y}{z}\right).$$

Using this, we could write:

$$b(z) = A\_0(z) - \frac{1}{A\_0(z)}.\tag{51}$$

From there, it followed that:

$$A\_0(z) = \frac{b(z) \pm \sqrt{b^2(z) + 4}}{2}.$$

This implied that:

$$|A\_0(z)| \le 1 + |b(z)|.$$

Equality (51) yielded:

$$\operatorname{Im} A\_0(z) = \frac{|A\_0(z)|^2}{1 + |A\_0(z)|^2} \operatorname{Im} b(z) \le \operatorname{Im} b(z).$$

Thus, the lemma was proven.

**Lemma 10.** *A positive absolute constant B exists, such that:*

$$|a\_n(z)|A\_0(z)| \le B$$

*and*

$$|\mathcal{S}\_y(z)| |A\_0(z)| \le \mathcal{C}\_\cdot$$

**Proof.** First, we considered |*b*(*z*)| ≥ Γ<sup>−</sup>1. Then, for |*z*| ≥ *C*Γ*n*:

$$|a\_n(z)|A\_0(z)| \le \Gamma\_n(|b(z)| + 1) \le \frac{C\Gamma\_n}{|z|} \le C.$$

In the case Γ*<sup>n</sup>* ≤ |*b*(*z*)| ≤ *C*, we obtained:

$$|a\_n(z)A\_0(z)| \le |b(z)|(|b(z)|+1) \le \mathcal{C}(\mathcal{C}+1).$$

we then considered the case |*b*(*z*)| ≤ Γ*n*:

$$(a\_n(z)A\_0(z) \le (yS\_y(z) + \frac{1-y}{|z|})\Gamma\_n \le \sqrt{y}\Gamma\_n + 1 - y \le 1.$$

To prove the second inequality, we considered the equality:

$$|S\_{\mathcal{Y}}(z)A\_0(z)| = |yS\_{\mathcal{Y}}^2(z) - \frac{1-y}{z}S\_{\mathcal{Y}}(z)| = |-1 - zS\_{\mathcal{Y}}(z)| \le \mathcal{C}.$$

Thus, the lemma was proven.

We let **X** be a rectangular *n* × *m* matrix with *m* ≥ *n*. We let *s*<sup>1</sup> ≥ ··· ≥ *sn* be the singular values of matrix **X**. The diagonal matrix with *djj* = *sj* was denoted by **D***<sup>n</sup>* = (*djk*) *n* × *n*. We let **O***n*,*<sup>k</sup>* be an *n* × *k* matrix with zero entries. We put **O***<sup>n</sup>* = **O***n*,*<sup>n</sup>* and **<sup>D</sup>** *<sup>n</sup>* <sup>=</sup> **D***n***O***n*,*m*−*n* . We let **L** and **K** be orthogonal (Hermitian) matrices, such that the singular value decomposition held:

$$\mathbf{X} = \mathbf{L}\mathbf{D}\_n\mathbf{K}\_n$$

Furthermore, we let **I***<sup>n</sup>* be the identity of an *n* × *n* matrix and **E***<sup>n</sup>* = **I***n***O***n*,*m*−*n* . We introduced the matrices **L***<sup>n</sup>* = **LE***<sup>n</sup>* and **K***<sup>n</sup>* = **KE***<sup>T</sup> <sup>n</sup>* . We noted that **L**<sup>∗</sup> *<sup>n</sup>* = **E***<sup>T</sup> <sup>n</sup>***L**<sup>∗</sup> and **K**∗ *<sup>n</sup>* = **E***n***K**∗. We introduced the matrix **V** = **O X X**<sup>∗</sup> **O** . We considered the matrix **Z** = <sup>√</sup><sup>1</sup> 2 **L L***<sup>n</sup>* **K***<sup>n</sup>* −**K** . We then obtained the following:

**Lemma 11.**

$$\mathbf{Z}^\* \mathbf{V} \mathbf{Z} = \begin{bmatrix} \mathbf{D}\_{\mathrm{n}} & \mathbf{O}\_{\mathrm{n}} & \mathbf{O}\_{\mathrm{n}} \\ \mathbf{O}\_{\mathrm{n}} & -\mathbf{D}\_{\mathrm{n}} & \mathbf{O}\_{\mathrm{m}-\mathrm{n},\mathrm{n}} \\ \mathbf{O}\_{\mathrm{m}-\mathrm{n},\mathrm{n}} & \mathbf{O}\_{\mathrm{m}-\mathrm{n},\mathrm{n}} & \mathbf{O}\_{\mathrm{m}-\mathrm{n}} \end{bmatrix} =: \widehat{\mathbf{D}}.$$

**Proof.** The proof followed direct calculations. It was straightforward to see that:

$$\mathbf{Z}^\* \mathbf{V} = \frac{1}{\sqrt{2}} \begin{bmatrix} \mathbf{K}\_n^\* \mathbf{X}^\* & \mathbf{L}^\* \mathbf{X} \\ -\mathbf{L}\_n^\* \mathbf{X} & \mathbf{K}^\* \mathbf{X} \end{bmatrix} = \frac{1}{\sqrt{2}} \begin{bmatrix} \mathbf{E}\_n \tilde{\mathbf{D}}^T \mathbf{L}^\* & \tilde{\mathbf{D}} \mathbf{K}^\* \\ -\mathbf{E}\_n \tilde{\mathbf{D}} \mathbf{K}^\* & \tilde{\mathbf{D}}^T \mathbf{L}^\* \end{bmatrix}.$$

Furthermore:

$$\mathbf{Z}^\* \mathbf{V} \mathbf{Z} = \frac{1}{2} \begin{bmatrix} \mathbf{E}\_n \mathbf{\tilde{D}}^T + \mathbf{\tilde{D}} \mathbf{E}\_n^T & \mathbf{E}\_n \mathbf{\tilde{D}}^T - \mathbf{\tilde{D}}\_n \mathbf{E}\_n^T \\\ -\mathbf{E}\_n^T \mathbf{\tilde{D}} + \mathbf{\tilde{D}}^T \mathbf{E}\_n & -\mathbf{\tilde{D}}^T \mathbf{E}\_n - \mathbf{E}\_n^T \mathbf{\tilde{D}} = \mathbf{\tilde{D}} \end{bmatrix}.$$

#### **8. Conclusions**

In this work, we obtained results by assuming that the conditions (*C*0)–(*C*2) were fulfilled. The condition (*C*2) was of a technical nature. In our investigation on the asymptotic behaviour of the Stieltjes transformation on a beam, this restriction could be eliminated. However, this was a technically cumbersome task that requires separate consideration.

**Author Contributions:** Writing—original draft, A.N.T. and D.A.T. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors wish to thank F. Götze for the several fruitful discussions on this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Mihailo Jovanovi´c 1,†, Vladica Stojanovi´c 2,\*,†, Kristijan Kuk 2,†, Brankica Popovi´c 2,† and Petar Cisar ˇ 2,†**


**Abstract:** This paper describes one of the non-linear (and non-stationary) stochastic models, the GSB (Gaussian, or Generalized, Split-BREAK) process, which is used in the analysis of time series with pronounced and accentuated fluctuations. In the beginning, the stochastic structure of the GSB process and its important distributional and asymptotic properties are given. To that end, a method based on characteristic functions (CFs) was used. Various procedures for the estimation of model parameters, asymptotic properties, and numerical simulations of the obtained estimators are also investigated. Finally, as an illustration of the practical application of the GSB process, an analysis is presented of the dynamics and stochastic distribution of the infected and immunized population in relation to the disease COVID-19 in the territory of the Republic of Serbia.

**Keywords:** stochastic processes; emphatic fluctuations; non-stationarity; asymptotic normality; Gaussian distribution; estimation; COVID-19

**MSC:** 60E10; 60F05; 62M10

#### **1. Introduction**

Stochastic models which are used in the analysis of time series with pronounced and permanent fluctuations are of particular importance in contemporary research. For this purpose, we start from the basic results of Engle and Smith [1], who first introduced the so-called STOchastic Permanent BREAKing process, popularly called the *STOPBREAK process*. Many authors have since considered the STOPBREAK notion, primarily in the field of econometrics. Some of its modifications were considered, among others, in [2–5], while its application was presented, for instance, in [6–8].

The original modification of the STOPBREAK process, named *the Split-BREAK model*, was introduced in [9]. After that, the general form of this process, named *Gaussian (or Generalized) Split-BREAK (GSB) process*, was proposed in [10–12]. This stochastic model also can be viewed as a generalization of STOPBREAK, as well as a well-known linear Auto-Regressive Moving Average (ARMA) model. In that way, the GSB process has already been applied in analyzing non-linear time series with pronounced and permanent fluctuations. Let us point out that in the mentioned works, of main consideration were the stochastic properties of the stationary components of the GSB process. The main goal of this paper is a more detailed investigation of the non-stationary components (time series) of the GSB model. These series naturally have a more complex stochastic structure, but they are of particular interest in contemporary research [13–18]. To this end, the asymptotic properties of distributions of the GSB series will also be of specific interest.

In addition to the theoretical aspects, the application of the GSB process in describing the dynamics and finding an adequate stochastic distribution of the infected and immunized population with respect to COVID-19 on the territory of the Republic of Serbia was also

**Citation:** Jovanovi´c, M.; Stojanovi´c, V.; Kuk, K.; Popovi´c, B.; Cisar, P. ˇ Asymptotic Properties and Application of GSB Process: A Case Study of the COVID-19 Dynamics in Serbia. *Mathematics* **2022**, *10*, 3849. https://doi.org/10.3390/ math10203849

Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

Received: 22 September 2022 Accepted: 14 October 2022 Published: 17 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

considered. We point out that many authors who deal with this, still current, issue have contributed various theoretical models that investigate it from several aspects. For instance, rigorous mathematical models, usually based on analyzing and solving systems of partial coupled equations, have been proposed, among others, in [19–21]. On the other hand, works in [22–25] combine deterministic and stochastic approaches, such as multiple and logistic regression, multifactor correlation, and the least squares estimation method, to predict the various effects caused by the COVID-19 pandemic. A particularly interesting approach is given in [26,27] where, to predict the COVID-19 dynamics more accurately, machine learning techniques and the construction of a complete information system are used. Finally, to the best of our knowledge, most stochastic approaches to-date in the analysis of infection, immunization, and other indicators related to the disease of COVID-19 were based on the use of the gamma distribution [21,28], as well as a log-normal distribution [29]. This is precisely one of the reasons why we believe that a different approach is given here, primarily in stochastic modeling and research of this problem. At the same time, let us emphasize that our main goal is to model the temporal dynamics of the COVID-19 disease, based on a formal study of the stochastic structure of the GSB model. In this sense, some other indicators and features of this disease, which can also affect its dynamics (see, for instance [30–32]), can to a certain degree be a limitation of this approach.

In the next section, starting from previous works [9–12], some definitions and basic stochastic properties of the GSB process are discussed. Section 3 contains the main and novel results related to this process's detailed stochastic structure and asymptotic properties, where the method of characteristic functions (CFs) was used as the basic tool. Section 4 presents the procedure for estimating the unknown parameters of the GSB process and an investigation of the asymptotic properties of the obtained estimators. Numerical Monte Carlo simulations of the obtained estimators are considered in Section 5. In addition, the application of the GSB process in describing the dynamics and distribution of the size of infected and immunized populations on the territory of the Republic of Serbia is given here. Finally, concluding remarks are highlighted in Section 6.

#### **2. Definition and Main Properties of the GSB Process**

The basic series of GSB processes is defined by the following equality:

$$y\_t = m\_t + \varepsilon\_t.\tag{1}$$

Here, *t* = 0, 1, ... , *T* are the known time values, (*mt*) is the series of the so-called *martingale means*, and (*εt*) are *the innovations*, i.e., series of independent identical distributed (IID) Gaussian N 0, *σ*<sup>2</sup> random variables (RVs). Moreover, it is considered that (*εt*) is defined on the same probability space (Ω, F, *P*), expanded by some filtration *F* = (F*t*), i.e., nondecreasing *σ*-algebras on Ω. In a practical sense, filtration (F*t*) represents a set of "information" at time *t*. Therefore, it is assumed that, for each *t* = 0, 1, ... , *T*, the RVs *ε<sup>t</sup>* are F*t*-adaptive. Accordingly, the conditional expectation, as well as the variance of RVs *εt*, are, respectively,

$$E(\mathfrak{e}\_t|\mathcal{F}\_{t-1}) = 0, \ V(\mathfrak{e}\_t|\mathcal{F}\_{t-1}) = E\left(\mathfrak{e}\_t^2 \middle| \mathcal{F}\_{t-1}\right) = \sigma^2.$$

On the other hand, for martingale means (*mt*), we assume that they are defined by the following recurrence relation:

$$m\_t = m\_{t-1} + q\_{t-1} \varepsilon\_{t-1} = m\_0 + \sum\_{j=0}^{t-1} q\_j \varepsilon\_j. \tag{2}$$

Here, we can effectively assume that *<sup>m</sup>*<sup>0</sup> *as* <sup>=</sup> *<sup>μ</sup>* (*const*.) and *<sup>ε</sup>*−<sup>1</sup> <sup>=</sup> *<sup>ε</sup>*<sup>0</sup> *as* = 0. Meanwhile, *qt* is the so-called *noise indicator*, i.e., the RV that depends on innovations (*εt*) in the following way:

$$q\_t = I\left(\varepsilon\_{t-1}^2 > c\right) = \begin{cases} 1, & \varepsilon\_{t-1}^2 > c, \\ 0, & \varepsilon\_{t-1}^2 \le c. \end{cases}$$

The value *c* > 0 represents *the critical value of the reaction*, i.e., the significance of the previous realization of innovations (*εt*) which allow their present values to be included in Equation (2). In other words, value *qt*−<sup>1</sup> = 0 indicates that there is no change in the martingale mean value *mt*, compared to the previous value *mt*−1. Consequently, the value *yt* will be obtained with a "small" fluctuation, which depends only on *εt*. By contrast, in the case of *qt* = 1 an emphatic (permanent) fluctuation of *yt* is registered. Thus, the level of previous realizations of series (*εt*) affects the degree of variations in the series (*yt*), that is, it indicates the intensity of fluctuations in the GSB process. Furthermore, according to the previous equalities, it follows that:

$$E(y\_t | \mathcal{F}\_{t-1}) = m\_t + E(\varepsilon\_t | \mathcal{F}\_{t-1}) = m\_{t'}$$

from which we conclude that the series realizations (*yt*) are "close" to the martingale means (*mt*). Moreover, it is valid to put:

$$\begin{array}{rcl} E(y\_t) &= E[E(y\_t|\mathcal{F}\_{t-1})] = E(m\_t) = E(m\_{t-1}) + E(q\_{t-1}\varepsilon\_{t-1}) \\ &= E(m\_{t-1}) = \dots = E(m\_0) = \mu\_\prime \end{array}$$

i.e., the mean values of the series (*yt*) and (*mt*) have equal, constant values. We notice that the previous equalities speak a lot about the stochastic nature of the GSB process, that is, the additive decomposition (1). Since the sequence (*mt*) is measurable concerning the field F*t*−1, it represents a component of *predictability and stability* of the GSB process. In contrast, the innovations series (*εt*) is *the deviation factor (white noise)* of the basic GSB series (*yt*) in relation to the martingale means (*mt*).

Further, we determine the conditional variance of the series (*yt*) from the equation:

$$V(y\_t | \mathcal{F}\_{t-1}) = E(y\_t^2 | \mathcal{F}\_{t-1}) - m\_t^2 = 2m\_t E(\varepsilon\_t) + E(\varepsilon\_t^2) = \sigma^2 \varepsilon\_t$$

and from here, one obtains:

$$V(y\_t) = E(y\_t^2) - \mu^2 = E(m\_t^2) + 2E(m\_t \varepsilon\_t) + E(\varepsilon\_t^2) - \mu^2 = V(m\_t) + \sigma^2.$$

For each *t* = 1, . . . , *T*, it also holds that:

$$\begin{array}{rcl} V(m\_t) &= E\left(m\_t^2\right) - \mu^2 \\ &= E\left(m\_{t-1}^2\right) + 2E\left(m\_{t-1}q\_{t-1}\varepsilon\_{t-1}\right) + E\left(q\_{t-1}^2\varepsilon\_{t-1}^2\right) - \mu^2 \\ &= V(m\_{t-1}) + a\_c\sigma^2, \end{array}$$

where *ac* = *E*(*qt*) = *E q*2 *t* = *P ε*2 *<sup>t</sup>* > *c* . It follows that the variance of martingale means (*mt*), under the assumption *m*<sup>0</sup> ≡ *μ*(*const*.), can be expressed as:

$$V(m\_t) = ta\_t \sigma^2, t \ge 0.$$

From here, the variance of the basic series (*yt*) can be obtained as follows:

$$V(y\_t) = V(m\_t) + \sigma^2 = (ta\_c + 1)\sigma^2, \ t \ge 0.$$

According to the previous equalities, the variances of the series (*yt*) and (*mt*) have nonconstant values that depend on the point in time (*t*) in which they are observed.

Correlation functions of the series (*yt*) and (*mt*) can be obtained in a similar way. Note that for every *s* > *t* ≥ 0, it holds that:

$$\begin{array}{rcl} \text{Cov}(m\_t, m\_s) &= \text{E}(m\_t m\_s) - \mu^2 = \text{E}(m\_t m\_{s-1}) + \text{E}(m\_t q\_{s-1} \varepsilon\_{s-1}) - \mu^2 \\ &= \text{Cov}(m\_t, m\_{s-1}), \end{array}$$

and it is easy to see that the covariance of the series (*mt*) satisfies:

$$\mathbb{C}ov(m\_t, m\_s) = V(m\_t)\_\prime \text{ } s > t \ge 0.$$

From here, the correlation function of the martingale means is obtained:

$$\bar{K}(s,t) = \frac{Cov(m\_t, m\_s)}{\sqrt{V(m\_t)} \cdot \sqrt{V(m\_s)}} = \begin{cases} \frac{\min(s,t)}{\sqrt{s \cdot t}}, & s \neq t \\ 1, & s = t. \end{cases}$$

Similarly, according to equalities:

$$\begin{array}{rcl} \text{Cov}(y\_t, y\_s) &= \text{E}(y\_t y\_s) - \mu^2 = \text{E}(y\_t m\_s) + \text{E}(y\_t \varepsilon\_s) - \mu^2 \\ &= \text{E}(m\_t m\_s) + \text{E}(\varepsilon\_t m\_s) - \mu^2 = \text{Cov}(m\_t, m\_s) + a\_c \sigma^2 \\ &= V(m\_t) + a\_c \sigma^2 = V(y\_t), \text{ s} > t \ge 0, \end{array}$$

the correlation function for (*yt*), can be obtained as follows:

$$K(\mathbf{s}, t) = \begin{cases} \frac{a\_\mathcal{c} \min(\mathbf{s}, t) + 1}{\sqrt{(a\_\mathcal{c} \mathbf{s} + 1) \cdot (a\_\mathcal{c} t + 1)}} & \mathbf{s} \neq t \\ 1 & \mathbf{s} = t. \end{cases}$$

Therefore, both correlation functions depend on the time arguments *t*,*s* and indicate the non-stationarity of the series (*yt*) and (*mt*). This fact requires some more complex techniques to examine their properties. Moreover, note that when *s* > *t* ≥ 0,

$$\begin{array}{l} \lim\_{s \to t} \tilde{K}(s, t) = \lim\_{s \to t} \frac{\min(s, t)}{\sqrt{s \cdot t}} = \frac{t}{\sqrt{t^2}} = 1\\ \lim\_{s \to t} K(s, t) = \lim\_{s \to t} \frac{a\_\varepsilon \min(s, t) + 1}{\sqrt{(a\_\varepsilon s + 1) \cdot (a\_\varepsilon t + 1)}} = \frac{a\_\varepsilon t + 1}{\sqrt{(a\_\varepsilon t + 1)^2}} = 1. \end{array}$$

Thus, the correlation functions of both series (*yt*) and (*mt*) satisfy the *L*2-continuity condition.

At the end of this section, we define *a series of increments of the GSB process* by the following equality:

$$X\_t = y\_t - y\_{t-1}, \ t = 1, \ldots, T. \tag{3}$$

Almost all authors who have studied STOPBREAK processes highlight the importance of this sequence. This series, as can be easily seen from Equations (1) and (2), can be given in the following form:

$$X\_t = \varepsilon\_t - \theta\_{t-1}\varepsilon\_{t-1} \tag{4}$$

where *θ<sup>t</sup>* = 1 − *qt* = *I ε*2 *<sup>t</sup>*−<sup>1</sup> ≤ *c* . The series (*Xt*) is named *a Splitting Moving Average process* (*of order* 1), shortened to *Split-MA (1) process*, because it operates in two regimes. Fluctuations of innovations (*εt*) that were emphasized in the previous time moment (*t* − 1) imply *θt*−<sup>1</sup> = 0, so the equality *Xt* = *ε<sup>t</sup>* holds. On the other hand, fluctuations that do not exceed the critical value *c* give a representation of (*Xt*) in the form of a standard, linear MA (1) process. In this way, (*Xt*) has similar properties to the MA (1) models, which can be applied in research into it. Thus, taking earlier assumptions, the mean value and variance of this series, obtained by simple computation, are:

$$E(X\_t) = 0, \; V(X\_t) = E\left(X\_t^2\right) = \sigma^2 (b\_\varepsilon + 1)\_\prime$$

where *bc* = 1 − *ac* = *P ε*2 *<sup>t</sup>*−<sup>1</sup> ≤ *c* . Moreover, the covariance of this sequence is:

$$\text{Cov}(X\_{t\prime}X\_{s}) = \begin{cases} (b\_{\text{c}} + 1)\sigma^{2}, & s = t \\ -b\_{\text{c}}\sigma^{2}, & |s - t| = 1 \\ 0, & \text{otherwise} \end{cases}$$

and obviously has an identical structure to the standard MA (1) series. Based on the obtained covariance, we can easily see that the series (*Xt*) is stationary and that its correlation function can be written in the form:

$$\rho\_X(h) := \frac{\operatorname{Cov}(X\_t, X\_{t+h})}{V(X\_t)} = \begin{cases} 1, & h = 0\\ -b\_t / (b\_t + 1), & h = \pm 1\\ 0, & \text{otherwise.} \end{cases}$$

Finally, according to Equations (3) and (4), it follows that:

$$y\_t - y\_{t-1} = \varepsilon\_t - \theta\_{t-1}\varepsilon\_{t-1}, \ t = 1, \dots, T, \dots$$

which can be viewed as a non-linear *Integrated Auto-Regressive Moving Average (ARIMA) model* with "temporary" components (*θt*−1*εt*−1). These imply the specific structure of the series (*Xt*), as well as other components of the GSB process.

In the following section, as we have already pointed out, we also discuss the application of the GSB model in describing the dynamics of infection and immunization of the population on the territory of the Republic of Serbia. As will be seen, this kind of dynamics has pronounced fluctuations that can be described by the non-stationary components of the GSB process, primarily by its main time series (*yt*). In that case, due to its stationarity, the Split-MA (1) process plays an important role. As an illustration, Figure 1 shows the realizations of all the above-mentioned series obtained by the Monte Carlo simulation of the GSB model.

**Figure 1.** Dynamics of the basic series of the GSB model. (Parameter values are: *μ* = 0 and *c* = *σ* = 1).

#### **3. Stochastic Distribution and Asymptotic Properties of the GSB Process**

In this section, some stochastic properties of the GSB process, regarding the distribution and asymptotic behavior of its basic stochastic components, are discussed in more detail. As explained in the previous section, the GSB model, given by Equations (1)–(4), contains four stochastic components: the basic series (*y*), innovations (*εt*), the martingale means (*mt*), and the series of increments (*Xt*). At the same time, series (*εt*) and (*Xt*) represent the stationary components of the GSB process, where (*Xt*) is "close" to the linear MA model. In general form, the stochastic structure of the series (*Xt*) is described in [12], where the method of characteristic functions (CFs) was used. Following this approach, the basic stochastic properties of the series (*Xt*) can be expressed by the following statement.

**Theorem 1.** *Let* (*Xt*) *be the Split-MA (1) process defined by Equation (4). For arbitrary x* ∈ R and *t* = 0, ·1, ... , *T, the cumulative distribution function (CDF) of this stochastic process is given by:*

$$F\_{\mathbf{X}}(\mathbf{x}) := P\{X\_t < \mathbf{x}\} = (1 - b\_c)F\_{\mathbf{c}}(\mathbf{x}) + b\_c F\_{\sqrt{2}\mathbf{c}}(\mathbf{x}),\tag{5a}$$

*where Fε*(*x*) *and F*2*ε*(*x*) *are CDFs of RVs ε<sup>t</sup>* : N 0, *σ*<sup>2</sup> *and* <sup>√</sup>2*ε<sup>t</sup>* : <sup>N</sup> 0, 2*σ*<sup>2</sup> *, respectively.*

**Proof.** For arbitrary *t* = 0, 1, ... , *T*, let us denote the series of RVs *η<sup>t</sup>* = *θtεt*. Since *θ<sup>t</sup>* and *ε<sup>t</sup>* are mutually independent RVs, it follows

$$\begin{array}{l} E(\eta\_t) = E(\theta\_t) E(\varepsilon\_t) = 0, \\ V(\eta\_t) = E(\theta\_t^2) E(\varepsilon\_t^2) = b\_c \sigma^2. \end{array}$$

Moreover, it is simply shown that *Cov*(*ηt*, *ηt*+*h*) = 0 holds for every *h* = 0, i.e., (*ηt*) is a series of uncorrelated RVs. By applying conditional probabilities, the CDF of these RVs can be obtained as follows:

$$\begin{array}{lcl} F\_{\emptyset}(\boldsymbol{\mathfrak{x}}): & = P\{\boldsymbol{\eta}\_{t} < \boldsymbol{\mathfrak{x}}\} \\ & = P\{\boldsymbol{\eta}\_{t} < \boldsymbol{\mathfrak{x}} | \boldsymbol{\theta}\_{t} = 1\} \cdot P\{\boldsymbol{\theta}\_{t} = 1\} + P\{\boldsymbol{\eta}\_{t} < \boldsymbol{\mathfrak{x}} | \boldsymbol{\theta}\_{t} = 0\} \cdot P\{\boldsymbol{\theta}\_{t} = 0\} \\ & = P\{\boldsymbol{\varepsilon}\_{t} < \boldsymbol{\mathfrak{x}}\} \cdot P\{\boldsymbol{\theta}\_{t} = 1\} + P\{\boldsymbol{\mathfrak{x}} > 0\} \cdot P\{\boldsymbol{\theta}\_{t} = 0\} \\ & = b\_{\boldsymbol{\varepsilon}} F\_{\boldsymbol{\varepsilon}}(\boldsymbol{\mathfrak{x}}) + (1 - b\_{\boldsymbol{\varepsilon}}) F\_{0}(\boldsymbol{\mathfrak{x}}), \end{array}$$

where *<sup>F</sup>*0(*x*) = *<sup>I</sup>*(*<sup>x</sup>* <sup>&</sup>gt; <sup>0</sup>) is the CDF of the RV*I*<sup>0</sup> *as* = 0. Based on that, for the CF of the RVs*ηt*, one obtains:

$$\begin{array}{rcl}\varrho\_{\eta}(\boldsymbol{u}): & = \stackrel{+\infty}{\int} \epsilon^{i\mathrm{u}\boldsymbol{x}} F\_{\eta}(d\boldsymbol{x}) = \stackrel{+\infty}{\int} \epsilon^{i\mathrm{u}\boldsymbol{x}} [b\_{\boldsymbol{c}} F\_{\boldsymbol{c}} + (1 - b\_{\boldsymbol{c}}) F\_{0}](d\boldsymbol{x}), \\ & = b\_{\boldsymbol{c}} \boldsymbol{\eta}\_{\boldsymbol{\varepsilon}}(\boldsymbol{u}) + (1 - b\_{\boldsymbol{c}}) \boldsymbol{\eta}\_{0}(\boldsymbol{u}). \end{array}$$

Here, *ϕε*(*u*) = *<sup>e</sup>*<sup>−</sup> <sup>σ</sup>2u2 <sup>2</sup> and *ϕ*0(*u*) ≡ 1 are CFs of the RVs *ε<sup>t</sup>* и *I*0, respectively. By substituting these CFs into the previous equality, we have:

$$
\phi\_\eta(u) = 1 + b\_c \left( e^{-\frac{s^2 u^2}{2}} - 1 \right),
$$

whence, by applying Equation (4), it follows that the CF of RVs *Xt* is:

$$\begin{aligned} \varphi\_X(\mu) &= \varphi\_\varepsilon(\mu) \cdot \varphi\_\eta(\mu) = \varepsilon^{-\frac{c^2\mu^2}{2}} \left[ 1 + b\_\mathfrak{c} \left( \varepsilon^{-\frac{c^2\mu^2}{2}} - 1 \right) \right] \\ &= (1 - b\_\mathfrak{c}) \varepsilon^{-\frac{c^2\mu^2}{2}} + b\_\mathfrak{c} \varepsilon^{-\sigma^2 \mu^2} . \end{aligned}$$

According to the last equality and Lévy's correspondence theorem (see, e.g., [33] (p. 181)), Equation (5) immediately follows, that is, the statement of the theorem is proved.

**Remark 1.** As shown in [12], the CDF of RVs *Xt* can also be given in the following form:

$$F\_{\mathbf{X}}(\mathbf{x}) := P\{\mathbf{X}\_{\mathbf{f}} < \mathbf{x}\} = [(1 - b\_{\mathbf{f}})F\_{\mathbf{0}}(\mathbf{x}) + b\_{\mathbf{f}}F\_{\mathbf{f}}(\mathbf{x})] \, \otimes \, F\_{\mathbf{f}}(\mathbf{x}),\tag{5b}$$

where "⊗" denotes the convolution of two (arbitrary) CDFs *F*(*x*), *G*(*x*):

$$(F \otimes G)(\mathfrak{x}) := \int\_{-\infty}^{+\infty} F(\mathfrak{x} - y)G(dy).$$

The equivalence of Equations (5a) and (5b) are directly obtained from the fact that CDF *F*0(*x*) is neutral for the convolution operator, i.e.,

$$I(F \otimes F\_0)(\mathfrak{x}) = (F\_0 \otimes F)(\mathfrak{x}) = \int\_{-\infty}^{+\infty} I(\mathfrak{x} > \mathfrak{y}) F(d\mathfrak{y}) = F(\mathfrak{x}).$$

Finally, note that by differentiating Equation (5), the probability density function (PDF) of the series (*Xt*), one obtains:

$$f\_{\mathcal{X}}(x) = \frac{1 - b\_c}{\sigma \sqrt{2\pi}} e^{-\frac{x^2}{2\pi\sigma^2}} + \frac{b\_c}{2\sigma\sqrt{\pi}} e^{-\frac{x^2}{4\pi\sigma^2}}.$$

By a similar procedure as in the previous theorem and using the convolutions of CDFs, we describe the stochastic distribution of other components of the GSB process, i.e., the series (*mt*) and (*yt*). As already shown in the previous section, these series represent non-stationary stochastic processes with a constant mean *μ* = *E*(*mt*) = *E*(*yt*). Accordingly, the following statement is valid.

**Theorem 2.** *Let (yt) and* (*mt*) *be the time series defined by Equations (1) and (2), respectively, where <sup>m</sup>*<sup>0</sup> *as* = *μ* (*const*). *For arbitrary x* ∈ R *and t* = 0, ·1, ... , *T, the CDFs of these series are as follows:*

$$F\_m(\mathbf{x}, t) := P\{m\_t < \mathbf{x}\} = \underset{j=1}{\stackrel{t}{\otimes}} \left[ (1 - b\_c) F\_{\bar{j}}(\mathbf{x}) + b\_c F\_0(\mathbf{x}) \right] \otimes F\_{\bar{\mu}}(\mathbf{x}). \tag{6}$$

$$F\_{\mathcal{Y}}(\mathbf{x},t) := P\{y\_t < \mathbf{x}\} = \underset{j=1}{\overset{\circ}{\otimes}} \left[ (1 - b\_{\mathbf{f}})F\_{\mathcal{Y}}(\mathbf{x}) + b\_{\mathbf{f}}F\_{\mathcal{O}}(\mathbf{x}) \right] \otimes F\_{\mathcal{Y}}(\mathbf{x}) \otimes F\_{\mathcal{E}}(\mathbf{x}). \tag{7}$$

*Here, F*0(*x*) *and Fj*(*x*) *are the CDFs of previously defined RVs I*<sup>0</sup> *and εt, respectively, and <sup>F</sup>μ*(*x*) = *Fm*(*x*, 0) *is the CDF of the RV <sup>m</sup>*<sup>0</sup> *as* = *μ. In addition, when T* = +∞*, the following convergences (in distribution) are valid:*

$$\frac{1}{\sqrt{t}}m\_t \stackrel{d}{\rightarrow} \mathcal{N}\left(0, a\_t \sigma^2\right), \quad \frac{1}{\sqrt{t}}y\_t \stackrel{d}{\rightarrow} \mathcal{N}\left(0, a\_t \sigma^2\right), \qquad t \rightarrow +\infty. \tag{8}$$

**Proof.** For arbitrary *t* = 0, 1, ... , *T*, let us introduce a series of RVs *ξ<sup>t</sup>* = *qtεt*. In the same way as in the proof of the previous theorem, it is shown that (*ξt*) is a series of mutually uncorrelated RVs, with *E*(*ξt*) = 0, *D*(*ξt*) = *acσ*2, where *ac* = *E*(*qt*) = *P ε*2 *<sup>t</sup>* > *c* = 1 − *bc*. By reapplying the conditional probabilities, the CDF of *ξ<sup>t</sup>* is obtained as follows:

$$\begin{array}{lcl} F\_{\xi}(\mathbf{x}): & = P\{\underline{\xi}\_{t} < \mathbf{x}\} \\ & = P\{\underline{\xi}\_{t} < \mathbf{x} | q\_{t} = 1 \cdot P\{q\_{t} = 1\} + P\{\underline{\xi}\_{t} < \mathbf{x} | q\_{t} = 0\} \cdot P\{q\_{t} = 0\} \\ & = P\{\underline{\varepsilon}\_{t} < \mathbf{x}\} \cdot P\{q\_{t} = 1\} + P\{\mathbf{x} > 0\} \cdot P\{q\_{t} = 0\} \\ & = a\_{\varepsilon}F\_{\mathbf{t}}(\mathbf{x}) + (1 - a\_{\varepsilon})F\_{0}(\mathbf{x}). \end{array}$$

According to this, their corresponding CF is obtained:

$$\begin{split} \varphi\_{\sharp}(u) &= \int\_{-\infty}^{+\infty} \mathfrak{e}^{iux} F\_{\sharp}(dx) = \int\_{-\infty}^{+\infty} \mathfrak{e}^{iux} [a\_{\varepsilon} F\_{\varepsilon} + (1 - a\_{\varepsilon}) F\_{0}](dx) \\ &= a\_{\varepsilon} \mathfrak{e}\_{\varepsilon}(u) + (1 - a\_{\varepsilon}) \mathfrak{e}\_{0}(u) = 1 + a\_{\varepsilon} \left( \mathfrak{e}^{-\frac{\varepsilon^{2} u^{2}}{2}} - 1 \right) \\ &= (1 - b\_{\varepsilon}) e^{-\frac{\varepsilon^{2} u^{2}}{2}} + b\_{\varepsilon} .\end{split}$$

Applying Equation (2), we find that the CFs of the RVs (*mt*) are as follows:

$$\varphi\_m(\mu, t) = \varphi\_\mu(\mu) \prod\_{j=0}^{t-1} \varphi\_{\frac{\nu}{\xi}}(\mu) = e^{i\mu\mu} \left[ (1 - b\_c) e^{-\frac{c^2 \mu^2}{2}} + b\_c \right]^t,\tag{9}$$

where *ϕμ*(*u*) = *<sup>e</sup>iu<sup>μ</sup>* is CF of the RV *<sup>m</sup>*<sup>0</sup> *as* = *μ*. Then, Equation (6) immediately follows from Equation (9) and Lévy's correspondence theorem [33] (p. 181).

Similarly, by applying the previous Equations (1) and (9), the CFs of the RVs (*yt*) are obtained:

$$\varphi\_y(u,t) = \varphi\_m(u) \cdot \varphi\_c(u) = e^{iu\mu - \frac{\sigma^2 u^2}{2}} \left[ (1 - b\_c)e^{-\frac{\sigma^2 u^2}{2}} + b\_c \right]^t. \tag{10}$$

From here, by reapplying the theorem of Lévy, Equation (7) immediately follows.

To prove the second part of the theorem, i.e., Equation (8), note first that the CFs of the RVs *mt*/ <sup>√</sup>*<sup>t</sup>* and *yt*/ <sup>√</sup>*t*, when *<sup>t</sup>* <sup>=</sup> 1, 2, ... , according to Equations (9) and (10), can be written as follows:

$$\begin{split} q\_{\mu} \left( \frac{u}{\sqrt{t}}, t \right) &= e^{iu\mu/\sqrt{t}} \Big[ 1 + a\_{\mathfrak{c}} \left( e^{-\frac{v^{2}u^{2}}{2t}} - 1 \right) \Big]^{t} \\ &= e^{iu\mu/\sqrt{t}} \Big[ 1 - \frac{a\_{\mathfrak{c}}v^{2}u^{2}}{2t} + \sigma \left( \frac{u^{2}}{t} \right) \Big]^{t} \\ q\_{\mathfrak{y}} \Big( \frac{u}{\sqrt{t}}, t \Big) &= e^{iu\mu/\sqrt{t} - \frac{v^{2}u^{2}}{2t}} \Big[ 1 + a\_{\mathfrak{c}} \Big( e^{-\frac{v^{2}u^{2}}{2t}} - 1 \Big) \Big]^{t} \\ &= e^{iu\mu/\sqrt{t} - \frac{v^{2}u^{2}}{2t}} \Big[ 1 - \frac{a\_{\mathfrak{c}}v^{2}u^{2}}{2t} + \sigma \left( \frac{u^{2}}{t} \right) \Big]^{t}. \end{split}$$

Here, *σ*(*z*) is an infinitely small value of a higher order than *z* when *z* → 0. Hence, for a fixed but arbitrary *u* ∈ R, we have:

$$p\_m\left(\frac{u}{\sqrt{t}},t\right) \to e^{-\frac{a\_k v^2 u^2}{2}}, \quad q\_y\left(\frac{u}{\sqrt{t}},t\right) \to e^{-\frac{a\_k v^2 u^2}{2}}, \text{ if } \to +\infty,$$

and the convergences thus obtained confirm the asymptotic relations in Equation (8).

**Remark 2.** Note again that the proofs of the previous two theorems are based on determining the CFs of the corresponding time series of the GSB process. In this sense, the CFs of the uncorrelated series of RVs (*ξt*) and (*ηt*) play a fundamental role. The series (*ξt*) and (*ηt*) can be viewed as "new" innovations with "optional" non-zero values, which essentially describe the stochastic structure of the GSB process. Nevertheless, as the relation *<sup>η</sup><sup>t</sup>* <sup>+</sup> *<sup>ξ</sup><sup>t</sup> as* = *ε<sup>t</sup>* holds for each *t* = 0, ·1, ... , *T*, it is sufficient to consider only one of these two series of uncorrelated RVs (which is what was done in the statement of Theorem 2). Moreover, it can be easily shown that CDFs:

$$\begin{array}{c} F\_{\mathbb{G}}(\mu) = (1 - b\_{\mathfrak{c}}) F\_{\mathfrak{c}}(\mathfrak{x}) + b\_{\mathfrak{c}} F\_{\mathbb{G}}(\mathfrak{x}), \\ F\_{\mathfrak{y}}(\mu) = b\_{\mathfrak{c}} F\_{\mathfrak{c}}(\mu) + (1 - b\_{\mathfrak{c}}) F\_{\mathbb{G}}(\mu) \end{array}$$

are continuous almost everywhere, with the only point of discontinuity *x* = 0 where they have "jumps" of the values *bc* and 1 − *bc*, respectively (see for more detail [34,35]). Therefore, the CDFs of the series (*ξt*) and (*ηt*) are mixtures of Gaussian and discrete type distribution, usually named *Contaminated Gaussian Distribution (CGD)*. This is another important fact that disables an application of some of the standard procedures in the investigation of the properties of non-stationary series (*yt*) and (*mt*).

On the other hand, Equation (8) shows that even non-stationary time series (*mt*) and (*yt*) can generate series *mt*/ √*t* and *yt*/ √*t* that converge toward a normal distribution when *t* → +∞. Moreover, based on the properties of the non-stationary components of the GSB process described in Section 2, the time series *mt*/ √*t* has a constant variance *acσ*2. These facts will be of importance in the practical application of the GSB process and can be readily observed based on the convergence of the corresponding CFs *ϕ<sup>m</sup> u*/ <sup>√</sup>*t*, *<sup>t</sup>* and *ϕ<sup>y</sup> u*/ <sup>√</sup>*t*, *<sup>t</sup>* . As an illustration, Figure 2 shows convergences of the modulus of these CFs, for different time indices (*t*).

**Figure 2.** Graphs of the convergence of modulus of the characteristic functions *ϕm u*/ <sup>√</sup>*t*, *<sup>t</sup>* and *ϕy u*/ <sup>√</sup>*t*, *<sup>t</sup>* , when *t* = 1, 2, . . . , 500. (Parameter values are: *μ* = *c* = *σ* = 1).

At the end of this section, we additionally describe some more asymptotic properties of series obtained by transformations of non-stationary time series (*mt*) and (*yt*). They also refer to the possibility of finding their asymptotically normal (AN) distributions, which can be shown by the following statement:

**Theorem 3.** *For arbitrary α* ≥ 1 *and time series* (*yt*) *and* (*mt*)*, given by Equations (1) and (2), respectively, let us define the so-called α-mean series:*

$$\overline{\mathcal{M}}\_{t;\alpha} = \frac{1}{t^{\alpha}} \sum\_{j=1}^{t} m\_{j\prime} \quad \overline{\mathcal{T}}\_{t;\alpha} = \frac{1}{t^{\alpha}} \sum\_{j=1}^{t} y\_{j\prime}$$

*Then the following statements hold:*

*(i)*. *When* 1 ≤ *α* ≤ 3/2*, time series Mt*;*<sup>α</sup> and Yt*;*<sup>α</sup> have an asymptotically normal distribution, i.e., the following relations, when t* → +∞*, are valid:*

$$\overline{\mathcal{M}}\_{t;\mathfrak{a}} \sim \mathcal{N}\left(\mu t^{1-a}, \frac{a\_{\varepsilon}\sigma^2 t^{3-2a}}{3}\right), \ \overline{\mathcal{T}}\_{t;\mathfrak{a}} \sim \mathcal{N}\left(\mu t^{1-a}, \frac{a\_{\varepsilon}\sigma^2 t^{3-2a}}{3}\right). \tag{11}$$

*(ii)*. *When α* > 3/2*, time series Mt*;*<sup>α</sup> and Yt*;*<sup>α</sup> asymptotically vanish, i.e.,*

$$
\overline{M}\_{t; \mathfrak{x}} \xrightarrow{d} I\_{\mathbf{0} \star} \ \overline{Y}\_{t; \mathfrak{x}} \xrightarrow{d} I\_{\mathbf{0} \star} \ t \to +\infty. \tag{12}
$$

**Proof.** We show the statement of the theorem first for the time series *Mt*;*α*. Based on the definition of time series (*mt*), i.e., Equation (2), one obtains:

$$\begin{aligned} \overline{M}\_{t\mathbb{X}} &= \frac{1}{l^{\mathbb{R}}} \sum\_{j=1}^{t} m\_{j} = \frac{1}{l^{\mathbb{R}}} \sum\_{j=1}^{t} \left( m\_{0} + \sum\_{k=0}^{j-1} q\_{k} \varepsilon\_{k} \right) \\ &= \frac{1}{l^{\mathbb{R}}} \left[ t m\_{0} + \sum\_{j=0}^{t-1} (t-j) q\_{j} \varepsilon\_{j} \right] = t^{1-\alpha} m\_{0} + \sum\_{k=1}^{t} \frac{k}{l^{\mathbb{R}}} \overline{\xi}\_{t} t - k \cdot l \end{aligned}$$

Thus, the series *Mt*;*<sup>α</sup>* is represented as a sum of uncorrelated RVs *ξt*−*k*, *k* = 1, ... , *t*. By applying the well-known properties of the CFs, as well as the expressions for the CF of the series (*ξt*), the CFs of *Mt*;*<sup>α</sup>* are as follows:

$$\varphi\_{\overline{M};\mu}(u,t) = \varphi\_m\left(\frac{u}{t^{n-1}},0\right)\prod\_{k=1}^t \varphi\_{\overline{\xi}}\left(\frac{ku}{t^{\overline{\alpha}}}\right) \\ = e^{iu\mu t^{1-a}}\prod\_{k=1}^t \left[1 + a\_{\overline{\alpha}}\left(e^{-\frac{k^2\sigma^2 u^2}{2t^{2\overline{\alpha}}}} - 1\right)\right].$$

Taking the logarithm of the function *ϕM*;*α*(*u*, *t*) gives a function:

$$\psi\_M(\mu, t, \alpha) := \ln \varrho\_{\overline{M}; \mathfrak{a}}(\mu, t) = i\mu \mu t^{1-\mathfrak{a}} + \sum\_{k=1}^t f\_k(\mu, t, \alpha),$$

where *fk*(*u*, *t*, *α*) := ln 1 + *ac* exp −*k*2*σ*2*u*2*t* <sup>−</sup>2*α*/2 − 1 . After some computation, we find that, when 0 < *ac* < 1,

$$\begin{cases} \frac{\partial f\_{k}(0,t,a)}{\partial u} = \left. \frac{-\frac{4c^{4}v^{2}\nu^{2}u}{\ell^{2a}}e^{-\frac{k^{2}c^{2}u^{2}}{2\ell^{2a}}}}{1+a\_{c}\left(e^{-\frac{k^{2}c^{2}u^{2}}{2\ell^{2a}}}-1\right)} \right|\_{u=0} = 0\\ \frac{\partial^{2}f\_{k}(0,t,a)}{\partial u^{2}} = \left. \frac{-\frac{4c^{4}v^{2}\nu^{2}}{\ell^{2a}}e^{-\frac{k^{2}c^{2}u^{2}}{2\ell^{2a}}}\left((1-a\_{c})\left(1-\frac{k^{2}c^{2}u^{2}}{\ell^{2a}}\right)+a\_{c}e^{-\frac{k^{2}c^{2}u^{2}}{2\ell^{2a}}}\right)}{\left(1+a\_{c}\left(e^{-\frac{k^{2}c^{2}u^{2}}{2\ell^{2a}}}-1\right)\right)^{2}} \right|\_{u=0} = -\frac{a\_{c}k^{2}\sigma^{2}}{t^{2a}}.\end{cases}$$

Thus, the functions *fk*(*u*, *t*, *α*) have local maxima at the point *u* = 0. Using a similar procedure as in [34], that is, by Laplace approximation of functions *fk*(*u*, *t*, *α*) at *u* = 0, one obtains:

$$\begin{split} \Psi\_{M}(\mu,t,\boldsymbol{\mu}) &= i\mu\mu t^{1-\alpha} + \sum\_{k=1}^{t} \left[ \frac{\partial^{2}f\_{k}(0,t,\boldsymbol{\mu})}{\partial\boldsymbol{u}^{2}} \cdot \frac{\boldsymbol{u}^{2}}{\boldsymbol{\Sigma}} + \sigma\_{k}(\boldsymbol{u}^{2}) \right] \\ &= i\mu\mu t^{1-\alpha} + \sum\_{k=1}^{t} \left[ -\frac{a\_{k}k^{2}\sigma^{2}\boldsymbol{u}^{2}}{2t^{2\alpha}} + \sigma\_{k}(t^{-2\alpha}\boldsymbol{u}^{2}) \right] \\ &= i\mu\mu t^{1-\alpha} - \frac{a\_{k}\sigma^{2}\boldsymbol{u}^{2}}{12t^{2\alpha}}t(t+1)(2t+1) + \sigma(t^{3-2\alpha}\boldsymbol{u}^{2}). \end{split}$$

Then, by taking the asymptotic value in the last expression, when *t* → +∞, it follows:

$$\Psi\_M(u,t,a) \sim \begin{cases} \
i u \mu t^{1-\alpha} - a\_c \sigma^2 t^{3-2\alpha}/6, & 1 \le \alpha \le 3/2 \\\ 0, & \alpha > 3/2. \end{cases}$$

Substituting this expression into the CFs *ϕM*;*α*(*u*, *t*), it is easy to conclude that the first part of the theorem, in the sense of the series *Mt*;*α*, is valid.

The proof for the series *Yt*;<sup>α</sup> is carried out analogously. Using Equation (1), as the previously proven facts, we have that

$$\begin{split} \mathbb{T}\_{t;\mathbb{a}} & \quad = \frac{1}{t^{a}} \sum\_{j=1}^{t} \left( m\_{j} + \varepsilon\_{j} \right) = \overline{\mathcal{M}}\_{t;\mathbb{a}} + \sum\_{j=1}^{t} \frac{\varepsilon\_{j}}{t^{a}} = t^{1-a} m\_{0} + \sum\_{k=1}^{t} \frac{k}{t^{a}} \xi\_{t-k} + \sum\_{k=0}^{t-1} \frac{\varepsilon\_{t-k}}{t^{a}} \\ & = t^{1-a} m\_{0} + \frac{\varepsilon\_{t}}{t^{a}} + \sum\_{k=1}^{t} \left( 1 + kq\_{t-k} \right) \frac{\varepsilon\_{t-k}}{t^{a}}. \end{split}$$

Since RVs *εt*−*k*, *k* = 0, 1, ... , *t*, are mutually independent, after some computation, we obtain the CFs of series *Yt*;*α* as follows:

$$\begin{split} \boldsymbol{\varrho}\_{\mathsf{T},\mathsf{X}}(\boldsymbol{u},t) &= \boldsymbol{\varrho}\_{\mathsf{M}}\left(\frac{\boldsymbol{u}}{t^{a-1}},0\right)\boldsymbol{\varrho}\_{\mathsf{E}}\left(\frac{\boldsymbol{u}}{t^{a}}\right)\prod\_{k=1}^{t}\left[\left(1-\boldsymbol{a}\_{c}\right)\boldsymbol{\varrho}\_{\mathsf{E}}\left(\frac{\boldsymbol{u}}{t^{a}}\right)+a\_{c}\boldsymbol{\varrho}\_{\mathsf{E}}\left(\frac{(k+1)u}{t^{a}}\right)\right] \\ &= \boldsymbol{\varepsilon}^{\mathrm{i}\mu\boldsymbol{\mu}^{1-a}-\frac{\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}}{2t^{2a}}}\prod\_{k=1}^{t}\left[\boldsymbol{e}^{-\frac{\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}}{2t^{2a}}}+a\_{c}\left(\boldsymbol{e}^{-\frac{(k+1)^{2}\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}}{2t^{2a}}}-\boldsymbol{e}^{-\frac{\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}}{2t^{2a}}}\right)\right] \\ &= \boldsymbol{\varepsilon}^{\mathrm{i}\mu\boldsymbol{\mu}^{1-a}-\frac{\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}(t+1)}{2t^{2a}}}\prod\_{k=1}^{t}\left[1+a\_{c}\left(\boldsymbol{e}^{-\frac{(k^{2}+2k)\boldsymbol{\varepsilon}^{2}\boldsymbol{u}^{2}}{2t^{2a}}}-1\right)\right]. \end{split}$$

From here, using the same procedure as in the previous part of the proof, i.e., by taking the logarithm of the function ϕ*Y*;α(*u*, *t*), and by developing ψ*Y*(*u*, *t*, α) := ln *ϕY*;*α*(*u*, *t*) at the point *u* = 0, we have:

$$\begin{split} \Psi\_{Y}(u,t,a) &= i\mu\mu t^{1-a} - \frac{\sigma^{2}u^{2}(t+1)}{2t^{2a}} + \sum\_{k=1}^{t} \ln\left[1 + a\_{c}\left(e^{-\frac{(k^{2}+2k)\sigma^{2}u^{2}}{2t^{2a}}} - 1\right)\right] \\ &= i\mu\mu t^{1-a} - \frac{\sigma^{2}u^{2}(t+1)}{2t^{2a}} - \sum\_{k=1}^{t} \left[\frac{a\_{c}(k^{2}+2k)\sigma^{2}u^{2}}{2t^{2a}} + \sigma\_{k}\left(t^{-2a}u^{2}\right)\right] \\ &= i\mu\mu t^{1-a} - \frac{\sigma^{2}u^{2}}{2}\left(t^{1-2a} + t^{-2a}\right) - a\_{c}\frac{\sigma^{2}u^{2}}{12t^{2a}}t(t+1)(2t+7) \\ &+ \sigma\left(t^{3-2a}u^{2}\right). \end{split}$$

Finally, taking the asymptotic values, when *t* → +∞, one obtains:

$$\psi\_Y(u,t,a) \sim \begin{cases} iu\mu t^{1-a} - \frac{a^2\mu^2}{2} \left(t^{1-2a} + t^{-2a} + \frac{a\_\ell t^{3-2a}}{3}\right), & 1 \le a \le 3/2\\ 0, & a > 3/2. \end{cases}$$

Substituting this expression into CFs *ϕY*;*α*(*u*, *t*), the entire statement of the theorem is proved.

**Remark 3.** In the previous theorem, the case α = 3/2 is particularly interesting because Equation (11) then gives the following convergences:

$$\frac{1}{t^{3/2}}\sum\_{j=1}^{t} m\_j \stackrel{d}{\rightarrow} \mathcal{N}\left(0, \frac{a\_\epsilon \sigma^2}{3}\right), \ \frac{1}{t^{3/2}}\sum\_{j=1}^{t} y\_j \stackrel{d}{\rightarrow} \mathcal{N}\left(0, \frac{a\_\epsilon \sigma^2}{3}\right), \ t \rightarrow +\infty. \tag{13}$$

We will call these convergences, in the usual way, *central limit theorems (CLTs) for the GSB process*. As will be seen below, they will be helpful for estimating the unknown parameters of the GSB process, primarily the conditional variance *σ*2.

#### **4. Parameter Estimation Procedures**

Now, let us consider the problem of estimation of (unknown) parameters of the GSB process, the critical value (*c*), mean value (*μ*), and conditional variance (*σ*2). To estimate the first parameter *c*, a series of increments (*Xt*) will be used as the (only) observable and stationary component of the GSB model. Recall that we have named this series the Split-MA (1) process because it is close to standard, linear MA models. Although some of the estimation procedures we present here are like standard estimation methods in MA models (see, for instance [36]), the specificity of the Split-MA (1) model requires additional testing and analysis, primarily of the quality of the obtained estimates. To that end, the consistency and asymptotic normality of the estimators were examined. After that, several new approaches were considered, based on the observation of non-stationary time series (*yt*). The main goal of these procedures is aimed at obtaining the estimated values of the parameters *μ* and *σ*2.

#### *4.1. Estimates of Critical Value (c)*

Let (*Xt*) be the Split-MA (1) process defined by Equation (4). As we have already shown, the first correlation coefficient of this series is:

$$
\rho\_X(1) = -\frac{b\_c}{1 + b\_c}, \ 0 < b\_c < 1.
$$

From here, by solving on *bc*, we get the estimated value of this parameter:

$$\bar{b}\_c = -\frac{\wp\_X(1)}{1 + \wp(1)}, \ 0 < b\_c < 1,\tag{14}$$

where:

$$\phi\_X(1) = \left(\sum\_{t=1}^T X\_t X\_{t-1}\right) \left(\sum\_{t=1}^T X\_t^2\right)^{-1}$$

is the estimated value of the first correlation. Based on the estimate *bc*, the corresponding estimate of the critical value *<sup>c</sup>* <sup>=</sup> *<sup>c</sup>*can be determined as a solution to the equation:

$$P\left\{\varepsilon\_t^2 \le c\right\} = \tilde{b}\_c.$$

According to Equation (14), it is easy to see that *bc* and *<sup>c</sup>*are appropriate estimates if the following inequalities hold:

$$0 < \dot{b}\_{\varepsilon} < 1 \quad \Longleftrightarrow \quad -0.5 < \dot{\rho}\_X(1) < 0.1$$

In [9], it was shown that thus obtained estimators are strictly consistent if the innovations (*εt*) have a continuous distribution. Moreover, the estimates *bc* and *<sup>c</sup>*will also be asymptotically normal (AN) if the RVs (*εt*) have a symmetric distribution. Note that both conditions are fulfilled in the case of Gaussian innovations ε*<sup>t</sup>* : N 0, σ<sup>2</sup> , when the RVs (*εt*/σ) <sup>2</sup> have a *χ*<sup>2</sup> <sup>1</sup> distribution. Thus, the estimate of the critical value *<sup>c</sup>* is simply found from the equality:

$$
\overline{\mathcal{E}} = \overline{\mathcal{o}}^2 \cdot \overline{\mathcal{F}}\_{\chi\_1^2}^{-1} \left( \overline{b}\_{\mathcal{E}} \right). \tag{15}
$$

Here, *σ*<sup>2</sup> is the estimated variance of innovations (*εt*) which will be described later.

However, it can be shown that, as for the linear MA series, the estimate *bc* is not the most efficient estimate for *bc* (asymptotic efficiency of the estimate *bc* is analyzed at the end of this subsection). To obtain more efficient estimates of the given parameters, we will modify the well-known Gauss-Newton method of estimating the parameters of nonlinear functions (see, for instance [36]). First, notice that Equation (4) can be written in the form:

$$
\varepsilon\_t = X\_t + \theta\_{t-1} \varepsilon\_{t-1}, \ t = 1, \dots, T
$$

or, in functional form,

$$
\varepsilon\_t(X,\theta) = X\_t + \theta\_{t-1}\varepsilon\_{t-1}(X,\theta). \tag{16}
$$

On the other hand, if we define a series of RVs as

$$\mathcal{W}\_t(X,\theta) = \theta\_t \mathcal{W}\_{t-1}(X,\theta) + \varepsilon\_{t-1}(X,\theta),\tag{17}$$

then it is easy to see that the RVs *Wt*(*X*, *θ*) are F*t*−<sup>1</sup> adapted, for each *t* = 1, ... , *T*, and thus independent of *ε<sup>t</sup>* and *θt*+1. According to mentioned properties of RVs (*θt*) and (*εt*), it follows that (*Wt*(*X*, *θ*)) is a stationary and ergodic series of RVs (see, for more detail [37]) with *E*(*Wt*(*X*, *θ*)) = 0 and correlation function *ρW*(*h*) = *b* |*h*| *<sup>c</sup>* , *h* = 0, ±1, ... To this series, using the procedure described in [38], we add the so-called residual series:

$$R\_l(X, \theta) = \mathcal{W}\_l(X, \theta) - b\_\varepsilon \mathcal{W}\_{l-1}(X, \theta). \tag{18}$$

The RVs *Rt*(*X*, *θ*) are also F*t*−<sup>1</sup> adapted and mutually non-correlated, which can easily be shown. Namely, by applying Equations (16)–(18), for any integer *h* > 0, one obtains:

$$\begin{array}{lcl} \mathbb{C}ov(R\_t(X,\theta),R\_{t+h}(X,\theta)) &= \mathbb{E}(R\_t(X,\theta)R\_{t+h}(X,\theta)) \\ &= \mathbb{E}[R\_t(X,\theta)(\mathcal{W}\_{t+h}(X,\theta) - b\_t\mathcal{W}\_{t+h-1}(X,\theta))] \\ &= \mathbb{E}(R\_t(X,\theta)\mathcal{W}\_{t+h}(X,\theta)) - b\_t\mathbb{E}(R\_t(X,\theta)\mathcal{W}\_{t+h-1}(X,\theta)) \\ &= \mathbb{E}[R\_t(X,\theta)\theta\_{t+h}\mathcal{W}\_{t+h-1}(X,\theta)] - b\_t\mathbb{E}(R\_t(X,\theta)\mathcal{W}\_{t+h-1}(X,\theta)) = 0. \end{array}$$

Thus, Equation (18) defines the series (*Wt*(*X*, *θ*)) as a linear autoregressive (AR) process with innovations (*Rt*(*X*, *θ*)). From here, we obtain another estimate of the unknown parameter *bc* ∈ (0, 1) by the following algorithmic procedure:


$$\begin{aligned} \tilde{\theta}\_t &:= I\left(\varepsilon\_{t-1}^2 \left(X, \overline{\theta}\right) \le \overline{\varepsilon}\right) \\ \varepsilon\_t \left(X, \overline{\theta}\right) &:= X\_t + \tilde{\theta}\_{t-1} \varepsilon\_{t-1} \left(X, \overline{\theta}\right) \\ W\_t \left(X, \overline{\theta}\right) &:= \tilde{\theta}\_t W\_{t-1} \left(X, \overline{\theta}\right) + \varepsilon\_{t-1} \left(X, \overline{\theta}\right) \\ R\_t \left(X, \overline{\theta}\right) &:= W\_t \left(X, \overline{\theta}\right) - \tilde{b}\_t W\_{t-1} \left(X, \overline{\theta}\right), \end{aligned}$$

where *θ* <sup>0</sup> <sup>=</sup> 1, *<sup>ε</sup>*<sup>0</sup> *X*, *θ* = *ε*−<sup>1</sup> *X*, *θ* = *W*<sup>0</sup> *X*, *θ* = 0.

(3) Using the standard regression procedure, i.e., the correlation function *ρW*(*h*) when *h* = 1, obtain an estimate of *bc* in the form:

$$\hat{\boldsymbol{\theta}}\_{\mathcal{E}} = \left(\sum\_{t=0}^{T-1} \mathcal{W}\_t \left(\mathbf{X}\_{\prime} \boldsymbol{\tilde{\theta}} \right) \mathcal{W}\_{t+1} \left(\mathbf{X}\_{\prime} \boldsymbol{\tilde{\theta}} \right) \right) \left(\sum\_{t=1}^{T} \mathcal{W}\_t^2 \left(\mathbf{X}\_{\prime} \boldsymbol{\tilde{\theta}} \right) \right)^{-1} \boldsymbol{\tilde{\theta}}$$

(4) As in the first step, based on the estimate ˆ *bc*, the critical value *c*ˆ can be estimated as a solution of the equation (concerning *c*):

$$P\{\mathfrak{e}\_t^2 \le c\} = \hat{b}\_c.$$

We emphasize that in [9], strict consistency and AN of the estimates *bc* and *<sup>c</sup>*as well as ˆ *bc* and *c*ˆ was proved. At the same time, the distribution of innovations (*εt*) was not explicitly used there. In the case of GSB process, where innovations are Gaussian distributed, we can express these results as follows:

**Theorem 4.** *Estimates bc and* <sup>ˆ</sup> *bc are strictly consistent for the parameter bc, i.e., it is valid that:*

$$
\overline{b}\_c \stackrel{as}{\rightarrow} b\_{c\prime} \quad \hat{b}\_c \stackrel{as}{\rightarrow} b\_{c\prime} \quad T \rightarrow +\infty.
$$

*Moreover, the estimates bc* and <sup>ˆ</sup> *bc are asymptotically normal for bc, i.e.,*

$$
\sqrt{T}\left(\tilde{b}\_{\mathsf{c}} - b\_{\mathsf{c}}\right) \stackrel{d}{\to} \mathcal{N}\left(0, \bar{\mathcal{V}}\right)\_{\mathsf{f}} \sqrt{T}\left(\hat{b}\_{\mathsf{c}} - b\_{\mathsf{c}}\right) \stackrel{d}{\to} \mathcal{N}\left(0, \bar{\mathcal{V}}\right)\_{\mathsf{f}} \ T \to +\infty,
$$

$$
\text{where } \tilde{\mathcal{V}}\left(b\_{\mathsf{c}}\right) = \left(b\_{\mathsf{c}} + 1\right)^{2}\left(2b\_{\mathsf{c}}^{2} + 4b\_{\mathsf{c}} + 1\right) \text{ and } \hat{\mathcal{V}}\left(b\_{\mathsf{c}}\right) = \left(1 - b\_{\mathsf{c}}\right)\left(3b\_{\mathsf{c}}^{2} + 3b\_{\mathsf{c}} + 1\right).
$$

**Remark 4.** Based on the previous theorem, the consistency and AN of the estimates *<sup>c</sup>*and *<sup>c</sup>*ˆ, as continuous functions of *bc* and <sup>ˆ</sup> *bc*, is also valid (see, for instance [9] or [39] p. 24). Additionally, for any *bc* <sup>∈</sup> (0, 1), the inequality *<sup>V</sup>*ˆ(*bc*) <sup>≤</sup> *<sup>V</sup>*(*bc*) holds when the equality is valid only for *bc* = 0, as can be seen in Figure 3. This means that asymptotic variance *V*ˆ(*bc*), as a measure of "scattering" ˆ *bc* from the true value *bc*, is (significantly) smaller than *<sup>V</sup>*(*bc*). So, ˆ *bc* is a more efficient estimate than *bc*, which justifies its introduction.

**Figure 3.** Graphs of the asymptotic variances of the estimates *bc*. (dashed line) and <sup>ˆ</sup> *bc* (solid line), depending on *bc* ∈ (0, 1).

#### *4.2. Estimates of Mean* (*μ*)

As an estimator for the parameter *μ* = *E*(*yt*), the sample mean of series (*yt*) was usually used:

$$
\overline{\mu} := \overline{y}\_T = \frac{1}{T} \sum\_{t=1}^T y\_t. \tag{19}
$$

This estimator is obviously unbiased *<sup>E</sup>*(*μ*) <sup>=</sup> *<sup>E</sup>*(*yT*) <sup>=</sup> *<sup>μ</sup>*, but its variance is not bounded. Namely, using the previously defined *α*-mean series *YT*;*<sup>α</sup>* when *α* = 1, we can represent the estimator *μ*ˆ as a sum of uncorrelated RVs:

$$
\overrightarrow{\mu} = m\_0 + \frac{1}{T} \left[ \sum\_{k=1}^{T} (1 + kq\_{T-k}) \varepsilon\_{T-k} + \varepsilon\_T \right].
$$

Thus, for the variance of *<sup>μ</sup>* we get:

$$\begin{split} \tilde{V} &:= V(\tilde{\mu}) \quad = \frac{1}{T^2} \Big[ \sum\_{k=1}^T V((1 + kq\_{T-k})\varepsilon\_{T-k}) + V(\varepsilon\_T) \Big] \\ &= \frac{\varrho^2}{T^2} \Big[ \sum\_{k=1}^T E(1 + kq\_{T-k})^2 + 1 \Big] \\ &= \frac{\varrho^2}{T^2} \Big[ \sum\_{k=1}^T \left( 1 + a\_c k (k + 2) \right) + 1 \Big] \\ &= \frac{\varrho^2}{T^2} \Big[ T + 1 + a\_c \frac{T(T+1)(2T+7)}{6} \Big] \\ &= \frac{\varrho^2 (T+1)}{T^2} \Big( 1 + a\_c \frac{T(2T+7)}{6} \Big) \\ &= \frac{a\_c \sigma^2 T}{3} + \mathcal{O} \left( T^{-1} \right) \to +\infty, \ T \to +\infty. \end{split}$$

Note that, as expected, the variance *<sup>V</sup>* <sup>=</sup> *<sup>V</sup>*(*μ*) is asymptotically identical to that in Theorem 3, i.e., as in Equation (11), when *<sup>α</sup>* <sup>=</sup> 1. Moreover, *<sup>V</sup>* <sup>=</sup> 0 when *ac* <sup>=</sup> 0, that is, in the case of extremely large values of the parameter *c*. However, in practical applications, this condition is usually not met.

An alternative way to obtain an estimate for *μ* is to take the sample mean of the mean series *yt* , when *t* = 1, . . . , *T*, i.e.,

$$\hat{\mu} := \frac{1}{T} \sum\_{t=1}^{T} \overline{\mathfrak{y}}\_{t} = \frac{1}{T} \sum\_{t=1}^{T} \omega\_{t} y\_{t}. \tag{20}$$

Here, *<sup>ω</sup><sup>t</sup>* :<sup>=</sup> *<sup>H</sup>*(*T*) <sup>−</sup> *<sup>H</sup>*(*<sup>t</sup>* <sup>−</sup> <sup>1</sup>) and *<sup>H</sup>*(*t*) :<sup>=</sup> *<sup>t</sup>* ∑ *j*=1 *j* <sup>−</sup>1, *t* = 1, ... , *T* are the harmonic numbers,

with assumption *H*(0) = 0. Obviously, *μ*ˆ is also an unbiased estimate of the parameter *μ*, but with weights that are more pronounced at the "older" points of time (*t*) in which realizations of the series (*yt*) are observed. This is consistent with the fact that the covariances of RVs *yt* depend on these "older" time indices. Moreover, as shown in Section 2, at these time points, the covariances of RVs *yt* are equal to their variances. For these reasons, it is expected that the estimate *<sup>μ</sup>*<sup>ˆ</sup> will be more efficient than *<sup>μ</sup>*. Indeed, using a similar procedure as before, we first represent the estimate *μ*ˆ as a sum of uncorrelated RVs:

$$\begin{aligned} \hat{\mu} &= \frac{1}{T} \sum\_{t=1}^{T} \omega\_t \left( m\_0 + \sum\_{j=0}^{t-1} q\_j \varepsilon\_j \right) + \frac{1}{T} \sum\_{t=1}^{T} \omega\_t \varepsilon\_t \\ &= \frac{1}{T} \left[ m\_0 \sum\_{t=1}^{T} \omega\_t + \sum\_{j=0}^{T-1} \left( q\_j \varepsilon\_j \sum\_{t=j+1}^{T} \omega\_t \right) + \sum\_{t=1}^{T} \omega\_t \varepsilon\_t \right]. \end{aligned}$$

As for each *j* = 1, . . . , *T*, the statement below holds:

$$\sum\_{t=j}^{T} \omega\_t = \sum\_{t=j}^{T} (H(T) - H(t-1)) = \sum\_{t=j}^{T} \sum\_{k=t}^{T} \frac{1}{k} = T - (j-1)\left(\omega\_j + 1\right)\omega\_k$$

it follows that it can also be written:

$$\begin{aligned} \hat{\mu}\_{\varepsilon} &= \frac{1}{T} \left[ T(m\_0 + q\_0 \varepsilon\_0) + \sum\_{j=1}^{T-1} \left( T - j(\omega\_{j+1} + 1) \right) q\_j \varepsilon\_j \right] + \frac{1}{T} \sum\_{t=1}^{T} \omega\_t \varepsilon\_t \\ &= m\_0 + q\_0 \varepsilon\_0 + \frac{1}{T} \sum\_{j=1}^{T-1} (\varepsilon\_j q\_j + \omega\_j) \varepsilon\_j + \frac{\varepsilon\_T}{T^2} \end{aligned}$$

where *cj* = *T* − *j ωj*+<sup>1</sup> + 1 . Thus, after some computation, the variance of *μ*ˆ one obtains is:

$$\begin{split} \mathcal{V} := \mathcal{V}(\boldsymbol{\hat{\mu}}) &= \frac{1}{T^2} \Bigg[ \sum\_{j=1}^{T-1} \mathbb{E} \left( c\_j q\_j + \omega\_j \right)^2 \mathbb{E} \left( \varepsilon\_j^2 \right) + \frac{\mathbb{E} \left( \varepsilon\_T^2 \right)}{T^2} \Bigg] \\ &= \frac{\sigma^2}{T^2} \Bigg[ \sum\_{j=1}^{T-1} \left( a\_c c\_j \left( c\_j + 2\omega\_j \right) + \omega\_j^2 \right) + \frac{1}{T^2} \Bigg] \\ &= \frac{\sigma^2 \left( a\_c (T-1) - 2 \right) H(T-1) H(T)}{T} + \sigma \left( H^{-2} (T) \right) \\ &= a\_c \sigma^2 H^2 (T) + \sigma \left( H^{-2} (T) \right) \to +\infty \ \end{split}$$

Notice that the variance of *V*ˆ := *V*(*μ*ˆ) is also unbounded, but with a lower asymptotic order than *<sup>V</sup>* <sup>=</sup> *<sup>V</sup>*(*μ*), since:

$$\lim\_{T \to +\infty} \frac{V(\hat{\mu})}{V(\hat{\mu})} = \lim\_{T \to +\infty} \frac{H^2(T)}{T} = 0.$$

This means that the estimate *<sup>μ</sup>*<sup>ˆ</sup> is (asymptotically) more efficient than *<sup>μ</sup>*, which can be seen in Figure 4. Here are shown 3D plots of both variances *<sup>V</sup>* and *<sup>V</sup>*<sup>ˆ</sup> , which were observed as functions of two variables *ac* ∈ (0, 1) and *T* > 0.

**Figure 4.** Variances shown as 3D plots of the estimate *<sup>μ</sup>* (**a**) and estimate *<sup>μ</sup>*<sup>ˆ</sup> (**b**), depending on *ac* ∈ (0, 1) and *T* > 0. (The variance of innovations is *σ*<sup>2</sup> = 1).

#### *4.3. Estimates of Variance σ*2

Let us consider determining the estimates of the third unknown parameter *σ*2, which represents the variance of the innovations (*εt*), that is, the conditional variance of the base series (*yt*). It is precisely these facts that enable different estimation procedures for the parameter *<sup>σ</sup>*2. First, notice that based on the previously obtained estimates *bc* and <sup>ˆ</sup> *bc*, i.e., the modeled innovation values (*εt*) given by Equation (16), the variance *σ*<sup>2</sup> can be easily estimated. The usual estimation procedure is based on sampling variance:

$$\overline{\sigma}^2 = \frac{1}{T} \sum\_{t=1}^T \varepsilon\_t^2 \left( \mathbf{X}\_t \overline{\theta} \right) \text{ or } \mathcal{O}^2 = \frac{1}{T} \sum\_{t=1}^T \varepsilon\_t^2 \left( \mathbf{X}\_t \theta \right). \tag{21}$$

Here, *ε<sup>t</sup> X*, *θ* are *ε<sup>t</sup> X*, ˆ *θ* modeled innovation values obtained from the estimates *bc* and <sup>ˆ</sup> *bc*, respectively. Notice that in the case of Gaussian innovations (*εt*), the estimates given by Equation (21) are identical to the maximum likelihood estimators. Indeed, the log-likelihood function then reads as follows:

$$L(y\_1, \dots, y\_T; \sigma^2) = -\frac{T}{2} \ln(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum\_{t=1}^T (y\_t - m\_t)^2 \dots$$

and by solving the equation *∂L y*1,..., *yT*; *σ*<sup>2</sup> /*∂σ*<sup>2</sup> = 0, the estimate of *σ*<sup>2</sup> is obtained as in Equation (21), that is, as the sample variance of the series (*εt*). Thus, the consistency and AN of both estimates *σ*<sup>2</sup> and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> can be readily shown. We note that due to their equivalence, only the estimate *σ*ˆ <sup>2</sup> will be further considered (see Theorem below).

On the other hand, note that the previous estimation procedure is based on unobservable, modeled values of innovations (*εt*). Another approach to estimating the variance *σ*<sup>2</sup> is based on the so-called two-stage procedure, using the previously estimated parameter ˆ *bc*. By applying the equality *V*(*Xt*) = *E X*2 *t* = *σ*2(*bc* + 1), as well as the sample variance of the series (*Xt*), we can obtain an estimate:

$$\mathfrak{d}\_X^2 = \frac{1}{T\left(\hat{b}\_c + 1\right)} \sum\_{t=1}^T X\_t^2. \tag{22}$$

Then, it follows:

**Theorem 5.** *Estimates σ*ˆ <sup>2</sup> *and σ*ˆ <sup>2</sup> *<sup>X</sup> are strictly consistent for the parameter <sup>σ</sup>*2*, i.e., it is valid to put:*

$$
\sigma^2 \stackrel{as}{\rightarrow} \sigma^2, \; \sigma\_X^2 \stackrel{as}{\rightarrow} \sigma^2, \; T \rightarrow +\infty.
$$

*Moreover, the estimates σ*ˆ <sup>2</sup> *and σ*ˆ <sup>2</sup> *<sup>X</sup> are asymptotically normal for* <sup>σ</sup>2*, i.e.,*

$$\sqrt{T}\left(\mathcal{O}^2 - \sigma^2\right) \xrightarrow{d} \mathcal{N}(0, V\_1), \quad \sqrt{T}\left(\mathcal{O}^2\_X - \sigma^2\right) \xrightarrow{d} \mathcal{N}(0, V\_2), \ T \to +\infty,\tag{23}$$

*where V*<sup>1</sup> = 2*σ*<sup>4</sup> *and V*<sup>2</sup> = σ<sup>4</sup> 2 + 11*bc* − *b*<sup>2</sup> *c* 1 + 2*bc* − 3*b*<sup>3</sup> *c* −<sup>1</sup> .

**Proof.** Since *ε*2 *t* is an IID series of RVs, the stationarity and ergodicity of this series are apparent. Applying the strong low of large numbers (SLLS), it follows:

$$
\boldsymbol{\sigma}^2 = \frac{1}{T} \sum\_{t=1}^T \boldsymbol{\varepsilon}\_t^2(\mathbf{X}, \boldsymbol{\hat{\theta}}) \stackrel{a\mathbf{s}}{\rightarrow} \boldsymbol{\sigma}^2.
$$

Furthermore, it can easily be shown that *V σ*ˆ 2 = 2*σ*4/*T* is the variance of the estimate *σ*ˆ 2. Thus, applying the central limit theorem (CLT), the first convergence in Equation (23) is obtained.

To prove the properties of the estimate *σ*ˆ <sup>2</sup> *<sup>X</sup>*, we note that *X*2 *t* is also a stationary and ergodic series of RVs. If SLLS is now applied to the following statistics:

$$\mathbf{X}\_t^2 := \frac{1}{T} \sum\_{t=1}^T X\_{t\prime}^2 \tag{24}$$

then one obtains:

$$\frac{1}{T}\sum\_{t=1}^{T}X\_t^2 \stackrel{as}{\rightarrow} \sigma^2(b\_c+1).$$

At the same time, according to Theorem 4, we have that ˆ *bc* is a strongly consistent estimator of *bc*, i.e., ˆ *bc* <sup>+</sup> <sup>1</sup> *as* → *bc* + 1, when *T* → +∞. Thus, the last two convergences give:

$$
\sigma\_X^2 = \frac{\overline{X}\_t^2}{\hat{b}\_\varepsilon + 1} \stackrel{a\_\xi}{\rightarrow} \sigma^2, \ T \rightarrow +\infty.
$$

To prove the AN of the estimate *σ*ˆ <sup>2</sup> *<sup>X</sup>*, note first that the sequence *X*2 *t* is 1-dependent, in the sense of Definition 6.3.1 in [36] (p. 245). According to Cauchy-Swarz and Minkowski inequalities, applied to Equation (4), i.e., the sixth moment of the sum *Xt* = *ε<sup>t</sup>* + (−*θt*−1*εt*−1), it follows that:

$$\begin{aligned} E|X\_t|^6 &\le \left[ \left( E|\varepsilon\_t|^6 \right)^{1/6} + \left( b\_c |E| \varepsilon\_{t-1}|^6 \right)^{1/6} \right]^6 \\ &\le 15\sigma^6 \left( 1 + b\_c^{1/6} \right)^6 < +\infty. \end{aligned}$$

Then, the Hoeffding-Robbins theorem [40] can be applied, based on which it follows:

$$\sqrt{T}\overline{\mathbf{X}}\_t^2 = T^{-1/2} \sum\_{t=1}^T \mathbf{X}\_t^2 \stackrel{d}{\to} \mathcal{N}\left(\sigma^2 (b\_t + 1), V\_0\right),\tag{25}$$

for which:

$$\begin{array}{l} V\_0 &= V\left(X\_t^2\right) + 2Cov\left(X\_t^2, X\_{t+1}^2\right) = E\left(X\_t^4\right) + 2E\left(X\_t^2 X\_{t+1}^2\right) - 3\sigma^4 \left(1 + b\_c\right)^2 \\ &= 3\sigma^4 \left(1 + 3b\_c\right) + 2\sigma^4 \left(1 + 4b\_c + b\_c^2\right) - 3\sigma^4 \left(1 + b\_c\right)^2 \\ &= \sigma^4 \left(2 + 11b\_c - b\_c^2\right). \end{array}$$

By applying the almost sure convergence of the estimate ˆ *bc* and the previously obtained convergence in Equation (25), we have

$$\sqrt{T}\mathcal{O}\_X^2 = \frac{\sqrt{T}\mathbb{X}\_t^2}{\mathfrak{b}\_\mathcal{E}+1} \stackrel{d}{\to} \mathcal{N}\left(\sigma^2, V\_2\right), \ T \to +\infty\prime$$

where *V*<sup>2</sup> = *V*0/*V*ˆ(*bc*). Thus, according to Theorem 4, the second convergence in Equation (23) is obtained.

**Remark 5.** As in Theorem 4, by comparing the asymptotic variances *V*<sup>1</sup> and *V*<sup>2</sup> for the estimates *σ*ˆ <sup>2</sup> and *σ*ˆ <sup>2</sup> *<sup>X</sup>*, respectively, it is easy to see that inequality *V*<sup>1</sup> ≤ *V*<sup>2</sup> holds. At the same time, the equality *V*<sup>1</sup> = *V*<sup>2</sup> = 2*σ*<sup>4</sup> is valid only when *bc* = 0 (Figure 5a), so the estimator *σ*ˆ <sup>2</sup> is more efficient than *σ*ˆ <sup>2</sup> *X*.

**Figure 5.** (**a**) Graphs of the asymptotic variances of the estimates *σ*ˆ <sup>2</sup> (dashed line) and *σ*ˆ <sup>2</sup> *<sup>X</sup>* (solid line), depending on *bc* <sup>∈</sup> (0, 1). (**b**) Plot in 3D of the variance of statistics *<sup>X</sup>*<sup>2</sup> *<sup>t</sup>* , depending on *bc* ∈ (0, 1) and *T* > 0. (The variance of the innovations is *σ*<sup>2</sup> = 1).

However, according to the proof of the previous theorem, it can be easily seen that for the variance of the statistics *X*<sup>2</sup> *<sup>t</sup>* , given by Equation (24), is valid (Figure 5b):

$$V(\overline{X}\_t^2) = \frac{\sigma^4 \left(2 + 11b\_t - b\_c^2\right)}{T} \to 0, \ T \to +\infty.$$

Thus, *X*<sup>2</sup> *<sup>t</sup>* can be used as an estimator of the "hybrid" parameter *σ*2(*bc* + 1), which will be of interest for practical research, that is, the application of the GSB model discussed below.

Finally, another approach to finding estimates of the variance *σ*<sup>2</sup> is based on the observations of the non-stationary series (*yt*). Applying Theorem 3, i.e., the previously proven convergence in Equation (13), we have:

$$\mathbb{T}\_{T;3/2} := \frac{1}{T^{3/2}} \sum\_{t=1}^{T} y\_t \stackrel{d}{\to} \mathcal{N}\left(0, \frac{a\_t \sigma^2}{3}\right), \quad T \to +\infty.$$

If we now consider the statistics:

$$S\_T^2 := \overline{Y}\_{T;3/2}^2 = \frac{1}{T^3} \left(\sum\_{t=1}^T y\_t\right)^2 = \frac{1}{T^3} \sum\_{j=1}^T \sum\_{k=1}^T y\_j y\_{k'} \tag{26}$$

after some computation, one obtains:

$$\begin{array}{ll} E\left(S\_{T}^{2}\right) &= \frac{1}{T^{3}} \sum\_{j=1}^{T} \sum\_{k=1}^{T} E\left(y\_{j}y\_{k}\right) = \frac{1}{T^{3}} \sum\_{j=1}^{T} \sum\_{k=1}^{T} \left[\mathbb{C}ov\left(y\_{j}y\_{k}\right) + \mu^{2}\right] \\ &= \frac{1}{T^{3}} \sum\_{j=1}^{T} \sum\_{k=1}^{T} \left[\sigma^{2}(\min\{j,k\}a\_{c} + 1) + \mu^{2}\right] \\ &= \frac{\sigma^{2}}{T^{3}} \left[a\_{c} \sum\_{j=1}^{T} \left(j + 2\sum\_{k=1}^{j-1} k\right) + T^{2}\right] + \frac{\mu^{2}}{T} = \frac{\sigma^{2}}{T^{3}} \left(a\_{c} \sum\_{j=1}^{T} j^{2} + T^{2}\right) + \frac{\mu^{2}}{T} \\ &= \frac{\sigma^{2}a\_{c}}{6T^{2}}(T+1)(2T+1) + \frac{\sigma^{2} + \mu^{2}}{T} \to \frac{a\_{c}\sigma^{2}}{3}, \ T \to +\infty. \end{array}$$

Thus, *S*<sup>2</sup> *<sup>T</sup>* is an asymptotically unbiased estimator for *acσ*2/3, and using the estimate *<sup>a</sup>*ˆ*<sup>c</sup>* = 1 − ˆ *bc*, an estimator of the parameter *σ*<sup>2</sup> can be taken as:

$$\mathcal{O}\_Y^2 := \frac{3}{\mathfrak{d}\_c} S\_T^2 = \frac{3}{\mathfrak{d}\_c T^3} \sum\_{j=1}^T \sum\_{k=1}^T y\_j y\_k. \tag{27}$$

#### **5. Numerical Simulation and Application of the GSB Process**

As already mentioned in the introductory section, two important aspects related to the practical implementation of the GSB process will be explored here. Firstly, numerical Monte Carlo simulations of previously obtained GSB estimators are analyzed. Then, based on actual data, the GSB process was applied to analyze the dynamics and distribution of the infected and immunized population with respect to COVID-19 disease in the territory of the Republic of Serbia.

#### *5.1. Numerical Simulations of GSB Estimators*

We first describe a pseudo-algorithm for estimating the parameters of the GSB model based on *N* = 1000 independent Monte Carlo replications of the GSB series. To that end, we assume that all series have size *T* = 500, which is close to the length of the actual series to be considered below. The primary aim is to examine the convergence, i.e., the quality of the previously proposed estimators on a sample of a given length. Therefore, corresponding estimation errors will also be investigated for this purpose. Using the previously presented theoretical facts, the pseudo-algorithm for estimating the parameters of the GSB process can be formulated as follows:


$$
\hat{\sigma}\_{\mathcal{X}}^2 = \frac{\overline{\mathcal{X}}\_t^2}{\overline{b}\_c + 1}
$$

.

3. According to Equation (15) and previously obtained estimates *bc* and *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>X</sup>*, compute the estimator *<sup>c</sup>*<sup>=</sup> *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> *X*·*F*−<sup>1</sup> *χ*2 1 *bc* .

4. By using the estimate *<sup>c</sup>*, for each *<sup>t</sup>* <sup>=</sup> 1, ... , *<sup>T</sup>*, generate the (modeled) values of series (*εt*) and (*mt*), by applying the iterative procedure:

$$\begin{cases} \varepsilon\_t = y\_t - m\_{t\prime} \\ m\_t = m\_{t-1} + \varepsilon\_{t-1} I \{ \varepsilon\_{t-2}^2 \ge \overline{\varepsilon} \} \end{cases} \tag{28}$$

where *ε*<sup>0</sup> = *ε*−<sup>1</sup> = 0, and *m*<sup>0</sup> = *y*<sup>0</sup> = *μ*ˆ is given by Equation (20).


We point out that in the above-mentioned pseudo algorithm, the 2nd stage can be replaced by the following alternative step:

2'.Compute statistics *S*<sup>2</sup> *<sup>T</sup>*, given by Equation (26), and estimate the "hybrid" parameter *acσ*2/3. Then, according to Equation (27), the variance *σ*<sup>2</sup> can be estimated as:

$$
\vartheta\_Y^2 := \frac{3}{\hat{a}\_c} S\_{T'}^2
$$

where *ac* <sup>=</sup> <sup>1</sup> <sup>−</sup> *bc*.

1

By applying this pseudo-algorithm, the obtained values of the estimated parameters can be summarized as shown in Table 1, where their average values (Mean), minimums (Min.), maximums (Max.) can also be seen, along with the appropriate mean squared errors of estimation (MSEE) given in parentheses. Furthermore, testing results concerning the AN of thus obtained estimates are also presented in Table 1. To that end, Anderson-Darling and Cramer-von Mises normality tests were used. Their test statistics (denoted as AD and W, respectively), as well as their corresponding *p*-values, were calculated using procedures from the R-package "nortest" [41].

According to the obtained values, it is evident that most estimators have a property of the AN. This applies even to the estimates of the mean value *<sup>μ</sup>* and *<sup>μ</sup>*ˆ, which are obtained from realizations of non-stationary GSB-series (*yt*). As already explained, this is related to Theorems 2 and 3, which respectively describe the AN properties of the series *yt*/ √*t* and so-called α-means series. Notice that the asymptotic variance of these estimators is not bounded, hence there is a large range of their observed values. On the other hand, the AN property is not particularly emphasized in the case where the critical value (*c*) is estimated. This is because both estimates *<sup>c</sup>*and *<sup>c</sup>*<sup>ˆ</sup> are obtained by the three-step procedure: estimates for the parameters *bc* and *σ*<sup>2</sup> should first be determined, and only then for *c*. In the case of variance estimators *σ*<sup>2</sup> and *<sup>σ</sup>*<sup>ˆ</sup> 2, obtained based on modeled innovations (*εt*), it is easy to see that they have the highest and almost the same efficiency. Furthermore, the values of the estimator *σ*ˆ <sup>2</sup> *<sup>X</sup>* are only slightly "weaker" than *σ*<sup>2</sup> and *<sup>σ</sup>*<sup>ˆ</sup> 2. This is expected since, according to Theorem 5, the AN property holds for all these variance estimators. However, the estimate *σ*ˆ <sup>2</sup> *<sup>Y</sup>* is by far the weakest variance estimate and can be omitted from further analysis. Moreover, based on previously obtained theoretical results, also confirmed through simulations, the most robust estimates of the unknown parameters *c*, *μ*,*σ*<sup>2</sup> are *c*ˆ, *μ*ˆ, *σ*ˆ 2, respectively. For those reasons, these estimators will be used for GSB modeling of actual data on COVID-19, which will be discussed below.


**Table 1.** Summary statistics of estimated parameters of the GSB process, obtained by a Monte Carlo study, along with realized statistics of normality tests.

\* *p* < 0.05, \*\* *p* < 0.01.

#### *5.2. Application of the GSB Process: A Case Study of COVID-19 Dynamics*

In this section we give, as an illustration, a practical application of the GSB process in stochastic modeling of actual data. In other words, as mentioned in the introductory section, we will show that it can be an adequate stochastic model for describing the dynamics of the infected and vaccinated population in relation to the SARS-CoV2 virus on the territory of the Republic of Serbia. To that end, we observe realizations of two time series (*Ut*) and (*Vt*) which, daily, represents the total number of infected persons, i.e., persons vaccinated with the first dose of the vaccine, starting from 24 December 2020 (the start date of vaccination in Serbia) and ending with 6 June 2022. The dynamics of both time series, length *T* = 529, are shown in Figure 6.

**Figure 6.** Dynamics of the total infected (**a**) and vaccinated population (**b**) in relation to the virus SARS-CoV2 on the territory of the Republic of Serbia.

The main statistical indicators of these series (also labeled as Series A and Series B, respectively) are shown in the following Table 2. Based on thus obtained values, it can be concluded that these are time series with distinct, pronounced fluctuations. For instance, the average number of infected people is (approximately) 3650 per day, ranging from 60 to 19,901 infected people. Similar to that, the average number of vaccinated persons is 6348 per day, but the range of vaccinated persons varies from only 4 to as many as 68,678 persons per day. Therefore, we further consider the possibility that the GSB process can be used here as an appropriate stochastic model. For this purpose, as basic sequences, we observe the realizations of the so-called *log-volumes*, i.e., logarithmic values of series (*Ut*) and (*Vt*):

$$y\_t^{(1)} := \ln(\mathcal{U}\_t), \ y\_t^{(2)} := \ln(V\_t), \ t = 0, \ 1, \ldots, T. \tag{29}$$

Notice that the main goal of this transformation is to obtain more evenly distributed values of both series, and although based on increasing of the logarithmic function, the emphasis of fluctuations will remain. Additionally, inequalities *Ut*, *Vt* ≥ 1 implies the non-negativity of both log-volumes series *y* (1) *<sup>t</sup>* , *y* (2) *<sup>t</sup>* ≥ 0 .


**Table 2.** Basic statistical indicators of observed actual series.

Further, using the log-volumes as a basic series, and using Equation (3), the series of increments *X*(1) *t* , *X*(2) *t* are determined entirely. Based on them, the estimates of GSB process parameters can be obtained by applying the pseudo-algorithm presented above. We emphasize that here the estimation procedure is repeated twice, i.e., for both series (A

and B). Thus, modeled values of martingale means and innovations series, generated by Equation (29), are as follows:

$$\begin{cases} \varepsilon\_t^{(j)} = y\_t^{(j)} - m\_t^{(j)},\\ m\_t^{(j)} = m\_{t-1}^{(j)} + \varepsilon\_{t-1}^{(j)} I \left\{ \left( \varepsilon\_{t-2}^{(j)} \right)^2 \ge \overline{\varepsilon} \right\}, \end{cases} \tag{30}$$

where *j* = 1, 2. As initial values of the iterative procedure (30), as before, we have taken *ε* (*j*) <sup>0</sup> = *ε* (*j*) <sup>−</sup><sup>1</sup> <sup>=</sup> 0, as well as *<sup>m</sup>*(*j*) <sup>0</sup> = *y* (*j*) <sup>0</sup> = *μ*ˆ. Table 3 contains the basic statistical indicators of the actual series, log-volumes (*y* (*j*) *<sup>t</sup>* ) and increments *<sup>X</sup>*(*j*) *t* , as well as modeled series, martingale means *<sup>m</sup>*(*j*) *t* and innovations *ε* (*j*) *t* .

**Table 3.** Basic statistical indicators of actual and modeled series.


By analyzing thus obtained values, an interesting connection can be observed, which can be explained by the previous theoretical results. Firstly, the average values of the log-volumes are "close" to the averages of the martingale means, which is in accordance with the equality *E*(*yt*) = *E*(*mt*). Moreover, with series A, almost equal values of other statistical indicators (standard deviations, for instance) are noticeable. This can also be seen by comparing the corresponding statistical indicators of increments *X*(1) *t* and innovations *ε* (1) *t* , which will be explained below. Table 4 shows the above-mentioned estimators obtained according to the previously described procedures. In addition, some other estimates are shown, such as the sample linear correlation *ρ*ˆ*X*(1) and estimates of the

value *bc*. Accordingly, note that the condition −0.5 < *ρ*ˆ*X*(1) < 0 is fulfilled in the cases of both series. Moreover, let us notice, for instance, that the estimated values for *σ*<sup>2</sup> in the case of Series B are "close" to unity, so it can be assumed that innovations (*εt*) in this case have a standard N (0, 1) distribution.

**Table 4.** Estimated values of GSB process parameters.


As we have already pointed out, the most robust estimators of the GSB process are *<sup>c</sup>*ˆ, *<sup>μ</sup>*ˆ, *<sup>σ</sup>*<sup>ˆ</sup> <sup>2</sup> and based on them, modeled values of the series (*m*(*j*) *<sup>t</sup>* ) and (*ε* (*j*) *<sup>t</sup>* ) were obtained. Let us recall that these series, respectively, represent the stability and the impact of fluctuations in the dynamics of the total number of infected and vaccinated people. The agreement between the modeled series and the actual data can be seen in Figure 7a where, along with the empirical values of the log-volumes (*y* (*j*) *<sup>t</sup>* ), modeled values of martingale means (*m*(*j*) *<sup>t</sup>* ) are given. On the other hand, the agreement of a series of increments, i.e., the Split-MA(1) process (*X*(*j*) *<sup>t</sup>* ) with innovations (*ε* (*j*) *<sup>t</sup>* ) is shown in Figure 7b.

It should also be noted that the high agreement between the actual and modeled series is particularly noticeable in the case of series A. This can be explained theoretically, in the way it was done in Section 2. If at some points in time, innovations (*ε* (1) *<sup>t</sup>* ) have a pronounced fluctuation, they become equal to increments (*X*(1) *<sup>t</sup>* ) at the next moment. The agreement between the realizations of these two series will be all the better if, in addition to large and pronounced fluctuations of (*ε* (1) *<sup>t</sup>* ), the critical value *c* is relatively small. Note that this is precisely the case with series A, where "small" estimated values of the parameter *c* indicate the possibility that the true value of this parameter is *c* = 0 (or, equivalently, *bc* = 0). If the sample size is large enough, this assumption can be formally tested by the null hypothesis *H*<sup>0</sup> : *c* = 0 or, equivalently, *H*<sup>0</sup> : *bc* = 0. According to Theorem 4, testing procedures can be based on the normal distribution, that is, using some standard, well-known statistical tests.

Note that in that case, the series of increments (*X*(1) *<sup>t</sup>* ) is equalized with innovations (*ε* (1) *<sup>t</sup>* ). This implies that (*y* (*j*) *<sup>t</sup>* ) is a series with independent increments, i.e.,

$$X\_t^{(1)} = y\_t^{(1)} - y\_{t-1}^{(1)} = \varepsilon\_t^{(1)} \Longleftrightarrow y\_t^{(1)} = y\_{t-1}^{(1)} + \varepsilon\_t^{(1)}.\tag{31}$$

According to Equation (1), it follows that *yt*−<sup>1</sup> (1) = *mt* (1), so all "information from the past" is contained in the previous realization of the series (*y* (1) *<sup>t</sup>* ). In that way, the entire statistical analysis of this series, i.e., the dynamics of the infected population, gains simplicity; namely, series A then has (only) two stochastic components (*y* (1) *<sup>t</sup>* ) and (*ε* (1) *<sup>t</sup>* ), i.e., it represents a random walk series.

Finally, using the inverse transformations of those given in Equation (29), PDFs of actual series (*Ut*) and (*Vt*) are readily obtained:

$$f\_{\mathcal{U}}(\mathbf{x},t) = \frac{1}{\mathcal{X}} f\_{\mathcal{Y}}^{(1)}(\ln \mathbf{x},t), \ f\_{\mathcal{V}}(\mathbf{x},t) = \frac{1}{\mathcal{X}} f\_{\mathcal{Y}}^{(2)}(\ln \mathbf{x},t). \tag{32}$$

Here, *f* (*j*) *<sup>y</sup>* (ln *x*, *t*), *j* = 1, 2 are the PDFs of log-volumes (*y* (*j*) *<sup>t</sup>* ), obtained by differentiating the CDFs given by Equation (9), which can be done simply. Still, due to the non-stationarity of the mentioned series, which also depends on time, it is necessary to apply some numerical procedures to calculate their PDFs. For this purpose, the R-package "distr" [42] has been used, and the results of the applied procedure are shown in Figure 8.

Here are the empirical distributions, i.e., histograms of the number of infected and vaccinated persons per day, with their fitted PDFs, obtained using Equations (32). Due to the non-stationarity of the time series (*Ut*) and (*Vt*), as well as the comparison of the theoretical PDFs, fitting was also performed for the PDFs *fU*(*x*, *t*) and *fV*(*x*, *t*) of length *t* = 50, 10, ... , 500 < *T* = 529 (shown with dashed lines in Figure 8). In the case of the infected population (Series A), according to Equation (31) and the condition c ≈ 0, it follows that RVs *y* (1) *<sup>t</sup>* have (an approximately) normal N *μ*,(*t* + 1)*σ*<sup>2</sup> distribution. Thus, RVs *Ut* will have (an approximately) log-normal distribution, shown with the solid line in Figure 8a. Note that this result is close to that obtained in [29]. Nevertheless, the distribution of the number of vaccinated population (Series B), shown with the solid line in Figure 8b, has a more pronounced "peak" close to the origin. It can also be explained by previous theoretical results, primarily given in Theorem 2, i.e., by Equation (8), which concerns the asymptotic behavior of the main GSB series (*yt*).

**Figure 7.** Graphs of empirical and modeled data: (**a**) log-volumes (solid lines) and martingale means (dashed lines); (**b**) Split-MA(1) process (solid lines) and innovations series (dashed lines). The upper panels represent the dynamics of the COVID-19 infection (Series A), and the lower panels represent the dynamics of the vaccinated population (Series B).

**Figure 8.** Empirical distributions of actual data (histograms) and their fitted PDFs (lines), obtained by the proposed estimation procedure: (**a**) distribution of the infected population (Series A); (**b**) distribution of the vaccinated population (Series B).

#### **6. Conclusions**

The stochastic analysis of the GSB process presented in this paper confirms its possibility in modeling actual time series with pronounced fluctuations. The applied methods of dynamic and statistical analysis, based on this process, aim here to understand the long-term tendency of the SARS-COV2 virus behavior, as well as the immunization process. Along with other contemporary research, we hope this one can help further development of successful methods of overcoming the pandemic. To this end, notice that new strains of the SARS-CoV2 virus, which are very common, can affect the overall symptoms as well as the disease dynamics of COVID-19 (see, c.f. [43–45]). They may therefore change the dynamics of both time series investigated here. This may therefore be a new goal and motivation for some future research.

Finally, let us emphasize that one of the main stochastic advantages of the GSB model is that it allows the simultaneous use of both stationary and non-stationary components. Thereby, the asymptotic behavior of the GSB time series as well as the corresponding estimates thus obtained are of particular importance. It should also be noted that the proposed parameter estimation procedure can be implemented algorithmically in a relatively simple way. Additionally, some other estimation methods, such as the Empirical Characteristic Function (ECF) method described in [12] can be used. As shown in [11,12], it can also be used to model some other types of real data with pronounced and persistent fluctuations.

**Author Contributions:** Conceptualization, M.J.; data curation, M.J.; formal analysis, V.S.; methodology, K.K.; project administration, B.P.; software, K.K. and B.P.; supervision, V.S.; validation, P.C.; ˇ visualization, P.C.; writing—original draft, M.J., V.S. and K.K.; writing—review and editing, B.P. All ˇ authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Ministry of Education, Science and Technological Development of the Republic of Serbia. (Grant number: III 47016.)

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank the Electronic Government of the Republic of Serbia and the Institute for Public Health "Milan Jovanovi´c-Batut" for providing datasets used in this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Yuri Yakubovich 1,\*, Oleg Rusakov <sup>1</sup> and Alexander Gushchin 2,3**


**Abstract:** We consider a sequence of i.i.d. random variables, (*ξ*)=(*ξi*)*i*<sup>=</sup>0,1,2,..., E*ξ*<sup>0</sup> = 0, E*ξ*<sup>2</sup> <sup>0</sup> = 1, and subordinate it by a doubly stochastic Poisson process Π(*λt*), where *λ* ≥ 0 is a random variable and Π is a standard Poisson process. The subordinated continuous time process *ψ*(*t*) = *ξ*Π(*λt*) is known as the PSI-process. Elements of the triplet (Π, *λ*,(*ξ*)) are supposed to be independent. For sums of *<sup>n</sup>*, independent copies of such processes, normalized by <sup>√</sup>*n*, we establish a functional limit theorem in the Skorokhod space *D*[0, *T*], for any *T* > 0, under the assumption E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ for some *h* > 1/*γ*2. Here, *γ* ∈ (0, 1] reflects the tail behavior of the distribution of *λ*, in particular, *γ* ≡ 1 when E*λ* < ∞. The limit process is a stationary Gaussian process with the covariance function Ee−*λu*, *u* ≥ 0. As a sample application, we construct a martingale from the PSI-process and establish a convergence of normalized cumulative sums of such i.i.d. martingales.

**Keywords:** functional limit theorem; Poisson stochastic index process; pseudo-Poisson process; random intensity

**MSC:** 60F17; 60G10; 60G44

#### **1. Introduction**

The Poisson Stochastic Index process (PSI-process) represents a special kind of a random process when the discrete time of a random sequence is replaced by the continuous time of a "counting" process of a Poisson type.

Throughout this paper, we consider the triplet {Π, *λ*,(*ξ*)} of jointly independent components defined on a probability space {Ω, F, P}. Here, Π is a standard Poisson process on R<sup>+</sup> := {*t* ∈ R : *t* ≥ 0}, *λ* is an almost surely (a.s.) non-negative random variable, which plays a role of random intensity, and (*ξ*) denotes a random sequence *ξ*0, *ξ*1, ... of independent and identically distributed (i.i.d.) random variables. Let us define a PSI-process in the following way:

$$\psi(t;\lambda) \equiv \psi(t) := \mathfrak{J}\_{\Pi(\lambda t)^{\prime}} \qquad \qquad t \in \mathbb{R}\_{+} \,. \tag{1}$$

The mechanism of PSI-processes is reduced to sequential replacements of terms of the "driven" sequence (*ξ*) at arrival times of the "driving" doubly stochastic Poisson process Π(*λt*).

Let us introduce a "natural" filtration F ≡ (F*t*)*t*∈R<sup>+</sup> , generated by the PSI-process

$$\mathcal{F}\_{\mathbf{t}} := \sigma\{\,\,\Pi(\lambda s), \mathbf{s} \le \mathbf{t} ; \,\,\sharp\_{0}, \dots, \mathbb{S}\_{k}, \mathbf{k} \le \Pi(\lambda t) \} \subset \mathcal{F}. \tag{2}$$

Note that if the distribution of *ξ*<sup>0</sup> has no atoms, then the natural filtration F coincides with a filtration, which is generated by a compound Poisson type process with the random

**Citation:** Yakubovich, Y.; Rusakov, O.; Gushchin, A. Functional Limit Theorem for the Sums of PSI-Processes with Random Intensities. *Mathematics* **2022**, *10*, 3955. https://doi.org/10.3390/math10213955

Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

Received: 22 September 2022 Accepted: 21 October 2022 Published: 25 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

intensity *<sup>λ</sup>*: *<sup>Y</sup>*(*t*) := <sup>∑</sup>Π(*λt*) *<sup>k</sup>*=<sup>0</sup> *ξ<sup>k</sup>* starting at the random point *ξ*0. (In the case when *ξ*<sup>0</sup> has an atom at 0, some jumps of Π(*λt*) may be "missed" in *Y*, the process *Y* is known as *a stuttering compound Poisson* process. A similar phenomenon happens with a PSI-process when *ξ*<sup>0</sup> has any atom, not necessarily at 0. For details we refer to [1].)

PSI-processes may have a lot of interpretations. For instance, in insurance models and their applications: while a compound Poisson process *Y*(*t*) is monitoring the cumulative value of claims up to a current time *t*, the corresponding PSI-process *ψ*(*t*) is monitoring the last claim.

Another interpretation arises in models of information channels. Here, (*ξ*) plays a role of random loads on an information channel. The driving doubly stochastic Poisson process Π(*λt*) affects (*ξ*) in the following manner. At arrival points of the driving process Π(*λt*), the current term of (*ξ*) is replaced with the next term.

In view of these interpretations, as well as from a point of view of the classical probability theory, it makes sense to consider sums of independent PSI-processes. In this paper, we confine ourselves to the case when all terms in these sums are identically distributed PSI-processes and when the terms of the driven sequences have a finite second moment. Without loss of generality, we assume that E*ξ*<sup>0</sup> = 0 and E*ξ*<sup>2</sup> <sup>0</sup> <sup>=</sup> 1. Let *<sup>ψ</sup>*(*k*), *k* = 1, 2, ... , denote independent copies of *ψ*. Note that the Poisson processes in the definition (1) are also independent in different copies, as well as the time change factors *λk d* = *λ*, for any *k* ∈ N. Introduce

$$\mathbb{Z}\_n(t) := \frac{1}{\sqrt{n}} \sum\_{k=1}^n \psi^{(k)}(t; \lambda\_k), \qquad n \in \mathbb{N}, \quad t \ge 0,\tag{3}$$

the normalized cumulative sum. Note that *ζ<sup>n</sup>* is a stationary process for any *n*.

When one of the processes *ψ*(1), ... , *ψ*(*n*) changes its value, all the values of other processes remain the same a.s. Hence, the change mechanism behind the sums of type (3) can be described as a projection of some information from past to future and replacement of other information with new independent values. This can be opposed to autoregression schemes, which are based on contractions of information. This mechanism of projection survives after a passage to the limit as *n* → ∞. Hence, if the limit exists in some sense, it has to be described by so-called "trawl" or "upstairs represented" processes introduced by O. E. Barndorf-Nielsen [2,3] and R. Wolpert, M.Taqqu [4], respectively. A relationship of PSI-processes with trawl processes is discussed briefly in [5].

Our main result is a functional limit theorem for normalized cumulative sums (3) (Theorem 1): random processes *ζ<sup>n</sup>* weakly converge, as *n* → ∞, in the Skorokhod space of càdlàg functions defined on a compact [0, *T*], *T* > 0. The limit process *ζ* is Gaussian, centered, stationary, and its covariance function is *Lλ*(|*t* − *s*|), *s*, *t* ∈ R+, where *L<sup>λ</sup>* denotes the Laplace transform of the random intensity *λ*. In a simpler case of non-random intensity *λ*, the analogous functional limit theorem has been established by the second author in [6]. In this case, the limit is necessarily an Ornstein–Uhlenbeck process. Introducing a random intensity significantly widens the class of possible limiting processes but makes a proof of the corresponding functional limit theorem more involved. Our method of proof is essentially based on a detailed analysis of a modulus of continuity for the PSI-process.

In our research, we came upon the following interesting phenomena, which occurs if E*λ* = +∞. Then, the fatter the tail of *λ* is, the more moments of *ξ*<sup>0</sup> are needed for the relative compactness of the family (*ζn*)*n*∈N. When E*λ* < ∞, our method of proof requires just a condition E|*ξ*0| <sup>2</sup>+*<sup>ε</sup>* < ∞, for some *ε* > 0.

As an example of a functional of the PSI-process, we construct a martingale adapted to the natural filtration (F*t*) generated by the PSI-process defined in (2). Consider a pathwise integrated PSI-process

$$\Psi(t) := \int\_0^t \psi(s)ds\tag{4}$$

and define a so-called *M*-process associated with the PSI-process as

$$M(t; \lambda) \equiv M(t) := \lambda \Psi(t) + \psi(t) - \mathbb{\tilde{g}}\_{0\prime} \quad t \ge 0. \tag{5}$$

Suppose that *λ* is a positive constant and E*ξ*<sup>0</sup> = 0. Then, *M*(*t*) is an (F*t*)-martingale, starting at the origin. The proof presented in Section 3 is reduced to a direct calculation and exploits the fact that the pair (Ψ, *ψ*) is an R2-valued Markov process (moreover, a strong Markov process with respect to (F*t*)).

This example shows that the PSI-process *ψ*(*t*) is the stationary solution of the Langevin equation driven by the martingale *M*(*t*):

$$\mathbf{d}\psi(t) = -\lambda\psi(t) + \mathbf{d}M(t). \tag{6}$$

As one of the consequences of our main result, we obtain as a limit the classical martingale <sup>√</sup>2*λW*(*t*), *<sup>t</sup>* <sup>≥</sup> 0, which replaces *<sup>M</sup>*(*t*) in (6). Here and below, *<sup>W</sup>*(*t*) is a standard Brownian motion.

Remark that if *λ* is a non-degenerate random variable, then *M*(*t*; *λ*) is not measurable with respect to F*t*, and hence, it is not an (F*t*)-martingale. However, if we supplement F<sup>0</sup> with *σ*(*λ*) to generate an initially enlarged filtration (F*<sup>λ</sup> <sup>t</sup>* ), then the *M*-process becomes a local martingale with respect to the new adjusted filtration. If E*λ* < ∞, then it is a martingale (see Proposition 2).

Suppose now as usual that E*ξ*<sup>2</sup> <sup>0</sup> = 1. Direct application of Theorem VIII.3.46 [7] (p. 481) allows us to obtain a functional limit theorem for the martingale *M*(*t*), i.e., for

$$
\overline{M}\_{\mathbb{H}} := \frac{1}{\sqrt{n}} \sum\_{i=1}^{n} M^{(i)}(t),
\tag{7}
$$

where *M*(*i*)(*t*), *i* = 1, 2, ... , are independent copies of *M*(*t*). Here, the convergence takes place in the Skorokhod space, and the limit process is <sup>√</sup>2E*λW*(*t*), *<sup>t</sup>* <sup>≥</sup> 0.

The rest of the paper is organized as follows. In Section 2, we introduce some notation and formulate our main result, Theorem 1. In Section 3, the *M*-process described above is studied in some details, as an example of the application of Theorem 1. Another example of the PSI-process such that the normalized cumulative sums do not converge in the Skorokhod space is constructed in Section 4 in order to show that some conditions are indeed necessary in a functional limit theorem. Section 5 collects some auxiliary facts about PSI-processes and their modulus of continuity. In Section 6, we study sums of PSI-processes and prove our main result. We finish the article with some conclusions in Section 7.

#### **2. Main Results**

Let (*ξ*)=(*ξ*0, *ξ*1, ...) be a sequence of random variables. Consider an independent of (*ξ*) standard Poisson process Π(*t*), *t* ≥ 0. Then, one can subordinate the sequence by the Poisson process to obtain a continuous time process

$$
\psi(t) = \xi\_{\Pi(t)\prime} \qquad t \ge 0.
$$

Consider also a non-negative random variable *λ*, which is independent of (*ξ*) and Π. The time-changed Poisson process Π(*λt*) is a Poisson process with random intensity, also known as (a specific case of) a Cox process or a doubly stochastic Poisson process. We consider the PSI-process with the random time-change

$$
\psi(t;\lambda) = \mathfrak{J}\_{\Pi(\lambda t)^\*} \qquad t \ge 0. \tag{8}
$$

We call *ψ*(*t*; *λ*) the *Poisson stochastic index process*, or PSI-process for short.

It turns out that if random variables *ξi*, *i* = 0, 1, ... , are uncorrelated and have zero expectations and unit variances, then the covariance function for *ψ*(*t*; *λ*) is equal to the Laplace transform of *λ*

$$L\_{\lambda}(u) = \mathbb{E} \, \mathbf{e}^{-\lambda u}, \qquad u \ge 0. \tag{9}$$

**Lemma 1.** *Let* (*ξ*)=(*ξ*0, *ξ*1, ...) *be a sequence of uncorrelated random variables with* E*ξ<sup>i</sup>* ≡ 0 *and* E*ξ*<sup>2</sup> *<sup>i</sup>* ≡ 1*. Let λ be a non-negative random variable and* Π(*t*) *be a standard Poisson process. Suppose that* (*ξ*)*, λ, and* Π *are mutually independent. Then, for any s*, *t* ≥ 0

$$\text{Cov}\left(\psi(s;\lambda),\psi(t;\lambda)\right) = L\_{\lambda}(|t-s|).$$

*In particular, ψ is a wide sense stationary process.*

**Proof.** First note that E *ψ*(*s*, *λ*) = 0 since any E*ξ<sup>i</sup>* = 0. Hence, Cov *ψ*(*s*; *λ*), *ψ*(*t*; *λ*) = E *ψ*(*s*; *λ*)*ψ*(*t*; *λ*). Suppose without loss of generality that 0 ≤ *s* ≤ *t*. Given *λ*, one has

$$\begin{split} \mathbb{E}(\psi(\mathbf{s};\lambda)\psi(\mathbf{t};\lambda)|\lambda) &= \mathbb{E}\left(\mathbb{1}\_{\Pi(\lambda s)}\mathbb{1}\_{\Pi(\lambda t)}|\lambda\right) \\ &= \mathbb{E}\left(\mathbf{1}\{\Pi(\lambda s) = \Pi(\lambda t)\}|\lambda\right) \\ &= \mathbb{E}\left(\mathbf{1}\{\Pi(\lambda(t-s)) = 0\}|\lambda\right) \\ &= \mathbf{e}^{-\lambda(t-s)}. \end{split}$$

Here and below, **1**{*A*} denotes the indicator of an event *A*. We used the assumption that E *ξiξ<sup>j</sup>* = *δij*, the Kronecker delta, and also the stationarity of the increments of the Poisson process. Taking expectation with respect to *λ* yields the result.

**Remark 1.** *Unlike [8], we allow λ to have an atom at* 0*, which implies that* lim*u*→<sup>∞</sup> *Lλ*(*u*) = P(*λ* = 0) > 0*.*

**Corollary 1.** *Let the triplet* (Π, *λ*,(*ξ*)) *satisfy the assumptions of Lemma 1. Then, the processes* (*ζn*) *defined in* (3) *as normalized cumulative sums of independent copies of ψ*(*t*; *λ*) *converge in the sense of finite dimensional distributions (f.d.d.), as n* → ∞*, to a stationary centered Gaussian process ζ*(*t*) *with the covariance function* Cov(*ζ*(*s*), *ζ*(*t*)) = *Lλ*(|*t* − *s*|)*, s*, *t* ∈ R+*.*

**Proof.** This is an immediate consequence of the central limit theorem (CLT) for vectors. Indeed, for any fixed time moments 0 ≤ *t*<sup>1</sup> < ... < *td*, the finite-dimensional distributions of *ψ*(*k*)(*t*1; *λk*), ... , *ψ*(*k*)(*td*; *λk*) are i.i.d. for different *k* and have zero mean and the covariation matrix

$$B = \left( L\_{\lambda} (|t\_i - t\_j|) \right)\_{i,j=1}^d \dots \quad \square$$

Lemma 1 emphasizes a special role played by the Laplace transform *L<sup>λ</sup>* in the study of PSI-processes with random intensities. We will need asymptotics of the Laplace transform *L<sup>λ</sup>* in the right neighborhood of 0.

**Assumption 1.** *For some γ* ∈ (0, 1] *and any ε* > 0*, the Laplace transform* (9) *of λ satisfies*

$$1 - L\_{\lambda}(s) = o(s^{\gamma - x}), \qquad s \downarrow 0. \tag{10}$$

It is well known that (10) holds with *γ* = 1 if E*λ* < ∞ or with *γ* ∈ (0, 1] if the tail P(*λ* > *x*) of *λ* varies regularly of index −*γ* at *x* → ∞, see, e.g., [9] (Theorem 8.1.6).

Below, we shall always suppose that terms of the sequence (*ξ*) are i.i.d., hence uncorrelated, and satisfy the assumptions of Lemma 1. By Corollary 1, random processes (*ζn*) have a limit *ζ* as *n* → ∞ but in the rather weak f.d.d. sense. The aim of this paper is to establish a more strong result, a functional limit theorem for (*ζn*) in an appropriate functional space. If Assumption 1 holds, then the covariance function of the limiting process *ζ*(*t*) behaves in a controllable way at 0, and *ζ*(*t*) has a version with almost surely continuous paths because *γ* > 0 in (10), see, e.g., [10] (§9.2). Our main result is that, under additional moment assumptions E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ for some *h* > 1/(*γ*2) (where *γ* is the exponent in (10)), the convergence indeed takes place in the Skorokhod space *D*[0, *T*], for any *T* > 0.

**Theorem 1.** *Consider a triplet* Π, *λ*,(*ξ*) *that consists of a standard Poisson process* Π*, a nonnegative random variable λ satisfying Assumption* 1*, and a sequence* (*ξ*)=(*ξ*0, *ξ*1, ...) *of i.i.d. random variables such that* E*ξ*<sup>0</sup> = 0 *and* E*ξ*<sup>2</sup> <sup>0</sup> = 1*. Elements of the triplet are supposed to be independent and to satisfy the condition*

$$\mathbb{E}|\xi\_0|^{2h} < \infty \qquad \text{for some} \quad h > \frac{1}{\gamma^2}. \tag{11}$$

*Let* <sup>Π</sup>*k*, *<sup>λ</sup>k*,(*ξ*(*k*) ) *, k* = 1, 2, ... *, be a sequence of independent copies of the triplet* Π, *λ*,(*ξ*) *, ψ*(*k*) ≡ *ψ*(*k*)(*t*; *λk*) *be the PSI-process* (1) *constructed from the k-th triplet, and ζ<sup>n</sup> be defined by* (3)*. Then, for any T* > 0*, the sequence of stochastic processes* (*ζn*(*t*)) *converges in the Skorokhod space D*[0, *T*]*, as n* → ∞*, to a zero mean stationary Gaussian process ζ*(*t*) *with the covariance function* E *ζ*(*s*)*ζ*(*t*) = *Lλ*(|*s* − *t*|)*, s*, *t* ∈ [0, *T*]*.*

**Remark 2.** *Nowadays, it is common to consider a weak convergence in the space D*[0, ∞)*. Due to specific features of our model (stationary of ζ<sup>n</sup> for every n, continuity of ζ), this implies a weak convergence in D*[0, *T*] *for all T* > 0*. Since we essentially use the results from Billingsley's book [11] that deals with D*[0, *T*]*, we prefer to formulate our results in D*[0, *T*]*, T* > 0*, as in Theorem 1.*

We prove Theorem 1 in Section 6 and now proceed with studying some of its consequences.

#### **3. Example: A PSI-Martingale**

Recall the definition (2) of the natural filtration F given in the Introduction. Note that since PSI-processes (with non-random *λ*) belong to a so-called class of "Pseudo-Poisson processes" [12] (Ch. X), they have the Markov property with the following transition probabilities: for *x* ∈ R; *t*, *u* ∈ R+,

$$\begin{split} \mathbb{P}(\psi(t+\mathsf{u}) \le \mathsf{x} \mid \psi(t) = \mathsf{x}\_{0}) &= \mathbb{P}(\Pi(\lambda \mathsf{u}) > 0)\mathbb{P}(\xi\_{0} \le \mathsf{x}) + \mathbb{P}(\Pi(\lambda \mathsf{u}) = 0)\mathbf{1}\{\mathsf{x}\_{0} \le \mathsf{x}\}.\\ &= \left(1 - \mathsf{e}^{-\lambda \mathsf{u}}\right)\mathbb{P}(\xi\_{0} \le \mathsf{x}) + \mathsf{e}^{-\lambda \mathsf{u}}\mathbf{1}\{\mathsf{x}\_{0} \le \mathsf{x}\}. \end{split}$$

Denote the pathwise integrated PSI-process Ψ(*t*) = *<sup>t</sup>* <sup>0</sup> *ψ*(*s*)d*s*. Note that a pair (Ψ, *ψ*) is an R2-valued Markov process, although Ψ itself is not Markovian.

**Proposition 1** (The PSI-martingale)**.** *Assume that ξ*0, *ξ*1, ... *are i.i.d. and* E*ξ*<sup>0</sup> = 0*. Then, for a non-random λ* > 0*, the stochastic process M*(*t*) *defined in* (5) *is a starting at the origin* F*-martingale for t* ∈ R+*.*

**Proof.** Let us introduce a slightly modified *M*-process

$$\mathcal{M}(t) := \lambda \Psi(t) + \psi(t) = M(t) + \mathbb{S}\_0.$$

First, we show that it is an F-martingale starting at the random point *ξ*0. Since the pair (Ψ(*t*), *ψ*(*t*)) is a Markov process adapted to the filtration (F*t*), and M(*t*) is determined by (Ψ(*t*), *ψ*(*t*)), we have

$$\mathbb{E}(\mathcal{M}(t+\iota)|\mathcal{F}\_{\iota}) = \mathbb{E}(\mathcal{M}(t+\iota)|\Psi(t),\psi(t)), \quad \forall \iota, t \ge 0. \tag{12}$$

Let 0 < *T*<sup>1</sup> < *T*<sup>2</sup> < ··· be jump times of the driving Poisson process Π(*λt*). Denote the random period *θ*(*t*) = min{*Tk* : *Tk* > *t*} − *t*; that is the time for which the Poisson process Π(*λs*) does not change after time *t*. For each fixed *t*, the period *θ*(*t*) has the exponential distribution with the intensity *λ*. Using this notation, we can calculate

$$\mathbb{E}(\psi(t+u)|\Psi(t),\psi(t)) = \psi(t)\,\mathbb{E}\,\mathbb{1}\{\theta(t) > u\} = \psi(t)\,\mathrm{e}^{-\lambda u},\tag{13}$$

$$\mathbb{E}(\Psi(t+\boldsymbol{\mu})|\Psi(t),\boldsymbol{\psi}(t)) = \Psi(t) + \boldsymbol{\psi}(t) \to \min\{\boldsymbol{\theta}(t),\boldsymbol{\mu}\} = \Psi(t) + \boldsymbol{\psi}(t)\frac{1-\mathbf{e}^{-\lambda\boldsymbol{\mu}}}{\lambda}.\tag{14}$$

Multiplying (14) by *λ* and adding (13), we obtain E(M(*t* + *u*)|Ψ(*t*), *ψ*(*t*)) = M(*t*), which proves the assertion about M(*t*) due to (12).

Now, the claim of Proposition 1 easily follows from *σ*(Ψ(*t*), *ψ*(*t*)) ⊂ F*<sup>t</sup>* and E(*ξ*0|F*t*) = *ξ*0.

As it has been mentioned in the Introduction, for a random non-degenerate *λ*, the process *M*(*t*) is not F*t*-measurable, and the filtration F should be augmented by *σ*(*λ*):

$$\mathcal{F}\_l^\lambda := \sigma \{ \, \Pi(\lambda s), s \le t; \, \mathbb{Q}\_{0^\star} \dots \mathbb{Q}\_k, k \le \Pi(\lambda t); \, \lambda \searrow \}; \qquad \mathbb{F}^\lambda := (\mathcal{F}\_l^\lambda)\_{t \in \mathbb{R}\_+} \,. \tag{15}$$

The following analog of Proposition 1 holds, but the proof is more tricky.

**Proposition 2** (The PSI-martingale with random intensity)**.** *Assume that* (*ξ*)=(*ξ*0, *ξ*1, ...) *is a sequence of i.i.d. random variables with* E*ξ*<sup>0</sup> = 0*,* Π = Π(*t*) *is a standard Poisson process, a random variable λ is positive a.s.; λ,* (*ξ*)*, and* Π *are independent. Then, the stochastic process M*(*t*; *λ*)*, t* ≥ 0*, defined in* (5) *is a local martingale with respect to* F*λ. If* E*λ* < ∞*, then M*(*t*) *is a martingale.*

**Proof.** Let 0 < *τ*<sup>1</sup> < *τ*<sup>2</sup> < ... be jump times of the Poisson process Π(*t*) and *Tk* := *τk*/*λ* corresponding jump times of the process Π(*λt*). Recall that filtrations F = (F*t*)*t*≥<sup>0</sup> and F*<sup>λ</sup>* = (F*<sup>λ</sup> <sup>t</sup>* )*t*≥<sup>0</sup> are defined in (2) and (15), respectively. It is easy to check that a set *A* ∈ F belongs to F*<sup>t</sup>* (resp. to F*<sup>λ</sup> <sup>t</sup>* ), *t* ≥ 0, if and only if *A* ∩ {*Tk* ≤ *t* < *Tk*<sup>+</sup>1} = *A* ∩ {Π(*λt*) = *k*}∈G*<sup>k</sup>* (resp. *A* ∩ {*Tk* ≤ *t* < *Tk*<sup>+</sup>1}∈G*<sup>λ</sup> <sup>k</sup>* ) for every *k* = 0, 1, . . . . Here,

$$\mathcal{G}\_k := \sigma\{T\_1, \dots, T\_k; \; \mathfrak{F}\_{0\prime}, \dots, \mathfrak{F}\_k\} = \sigma\{\pi\_1, \dots, \pi\_k; \; \mathfrak{F}\_{0\prime}, \dots, \mathfrak{F}\_k\},$$

the latter equality being held if *λ* = c*onst*, and

$$\mathcal{G}\_k^\lambda := \sigma\{T\_1, \dots, T\_k; \; \mathfrak{F}\_{0\prime} \dots \mathcal{G}\_k; \; \lambda \; \} = \sigma\{\pi\_1, \dots, \pi\_k; \; \mathfrak{F}\_{0\prime} \dots \mathcal{G}\_k; \; \lambda \; \}.$$

In particular, the filtrations (F*t*)*t*≥<sup>0</sup> and (F*<sup>λ</sup> <sup>t</sup>* )*t*≥<sup>0</sup> are right-continuous. First, we calculate the F*λ*-compensator of the locally integrable process

$$\Pi(\lambda t) = \sum\_{n=1}^{\infty} \mathbf{1}\{t \ge T\_n\}.$$

Since, for *λ* = c*onst*, Π(*λt*) is a Poisson process with intensity *λ*, its F-compensator is *λt*. This means that Π(*λt*) − *λt* is an F-martingale. Denoting *N*(*t*) := Π(*t*) − *t*, this can be written as

$$\mathbb{E}\left\{ \left( N(\lambda t) - N(\lambda s) \right) \mathbf{1} \{ \Pi(\lambda s) = k \} f(\pi\_1, \dots, \pi\_k; \, \xi\_{0 \prime}, \dots \, \xi\_k) \right\} = 0$$

for every 0 ≤ *s* < *t*, *k* = 0, 1, ... , and any bounded Borel function *f* from R2*k*+<sup>1</sup> in R. Consider now the case of random *λ*. Note that E Π(*λt*)**1**{*λ* ≤ *k*} ≤ *kt* < ∞ for any *t* and *k* ≥ 1. This allows us to take a conditional expectation given *λ* in the expression below, where *f* is as above and *g* is a bounded measurable function from R to R:

$$\begin{split} & \mathbb{E}\left\{ \left( N(\lambda t) - N(\lambda s) \right) \mathbf{1} \{ \Pi(\lambda s) = k \} f(\tau\_1, \dots, \tau\_k; \,\,\xi\_{0\prime}, \dots \,\,\xi\_k) g(\lambda) \mathbf{1} \{ \lambda \le k \} \right\} \\ &= \mathbb{E} \, \mathbb{E}\left\{ \left( N(\lambda t) - N(\lambda s) \right) \mathbf{1} \{ \Pi(\lambda s) = k \} f(\tau\_1, \dots, \tau\_k; \,\,\xi\_{0\prime}, \dots \,\,\xi\_k) g(\lambda) \mathbf{1} \{ \lambda \le k \} \, \middle| \,\lambda \right\} = 0. \end{split}$$

This means

$$0 = \mathbb{E}\left\{ \left( N(\lambda t) - N(\lambda s) \right) \mathbf{1} \{ \lambda \le k \} \Big| \mathcal{F}\_s^{\lambda} \right\} = \mathbb{E}\left( N(\lambda t \wedge \sigma\_k) - N(\lambda s \wedge \sigma\_k) \big| \mathcal{F}\_s^{\lambda} \right),$$

where *σ<sup>k</sup>* = 0 if *λ* > *k* and *σ<sup>k</sup>* = +∞ otherwise. We conclude that *N*(*λt*) is an F*λ*-local martingale, and *λt* is the F*λ*-compensator of Π(*λt*).

The same proof shows that

$$K(\lambda t) := \sum\_{n=1}^{\infty} \xi\_n \mathbf{1}\{t \ge T\_n\}$$

is an F*λ*-local martingale. Indeed, it is a compound Poisson process with zero mean; hence, it itself is an F-martingale for a deterministic *λ*. To ensure that the corresponding expectation is finite, we note that E *K*(*λt*)**1**{*λ* ≤ *k*} ≤ ∑<sup>∞</sup> *<sup>n</sup>*=<sup>1</sup> E|*ξn*|P(*<sup>t</sup>* ≥ *Tn*, *<sup>λ</sup>* ≤ *<sup>k</sup>*) ≤ E|*ξ*0| E Π(*λt*)**1**{*λ* ≤ *k*} < ∞.

The final step of the proof is to determine the F*λ*-compensator of the process

$$J(\lambda t) := \sum\_{n=1}^{\infty} \xi\_{n-1} \mathbf{1} \{ t \ge T\_n \} $$

We can represent *J*(*λt*) as the pathwise Lebesgue–Stieltjes integral of a predictable process

$$H(\lambda t) := \sum\_{n=1}^{\infty} \xi\_{n-1} \mathbf{1} \{ T\_{n-1} < t \le T\_n \} $$

with respect to Π(*λt*). Note that the integral process

$$\int\_{(0\beta t]} H(\lambda t) \mathrm{d}\Pi(\lambda t)$$

is a process with F*λ*-locally integrable variation because its variation up to *σ<sup>k</sup>* is estimated from above similarly to *K*(*λt*). This allows us to conclude that the F*λ*-compensator of *J*(*λt*) is the Lebesgue–Stieltjes integral process of *H*(*λt*) with respect to the F*λ*-compensator of Π(*λt*), see, e.g., Theorem 2.21 (2) in [13], i.e., the F*λ*-compensator of *J*(*λt*) equals

$$\int\_{(0,t]} H(\lambda t) \lambda \,\mathrm{d}t = \lambda \Psi(t).$$

Summarizing, we obtain that the F*λ*-compensator of

$$
\psi(t) - \xi\_0 = \sum\_{n=1}^{\infty} (\xi\_n - \xi\_{n-1}) \mathbf{1} \{ t \ge T\_n \} = K(\lambda t) - J(\lambda t),
$$

that is −*λ*Ψ(*t*).

Finally, the quadratic variation of *M* is

$$\mathbb{E}\left[M,M\right]\_t = \sum\_{k=1}^{\infty} \left(\mathbb{\zeta}\_k - \mathbb{\zeta}\_{k-1}\right)^2 \mathbf{1}\left\{T\_k \le t\right\}.\tag{16}$$

Hence, if E*λ* < ∞,

$$\begin{split} \mathbb{E}([M,M]\_t)^{1/2} &\leq \mathbb{E} \sum\_{k=1}^{\infty} |\mathfrak{f}\_k - \mathfrak{f}\_{k-1}| \mathbf{1} \{T\_k \leq t\} \\ &\leq 2\mathbb{E}|\mathfrak{f}\_0| \operatorname{\mathbb{E}} \sum\_{k=1}^{\infty} \mathbf{1} \{T\_k \leq t\} \leq 2\mathbb{E}|\mathfrak{f}\_0| \operatorname{\mathbb{E}}\Pi(\lambda t) = 2t\mathbb{E}|\mathfrak{f}\_0| \operatorname{\mathbb{E}}\lambda. \end{split}$$

Therefore, *M*(*t*) is a martingale according to Davis' inequality (see [14] (Ch. 9)).

If we assume also that E*ξ*<sup>2</sup> <sup>0</sup> = 1, then the F*λ*-martingale *<sup>M</sup>*(*t*) has E*M*(*t*)<sup>2</sup> < <sup>∞</sup> for all *t* ∈ R+. Its quadratic variation is calculated in (16). The variance of *M*(*t*) can then be calculated as follows:

$$\text{Var}\,M(t) = \mathbb{E}\left[M, M\right]\_t = \mathbb{E}\sum\_{k=1}^{\infty} \left(\mathbb{\zeta}\_k - \mathbb{\zeta}\_{k-1}\right)^2 \mathbb{1}\{T\_k \le t\} = \mathbb{E}(\mathbb{\zeta}\_1 - \mathbb{\zeta}\_0)^2 \mathbb{E}\,\Pi(\lambda t) = 2t\,\mathbb{E}\lambda\dots$$

If E*λ* < ∞ (in particular, if *λ* is not random), then the variance of *M*(*t*) is finite for any *t* ∈ R+. Hence, direct application of Theorem VIII.3.46 [7] (p. 481) allows us to obtain a functional limit theorem for properly normalized sums of independent copies *M*(*i*)(*t*), *i* = 1, 2, . . . , of the martingale *M*(*t*), i.e., for the processes

$$\overline{M}\_n(t) := \frac{1}{\sqrt{n}} \sum\_{i=1}^n M^{(i)}(t), \qquad n = 1, 2, \dots, \quad t \ge 0.$$

√ Here, the convergence takes place in the Skorokhod space, and the limit process is 2E*λW*(*t*), where *W*(*t*), *t* ≥ 0, is a standard Brownian motion.

Assume now that *λ* > 0 is non-random. It is easy to see that the mapping (*ψ*(*t*))*t*∈[0,*T*] \$→ (*M*(*t*))*t*∈[0,*T*] is continuous in the Skorokhod space *D*[0, *T*], for any *T* > 0. Hence, as a corollary of Theorem 1, we reconstruct the above result that the convergence *Mn* <sup>→</sup> <sup>√</sup>2*λ<sup>W</sup>* takes place in the Skorokhod space, under the condition that E|*ξ*0| <sup>2</sup>+*<sup>ε</sup>* < ∞, for some *ε* > 0.

#### **4. Counterexample: Diverging Sums**

For *<sup>β</sup>* > 1, denote *μβ* = *<sup>β</sup> <sup>β</sup>*−<sup>1</sup> and consider a function

$$f\_{\beta}(\mathbf{x}) = \begin{cases} \beta(\mathbf{x} + \mu\_{\beta})^{-\beta - 1}, & \mathbf{x} \ge -1/(\beta - 1), \\ 0, & \mathbf{x} < -1/(\beta - 1) \end{cases}$$

of *x* ∈ R. This is a probability density. Let *ξ* be a random variable with this density, then, by the choice of *μβ* the mean E*<sup>ξ</sup>* = 0 for any *<sup>β</sup>* > 1, and Var *<sup>ξ</sup>* = *<sup>β</sup>* (*β*−2)(*β*−1)<sup>2</sup> < <sup>∞</sup> for any *β* > 2. Moreover, all absolute moments of non-negative order less than *β* exist, while E|*ξ*| *<sup>β</sup>* = ∞. The tail distribution function is P *ξ* > *x* = (*x* + *μβ*)−*<sup>β</sup>* for *x* ≥ −1/(*β* − 1). Let (*ξ*)=(*ξ*0, *ξ*1,...) be a sequence of i.i.d. random variables distributed as *ξ*.

For *α* > 0, let *λ* be independent of (*ξ*) and have the tail distribution function P *λ* > *x* = (*x* + 1)−*<sup>α</sup>* for *x* ≥ 0. The Laplace transform of *λ* can be expressed in terms of the (upper) incomplete Gamma function function

$$
\Gamma(\mathfrak{a}, \mathfrak{x}) = \int\_{\mathfrak{x}}^{\infty} \mathbf{e}^{-y} y^{\mathfrak{a}-1} \mathrm{d}y.
$$

By a simple change of variables, we obtain

$$L\_{\lambda}(s) = \mathbb{E}\,\mathbf{e}^{-s\lambda} = a\mathbf{e}^{s}\mathbf{s}^{a}\Gamma(-a,s), \quad s > 0. \tag{17}$$

The asymptotics of *Lλ*(*s*) as *s* ↓ 0 can be read, say, from Theorem 8.1.6 [9] (p. 333): as *s* ↓ 0,

$$1 - L\_{\lambda}(s) \sim \begin{cases} \Gamma(1 - \mathfrak{a})s^{\mathfrak{a}}, & \mathfrak{a} \in (0, 1), \\ s \log \frac{1}{s}, & \mathfrak{a} = 1, \\ \frac{s}{\mathfrak{a} - 1}, & \mathfrak{a} > 1. \end{cases}$$

Hence, *λ* satisfies Assumption 1 with *γ* = min{*α*, 1}.

Let Π(*t*) be a standard Poisson process, independent of both (*ξ*) and *λ*. Define a PSI-process *ψ*(*t*; *λ*) with the random intensity *λ* as in (1).

Consider independent copies *ψ*(*k*)(*t*; *λk*), *k* = 1, 2, ... , where *λ<sup>k</sup>* are independent copies of *λ*, and let (*ζn*(*t*)) be their normalized cumulative sums, as in (3). The CLT for vectors implies that, for *β* > 2 and *α* > 0, in terms of finite-dimensional distributions, the processes (*ζn*) converge, as *n* → ∞, to a stationary centered Gaussian process with the covariance function *β*(*β* − 2)−1(*β* − 1)−2*Lλ*(*u*), *u* ≥ 0. We claim that, nevertheless, for certain parameters *α* > 0 and *β* > 2, the functional limit theorem cannot hold true for these (*ζn*). The proof is based on the following technical result.

**Proposition 3.** *One can find n*<sup>0</sup> *such that for any n* ≥ *n*0*, with probability not less than* 1/16*, one of the PSI-processes ψ*(1)(*t*; *λ*1),..., *ψ*(*n*)(*t*, *λn*) *has a jump of size at least n*1/(*αβ*)*, for t* ∈ [0, 1]*.*

**Proof.** Define for *n* = 1, 2,. . .

$$\mu\_n := \max\{\lambda\_1, \dots, \lambda\_n\}.$$

The cumulative distribution function of *μ<sup>n</sup>* is

$$F\_n(\mathbf{x}) := \mathbb{P}(\mu\_n \le \mathbf{x}) = \left(1 - (\mathbf{x} + 1)^{-n}\right)^n, \qquad \mathbf{x} \ge \mathbf{0}.$$

Notice that lim*n*→<sup>∞</sup> *Fn*(*n*1/*α*) = e<sup>−</sup>1. Hence, for large enough *n*, there exists κ ∈ {1, ... , *n*} such that *λ*<sup>κ</sup> ≥ *n*1/*<sup>α</sup>* with probability not less than 1/2. Since Π<sup>κ</sup> is independent of *λ*κ and the Poisson distribution is asymptotically symmetric around its mean as the parameter becomes large, we may claim that P(Πκ(*λ*κ) > *n*1/*α*|*λ*<sup>κ</sup> ≥ *n*1/*α*) > 1/3. Hence, with probability not less than 1/6 among PSI-process *ψ*(1), ... , *ψ*(*n*), at least one process *ψ*(κ) engages more than *n*1/*<sup>α</sup>* random variables (*ξ* (κ) *<sup>i</sup>* ) on the time interval [0, 1]; that is, Πκ(*λ*κ) ≥ *m* := *n*1/*α* + 1. Here and below for *x* ∈ R, we denote *x* = max{*n* ∈ Z : *n* ≤ *x*} the floor function.

Consider now *η*κ,*<sup>m</sup>* := max{*ξ* (κ) <sup>1</sup> , ... , *ξ* (κ) *<sup>m</sup>* }. For any fixed *<sup>n</sup>*, they are i.i.d. and have the cumulative distribution function

$$G\_m(\mathbf{x}) := \mathbb{P}(\eta\_{\varkappa, m} \le \mathbf{x}) = \left(1 - (\mathbf{x} + \mu\_{\beta})^{-\beta}\right)^m, \qquad \mathbf{x} > -1/(\beta - 1),$$

and *η*κ,*<sup>m</sup>* > *m*1/*<sup>β</sup>* with probability not less than 1/2 for all *m* large enough, because *Gm*(*m*1/*β*) = <sup>1</sup> <sup>−</sup> (*m*1/*<sup>β</sup>* <sup>+</sup> *μβ*)−*β<sup>m</sup>* <sup>→</sup> <sup>e</sup>−<sup>1</sup> as *<sup>m</sup>* <sup>→</sup> <sup>∞</sup>. This maximum is attained on some *ξ* (κ) *<sup>j</sup>* , and with probability 3/4 at least one of *ξ* (κ) *<sup>j</sup>*−<sup>1</sup> and *ξ* (κ) *<sup>j</sup>*+<sup>1</sup> is less than 21/*<sup>β</sup>* <sup>−</sup> *μβ* <sup>&</sup>lt; 0. (We neglect a situation when the maximum is attained for *j* = 1 or *j* = *m*, which happens with the probability 2/*m*, see, e.g., [15].) It means that, for large *m*, *ψ*(<sup>κ</sup>)(*t*, *λ*κ) has at least one jump greater than *m*1/*β*, with probability at least 3/8.

Combining the above estimates and using the independence between Π(*λ*κ*t*) and the corresponding driven sequence (*ξ*(κ) ), we see that, with probability not less than 1/16, the process *ψ*(<sup>κ</sup>)(*t*; *λ*κ), *t* ∈ [0, 1], has a jump of size at least *m*1/*<sup>β</sup>* ≥ *n*1/(*αβ*), for all *n* ≥ *n*<sup>0</sup> = *n*0(*α*, *β*).

Since all these PSI-processes jump at different moments of time a.s., the jump of any process is not compensated by other PSI-processes and makes a contribution to *ζn*. If *αβ* <sup>≤</sup> 2, then after the scaling by <sup>√</sup>*<sup>n</sup>* in (3), the size of the jump that exists according to Proposition 3 exceeds *n*1/(*αβ*)−1/2 → 0 as *n* → ∞. Hence, the limit in the Skorokhod space *D*[0, 1], if it exists, should have jumps with positive probability. However, it is well known that the stationary Gaussian process with the covariance function c*onst* · *Lλ*(*u*), *u* ≥ 0, where *Lλ*(*u*) is given by (17), has a continuous modification a.s. This contradiction shows that the convergence *ζ<sup>n</sup>* → *ζ* cannot take place in *D*[0, 1] as *n* → ∞.

**Remark 3.** *The considered counterexample suggests that the correct condition for the functional limit theorem could be* E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ *for some h* > 1/*γ. Theorem 1 is proved under the more restrictive condition h* > 1/*γ*2*. In the case* E*λ* < ∞*, Assumption 1 holds with γ* = 1*, so both inequalities become h* > 1*. In the more interesting case* E*λ* = ∞*, we conjecture that the less restrictive inequality h* > 1/*γ should be enough. The only place in our proof where we need*

*h* > 1/*γ*<sup>2</sup> *is Lemma 4, which is proved with a straightforward and rather rough approach. A more sophisticated technique is needed to show that the same or similar result holds if h* > 1/*γ.*

#### **5. Modulus of Continuity for PSI-Processes with Random Intensity**

We need to bound the probability of large changes of the PSI-process with random intensity. The following result builds a base for such bounds.

**Proposition 4.** *Consider a PSI-process ψ defined by* (1)*. Then, for any fixed δ* > 0*,*

$$\mathbb{P}\left(\sup\_{0\le t\le \delta}|\psi(t;\lambda)-\psi(0;\lambda)|\ge r\right) = \int\_{-\infty}^{\infty} \left[1 - L\_{\lambda}\left(\delta(1 - F(\mathbf{x} + r) + F(\mathbf{x} - r))\right)\right] dF(\mathbf{x}) \tag{18}$$

*at least for all r* > 0 *such that F*(*x*) *and F*(*x* + *r*) *have no common discontinuity points.*

**Proof.** Suppose first that *λ* is fixed. If there are no jumps of Π(*λt*) on [0, *δ*] % *t*, then *ψ*(*t*; *λ*) = *ψ*(0, *λ*) = *ξ*<sup>0</sup> for all *t* ∈ [0, *δ*]. If Π(*λt*) has *k* > 0 jumps on [0, *δ*], then

$$\sup\_{0 \le t \le \delta} |\psi(t; \lambda) - \psi(0; \lambda)| = \max\{ |\xi\_1 - \xi\_0|, \dots, |\xi\_k - \xi\_0| \}.$$

Since (*ξi*) are i.i.d., conditioning on the value of *ξ*<sup>0</sup> = *x*, we obtain

$$\mathbb{P}\left(\max\{|\xi\_1 - \xi\_0|, \dots, |\xi\_k - \xi\_0|\} < r\right) = \int\_{-\infty}^{\infty} \mathbb{P}\left(|\xi\_1 - x| < r\right)^k dF(x)$$

and if *F*(*x*) and *F*(*x* + *r*) have no common discontinuities as functions of *x*, it implies

$$\mathbb{P}\left(\max\{|\tilde{\xi}\_1 - \tilde{\xi}\_0|, \dots, |\tilde{\xi}\_k - \tilde{\xi}\_0|\} \ge r\right) = 1 - \int\_{-\infty}^{\infty} \left(F(\mathbf{x} + r) - F(\mathbf{x} - r)\right)^k dF(\mathbf{x}).$$

For a fixed *<sup>λ</sup>*, the process <sup>Π</sup>(*λt*) has *<sup>k</sup>* jumps on [0, *<sup>δ</sup>*] with probability (*λδ*)*<sup>k</sup> <sup>k</sup>*! <sup>e</sup>−*λδ*, so by the law of total probability,

$$\begin{split} \mathbb{P}\Big(\sup\_{0\le s\le\delta} \left|\psi(s;\lambda)-\psi(0;\lambda)\right|\ge r\; \Big|\ \lambda\Big) \\ =\sum\_{k=1}^{\infty} \Big(1-\int\_{-\infty}^{\infty} \left(F(\mathbf{x}+r)-F(\mathbf{x}-r)\right)^{k} dF(\mathbf{x})\Big) \frac{(\lambda\delta)^{k}}{k!} \mathbf{e}^{-\lambda\delta} \\ = 1-\mathbf{e}^{-\lambda\delta}-\mathbf{e}^{-\lambda\delta} \int\_{-\infty}^{\infty} \left(\exp\left(\lambda\delta\left(F(\mathbf{x}+r)-F(\mathbf{x}-r)\right)\right)-1\right) dF(\mathbf{x}) \\ = \int\_{-\infty}^{\infty} \Big(1-\exp\left(-\lambda\delta\left(1-F(\mathbf{x}+r)+F(\mathbf{x}-r)\right)\right)\Big) dF(\mathbf{x}), \end{split}$$

where changing the order of summation and integration is justified by Fubini's theorem, and the last line follows by simple manipulations using <sup>∞</sup> <sup>−</sup><sup>∞</sup> *dF*(*x*) = 1. The claim (18) follows by taking expectation with respect to *λ*, and again, the order of integration can be changed by Fubini's theorem.

The equality (18) easily implies a bound for the probability in the left-hand part of (18) in terms of the so-called concentration function of a random variable *ξ* defined as

$$Q\_{\xi}(r) = \sup\_{\mathfrak{x} \in \mathbb{R}} \mathbb{P}(\mathfrak{x} \le \xi \le \mathfrak{x} + r).$$

The straightforward calculation shows that (18) implies that

$$\mathbb{P}\left(\sup\_{0\le t\le \delta}|\psi(t;\lambda)-\psi(0;\lambda)|\ge r\right)\le 1-L\_{\lambda}\left(\delta(1-Q\_{\tilde{\varphi}\_{0}}(2r))\right).\tag{19}$$

However, we need a more explicit bound. To obtain such bound, we analyze the behavior of the Laplace transform *Lλ*(*s*) for small *s*. It is postulated in Assumption 1, but for applications, it is convenient to obtain an explicit inequality. It can always be done by slightly reducing the power of *s*.

**Lemma 2.** *If λ satisfies Assumption 1, then for any ε* ∈ (0, *γ*)*, there exists a constant C* > 0 *such that*

$$0 \le 1 - L\_{\lambda}(s) \le \mathcal{C}s^{\gamma - \varepsilon}, \qquad s \ge 0. \tag{20}$$

**Proof.** Since (1 − *Lλ*(*s*))*s*−*γ*+*<sup>ε</sup>* → 0 as *s* ↓ 0, according to (10), the inequality (20) holds with *C* = 1 when *s* ∈ [0,*s*0) for some sufficiently small *s*<sup>0</sup> = *s*0(*γ*,*ε*, *Lλ*). The inequality for *s* ≥ *s*<sup>0</sup> can be fulfilled by increasing *C* if necessary.

A combination of the above statements gives an estimate for the probability of big changes of the PSI-process with random intensity on a small interval, provided that we can bound the tail probability for an individual random variable *ξ*0, say under some moment assumptions.

**Proposition 5.** *Suppose that the PSI-process ψ*(*t*; *λ*) *with the random intensity λ defined by* (1) *satisfies the assumptions of Proposition 4, that λ satisfies Assumption 1, and that* E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ *for some h* > 0*. Then, for any ε* ∈ (0, *γ*)*, there exists a constant C* > 0 *such that for all r* > 0 *and δ* ∈ [0, 1]

$$\mathbb{P}\left(\sup\_{0\le t\le \delta}|\psi(t;\lambda)-\psi(0;\lambda)|\ge r\right)\le C\delta^{\gamma-\varepsilon}r^{-2h(\gamma-\varepsilon)}.\tag{21}$$

**Proof.** Denote for short *m*2*<sup>h</sup>* := E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ by assumption. Take *r* > 0, then for any |*x*| < *r*/2

$$1 - F(\mathbf{x} + r) + F(\mathbf{x} - r) = \mathbb{P}(\xi\_0 \le \mathbf{x} - r \text{ or } \xi\_0 > \mathbf{x} + r) \le \mathbb{P}(|\xi\_0| \ge r/2) \le \frac{2^{2h} m\_{2h}}{r^{2h}}$$

by Markov's inequality. Thus, since *L<sup>λ</sup>* does not increase

$$\int\_{-r/2}^{r/2} \left[1 - L\_{\lambda} \left(\delta (1 - F(\mathbf{x} + r) + F(\mathbf{x} - r))\right)\right] dF(\mathbf{x}) \le 1 - L\_{\lambda} \left(4^h m\_{2h} \delta r^{-2h}\right). \tag{22}$$

On the other hand, 1 − *L<sup>λ</sup> δ*(1 − *F*(*x* + *r*) + *F*(*x* − *r*)) ≤ 1 − *Lλ*(*δ*) for any *x* ∈ R and *r* ≥ 0. Hence, for any *ε* > 0, again by the Markov inequality applied to |*ξ*0| <sup>2</sup>*h*(*γ*−*ε*), one has

$$\begin{split} \mathbb{P}\left(\int\_{-\infty}^{-r/2} + \int\_{r/2}^{\infty} \right) \left[1 - L\_{\lambda} \left(\delta (1 - F(\mathbf{x} + r) + F(\mathbf{x} - r))\right) \right] dF(\mathbf{x}) \\ \leq \left(1 - L\_{\lambda}(\delta)\right) \mathbb{P}\left(|\xi\_{0}| \geq r/2\right) \leq \left(1 - L\_{\lambda}(\delta)\right) \frac{2^{2h(\gamma - \varepsilon)} m\_{2h(\gamma - \varepsilon)}}{r^{2h(\gamma - \varepsilon)}} . \end{split} \tag{23}$$

Combining (22) and (23) and using Lemma 2, we obtain the result.

#### **6. Sums of PSI-Processes**

Since the limit of the normalized cumulative sums (*ζn*) is an a.s. continuous stochastic process, we can use Theorem 15.5 from Billingsley's book [11] (p. 127), which gives the conditions for convergence of processes from the Skorokhod space *D*[0, 1] to a process with realizations lying in *C*[0, 1] a.s., in terms of the modulus of continuity

$$\omega\_{\mathbb{\zeta}}(\delta) = \sup\_{s, t \in [0, 1] \atop |s - t| \le \delta} \{ |\zeta(s) - \zeta(t)| \}. \tag{24}$$

It claims that if


$$\mathbb{P}(\omega\_{\mathbb{S}\_{w}}(\delta) \ge w) \le \varepsilon, \qquad n \ge n\_{0}; \tag{25}$$

(iii) (*ζn*) converges weakly in terms of finite-dimensional distributions to some random function *ζ* as *n* → ∞,

then (*ζn*) converges to *ζ* as *n* → ∞, in *D*[0, 1] and *ζ* is continuous a.s.

In order to bound *ωζ<sup>n</sup>* in probability, Billingsley suggests to use a corollary to Theorem 8.3 in the same book, which can be formulated as follows. Suppose that *ζ* is some random element in *D*[0, 1], then for any *δ* > 0 and *w* > 0

$$\mathbb{P}(\omega\_{\mathbb{S}}(\delta) \ge 3w) \le \sum\_{i=0}^{\lfloor 1/\delta \rfloor - 1} \mathbb{P}\left(\sup\_{t \in [i\delta, (i+1)\delta]} |\zeta(t) - \zeta(i\delta)| \ge w\right). \tag{26}$$

The sum (26) can be estimated efficiently in our settings because *ζ<sup>n</sup>* is stationary by construction for any *n*. Hence, all the probabilities in the sum (26) are the same and

$$\mathbb{P}(\omega\_{\zeta\_n}(\delta) \ge 3w) \le \frac{1}{\delta} \mathbb{P}\left(\sup\_{t \in [0,\delta]} |\zeta\_n(t) - \zeta\_n(0)| \ge w\right). \tag{27}$$

**Remark 4.** *Actually, the events whose probabilities are added in the right-hand side of* (26) *are dependent since for a large n and a small δ, an appearance of a big (*≥ *ε) jump of ζ<sup>n</sup> on* [0, *δ*] *suggests that there are many jumps of some ψ*(*i*)(*t*; *λi*)*, and hence, the correspondent λ<sup>i</sup> is large; so it is probable that there would be many jumps on other intervals and a probability of a big jump is not too small. Perhaps this observation can be used to find a better bound than the union bound* (27)*, but we have not used it.*

In order to check assumption (ii) of Billingsley's theorem, we apply the following two-stage procedure. We use (27) to bound the "global" probability of jumps greater than *w* on some interval of the length *δ*. We aim to show that for any *w* > 0 and *ε* > 0, one can find positive *C*, *τ*, and *δ* such that

$$\mathbb{P}\left(\sup\_{t\in[0,\delta]} \left|\zeta\_{\mathbb{H}}(t) - \zeta\_{\mathbb{H}}(0)\right| \geq w\right) \leq \mathbb{C}\delta^{1+\tau} \qquad \text{and} \qquad \mathbb{C}\delta^{\tau} < \varepsilon \tag{28}$$

for all *n* greater than some *n*0. To this end, we first show that one can find positive *C*, *τ*, *δ*, and *n*<sup>0</sup> such that (28) holds for *n* = *n*<sup>0</sup> and then analyze the local structure of *ζ<sup>n</sup>* to show that (28) actually holds for all *n* ≥ *n*0.

Our analysis of sup*t*∈[0,*δ*] *ζn*(*t*) <sup>−</sup> *<sup>ζ</sup>n*(0) is based on the results of Section 5. Consider the Poisson processes with random intensity Π*i*(*λit*), *i* = 1, ... , *n*, used in the construction of *ψ*(1), ... , *ψ*(*n*), and denote *κn*(*δ*) the (random) number of these processes that have at least one jump on [0, *δ*]:

$$\kappa\_n(\boldsymbol{\delta}) := \sum\_{i=1}^n \mathbf{1} \{ \boldsymbol{\Pi}\_i(\lambda\_i \boldsymbol{\delta}) > 0 \}\,. \tag{29}$$

This is a binomial random variable with *n* trials and the success probability

$$p\_1 \equiv p\_1(\delta) := 1 - L\_\lambda(\delta). \tag{30}$$

Lemma 2 provides an upper bound for *p*1(*δ*). We are interested just in the case when *p*1(*δ*) is small compared to 1/*n*, that is, when E*κn*(*δ*) is small. Then, the probability that *κn*(*δ*) ≥ *b* decays fast enough even for an appropriately chosen but fixed *b*.

**Lemma 3.** *Let λ satisfy Assumption* 1*. Then, for any a* > 1/*γ, b* > *a*/(*aγ* − 1) *and c* > 0*, one can find positive τ and δ*<sup>0</sup> *such that for all n satisfying nδ*1/*<sup>a</sup>* ≤ *c, it holds*

$$\mathbb{P}(\kappa\_n(\delta) \ge b) \le \delta^{1+\tau}, \qquad \delta \in (0, \delta\_0).$$

**Proof.** The well-known Chernoff bound [16] (Theorem 2.1) ensures that for any *t* ≥ 0,

$$\mathbb{P}(\kappa\_n(\delta) \ge np\_1(\delta) + t) \le \exp\left(-f\left(t/(np\_1(\delta))\right)np\_1(\delta)\right),\tag{31}$$

where *f*(*x*)=(1 + *x*)log(1 + *x*) − *x*. For *a* > 1/*γ*, Lemma 2 along with the assumption *nδ*1/*<sup>a</sup>* ≤ *c* guarantee that *np*1(*δ*) ≤ *Cδγ*−1/*a*−*<sup>ε</sup>* for any *ε* ∈ (0, *γ*) and some *C* (which may depend on *ε*). Taking *ε* < *γ* − 1/*a* yields *np*1(*δ*) → 0 as *δ* ↓ 0. Plugging *t* = *b* − *np*1(*δ*), which is positive for small *δ*, into (31) gives

$$\begin{aligned} \log \mathbb{P}(\kappa\_{\mathfrak{n}}(\delta) \ge b) &\le -f\left((b/(np\_1(\delta)) - 1)np\_1(\delta) \\ &= -b(\log b - 1) + b\log(np\_1(\delta)) - np\_1(\delta) \\ &\le -b(\log b - 1 - \log c) + b(\gamma - 1/a - \varepsilon)\log \delta. \end{aligned}$$

Restricting *ε* further to be less than *γ* −1/*a* −1/*b*, which is positive by the assumptions, implies that the coefficient of log *δ*, that is *b*(*γ* − 1/*a* − *ε*), is bigger than 1, and Lemma 3 is proved.

**Lemma 4.** *Suppose that the random λ satisfies Assumption 1 and that* E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ *for some h* > 1/*γ*2*. Let* 0 < *c*<sup>1</sup> < *c*<sup>2</sup> < ∞*. Then for any a* ∈ (1/*γ*,(*hγ* − 1)/(1 − *γ*)) (*with the right bound understood as* ∞ *if γ* = 1) *and for any fixed w* > 0*, there exist positive δ*<sup>0</sup> *and τ such that for all n* ∈ [*c*1*δ*−1/*a*, *c*2*δ*−1/*a*]

$$\mathbb{P}\left(\sup\_{t\in[0,\delta]}|\zeta\_n(t)-\zeta\_n(0)|\geq w\right)\leq\delta^{1+\tau},\qquad\delta\in(0,\delta\_0].\tag{32}$$

**Proof.** Let *a* > 1/*γ* and *w* > 0 be fixed. Denote for short *δ* = *n*−*a*. By the law of total probability,

$$\begin{split} &\mathbb{P}\{\sup\_{t\in[0,\delta]}|\zeta\_{n}(t)-\zeta\_{n}(0)|\geq w\} \\ &=\sum\_{k=0}^{n}\mathbb{P}\{\sup\_{t\in[0,\delta]}|\zeta\_{n}(t)-\zeta\_{n}(0)|\geq w \mid \kappa\_{n}(\delta)=k\}\mathbb{P}\{\kappa\_{n}(\delta)=k\} \\ &\leq\sum\_{k=1}^{b-1}\mathbb{P}\{\sup\_{t\in[0,\delta]}|\zeta\_{n}(t)-\zeta\_{n}(0)|\geq w \mid \kappa\_{n}(\delta)=k\}\mathbb{P}\{\kappa\_{n}(\delta)=k\}+\mathbb{P}\{\kappa\_{n}(\delta)\geq b\} \end{split} \tag{33}$$

for any integer *b* ≥ 2. Consider an event *κn*(*δ*) = *k* ≥ 1, which means that not more than some *k* of *n* processes *ψ*(1), ... , *ψ*(*n*) jump on [0, *δ*], and other *n* − *k* processes are constant. Then, sup*t*∈[0,*δ*] |*ζn*(*t*) − *ζn*(0)| ≥ *w* implies that at least one of *k* PSI-processes that jumps on [0, *δ*] changes by more than *w* <sup>√</sup>*n*/*k*. So, for *<sup>k</sup>* <sup>≥</sup> 1,

$$\begin{split} \mathbb{P}\{\sup\_{t\in[0,\delta]} |\zeta\_{\boldsymbol{n}}(t) - \zeta\_{\boldsymbol{n}}(0)| \geq w \mid \kappa\_{\boldsymbol{n}}(\delta) = k\} \\ \leq & k \,\mathbb{P}\{\sup\_{t\in[0,\delta]} |\psi(t;\lambda) - \psi(0;\lambda)| \geq w\sqrt{n}/k \mid \Pi(\lambda \cdot) \text{jumps on } [0,\delta] \} \\ = & \frac{k}{p\_1} \,\mathbb{P}\{\sup\_{t\in[0,\delta]} |\psi(t;\lambda) - \psi(0;\lambda)| \geq w\sqrt{n}/k\}. \end{split} \tag{34}$$

Proposition 5 provides a bound for the probability in the right-hand part of (34), and since *κn*(*δ*) has the binomial distribution with the parameters *n* and *p*1, using the total probability formula, we continue (33) as

$$\begin{split} &\mathbb{P}\left(\sup\_{t\in[0,\delta]} |\zeta\_{n}(t) - \zeta\_{n}(0)| \geq w\right) \\ &\leq \sum\_{k=1}^{b-1} k \mathbb{P}\left(\sup\_{t\in[0,\delta]} |\psi(t;\lambda) - \psi(0;\lambda)| \geq w\sqrt{n}/k\right) \binom{n}{k} p\_{1}^{k-1} (1-p\_{1})^{n-k} + \mathbb{P}\left(\kappa\_{n}(\delta) \geq b\right) \\ &\leq \mathbb{C} \sum\_{k=1}^{b-1} k \binom{n}{k} \left(\frac{k^{2h}\delta^{k}}{w^{2h}n^{h}}\right)^{\gamma-\varepsilon} + \mathbb{P}\left(\kappa\_{n}(\delta) \geq b\right) \tag{35} \end{split} \tag{35}$$

for any *ε* ∈ (0, *γ*), *h* > 0 such that E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞, and some *C* depending on the choice of *ε*, where the last inequality follows from Proposition 5.

Suppose now that *h* > 1/*γ*2. Then 1/*γ* < (*hγ* − 1)/(1 − *γ*), where the right part is understood as ∞ if *γ* = 1. Choose *a* ∈ (1/*γ*,(*hγ* − 1)/(1 − *γ*)) and an integer *b* > *a*/(*aγ* − 1). Then, by Lemma 3, there exists a positive *τ* such that P *κn*(*δ*) ≥ *b* ≤ *δ*1+*<sup>τ</sup>* for small enough *δ*. Bounds *c*1*δ*−1/*<sup>a</sup>* ≤ *n* ≤ *c*2*δ*−1/*<sup>a</sup>* give

$$k\left(\frac{n}{k}\right)\left(\frac{k^{2h}\delta^k}{w^{2h}n^h}\right)^{\gamma-\varepsilon}\leq\frac{k^{2h(\gamma-\varepsilon)}}{(k-1)!w^{2h(\gamma-\varepsilon)}}\frac{n^k\delta^{k(\gamma-\varepsilon)}}{n^{h(\gamma-\varepsilon)}}\leq\frac{c\_2^k k^{2h(\gamma-\varepsilon)}}{c\_1^{h(\gamma-\varepsilon)}w^{2h(\gamma-\varepsilon)}}\delta^{k(\gamma-\varepsilon-1/a)+h(\gamma-\varepsilon)/a}.$$

Choosing *ε* < *γ* − 1/*a* ensures that the power of *δ* is minimal for *k* = 1, and the inequality *a* < (*hγ* − 1)/(1 − *γ*) guarantees that for *k* = 1 this power *γ* + (*hγ* − 1)/*a* − (1 + *h*/*a*)*ε* > 1 for small enough *ε*; thus, (32) follows from (35).

The estimates that are used in the proof of Lemma 4 essentially rely on the relation between *δ* and *n*. Therefore, this argument cannot be used to provide a bound (28) uniformly for all *n* ≥ *n*0. In order to obtain such bound, we apply the technique close to the one used in Billingsley's book [11] (Ch. 12). If we impose some moment condition on *ξ*0, then the following bound holds:

**Lemma 5.** *Suppose that* E*ξ*<sup>0</sup> = 0*,* E*ξ*<sup>2</sup> <sup>0</sup> = <sup>1</sup> *and* E|*ξ*0| <sup>2</sup>*<sup>h</sup>* < ∞ *for some h* > 1*. Then, for some constant C* > 0 *and for all n* = 1, 2, . . . *and* 0 ≤ *s* < *t* ≤ 1

$$\mathbb{E}\left|\zeta\_n(t) - \zeta\_n(s)\right|^{2h} \le \mathbb{C} \max\{p\_1(t-s)^h, p\_1(t-s)n^{1-h}\},\tag{36}$$

*where p*1(·) *is defined by* (30)*.*

**Proof.** Due to stationarity of *ζ<sup>n</sup>* for each *n*, it is enough to consider the case *s* = 0. For any *t* ≥ 0, we can represent the increment *ζn*(*t*) − *ζn*(0) as a sum of i.i.d. random variables

$$
\zeta\_n(t) - \zeta\_n(0) \stackrel{d}{=} \frac{1}{\sqrt{n}} \sum\_{i=1}^n \eta\_{i\cdot\cdot} \tag{37}
$$

$$\eta\_i \stackrel{d}{=} \left(\xi\_1 - \xi\_0\right) \mathbf{1}\{\Pi(\lambda t) > 0\}\,. \tag{38}$$

Each summand *η<sup>i</sup>* has a symmetric distribution, and two factors in the right-hand part of (38) are independent. By Rosenthal's inequality (see, e.g., [17] (Th. 2.9)), we obtain

$$\mathbb{E}\left|\zeta\_n(t) - \zeta\_n(0)\right|^{2h} \le \mathbb{C}n^{-h} \max\left\{ \left( \text{Var}\sum\_{i=1}^n \eta\_i \right)^h , n \to |\eta\_1|^{2h} \right\} \tag{39}$$

for some constant *C* > 0. Both moments can be easily evaluated. Since the summands are i.i.d., *n*

$$\text{Var}\sum\_{i=1}^{\bullet} \eta\_i = n \,\text{Var}\,\eta\_1 = np\_1(t) \,\text{Var}(\xi\_1 - \xi\_0) = 2np\_1(t),$$

because E **1**{Π(*λt*) > 0} = *p*1(*t*). Similarly,

$$\mathbb{E}|\eta\_1|^{2\mathfrak{h}} = p\_1(t)\mathbb{E}|\mathfrak{f}\_1 - \mathfrak{f}\_0|^{2\mathfrak{h}}.$$

Plugging these two values into (39), we readily obtain (36), maybe with another constant *C* than in (39).

**Corollary 2.** *Suppose that Assumption 1 holds, and h* > 1/*γ in the settings of Lemma 5. Then, for any fixed w* > 0*, one can find positive δ*<sup>1</sup> *and τ such that for all n* ≥ (*t* − *s*)−(*γ*+1)/(*h*+1) *it holds*

$$\mathbb{P}\left(\left|\mathbb{\zeta}\_{n}(t) - \mathbb{\zeta}\_{n}(s)\right| \geq w\right) \leq (t - s)^{1 + \tau}, \qquad t - s \in (0, \delta\_{1}].\tag{40}$$

**Proof.** By the Markov inequality, we have

$$\mathbb{P}\left(\left|\zeta\_n(t) - \zeta\_n(s)\right| \ge w\right) \le \mathbb{E}\left|\zeta\_n(t) - \zeta\_n(s)\right|^{2\mathfrak{h}} w^{-2\mathfrak{h}}, \qquad 0 \le s < t \le 1.$$

Lemma 5 gives a bound for the right-hand side in terms of *p*1(*t* − *s*) and *n*. Lemma 2 provides the upper bound for *p*1(*t* − *s*), and the condition on *n* imposed in the claim implies *n*−<sup>1</sup> ≤ (*t* − *s*)(*γ*+1)/(*h*+1). Hence, for any *ε* > 0, there exists a constant *C* > 0 such that for all 0 ≤ *s* < *t* ≤ 1

$$\mathbb{P}\left(\left|\mathbb{\zeta}\_{n}(t) - \mathbb{\zeta}\_{n}(s)\right| \geq w\right) \leq \mathbb{C}' \max\{ (t-s)^{h(\gamma-\varepsilon)}, (t-s)^{\gamma-\varepsilon+(h-1)(\gamma+1)/(h+1)} \}.$$

Taking *ε* = (*hγ* − 1)/(*h* + 1), which is positive by the assumptions, makes both exponents above equal: *h*(*γ* − *ε*) = *γ* − *ε* + (*h* − 1)(*γ* + 1)/(*h* + 1) = 1 + *ε*. Hence, this choice of *ε* yields (40) with *τ* = *ε* for all 0 ≤ *s* < *t* ≤ 1, but with a constant in the right-hand side of the inequality. Reducing to *t* − *s* lying in a proper interval (0, *δ*1] allows us to get rid of the constant.

**Proof of Theorem 1.** Without loss of generality, we may assume *T* = 1 (otherwise perform a non-random time change *t* \$→ *t*/*T*). We need to show that the conditions of Theorem 15.5 of [11] (recalled in the beginning of Section 6) hold. Condition (iii) was already verified (see Corollary 1), and it implies condition (i). So it remains to check condition (ii), which follows from (28).

Suppose that we are given positive *ε* and *w* and want to find *δ* and *n*<sup>0</sup> such that (25) holds. Lemma 4 applied with *c*<sup>1</sup> = 1/2, *c*<sup>2</sup> = 2 implies that for some positive *δ*0, *τ* and any *a* ∈ (1/*γ*,(*hγ* − 1)/(1 − *γ*)) inequality (32) holds for *δ* ∈ (0, *δ*0]. Corollary 2 guarantees that for some positive *δ*1, inequality (40) holds for *n* sufficiently large and *δ* ∈ (0, *δ*1], and in our application below, the lower bound on *n* will be fulfilled if *a* < (*h* + 1)/(*γ* + 1). Choose some *a* ∈ (1/*γ*, min{(*hγ* − 1)/(1 − *γ*),(*h* + 1)/(*γ* + 1)}) (this interval is not empty if *h* > 1/*γ*2), fix a positive *δ* ≤ min{*δ*0, *δ*1} and let *n*<sup>0</sup> = *δ*−1/*a*.

For this choice of parameters, Lemma 4 (again with *c*<sup>1</sup> = 1/2, *c*<sup>2</sup> = 2) ensures that (28) holds for all *n* ∈ [*n*0, 2*n*0]. Suppose now that *n* > 2*n*<sup>0</sup> and let *m* = *naδ*. (Note that *a* > 1/*γ* ≥ 1, so *m* ≥ 2 if *n* > 2*n*0.) Then for *c*<sup>1</sup> = 1/2, *c*<sup>2</sup> = 2 we have *n* ∈ [*c*1(*δ*/*m*)−1/*a*, *c*2(*δ*/*m*)−1/*a*], so (32) holds with *δ*/*m* instead of *δ*, implying that for any *i* = 1, . . . , *m*

$$\mathbb{P}\left(\sup\_{t\in[\delta(i-1)/m,\delta i/m]} \left|\zeta\_n(t) - \zeta\_n(\delta(i-1)/m)\right| \ge w\right) \le (\delta/m)^{1+\tau},\tag{41}$$

due to the stationarity of *ζn*. Let

$$Z\_m(\delta) := \max\_{i=1,\dots,m} \left\{ \left| \zeta\_n(\delta i/m) - \zeta\_n(0) \right| \right\}.$$

Take *s* = *iδ*/*m* and *t* = *jδ*/*m* for some 0 ≤ *i* < *j* ≤ *m*. Now, we aim to apply Corollary 2 for these *s* and *t*. Note that *t* − *s* ∈ (0, *δ*1) by the choice of *δ*, so it remains to check that the assumption *n* ≥ (*t* − *s*)−(*γ*+1)/(*h*+1) holds. Indeed, *t* − *s* ≥ *δ*/*m* and *m*/*δ* ≤ *na*; thus, (*t* − *s*)−(*γ*+1)/(*h*+1) ≤ *na*(*γ*+1)/(*h*+1) < *n* by the choice of *a*. Hence, Corollary 2 implies

$$\mathbb{P}\left( \left| \mathbb{S}\_n(j\delta/m) - \mathbb{S}\_n(i\delta/m) \right| \geq w \right) \leq \left( (j-i)\delta/m \right)^{1+\tau'} $$

for some *τ* > 0. Hence, Theorem 12.2 from Billingsley's book [11] implies that

$$\mathbb{P}(Z\_{\mathfrak{m}}(\delta) \ge w) \le K\delta^{1+\tau'} \tag{42}$$

for some *K* > 0, which depends on *τ* but not on *δ*.

Suppose now that *Zm*(*δ*) < *w* and sup*t*∈[*δ*(*i*−1)/*m*,*δi*/*m*] *ζn*(*t*) <sup>−</sup> *<sup>ζ</sup>n*(*δ*(*<sup>i</sup>* <sup>−</sup> <sup>1</sup>)/*m*) <sup>&</sup>lt; *<sup>w</sup>* for all *i* = 1, . . . , *m*. Then, sup*t*∈[0,*δ*] |*ζn*(*t*) − *ζn*(0)| < 2*w* by the triangle inequality. Hence,

$$\begin{aligned} \mathbb{P}\{\sup\_{t\in[0,\delta]}|\zeta\_n(t)-\zeta\_n(0)|\geq 2w\} \\ \leq \mathbb{P}(Z\_m(\delta)\geq w) + m\mathbb{P}\left(\sup\_{t\in[0,\delta/m]}|\zeta\_n(t)-\zeta\_n(0)|\geq w\right) \leq (K+1)\delta^{1+\tau\_1} \end{aligned}$$

with *τ*<sup>1</sup> = min{*τ*, *τ* }, by inequalities (41) and (42). This argument works for any *δ* ≤ min{*δ*0, *δ*1}, with *δ*<sup>0</sup> and *δ*<sup>1</sup> given by Lemma 4 and Corollary 2, and choosing *δ* > 0 small enough, one can guarantee that (*K* + 1)*δτ*<sup>1</sup> ≤ *ε*. This proves (28) (with 2*w* instead of *w*, but *w* > 0 is arbitrary) for all *n* ≥ *n*0, and the claim follows by application of Theorem 15.5 from Billingsley's book [11].

#### **7. Conclusions**

The functional limit theorem for normalized cumulative sums of PSI-processes (Theorem 1) can be used in opposite directions. The PSI-processes are very simple, and some results can be obtained directly for their sums and imply the corresponding facts for the limiting stationary Gaussian process *ζ*. On the other hand, the theory of stationary Gaussian processes has been deeply developed in the last few decades, and some results of this theory can have consequences for the pre-limiting processes (*ζn*), which model a number of real life phenomena.

When *γ* < 1 in Assumption 1, there is some gap between the conditions implied by the counterexample of Section 4, that is E|*ξ*0| 2/*γ*+*<sup>ε</sup>* < ∞ for some *ε* > 0, and the actual condition E|*ξ*0| 2/*γ*2+*<sup>ε</sup>* < ∞ (see (11)) under which Theorem 1 is proven. Also, if E*λ* < ∞, it is still unclear if just the finiteness of the variance E*ξ*<sup>2</sup> <sup>0</sup> < ∞ would be sufficient for the convergence in the Skorokhod space.

**Author Contributions:** Writing – original draft, Y.Y., O.R. and A.G.; Writing – review & editing, Y.Y., O.R. and A.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** The reported study was funded by RFBR, project number 20-01-00646 A.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors express their gratitude to A.V. Liulintsev (the last year student at the Math. and Mech. Dept. of St. Petersburg State University, a participant of the project 20-01-00646 A) for active discussion of *M*-processes studied in Section 3.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


## *Article* **Poissonization Principle for a Class of Additive Statistics**

**Igor Borisov \* and Maman Jetpisbaev**

Laboratory of Probability Theory and Mathematical Statistics, Sobolev Institute of Mathematics, Novosibirsk State University, 630090 Novosibirsk, Russia

**\*** Correspondence: sibam@math.nsc.ru

**Abstract:** In this paper, we consider a class of additive functionals of a finite or countable collection of the group frequencies of an empirical point process that corresponds to, at most, a countable partition of the sample space. Under broad conditions, it is shown that the asymptotic behavior of the distributions of such functionals is similar to the behavior of the distributions of the same functionals of the accompanying Poisson point process. However, the Poisson versions of the additive functionals under consideration, unlike the original ones, have the structure of sums (finite or infinite) of independent random variables that allows us to reduce the asymptotic analysis of the distributions of additive functionals of an empirical point process to classical problems of the theory of summation of independent random variables.

**Keywords:** empirical point process; Poisson point process; Poissonization; group frequency; additive functional

**MSC:** 60F05

#### **1. Introduction**

In this paper, we study a class of additive functionals (statistics) of a finite or countable collection of group frequencies constructed by a sample of size *n* with a finite or countable partition of the sample space. Under broad conditions, it is shown that, as *n* → ∞, the asymptotic behavior of distributions of the additive functionals under consideration is completely similar to the behavior of distributions of the same functionals of the accompanying Poisson point process. From here it is easy to establish that the above-mentioned weak convergence is equivalent to that for the same additive functionals but with independent group frequencies, which are constructed, respectively, using a finite or countable collection of independent copies of the original sample, when we fix in the *i*-th partition element only the points from the *i*-th independent copy of the original sample. In other words, in the case under consideration, we remove the dependence of the initial group frequencies with a multinomial distribution. This phenomenon makes it possible to directly use the diverse tool of the summation theory of independent random variables to study the limiting behavior of the additive statistics being considered.

The structure of this paper is as follows. In Section 2, we introduce the empirical and accompanying Poisson vector point processes and formulate some important results regarding their connection. In Section 3, we introduce a class of additive statistics and give a number of examples. Section 4 contains the main result of the paper, i.e., a duality theorem, which states that an original additive statistic with some normalizing and centering constants weakly converges to a limit if, and only if, their Poisson version with the same normalizing and centering constants weakly converges to the same limit. In Section 5, we discuss some applications of the duality theorem. In Section 6, we present moment inequalities connecting the original additive statistics and their Poisson versions. Section 7 is devoted to asymptotic analysis of first two moments of additive statistics connected with an infinite multinomial urn model. Section 8 contains proofs of all results of the paper. Finally, in Section 9, we summarize the results and discuss some their extensions.

**Citation:** Borisov, I.; Jetpisbaev, M. Poissonization Principle for a Class of Additive Statistics. *Mathematics* **2022**, *10*, 4084. https://doi.org/ 10.3390/math10214084

Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

Received: 5 September 2022 Accepted: 29 October 2022 Published: 2 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Empirical and Poisson Point Processes**

Let {*X*(*k*) *<sup>i</sup>* , *i* ≥ 1}, *k* = 1, *m* be a finite set of independent copies of a sequence of independent identically distributed random variables with values in an arbitrary measurable space (X, A) and distribution *P*. For any natural *n*1, ... , *nm*, consider *m* independent empirical point processes based on respective samples *X*(*k*) <sup>1</sup> ,..., *<sup>X</sup>*(*k*) *nk* , *k* = 1, *m*:

$$V\_{n\_k}^{(k)}(A) := \sum\_{i=1}^{n\_k} I\_A(X\_i^{(k)}), \quad k = \overline{1,m}, \quad A \in \mathcal{A}.$$

Define the *m* independent accompanying Poisson point processes as

$$\Pi\_{n\_k}^{(k)}(A) := \sum\_{i=1}^{\pi\_k(n\_k)} I\_A(X\_i^{(k)}), \ k = \overline{1, m}, \ A \in \mathcal{A}\_\nu$$

where *πk*(*t*), *k* = 1, *m*, are independent standard Poisson processes on the positive halfline, which do not depend on all sequences {*X*(*k*) *<sup>i</sup>* ; *i* ≥ 1}, *k* = 1, *m*. In other words, Π*nk* (*A*) = *Vπ<sup>k</sup>* (*nk* )(*A*) for all *k* = 1, *m*. We consider the point processes *Vnk* (·) and Π*nk* (·) as stochastic processes with trajectories from the measurable space (BA, C) of all bounded functions indexed by the elements of the set A, with the *σ*-algebra C of all cylindrical subsets of the space BA. The distributions of stochastic processes *Vnk* (·) and <sup>Π</sup>*nk* (·) on C are defined in a standard way.

Now, we introduce the vector-valued empirical and accompanying Poisson point processes *Vn*¯(*A*) := (*V*(1) *<sup>n</sup>*<sup>1</sup> (*A*),..., *<sup>V</sup>*(*m*) *nm* (*A*)) <sup>≡</sup> *Vn*¯,

$$V\_{\vec{n}}(A) := (V\_{n\_1}^{(1)}(A), \dots, V\_{n\_m}^{(m)}(A)) \equiv V\_{\vec{n}\prime}$$

$$\Pi\_{\hbar}(A) := (\Pi\_{n\_1}^{(1)}(A), \dots, \Pi\_{n\_m}^{(m)}(A)) \equiv \Pi\_{\hbar\prime}$$

where *n*¯ = (*n*1, *n*2, ... , *nm*). The vector-valued point processes *Vn*¯ and Π*n*¯ are considered as random elements with values in the measurable space ((BA)*m*, C*m*).

Let *A*<sup>0</sup> ∈ A with *p* := *P*(*A*0) ∈ (0, 1). Consider the restrictions of the vector point processes *Vn*¯ and Π*n*¯ to the set

$$\mathcal{A}\_0 := \{ A \in \mathcal{A} : A \subseteq A\_0 \}. \tag{1}$$

These so-called <sup>A</sup>0-restrictions are denoted by *<sup>V</sup>*<sup>0</sup> *<sup>n</sup>*¯ and <sup>Π</sup><sup>0</sup> *<sup>n</sup>*¯, respectively. For the distributions L(*V*<sup>0</sup> *<sup>n</sup>*¯) and <sup>L</sup>(Π<sup>0</sup> *<sup>n</sup>*¯) in the measurable space ((BA)*m*, C*m*), there are the following three assertions (some particular versions of these assertions have been proved in [1,2]).

**Theorem 1.** *The following inequality is valid*:

$$
\mathcal{L}(\nabla\_n^0) \le \frac{1}{(1-p)^m} \mathcal{L}(\Pi\_n^0). \tag{2}
$$

**Corollary 1.** *For any non-negative measurable functional F defined on* ((BA)*m*, C*m*)*,*

$$\mathbb{E}F(\overline{V}\_n^0) \le \frac{1}{(1-p)^m} \mathbb{E}F(\overline{\Pi}\_n^0);\tag{3}$$

*the expectation on the right-hand side of* (3) *may be infinite at that.*

The following result plays an essential role in proving the main result of the paper—a duality limit theorem for the distributions L(*Vn*¯) and L(Π*n*¯) (see Theorem 3 below).

**Theorem 2.** *For each multi-index <sup>n</sup>*¯*, one can define some vector point processes <sup>V</sup>*0<sup>∗</sup> *<sup>n</sup>*¯ *and* <sup>Π</sup>0<sup>∗</sup> *<sup>n</sup>*¯ *on a common probability space so that they coincide in distribution with the point processes <sup>V</sup>*<sup>0</sup> *<sup>n</sup>*¯ *and* <sup>Π</sup><sup>0</sup> *n*¯*, respectively, and*

$$\sup\_{\mathcal{A}\subseteq\mathcal{A}\_{c}}\mathbb{P}\left(\sup\_{A\in\mathcal{A}\_{c}}\left\|\left|\overline{\mathcal{V}}^{0\*}\_{n}(A)-\overline{\Pi}^{0\*}\_{n}(A)\right|\right\|\neq 0\right)\leq 1-(1-p)^{m}$$

*where* (*z*1, ... , *zm*) := max*k*≤*<sup>m</sup>* |*zk*|*, and the outer supremum is taken over all at most countable families* A*<sup>c</sup> of sets from* A0*.*

**Remark 1.** *In Theorem 2, the sup-seminorm* sup *A*∈A*<sup>c</sup>* · *is obviously measurable with respect to the*

*cylindrical σ-algebra* C*m. If instead of* A*<sup>c</sup> we substitute the entire class* A<sup>0</sup> *(possibly uncountable) then this measurability may no longer exist (unless, of course, the point processes under consideration do not have the separability property). Nevertheless, the assertion of Theorem 2 remains valid in this case if the probability* **<sup>P</sup>** *is replaced by the outer probability* **<sup>P</sup>**∗(*No*) := inf*N*∈C*m*:*N*⊇*No* **<sup>P</sup>**(*N*)*. However, the outer probability has only the property of semiadditivity, which makes it difficult to use.*

*Let measurable sets* Δ1, Δ2, ... *form a finite or countable partition of the sample space under the condition pi* := *P*(Δ*i*) > 0 *for all i. Without loss of generality, we can assume that the sequence* {*pi*} *is monotonically nonincreasing. Denoted by ν* (*k*) *nk*1, *ν* (*k*) *nk*2, ...*, <sup>k</sup>* = 1, *m, the corresponding group frequencies are defined by the sample X*(*k*) <sup>1</sup> ,..., *<sup>X</sup>*(*k*) *nk . Put*

$$\psi\_{i\mathfrak{h}} := \overline{V}\_{\mathfrak{h}}(\Delta\_i) = \left(\nu^{(1)}\_{n\_1 i'}, \dots, \nu^{(m)}\_{n\_m i}\right), \ i = 1, 2, \dots, \ell$$

*Let us agree that everywhere below the limit relation n*¯ → ∞ *will be understood as nk* → ∞ *for all k* = 1, *m.*

#### **3. Additive Statistics: Examples**

In the paper, we consider a class of additive statistics of the form

$$\Phi\_f(\overline{\mathcal{V}}\_\mathbb{n}) := \sum\_{i \ge 1} f\_{i\mathbb{n}}(\overline{\nu}\_{i\mathbb{n}})\_\prime \tag{5}$$

where *f* ≡ { *fin*¯ } is an array of arbitrary finite functions defined on Z*<sup>m</sup>* <sup>+</sup> under the condition

$$\sum\_{i\geq 1} |f\_{i\mathfrak{n}}(0, \dots, 0)| < \infty \quad \forall n,\tag{6}$$

which ensures the correct definition of the functional Φ*f*(*Vn*¯) in the case of a countable partition of the sample space, since the sum under consideration contains only a finite set of nonzero random vectors *ν*¯*in*¯. In the case of a finite partition and *m* = 1, additive functionals of the form (5) were considered in [3–5].

We now give some examples of such statistics.

(1) Consider a finite partition {Δ*i*; *i* = 1, ... , *N*} of the sample space. Put *fin*¯(*x*¯) := <sup>|</sup>*x*¯−*np*¯ *<sup>i</sup>*<sup>|</sup> 2 <sup>|</sup>*np*¯ *<sup>i</sup>*<sup>|</sup> , *<sup>i</sup>* <sup>=</sup> 1, ... , *<sup>N</sup>*, where |·| is the standard Euclidean norm in <sup>R</sup>*m*. Then the functional

$$\Phi\_{\chi^2}(\overline{V}\_{\hbar}) := \sum\_{i=1}^{N} \frac{|\bar{\nu}\_{i\hbar} - \bar{n}p\_i|^2}{|\bar{n}p\_i|} \tag{7}$$

is an *m*-variate version of a well-known *χ*2-statistic. Note that, in the present paper, we are primarily interested in the case where *N* ≡ *N*(*n*¯) → ∞ as *n*¯ → ∞.

(2) Let now the sizes of all *m* samples be equal: *nj* = *n*, *j* = 1, ... , *m*. In an equivalent reformulation of the original problem, we consider a sample of *m*-dimensional observations {(*X*<sup>1</sup> *<sup>i</sup>* , ... , *<sup>X</sup><sup>m</sup> <sup>i</sup>* ); *i* ≤ *n*} under the main hypothesis that the sample vector coordinates are independent and have the same *N*-atomic distribution with unknown masses *p*1, ... , *pN*. In this case, the log-likelihood function can be represented as the additive functional

$$\Phi\_{\log}(\nabla\_{\boldsymbol{\theta}}) := \sum\_{i=1}^{N} (\vec{v}\_{i\bar{n}'} \, \mathrm{I}) \log p\_{i\prime}$$

where 1 is the unit vector in ¯ R*<sup>m</sup>* and (·, ·) is the Euclidean inner product.

(3) Consider a finite or countable partition {Δ*i*; *i* ≥ 1}. Let *fin*¯(*x*¯) ≡ *f*(*x*¯) := *IB*(*x*¯) be the indicator function of some subset *B* ⊂ Z*<sup>m</sup>* <sup>+</sup> . Then the functional

$$\Phi\_{I\_B}(\overline{V}\_{\mathbb{R}}) := \sum\_{i \ge 1} I\_B(\overline{v}\_{i\mathbb{R}}) \tag{8}$$

counts the number of partition elements (cells) containing any number of vector sample observations from the range *B* in a multinomial scheme (finite or infinite) of placing particles into cells (see [6–12]). Note that in the case of an infinite multinomial scheme in (8), it is additionally assumed that 0 /∈ *B*.

In the case *m* = 2 and *B* = {(*x*, *y*) ∈ Z<sup>2</sup> <sup>+</sup> : *x* = 0, *y* > 0}, the two-sample statistic (8) counts the number of nonempty cells after second ("additional") series of trials ("future" sample), which were empty in the first series ("original" sample). Statistics of such a kind play an important role in the theory of species sampling (for example, see [13,14]). In this case the functional (8) is called the number of unseen species in the original sample.

(4) In the case *m* = 1, consider the joint distribution (see [10]) of the random variables

$$\Phi\_{I\_B}(V\_{n\_1})\_{\prime}\Phi\_{I\_B}(V\_{n\_1+n\_2})\_{\prime}\ldots\Box\_{I\_B}(V\_{n\_1+\ldots+n\_m})\_{\prime}$$

defined in (8) by the sample (*X*1, ... , *XN*), with *N* = *n*<sup>1</sup> + ... + *nm*. It is clear that studying the asymptotic behavior of the joint distribution of these random variables (for example, proving the multidimensional central limit theorem) can be reduced to the study of the limit distributions of the linear combinations of the form

$$a\_1 \Phi\_{I\_B}(V\_{n\_1}) + a\_2 \Phi\_{I\_B}(V\_{n\_1 + n\_2}) + \dots + a\_m \Phi\_{I\_B}(V\_{n\_1 + \dots + n\_m})$$

for almost all vectors (*a*1, ... , *am*) with respect to the Lebesgue measure on R*m*. It is easy to see that, for any natural *j* ≤ *m*,

$$V\_{n\_1 + \ldots + n\_j} = V\_{n\_1}^{(1)} + \ldots + V\_{n\_j}^{(j)} \ . $$

where the empirical point processes *<sup>V</sup>*(1) *<sup>n</sup>*<sup>1</sup> , ... , *<sup>V</sup>*(*j*) *nj* are defined by the above-mentioned independent subsamples. So, in this case, we deal with a functional of the form (5) defined by *m* independent empirical point processes corresponding to the *m* independent subsamples (*X*1, ... , *Xn*<sup>1</sup> ), (*Xn*1+1, ... , *Xn*1+*n*<sup>2</sup> ),. . . , (*XN*−*nm*+1, ... , *XN*), and with the array of functions

$$f\_{\vec{n}\vec{n}}(\vec{x}) \equiv f(\mathbf{x}\_1, \dots, \mathbf{x}\_m) := a\_1 I\_B(\mathbf{x}\_1) + a\_2 I\_B(\mathbf{x}\_1 + \mathbf{x}\_2) + \dots + a\_m I\_B(\mathbf{x}\_1 + \dots + \mathbf{x}\_m). \tag{9}$$

(5) Consider the stochastic process {Φ*IB* (*Vn*¯); *<sup>B</sup>* ⊂ Z*<sup>m</sup>* <sup>+</sup>} indexed by all subsets of Z*<sup>m</sup>* +. As was noted above, studying the asymptotic behavior of the joint distributions of this process can be reduced to studying the asymptotic behavior of the distributions of any linear combinations of corresponding one-dimensional projections of this process, i.e., to studying the asymptotic behavior of the distributions of functionals of the form (5) for *m* = 1 and the array of functions

$$f\_{i\mathfrak{h}}(\mathbf{x}) \equiv f(\mathbf{x}) := a\_1 I\_{B\_1}(\mathbf{x}) + a\_2 I\_{B\_2}(\mathbf{x}) + \dots + a\_r I\_{B\_r}(\mathbf{x}) \tag{10}$$

for almost all vectors (*a*1, ... , *ar*). For one-point sets, the asymptotic analysis of the abovementioned joint distributions can be found, for example, in [7–12].

(6) Consider the case *m* = 1 and the functional

$$\Phi\_f(V\_n) := \sum\_{i \ge 1} n p\_i I\_B(\nu\_{in})\_\prime \tag{11}$$

which counts the sampling ratio of the cells containing any number of particles from the range *B*. For the one-point set *B* = {0}, such functional was considered in [9]. In general, if instead of *npi* in (11) we consider arbitrary weights *g*(*n*, *i*) > 0 (under condition (6)) with one or another interpretation, the functional Φ*f*(*Vn*) in this case will be interpreted as the total weight of the corresponding cells.

#### **4. Poissonization: Duality Theorem**

In this section, we present the main result of the paper—a duality theorem for additive statistics under consideration. First of all, we explain the term "Poissonization". It means that studying the limit behavior of the original additive statistics, we reduce the problem to studying the following "Poissonian version" of the functional (5) under condition (6):

$$\Phi\_f(\Pi\_{\vec{n}}) := \sum\_{i \ge 1} f\_{i\mathfrak{n}}(\pi\_{i\mathfrak{n}})\_\prime \tag{12}$$

where *π*¯*in*¯ = *π*(1) *n*1*i* ,..., *π*(*m*) *nmi* , *π*(*k*) *nki* := <sup>Π</sup>*nk* (Δ*i*), *<sup>i</sup>* ≥ 1, is a sequence of independent Poisson random variables with respective parameters *nk pi*. It is clear that the functional (12) is well defined with probability 1 since only a finite number of the vectors {*π*¯*in*¯ } differ from the zero vector. Independence of the summands is a crucial difference of the Poisson version of an additive functional from the original one. Some elements of Poissonization for additive functionals of the form (8) and (10) are contained, for example, in [9,12]. In [9], the author used the well-known representation of an empirical point process as the conditional Poisson point process under the condition that the number of atoms of the accompanying Poisson point process equals *n*. Moreover, in [9], the simple known representation *π*(*n*) = *n* + *Op*( <sup>√</sup>*n*) was employed, where *Op*( <sup>√</sup>*n*) denotes a random variable such that *Op*( <sup>√</sup>*n*)/ <sup>√</sup>*<sup>n</sup>* is bounded in probability as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>. In [12], proving the multivariate central limit theorem for the above-mentioned joint distributions (in fact, for functionals of the form (10) in the case of one-point subsets {*Bi*}), the authors applied a reduction to the joint distributions of the Poissonian versions of additive functionals using known upper bounds for a multivariate Poisson approximation to a multinomial distribution (see also [15]). The main goal of the paper is to establish a duality theorem, which demonstrates absolute identity of the asymptotic behavior of the distributions of the additive functionals under consideration and their Poissonian versions.

First, we formulate a crucial auxiliary assertion in proving the main result.

**Lemma 1.** *Let* {Δ*n*¯ } *be an arbitrary scalar array satisfying the condition fin*¯(*πin*¯)Δ*n*¯ *p* → 0 *as n*¯ → ∞ *for every fixed i. Then, for each multiindex n*¯*, one can define on a common probability space a pair of point processes <sup>V</sup>*<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ *and* <sup>Π</sup><sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ *such that* <sup>L</sup>(*V*<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯) = <sup>L</sup>(*Vn*¯)*,* <sup>L</sup>(Π<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯) = L(Π*n*¯)*, and for any ε* > 0*,*

$$\mathbf{P}\left(|\Delta\_{\mathfrak{A}}| \left| \Phi\_f(\overline{V}\_{\vec{n},\Delta\_{\mathfrak{A}}}^{\*}) - \Phi\_f(\overline{\Pi}\_{\vec{n},\Delta\_{\mathfrak{A}}}^{\*}) \right| > \varepsilon\right) \to 0 \quad \text{as} \quad \vec{n} \to \infty. \tag{13}$$

**Remark 2.** *Lemma 1 only asserts that the marginal distributions (that is, for each n*¯ *separately) of the arrays* {*V*<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ , *<sup>n</sup>*¯ ∈ Z*<sup>m</sup>* <sup>+</sup>} *and* {*Vn*¯, *n*¯ ∈ Z*<sup>m</sup>* <sup>+</sup>}*, and also* {Π<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ , *<sup>n</sup>*¯ ∈ Z*<sup>m</sup>* <sup>+</sup>} *and* {Π*n*¯, *n*¯ ∈ Z*<sup>m</sup>* <sup>+</sup>}*. Note that the probability in (13) is precisely determined by the marginal distributions of the mentioned random arrays, i.e., formally, it also depends on n*¯*. Without loss of generality, we can assume that pairs of point processes* (*V*<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ *,* <sup>Π</sup><sup>∗</sup> *<sup>n</sup>*¯,<sup>Δ</sup> *barn* ) *are independent in n*¯ *, and on this extended probability space, the universal probability measure* **P** *in (13) is given in the standard way, which no longer depends on n*¯*. In this case it is correct to speak about the convergence to zero in probability of the sequence of random variables in (13).*

Lemma 1 gives the key to the proof of the following duality theorem, a criterion for the weak convergence of distributions of functionals of the point processes under consideration. The essence of this result is that the asymptotic behavior of the distributions of additive functionals of the point processes *Vn*¯ and Π*n*¯ is exactly the same. In addition, one can also indicate a third class of additive functionals (under condition (6)) that has the same property:

$$\Phi\_f^\* := \sum\_{i \ge 1} f\_{in}(\vec{v}\_{in}^\*)\_{\prime \prime}$$

where {*ν*¯<sup>∗</sup> *in*¯, *<sup>i</sup>* <sup>≥</sup> <sup>1</sup>} is a sequence of independent random vectors such that <sup>L</sup>(*ν*¯<sup>∗</sup> *in*¯) = L(*ν*¯*in*¯) for all *i*. The functional Φ<sup>∗</sup> *<sup>f</sup>* is well defined due to the Borel–Cantelli lemma and the simple estimate **P**(*ν*¯<sup>∗</sup> *in*¯ = 0) = **P**(*ν*¯*in*¯ = 0) ≤ *mn*¯*pi*.

Let us agree that the symbol «=⇒» in what follows will denote the weak convergence of distributions. The main result of the paper is as follows.

**Theorem 3.** *Under the conditions of Lemma* 1*, the following three limit relations are equivalent as n*¯ → ∞:

$$(1)\ \mathcal{L}\left(\Phi\_f(\overline{\nabla}\_{\hbar})\Delta\_{\hbar}-M\_{\hbar}\right) \Longrightarrow \mathcal{L}(\gamma),$$

$$(2)\ \mathcal{L}\left(\Phi\_f(\overline{\Pi}\_{\hbar})\Delta\_{\hbar}-M\_{\hbar}\right) \Longrightarrow \mathcal{L}(\gamma),$$

$$(3)\ \mathcal{L}\left(\Phi\_f^\*\Delta\_{\hbar}-M\_{\hbar}\right) \Longrightarrow \mathcal{L}(\gamma),$$

*where Mn*¯ *and* Δ*n*¯ *are some scalar arrays and γ is some random variable.*

#### **5. Applications**

Theorem 3 allows us to reduce the asymptotic analysis of the distributions of the additive functionals under consideration to a similar analysis of their Poissonian versions, i.e., to the asymptotic analysis of distributions of sums (finite or infinite) of independent random variables, or to reduce the problem to studying the limit behavior of the distributions L Φ*f*(*Vn*¯ , absolutely ignoring the dependence of the random variables {*ν*¯*in*¯, *i* ≥ 1}. Note also that, under some rather broad assumptions, the law L(*γ*) will be infinitely divisible. A detailed analysis of such conditions and corresponding examples will be considered in a separate paper. Here we present only a few of these corollaries, focusing our attention on the equivalence of the first two relations of Theorem 3.

First of all, we note one useful property of the expectations of the functionals under consideration as functions of *n*¯.

**Lemma 2.** *Let* max*n*¯ sup*x*¯ | *fin*¯(*x*¯)| ≤ *Ci,* ∑ *i*≥1 *Ci pi* < ∞*, and*

$$\sum\_{i\geq 1} \mathbb{E} |f\_{i\mathbb{H}}(\boldsymbol{\pi}\_{i\mathbb{H}})| < \infty \,\,\,\forall \boldsymbol{\pi}. \tag{14}$$

*Then the relations* lim*n*¯→<sup>∞</sup> <sup>|</sup>**E**Φ*f*(*Vn*¯)<sup>|</sup> <sup>=</sup> <sup>∞</sup> *and* lim*n*¯→<sup>∞</sup> <sup>|</sup>**E**Φ*f*(Π*n*¯)<sup>|</sup> <sup>=</sup> <sup>∞</sup> *are equivalent. In the case of infinite limits,*

$$\mathbb{E}\Phi\_f(\overline{V}\_{\vec{n}}) \sim \mathbb{E}\Phi\_f(\overline{\Pi}\_{\vec{n}}) \text{ as } \vec{n} \to \infty.$$

**Remark 3.** *For functionals of the form (8) in an infinite multinomial scheme, the conditions of Lemma 2 are typical. Let m* = 1 *and B* := {*j* : *j* > *k*} *for any k* ≥ 0*. Then*

$$\lim\_{n \to \infty} \mathbf{E} \Phi\_f(V\_n) = \lim\_{n \to \infty} \sum\_{i \ge 1} \mathbf{P}(\nu\_{in} > k) = \infty$$

*since, by virtue of the law of large numbers,* lim*n*→<sup>∞</sup> **<sup>P</sup>**(*νin* <sup>&</sup>gt; *<sup>k</sup>*) <sup>→</sup> <sup>1</sup> *for every fixed i. Moreover, in the case under consideration, obviously,* **E**Φ*f*(*Vn*) ≤ *n. Similarly, without any restrictions on the probabilities* {*pi*}*, the infinite limits in Lemma 2 for functionals of the form (8) (and even more so for (11)) also hold for the set B consisting of all odd natural numbers. Here the limit relation* lim*n*→<sup>∞</sup> **<sup>E</sup>**Φ*f*(Π*n*¯) <sup>≡</sup> lim*n*→<sup>∞</sup> <sup>∑</sup> *i ge*1 **P**(*πin* ∈ *B*) = ∞ *follows immediately from the equality* **P**(*πin* ∈ *B*) = <sup>1</sup> <sup>2</sup> (<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−2*npi*)*.*

It is also worth noting that for some sets *B* the main contribution to the limit behavior of the series ∑ *i*≥1 **P**(*πin* ∈ *B*) can be made not only by their initial segments but also tails. For example, this will be the case for any one-point sets *Bk* := {*k*} for *k* > 0 if the group probabilities are given as *pi* = *Ci*−1−*<sup>b</sup>* or *pi* = *ce*−*Co <sup>i</sup> α* for some constants *c*, *C*, *Co*, *b* > 0 and *α* ∈ (0, 1). In this case, for any subset *B* of natural numbers in the definition of the functionals (8) and (11), the expectation limits indicated in Lemma 2 will be infinite (see Section 7 and [9,12]). On the other hand, if *pi* = *ce*−*Co <sup>i</sup>* , then for any one-point set the expectations mentioned will be bounded uniformly in *n* (see Section 7 and [9,12]). For more complex functionals with kernels (9) or (10) for the above-mentioned distributions {*pi*}, one can find sufficiently broad conditions that ensure unbounded increase in their expectations and variances as *n*¯ → ∞ for almost all vectors (*a*1, ... , *ar*) ∈ R*<sup>r</sup>* (see Section 7).

Now we present one of the corollaries of Theorem 3, namely, the law of large numbers for the additive functionals under consideration, setting in this theorem Δ*n*¯ := (**E**Φ*f*(Π*n*¯))<sup>−</sup>1, *Mn*¯ := 0, and *γ* := 1.

**Corollary 2.** *Let the conditions of Lemma 2 be fulfilled. If* |**E**Φ*f*(Π*n*¯)| → ∞ *as n*¯ → ∞ *then the following criterion holds*:

$$\frac{\Phi\_f(\overline{V}\_n)}{\mathbf{E}\Phi\_f(\overline{V}\_n)} \stackrel{p}{\longrightarrow} 1 \quad \text{iff} \quad \frac{\Phi\_f(\overline{\Pi}\_n)}{\mathbf{E}\Phi\_f(\overline{\Pi}\_n)} \stackrel{p}{\longrightarrow} 1;$$

*in this case, the normalizations* **E**Φ*f*(*Vn*¯) *and* **E**Φ*f*(Π*n*¯) *can be swapped.*

**Remark 4.** *In consideration of Chebyshev's inequality, a sufficient condition for the limit relations in Corollary 2 is as follows:*

$$\frac{\sum\_{i\geq 1} \mathbf{D} f\_{i\boldsymbol{n}}(\boldsymbol{\pi}\_{i\boldsymbol{n}})}{\left(\sum\_{i\geq 1} \mathbf{E} f\_{i\boldsymbol{n}}(\boldsymbol{\pi}\_{i\boldsymbol{n}})\right)^{2}} \to 0.$$

*For example, let fin*¯(·) ≥ 0 *and* sup *x*¯,*i*,*n*¯ *fin*¯(*x*¯) ≤ *C*0*. Then* **D** *fin*¯(*π*¯*in*¯) ≤ *C*0**E** *fin*¯(*π*¯*in*¯) *and*

$$\frac{\sum\_{i\geq 1} \mathbf{D} f\_{i\boldsymbol{n}}(\boldsymbol{\pi}\_{i\boldsymbol{n}})}{\left(\sum\_{i\geq 1} \mathbf{E} f\_{i\boldsymbol{n}}(\boldsymbol{\pi}\_{i\boldsymbol{n}})\right)^{2}} \leq \mathbf{C}\_{0} \left| \sum\_{i\geq 1} \mathbf{E} f\_{i\boldsymbol{n}}(\boldsymbol{\pi}\_{i\boldsymbol{n}}) \right|^{-1} \to 0.$$

*In particular, this estimate is valid in the case fin*¯(*x*¯) ≡ *f*(*x*¯) := *IB*(*x*¯)*, with* 0 ∈/ *B, if only* **E**Φ*f*(Π*n*¯) = ∑ *i*≥1 **P**(*π*¯*in*¯ ∈ *B*) → ∞*.*

We now formulate an analog of Lemma 2 for the variances of the functionals under consideration.

**Lemma 3.** *Under the conditions* max*n*¯ sup*x*¯ | *fin*¯(*x*¯)| ≤ *Ci* ∀*i and* ∑ *i*≥1 *C*2 *<sup>i</sup> pi* < ∞ *the limit relation* lim*n*¯→<sup>∞</sup> **<sup>D</sup>**Φ*f*(*Vn*¯) = <sup>∞</sup> *holds if and only if* lim*n*¯→<sup>∞</sup> **<sup>D</sup>**Φ*f*(Π*n*¯) = <sup>∞</sup>*. In the case of infinite limit the following equivalence is valid:* **D**Φ*f*(*Vn*¯) ∼ **D**Φ*f*(Π*n*¯) *as n*¯ → ∞*.*

Lemma 3 and Theorem 3 imply the following important criterion, which allows us to reduce proving the central limit theorem for additive functionals Φ*f*(*Vn*¯) to proving the same assertion for the Poissonian version Φ*f*(Π*n*¯).

**Corollary 3.** *Under the conditions of Lemma 3 and* **D**Φ*f*(Π*n*¯) → ∞ *as n*¯ → ∞ *the limit relation*

$$\mathcal{L}\left(\frac{\Phi\_f(\overline{V}\_{\mathfrak{A}}) - \mathbf{E}\Phi\_f(\overline{V}\_{\mathfrak{A}})}{\mathbf{D}^{1/2}\Phi\_f(\overline{V}\_{\mathfrak{A}})}\right) \Longrightarrow N(0,1) \quad \text{as} \ \vec{n} \to \infty,$$

*is valid if, and only if,*

$$\mathcal{L}\left(\frac{\Phi\_f(\overline{\Pi}\_{\tilde{n}}) - \mathbb{E}\Phi\_f(\overline{\Pi}\_{\tilde{n}})}{\mathbf{D}^{1/2}\Phi\_f(\overline{\Pi}\_{\tilde{n}})}\right) \Longrightarrow \mathcal{N}(0,1) \quad \text{as } \tilde{n} \to \infty,$$

*where* N (0, 1) *is the standard normal distribution. In this case, the normalizing and centering sequences in these two limit relations can be, respectively, swapped.*

In order to prove this corollary we should put in Theorem 3 Δ*n*¯ := **D**−1/2Φ*f*(Π*n*¯), *Mn*¯ := **E**Φ*f*(*Vn*¯)**D**−1/2Φ*f*(Π*n*¯), and L(*γ*) := N (0, 1). In this case, Lemma 3 allows us only to replace the normalizing and centering sequences in Theorem 3 with some equivalent sequences.

**Remark 5.** *The validity of the central limit theorem for the sequence* Φ*f*(Π*n*¯) *in Theorem 3 will be justified if, say, the third-order Lyapunov condition is met:*

$$\frac{\sum\_{i\geq 1} \mathbb{E}|f\_{in}(\boldsymbol{\pi}\_{in}) - \mathbb{E}f\_{in}(\boldsymbol{\pi}\_{in})|^3}{\left(\sum\_{i\geq 1} \mathbb{D}f\_{in}(\boldsymbol{\pi}\_{in})\right)^{3/2}} \to 0 \quad \text{as} \ \bar{n} \to \infty.$$

*For example, let* sup | *fin*¯(*x*¯)| ≤ *C*0*. Then it is easy to see that*

*x*¯,*i*,*n*¯

$$\sum\_{i\geq 1} \mathbb{E} |f\_{i\mathbb{n}}(\boldsymbol{\pi}\_{i\mathbb{n}}) - \mathbb{E} f\_{i\mathbb{n}}(\boldsymbol{\pi}\_{i\mathbb{n}})|^{\mathfrak{Z}} \leq 2\mathbb{C}\_{0} \sum\_{i\geq 1} \mathbf{D} f\_{i\mathbb{n}}(\boldsymbol{\pi}\_{i\mathbb{n}}).$$

*Thus, if* **D**Φ*f*(Π*n*¯) → ∞*as n*¯ → ∞*, then the Lyapunov condition will be met and the approval of the above investigation will take place. So an important special case fin*¯(*x*¯) := *IB*(*x*¯) *is included in the scheme at issue if*

$$\mathbf{D}\Phi\_{I\_{\mathcal{B}}}(\overline{\Pi}\_{\mathcal{H}}) = \sum\_{i \ge 1} \mathbf{P}(\bar{\pi}\_{i\mathcal{H}} \in B) \left(1 - \mathbf{P}(\bar{\pi}\_{i\mathcal{H}} \in B)\right) \to \infty \text{ as } \; \bar{\pi} \to \infty.$$

*Note that examples for which the specified variance property takes place or is violated are given, for example, in [9].*

*Finally, here is another consequence of Theorem 3, relating to the asymptotic behavior of χ*2*-statistics in (7) at m* = 1 *and N* ≡ *N*(*n*) → ∞*. First of all, note that*

$$\mathsf{E}\Phi\_{\chi^2}(\Pi\_n) = N\_\star$$

$$D\_{\mathbb{H}} := \mathbf{D} \Phi\_{\chi^2} (\Pi\_{\mathbb{H}}) = 2N + \sum\_{i=1}^{N} \frac{1}{np\_i}.$$

**Corollary 4.** *Let N* ≡ *N*(*n*) → ∞ *as n* → ∞*. Then the following two asymptotic relations are equivalent:*

$$\mathcal{L}\left(\frac{\Phi\_{\chi^2}(V\_n) - N}{D\_n^{1/2}}\right) \Longrightarrow \mathcal{N}(0, 1),\tag{15}$$

$$\mathcal{L}\left(\frac{\Phi\_{\chi^2}(\Pi\_n) - N}{D\_n^{1/2}}\right) \Longrightarrow \mathcal{N}(0, 1). \tag{16}$$

Note that in the present case, the requirement of Lemma 1 is met, since each term (*νin*−*npi*)<sup>2</sup> *npi* (as a sequence of *n*) is bounded in probability due to Markov's inequality, and therefore, with the normalizing sequence Δ*<sup>n</sup>* := *D*−<sup>1</sup> *<sup>n</sup>* , this term will tend to zero in probability as *n* → ∞.

**Remark 6.** *In the relations (15) and (16) we can say just about the double limit when N*, *n* → ∞ *because this assertion is missing restrictions on the rate of increase in the sequence N*(*n*)*. The proposed formulation in Corollary 4, equivalent to the one just mentioned, is more convenient to refer to Theorem 3. Note that the centering sequence En can be replaced with its equivalent sequence* **E**Φ*χ*<sup>2</sup> (*Vn*) = *N* − 1*. Replacement in the normalization in (15) the variance Dn with the variance of the χ*2*-statistic itself, i.e., by the term (for example, see [16])*

$$\mathbf{D}\Phi\_{\chi^2}(V\_n) = 2N + \frac{1}{N} \sum\_{i=1}^{N} \frac{1}{np\_i} - \frac{3N-2}{n} \lambda\_i$$

*is possible only if these two variances are equivalent. For example, this would be the case if* min*i*≤*<sup>N</sup> npi* → ∞*. This means that the growth rate of the sequence N* ≡ *N*(*n*) *is subject to appropriate constraints, which is not the case in the above consequence. So, in this assertion we can talk about a double limit as n*, *N* → ∞*.*

The formulated criterion allows us to establish a fairly general sufficient condition for the asymptotic normality of *χ*2-statistics with an increasing number of groups.

**Theorem 4.** *Let N* ≡ *N*(*n*) → ∞ *as n* → ∞*. Then the asymptotic relation* (15) *is valid if*

$$\frac{\sum\_{i=1}^{N} (np\_i)^{-2}}{\left(N + \sum\_{i=1}^{N} (np\_i)^{-1}\right)^{3/2}} \longrightarrow 0 \tag{17}$$

*as n* → ∞*.*

The problem of finding more or less broad sufficient conditions for asymptotic normality *χ*2-statistics with a growing number of groups were studied by many authors in the second half of the last century (for example, see [3–5,16–18]). Note that all known sufficient conditions for the above weak convergence imply fulfillment of the asymptotic relation (17). For example, the condition min*i*≤*<sup>N</sup> npi* → ∞ along with *N* → ∞ (see [17,18]), obviously immediately entails relation (17). It is equally obvious that the requirement of the so-called regularity of multinomial models (see [3–5]), i.e.,

$$0 < c\_1 \le \min\_{i \le N} N p\_{i\prime} \max\_{i \le N} N p\_i < c\_2 < \infty,$$

where the constants *c*<sup>1</sup> and *c*<sup>2</sup> are independent of *N*, also implies (17). On the other hand, it is easy to construct examples in which the regularity requirement of the multinomial model is violated but relation (17) is valid. For example, let *pi* := *CNi* <sup>−</sup>1−*b*, *i* = 1, ... , *N*, where *b* > 0 and *CN* := ∑*i*≤*<sup>N</sup> i* −1−*b* <sup>−</sup><sup>1</sup> . It is easy to see that, as *N* → ∞, the sums ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>p</sup>*−<sup>2</sup> *<sup>i</sup>* and ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>p</sup>*−<sup>1</sup> *<sup>i</sup>* increase as *<sup>N</sup>*3+2*<sup>b</sup>* and *<sup>N</sup>*2+*b*, respectively. Therefore, as *<sup>n</sup>*, *<sup>N</sup>* <sup>→</sup> <sup>∞</sup>, the ratio in (17) is equivalent to

$$\frac{N^{3+2b}}{\sqrt{n}(N^{2+b})^{3/2}} = \frac{N^{b/2}}{\sqrt{n}}$$

up to a constant factor. So, here we already need to measure the growth rate *N* with *n*. Obviously, in this case, in order to fulfill condition (17), you need to require that *N* = *o*(*n*1/*b*). If the probabilities *pi* decrease exponentially then the growth rate zone for *N* narrows to *o*(log *n*). It is worth to note that for the above-mentioned power-type probabilities at issue the condition min*i*≤*<sup>N</sup> npi* → ∞ implies the asymptotic relation *N* = *o*(*n*1/(*b*+1)) that is more restrictive than the above constraint.

#### **6. Probability and Moment Inequalities**

The next theorem is related to estimation of the distribution tails of additive functionals.

$$\begin{aligned} \text{Theorem 5. } \text{ Let } f\_{\vec{n}}(\cdot) \ge 0 \text{ for all i. Then,} \\ \mathbf{P}(\Phi\_f(\overline{\nabla}\_{\vec{n}}) \ge \mathbf{x}) \le 2\mathbf{C}^\* \mathbf{P}(\Phi\_f(\overline{\Pi}\_{\vec{n}}) \ge \mathbf{x}/2), \end{aligned} \tag{18}$$

$$where \ C^\* := \min\_{j \ge 1} \max \{ (\sum\_{i \le j} p\_i)^{-1}, (\sum\_{i > j} p\_i)^{-1} \}. If additionally \ \operatorname{supp}\_x f\_{10}(\mathbf{x}) \le c\_0 \, then$$

$$\mathbf{P}(\Phi\_f(\overline{V}\_{\tilde{n}}) \ge x) \le p\_1^{-1} \mathbf{P}(\Phi\_f(\overline{\Pi}\_{\tilde{n}}) \ge x - c\_0). \tag{19}$$

**Remark 7.** *In (19), the constant c*<sup>0</sup> *may depend on n*¯*. What is more, we can use the truncation of the random variable f*1*n*¯(*νin*¯) *at the level c*0*, while adding to the right-hand side of inequality (19) the probability* **P**(*f*1*n*¯(*νin*¯) > *c*0)*.*

**Corollary 5.** *Under the conditions of Theorem 5, let F be a continuous nondecreasing function defined on* R+*, with F*(0) = 0*. If* **E***F*(2Φ*f*(Π*n*¯)) < ∞ *then*

$$\mathbb{E}F(\Phi\_f(\nabla\_n)) \le 2\mathbb{C}^\* \mathbb{E}F(2\Phi\_f(\Pi\_n)).\tag{20}$$

As an example, consider the functional Φ*IB* (*Vn*¯) defined in (8). Then, as a consequence of (19) and Chernoff's upper bound [19] for the distribution tail of a sum of independent nonidentically distributed Bernoulli random variables (the transition from finite sums to series in this case is obvious), we obtain the following result.

**Corollary 6.** *Put Mn*(*B*) := **<sup>E</sup>**Φ*IB* (Π*n*¯) = <sup>∑</sup>*i*≥<sup>1</sup> **<sup>P</sup>**(*πin* ∈ *<sup>B</sup>*)*. Then for any <sup>ε</sup>* > (*Mn*(*B*))−<sup>1</sup> *the following inequality holds*:

$$\mathbb{P}\left(\left|\frac{\Phi\_{I\_B}(\nabla\_n)}{M\_n(B)} - 1\right| > \varepsilon\right) \le 2p\_1^{-1}e^{-\frac{\beta^2 M\_0(B)}{2+\delta}},\tag{21}$$

*where δ* := *ε* − <sup>1</sup> *Mn*(*B*) > <sup>0</sup>*.*

**Remark 8.** *one can replace the Poissonian mean Mn*(*B*) *in (21) with the mean* **E**Φ*IB* (*Vn*¯)*, which differs from Mn*(*B*) *by no more than 1 due to Barbour–Hall's estimate of the Poisson approximation to a binomial distribution (see [15,20]). Further, if the condition Mn*(*B*) → ∞ *is met as n* → ∞ *then from (21) we obtain not only the law of large numbers (already formulated in Corollary 2), but at a certain growth rate of the sequence Mn*(*B*)*, the strong law of large numbers (SLLN) (see Section 7). If in the case m* = 1 *we consider the infinite intervals B* ≡ *Bk* := {*i* : *i* > *k*} *for any k* ∈ Z<sup>+</sup> *then the SLLN occurs at any speed of increasing the sequence Mn*(*B*) *to infinity. This follows from estimate (21), the monotonicity of the functions IBk* (*x*)*, and the simple technique in proving SLLN in [9,21].*

#### **7. Asymptotic Analysis of the Means and Variances of Additive Statistics**

In the previous section, it was noted that when proving certain limit theorems for the introduced additive functionals, it is extremely important to have information about the behavior of their means and variances. In this section, for additive statistics (8)–(11), we demonstrate exactly how the asymptotic behavior of these moments is studied. To simplify the notation, we will consider here the case *m* = 1. The subsequent asymptotic analysis is based on the following elementary assertion, which is presented in one way or another in many papers on this topic.

**Lemma 4.** *Let fn*(*x*) *be a sequence of non-negative, integrable, and piecewise monotonic functions defined on* R+*. Suppose that each fn*(*x*) *has M monotonicity intervals, where M is independent of n. Finally, assume that, as n* → ∞*,*

$$\int\_0^\infty f\_n(\mathbf{x})d\mathbf{x} \to \infty, \quad \sup\_{\mathbf{x}\geq 0} f\_n(\mathbf{x}) = o\left(\int\_0^\infty f\_n(\mathbf{x})d\mathbf{x}\right).$$
 Then, as  $n \to \infty,$  
$$\sum\_{j>0} f\_n(j) \sim \int\_0^\infty f\_n(\mathbf{x})d\mathbf{x}.$$

We now give a few examples of calculating the asymptotics we need.

(1) Let *Bk* := {*i* : *i* > *k*} for any *k* ∈ Z+. In Remark 3 it was already noted that *Mn*(*Bk*) → ∞ due to the strong law of large numbers for binomially distributed random variables. However, for specific classes of distributions {*pi*}, one can estimate the growth rate of the sequence {*Mn*(*Bk*)}. For example, let *pi* := *Ci*−1−*b*, where *b* > 0, *i* = 1, 2, .... Then, using Lemma 4 and the well-known connection between the tail of a Poisson distribution and the corresponding gamma distribution, we obtain after integration by parts and a change of the integration variable:

$$\begin{split} M\_n(B\_k) \equiv \sum\_{i\geq 1} \mathbf{P}(\pi\_{in} > k) &= \sum\_{i\geq 1} \gamma\_{k+1,1}(np\_i) \\ &\sim (\mathbb{C}n)^{\frac{1}{1+b}} \int\_0^\infty \gamma\_{k+1,1}(y^{-1-b}) dy = \frac{(\mathbb{C}n)^{\frac{1}{1+b}}}{k!} \Gamma\left(k + \frac{b}{1+b}\right), \end{split} \tag{22}$$

where *<sup>γ</sup>k*+1,1(*z*) :<sup>=</sup> *<sup>z</sup>* 0 *t k k*! *e*−*<sup>t</sup> dt*, Γ(*z*) := ∞ 0 *t <sup>z</sup>*−1*e*−*<sup>t</sup> dt*, *z* > 0, are the distribution function of

the gamma-distribution with parameters (*k* + 1, 1), and the gamma-function, , respectively. For example, if *k* = 0 then the asymptotics of the expectation of the number of nonempty cells is as follows (see [6,9]):

$$M\_n(B\_0) \sim (\mathbb{C}n)^{\frac{1}{1+b}} \int\_0^\infty (1 - e^{-y^{-1-b}}) dy = (\mathbb{C}n)^{\frac{1}{1+b}} \Gamma\left(\frac{b}{1+b}\right). \tag{23}$$

By analogy to the arguments in proving (22), after an appropriate change of the integration variable, we obtain for the one-point sets the following asymptotics:

$$\begin{split} M\_{\text{il}}(\{k\}) &\sim (\text{Cn})^{\frac{1}{1+b}} \int\_{0}^{\infty} \frac{y^{-k(1+b)}}{k!} e^{-y^{-1-b}} dy \\ &= \frac{(\text{Cn})^{\frac{1}{1+b}}}{(1+b)k!} \int\_{0}^{\infty} x^{k-1-\frac{1}{1+b}} e^{-x} dx = \frac{(\text{Cn})^{\frac{1}{1+b}}}{(1+b)k!} \Gamma\left(k - \frac{1}{1+b}\right). \end{split} \tag{24}$$

Thus, from (24) it follows that for *any subset B of the natural series* in the case under consideration of a power-law decrease in {*pi*} the following asymptotic representation is true:

$$M\_n(B) \sim \frac{(Cn)^{\frac{1}{1+b}}}{(1+b)} \sum\_{k \in B} \frac{1}{k!} \Gamma\left(k - \frac{1}{1+b}\right). \tag{25}$$

Note that, due to the countable additivity of the finite measure *Mn*(·) and the relations (22)–(24), the sum (possibly infinite) in (25) will always be finite.

**Remark 9.** *Inequality (21), relation (25), and the Borel–Cantelli lemma guarantee that the strong law of large numbers holds for the sequence* {*Mn*(*B*)} *for any subsets B of the natural series in the case of a power-law decrease in the probabilities* {*pi*}*. Moreover, what has been said and the above asymptotics are also preserved for probabilities of the form pi* := *C*(*i*)*i* <sup>−</sup>1<sup>−</sup>*b, where C*(*x*) *is a slowly varying function under certain minimal constraints (see [9,12]). In this case, in the asymptotic relations (22)–(25) instead of C one should substitute C*(*n*)*.*

Asymptotic behavior of the variances of the functionals Φ*IB* (Π*n*) for some *B* and broad conditions on the rate of decrease in the sequence {*pi*} is given in [9]. Here we only demonstrate how this variance is calculated for *arbitrary* subsets *B* of the natural series under the above conditions on {*pi*}. Analogously with (22) we have for the infinite intervals *Bk*:

$$\begin{split} D\_{\boldsymbol{n}}(B\_{k}) &:= \mathbf{D} \Phi\_{I\_{\overline{k}}}(\overline{\Pi}\_{n}) = \sum\_{i\geq 1} \mathbf{P}(\pi\_{in} > k) - \sum\_{i\geq 1} \mathbf{P}^{2}(\pi\_{in} > k) \\ &= \sum\_{i\geq 1} \gamma\_{k+1,1}(np\_{i}) - \sum\_{i\geq 1} \gamma\_{k+1,1}^{2}(np\_{i}) \sim (\mathbb{C}n)^{\frac{1}{1+b}} \int\_{0}^{\infty} \left( \gamma\_{k+1,1}(y^{-1-b}) - \gamma\_{k+1,1}^{2}(y^{-1-b}) \right) dy. \end{split} \tag{26}$$

Similarly to proving (24), we derive the asymptotics of the variance for the one-point sets:

$$\begin{split} D\_n(\{k\}) &= \sum\_{i\geq 1} \mathbf{P}(\pi\_{in} = k) - \sum\_{i\geq 1} \mathbf{P}^2(\pi\_{in} = k) \\ &= \frac{(\mathbb{C}n)^{\frac{1}{1+\delta}}}{(1+b)} \left( \int\_0^\infty \frac{1}{k!} x^{k-1-\frac{1}{1+\delta}} e^{-x} dx - \int\_0^\infty \frac{1}{(k!)^2} x^{2k-1-\frac{1}{1+\delta}} e^{-2x} dx \right) \\ &= \frac{(\mathbb{C}n)^{\frac{1}{1+\delta}}}{(1+b)k!} \left( \Gamma \left( k - \frac{1}{1+b} \right) - \frac{2^{\frac{1}{1+\delta}} - 2k}{k!} \Gamma \left( 2k - \frac{1}{1+b} \right) \right) . \end{split} \tag{27}$$

Although the set function *Dn*(·) is not additive, the extension to arbitrary subsets *B* of the natural series of computing the asymptotics of *Dn*(*B*) presents no difficulty. Along with formula (25), which gives one term in the resulting asymptotics, we use the following representation for the second sum:

$$\begin{split} \sum\_{i\geq 1} \mathbf{P}^2(\pi\_{in} \in B) \sim \frac{(\mathbb{C}n)^{\frac{1}{1+b}}}{1+b} \int\_0^\infty \left(\sum\_{k\in B} \frac{\mathbf{x}^k}{k!} \right)^2 \mathbf{x}^{-1-\frac{1}{1+b}} e^{-2x} d\mathbf{x} \\ &= \frac{(\mathbb{C}n)^{\frac{1}{1+b}}}{1+b} \sum\_{k,l\in B} \frac{2^{\frac{1}{1+b}-k-l}}{k!l!} \Gamma\left(k+l-\frac{1}{1+b}\right). \end{split} \tag{28}$$

Thus, the difference between the right-hand sides of (25) and (28) determines the asymptotic of *Dn*(*B*) for any subset of the natural series.

(2) The asymptotics of the first two moments for the functionals (10) for pairwise disjoint sets {*Bj*} is derived in exactly the same way. In the case of one-point sets *Bj* := {*kj*}, the asymptotic behavior of the first moment immediately follows from the previous calculations. As for the variance, we should first note that, due to the orthogonality of the indicator random variables under consideration, we have

$$\mathbf{D}\sum\_{s=1}^{r}a\_{s}I\_{\mathrm{B}\_{s}}(\pi\_{\mathrm{in}})=\sum\_{s=1}^{r}a\_{s}^{2}\mathbf{P}(\pi\_{\mathrm{in}}=k\_{s})-\left(\sum\_{s=1}^{r}a\_{s}\mathbf{P}(\pi\_{\mathrm{in}}=k\_{s})\right)^{2}$$

$$=\sum\_{s=1}^{r}a\_{s}^{2}\mathbf{P}(\pi\_{\mathrm{in}}=k\_{s})-\sum\_{j,s=1}^{r}a\_{s}a\_{j}\mathbf{P}(\pi\_{\mathrm{in}}=k\_{s})\mathbf{P}(\pi\_{\mathrm{in}}=k\_{j}).$$

Summation over *i* of the resulting expression and the previous calculations give the desired asymptotics:

$$\mathbf{D}\Phi\_f(\Pi\_n) \sim \frac{(\mathbb{C}n)^{\frac{1}{1+b}}}{b+1} \sum\_{s,j=1}^r \left[ \frac{a\_s^2}{r k\_s!} \Gamma\left(k\_s - \frac{1}{b+1}\right) - \frac{2^{\frac{1}{b+1} - k\_s - k\_j} a\_s a\_j}{k\_s! k\_j!} \Gamma\left(k\_s + k\_j - \frac{1}{b+1}\right) \right].$$

We note the resulting representation can vanish on the set of vectors (*a*1, ... , *ar*) of zero Lebesgue measure in R*<sup>r</sup>* , i.e., on the surface defined by the relation *<sup>r</sup>* ∑ *s*,*j*=1 *Bs*,*jasaj* = 0 for

some coefficients {*Bs*,*j*}.

For infinite intervals of the form *Bj* := {*i* : *i* > *kj*}, the variance is studied in a similar way. We assume without loss of generality that *k*<sup>1</sup> ≤ *k*<sup>2</sup> ≤ ... ≤ *kr*. To calculate the variance of this functional, it suffices for us to restrict ourselves to the second moment, since the asymptotics of the first one has already been studied. We have

$$\begin{split} \mathbb{E}\left(\sum\_{s=1}^{r} a\_{s} I(\pi\_{in} > k\_{s})\right)^{2} &= \sum\_{s=1}^{r} a\_{s}^{2} \mathbb{P}(\pi\_{in} > k\_{s}) + 2 \mathbb{E} \sum\_{j=1}^{r-1} a\_{j} I\left(\pi\_{in} > k\_{j}\right) \sum\_{s>j}^{r} a\_{s} I\left(\pi\_{in} > k\_{s}\right) \\ &= \sum\_{s=1}^{r} a\_{s}^{2} \mathbb{P}(\pi\_{in} > k\_{s}) + 2 \mathbb{E} \sum\_{j=1}^{r-1} a\_{j} \sum\_{s>j}^{r} a\_{s} I\left(\pi\_{in} > k\_{s}\right) \\ &= \sum\_{s=1}^{r} a\_{s}^{2} \mathbb{P}(\pi\_{in} > k\_{s}) + 2 \sum\_{j=1}^{r-1} a\_{j} \sum\_{s>j}^{r} a\_{s} \mathbb{P}(\pi\_{in} > k\_{s}). \end{split}$$

Further calculations in essence have already been made earlier. So, finally we obtain

$$\mathbf{D}\Phi\_f(\Pi\_\hbar) \sim (\mathbb{C}n)^{\frac{1}{1+b}} \sum\_{s,j=1}^r \left[ \frac{a\_s^2}{r} \int\_0^\infty \Gamma\_{k\_i+1,1}(v^{-1-b}) dv - a\_s a\_j \int\_0^\infty \Gamma\_{k\_i+1,1}(v^{-1-b}) \Gamma\_{k\_j+1,1}(v^{-1-b}) dv \right]$$

with comments similar to the above regarding the zeroing of the double sum.

To conclude this section, we give an example where the above-mentioned moments of the functional under consideration do not tend to infinity as *n* grows. We put *pj* = *e*−*Cj*, with *C* := log 2. Let us show that

$$\sup\_{n} \sum\_{j \ge 1} \mathbf{P}(\pi\_{nj} = k) < \infty.$$

This estimate obviously implies that the first two moments of the functional Φ*IB* (Π*n*) are uniformly bounded in *n* for *B* := {*k*}. Indeed, one has

$$\sum\_{j\geq 1} \mathbf{P}(\pi\_{nj} = k) = \frac{n^k}{k!} \sum\_{j\geq 1} e^{-n\varepsilon^{-Cj}} e^{-Ckj} \leq \frac{\varepsilon^{\mathbf{Ck}} n^k}{k!} \int\_1^{\infty} e^{-n\varepsilon^{-\mathbf{Ck}}} e^{-\mathbf{Ck}x} d\mathbf{x};$$

$$= \frac{\varepsilon^{\mathbf{Ck}} n^k}{\mathbf{Ck}!} \int\_0^{\varepsilon^{-\mathbf{C}}} e^{-nt} t^{k-1} dt = \frac{\varepsilon^{\mathbf{Ck}}}{\mathbf{Ck}!} \int\_0^{n\varepsilon^{-\mathbf{C}}} e^{-u} u^{k-1} du;$$

here we used the estimate *e*−*ne*−*Cj e*−*Ckj* ≤ *eCke*<sup>−</sup>*ne*−*Cx e*−*Ckx* for all *x* ∈ [*j*, *j* + 1], also representing the integral over the semiaxis [0, ∞) as a series of integrals over the indicated segments of unit length. If *n* → ∞ then the integral in the last expression converges monotonically to the quantity Γ(*k*), which proves our assertion. Note also that a similar example is given in [9].

#### **8. Proofs**

**Proof of Theorem 1.** The assertion of the theorem is essentially a consequence of some results from [1,2,22,23] . First we introduce the necessary notation and recall the assertions from [22,23] we need.

Let {*Yi*} be a sequence of independent identically distributed random elements taking values in a measurable Abelian group (G, A) with measurable operation «+». Assume that the zero (neutral) element 0, as a one-point set, belongs to *σ*-algebra A and *p* := **P** (*Y*<sup>1</sup> = 0) ∈ (0, 1). Denote by {*Y*<sup>0</sup> *<sup>i</sup>* } a sequence of independent identically distributed random variables with marginal distribution

$$
\mathcal{L}(Y\_1^0) = \mathcal{L}(Y\_1 | Y\_1 \neq 0),
$$

and also put *Sn* := Σ*<sup>n</sup> <sup>i</sup>*=1*Yi* and *<sup>S</sup>*<sup>0</sup> *<sup>n</sup>* := Σ*<sup>n</sup> i*=1*Y*<sup>0</sup> *<sup>i</sup>* . In [1,2,22], the following assertion was obtained.

**Lemma 5.** *For any natural n, the following representations are valid*:

$$
\mathcal{L}(S\_n) = \mathcal{L}(S^0\_{\nu(n,p)}), \ \mathcal{L}(S\_{\pi(n)}) = \mathcal{L}(S^0\_{\pi(np)}), \tag{29}
$$

*where* L(*ν*(*n*, *p*)) ≡ *Bn*,*p*, *is the binomial distribution with parameters n and p, π*(*t*) *is a standard Poisson process*; *wherein the pair* (*ν*(*n*, *p*), *π*(*np*)) *does not depend on the sequence* {*X*<sup>0</sup> *i* }.

The second important assertion gives an estimate for the Radon–Nikodim derivative of the binomial distribution with respect to the accompanying Poisson law (see [23]).

**Lemma 6.** *For all p* ∈ (0, 1) *and natural n, the following estimate holds:*

$$\sup\_{k\geq 0} \frac{B\_{n,p}(k)}{\mathcal{L}(\pi(np))(k)} \leq \frac{1}{1-p}.\tag{30}$$

**Remark 10.** *There are other estimates for this Radon–Nikodim derivative. For example, in [24], it was established that Bn*,*p*(*k*)

$$\sup\_{k \ge 0} \frac{B\_{n,p}(k)}{\mathcal{L}(\pi(np))(k)} \le \frac{2}{\sqrt{1-p}}$$

*for any n and p* ∈ (0, 1)*. Note that for p* ≥ 3/4 *this estimate is more accurate than (30).*

It is clear that it is enough to prove the assertion for *m* = 1. A proof of the general case is carried out by induction on *m* and immediately follows from the total probability formula and an estimate for the conditional probability when *m* − 1 coordinates of the vector *Vn*¯ are fixed. From (29) and (30) and the total probability formula (when the sequence {*Y*<sup>0</sup> *<sup>i</sup>* } is fixed) we obtain the inequality

$$
\mathcal{L}(S\_n) \le \frac{1}{1-p} \mathcal{L}(S\_{\pi(n)}).\tag{31}
$$

Now we put *Yi* :<sup>=</sup> *IA*(*X*(1) *<sup>i</sup>* ), *A* ∈ A0, where A<sup>0</sup> is defined in (1). Consider the Abelian group

$$\mathcal{G} := \left\{ \sum\_{i=1}^k e\_i I\_A(z\_i) \; ; \; A \in \mathcal{A}\_0 ; \; \forall k \ge 1 \; \forall z\_i \in \mathfrak{X} \; \forall e\_i = -1, 1 \right\}$$

and equip this group with the cylindric *σ*-algebra. It is clear that *Yi* ∈ G and the following is true: **P**(*Y*<sup>1</sup> = 0) = *P*(*A*0) = *p* ∈ (0, 1). So, inequality (2) follows from (31) and the above-mentioned induction on *m*.

**Proof of Theorem 2.** We will carry out our reasoning in the generality and notation of the proof of Theorem 1. Both relations (29) will be the basis of construction where the sequence {*Y*<sup>0</sup> *<sup>i</sup>* } is assumed to be the same in constructing the sums *<sup>S</sup>*<sup>0</sup> *<sup>n</sup>* and *S*<sup>0</sup> *<sup>π</sup>*(*n*) on a common probability space. So, to prove the first two assertion of the theorem, we only need to construct on the common probability space the random variables *ν*(*n*, *p*) and *πnp* so that they would be as close as possible to each other. The resulting probability space will be the direct product of the two probability spaces where are, respectively, defined the sequence of independent identically distributed random variables {*Y*<sup>0</sup> *<sup>i</sup>* } and the above-mentioned pair of scalar indices. For the optimal definition of random indices *ν*(*n*, *p*) and *πnp* on a common probability space, we use Dobrushin's theorem (see [25]), which guarantees the existence of marginal copies *ν*∗(*n*, *p*) and *π*<sup>∗</sup> *np* of the mentioned random indices defined on a common probability space so that

$$\mathbf{P}(\nu^\*(n, p) \neq \pi\_{np}^\*) = d\_{TV}(\mathcal{L}(\nu(n, p), \mathcal{L}(\pi\_{np})), \tag{32}$$

where *dTV*(·, ·) is the total variation distance between distributions. Now we use the well-known estimate of Poisson approximation to a binomial distribution (see [15,20]):

$$d\_{TV}(\mathcal{L}(\upsilon(n,p), \mathcal{L}(\pi\_{np})) \le p \land (np^2) \le p. \tag{33}$$

Using the described construction to each of the *m* independent coordinates of the vector point processes under consideration, we easily obtain from (32) and (33) the assertion of the theorem.

**Proof of Lemma 1.** Fix a multi-index *n*¯. Let us assume that the point processes *Vn*¯ and Π*n*¯ are defined on the same probability space in one way or another. Then for any natural *k* we have the estimate

$$|\Phi\_f(\nabla\_{\hbar}) - \Phi\_f(\overline{\Pi}\_{\hbar})| \le \sum\_{i \ge k} |f\_{i\hbar}(\overline{v}\_{i\hbar}) - f\_{i\hbar}(\overline{\pi}\_{i\hbar})| + \zeta\_{k\bar{n}}.\tag{34}$$

where *ζkn*¯ := ∑ *i*<*k* | *fin*¯(*ν*¯*in*¯)| + ∑ *i*<*k* | *fin*¯(*π*¯*in*¯)|. Put *A*<sup>0</sup> := ! *i*≥*k* Δ*i*, *p*(*k*) := **P**(*A*0) = ∑ *i*≥*k pi*. Note that the tail of the series on the right-hand side of inequality (34) is a functional of the A0-restrictions of the studied vector point processes defined on common probability space. So we can use Theorem 2, which guarantees the existence of an absolute coupling (depending on *k*) of the mentioned A0-restrictions with the following lower bound for the coincidence probability (see (4); here, in order not to clutter up the notation, we omit the upper symbol «\*»):

$$\mathbf{P}\begin{pmatrix}\begin{pmatrix}\boldsymbol{\nu}\_{n\boldsymbol{k}\boldsymbol{\ell}^{\star}}^{(1)}\boldsymbol{\nu}\_{n\boldsymbol{k}+1}^{(1)}\ldots \\ \boldsymbol{\nu}\_{n\boldsymbol{k}\boldsymbol{\ell}^{\star}}^{(2)}\boldsymbol{\nu}\_{n\boldsymbol{k}+1}^{(2)}\ldots \end{pmatrix} = \begin{pmatrix}\boldsymbol{\pi}\_{n\boldsymbol{k}\boldsymbol{\ell}^{\star}}^{(1)}\boldsymbol{\pi}\_{n\boldsymbol{k}+1}^{(1)}\ldots \\ \boldsymbol{\pi}\_{n\boldsymbol{k}\boldsymbol{k}^{\star}}^{(2)}\boldsymbol{\pi}\_{n\boldsymbol{k}+1}^{(2)}\ldots \end{pmatrix} \\ \vdots \\ \boldsymbol{\pi}\_{n\boldsymbol{k}^{\star}}^{(m)}\boldsymbol{\nu}\_{n\boldsymbol{k}+1}^{(m)}\ldots) = \left(\boldsymbol{\pi}\_{n\_{m}\boldsymbol{k}^{\star}}^{(m)}\boldsymbol{\pi}\_{n\_{m}\boldsymbol{k}+1}^{(m)}\ldots\right) \\\\ \mathbf{P}\left(\sup\_{\boldsymbol{\Delta}\_{\boldsymbol{j}^{\star}},\boldsymbol{j}\geq k} \left\|\boldsymbol{V}\_{n}^{0}(\boldsymbol{\Delta}\_{\boldsymbol{j}^{\star}},\ldots,\boldsymbol{\Delta}\_{\boldsymbol{j}}) - \boldsymbol{\Pi}\_{n}^{0}(\boldsymbol{\Delta}\_{\boldsymbol{j}^{\star}},\ldots,\boldsymbol{\Delta}\_{\boldsymbol{j}})\right\| = 0\right) \geq (1-p(k))^{m}. \end{pmatrix}$$

Hence, the coupling method of Theorem 2 vanishes the first term on the right-hand side of (34) with a probability no less than (1 − *p*(*k*))*m*.

Further, by virtue of estimate (2) we conclude that L(*ν*¯*in*¯)) ≤ <sup>1</sup> (1−*pi*)*<sup>m</sup>* <sup>L</sup>(*π*¯*in*¯) for any *<sup>i</sup>*. Therefore, by virtue of the conditions of the theorem, we have <sup>Δ</sup>*n*¯ *fin*¯(*νin*¯) *<sup>p</sup>* → 0 for any *i* for *n*¯ → ∞. So, for any given (obviously, such construction exists) random variable *ζkn*¯ on the same probability space with the A0-restrictions of the point processes mentioned above, there is the relation Δ*n*¯ *ζkn*¯ *p* → 0 for *n*¯ → ∞ for any fixed *k*. Therefore, using the diagonal method, one can choose *k* ≡ *k*(*n*¯) → ∞ for *n*¯ → ∞, for which Δ*n*¯ *ζkn*¯ *p* → 0 as *n*¯ → ∞. After constructing the point processes under consideration on a common probability space by the method of Theorem 2 for each *n*¯ and already chosen *k*(*n*¯) (in this case, obviously, *p*(*k*(*n*)) → 0), the limit relation (13) will hold. Lemma 1 is proved.

**Proof of Theorem 3.** The equivalence of items 1 and 2 directly follows from Lemma 1 and the evident two-sided estimate

$$\mathbf{P}(\boldsymbol{\xi} \le \mathbf{x} - \boldsymbol{\varepsilon}) - \mathbf{P}(|\boldsymbol{\xi} - \boldsymbol{\eta}| > \boldsymbol{\varepsilon}) \le \mathbf{P}(\boldsymbol{\eta} \le \mathbf{x}) \le \mathbf{P}(\boldsymbol{\xi} \le \mathbf{x} + \boldsymbol{\varepsilon}) + \mathbf{P}(|\boldsymbol{\xi} - \boldsymbol{\eta}| > \boldsymbol{\varepsilon})$$

for any *x* ∈ R, *ε* > 0, and arbitrary random variables *ξ* and *η* defined on a common probability space. It remains to put

$$\zeta := \Phi\_f(\overline{V}^\*\_{n,\Delta\_{\mathbb{R}}})\Delta\_{\mathbb{R}} - M\_{\mathbb{R}} \quad \eta := \Phi\_f(\overline{\Pi}^\*\_{n,\Delta\_{\mathbb{R}}})\Delta\_{\mathbb{R}} - M\_{\mathbb{R}}.$$

where the point processes *V*<sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ and <sup>Π</sup><sup>∗</sup> *<sup>n</sup>*¯,Δ*n*¯ are defined in Lemma 1.

We now prove the equivalence of items 2 and 3 of the theorem. To this end we need to reformulate the assertion in Lemma 1 where we substitute Φ<sup>∗</sup> *<sup>f</sup>* for the functional Φ*f*(*Vn*¯). As the resulting probability space in this assertion, we consider the direct product of the probability spaces where *νni* and *πni* are defined by Dobrushin's theorem. We only note that, after such construction,

$$\mathbb{P}(\{\bar{v}\_{in'}^\* \ i \ge k\} \equiv \{\pi\_{in'} \ i \ge k\}) \ge 1 - m \sum\_{i \ge k} p\_i \sim 1$$

if only *k* → ∞. Further, we repeat the corresponding reasoning in the proof of Lemma 1 (using the corresponding analog of (34)) as well as the above-mentioned arguments in proving the equivalence of items 1 and 2.

**Proof of Lemma 2.** We restrict ourselves to the case *m* = 2. For an arbitrary *m*, the assertion can be easily proved by induction on *m* using analogues of the estimates that will be given below. So we have

$$\mathbf{E}\Phi\_f(\overline{V}\_{\mathbb{R}}) = \sum\_{i\geq 1} \sum\_{k\_1, k\_2 \geq 0} f\_{i\mathbb{n}}(k\_1, k\_2) \mathbf{P}(\boldsymbol{\nu}\_{i\mathbb{n}\_1}^{(1)} = k\_1) \mathbf{P}(\boldsymbol{\nu}\_{i\mathbb{n}\_2}^{(2)} = k\_2),$$

$$\mathbf{E}\Phi\_f(\overline{\Pi}\_{\mathbb{R}}) = \sum\_{i\geq 1} \sum\_{k\_1, k\_2 \geq 0} f\_{i\mathbb{n}}(k\_1, k\_2) \mathbf{P}(\boldsymbol{\pi}\_{i\mathbb{n}\_1}^{(1)} = k\_1) \mathbf{P}(\boldsymbol{\pi}\_{i\mathbb{n}\_2}^{(2)} = k\_2);$$

here the introduction of the operator **E** under the summation sign in the second formula is legal due to (14) and Fubini's theorem. Now, estimate the total variation distance between the distributions of the vectors (*ν* (1) *in*<sup>1</sup> , *ν* (2) *in*<sup>2</sup> ) and (*π*(1) *in*<sup>1</sup> , *π*(2) *in*<sup>2</sup> ):

$$\sum\_{k\_1,k\_2\geq0} |\mathbf{P}(\boldsymbol{\nu}^{(1)}\_{in\_1} = k\_1)\mathbf{P}(\boldsymbol{\nu}^{(2)}\_{in\_2} = k\_2) - \mathbf{P}(\boldsymbol{\pi}^{(1)}\_{in\_1} = k\_1)\mathbf{P}(\boldsymbol{\pi}^{(2)}\_{in\_2} = k\_2)|$$

$$\leq \sum\_{k\_1,k\_2\geq0} |\mathbf{P}(\boldsymbol{\nu}^{(1)}\_{in\_1} = k\_1) - \mathbf{P}(\boldsymbol{\pi}^{(1)}\_{in\_1} = k\_1)|\mathbf{P}(\boldsymbol{\nu}^{(2)}\_{in\_2} = k\_2)$$

$$+ \sum\_{k\_1,k\_2\geq0} |\mathbf{P}(\boldsymbol{\nu}^{(2)}\_{in\_2} = k\_2) - \mathbf{P}(\boldsymbol{\pi}^{(2)}\_{in\_2} = k\_2)|\mathbf{P}(\boldsymbol{\pi}^{(1)}\_{in\_1} = k\_1)$$

$$= \sum\_{k\_1\geq0} |\mathbf{P}(\boldsymbol{\nu}^{(1)}\_{in\_1} = k\_1) - \mathbf{P}(\boldsymbol{\pi}^{(1)}\_{in\_1} = k\_1)| + \sum\_{k\_2\geq0} |\mathbf{P}(\boldsymbol{\nu}^{(2)}\_{in\_2} = k\_2) - \mathbf{P}(\boldsymbol{\pi}^{(2)}\_{in\_2} = k\_2)|.$$

We now use once more Barbour–Hall's upper bound (see [15,20]) for the total variation distance between the distributions L *ν* (*j*) *inj* and L *π*(*j*) *inj* :

$$\sum\_{k\_j \ge 0} |\mathbf{P}(\nu\_{i\dot{m}\_j}^{(j)} = k\_j) - \mathbf{P}(\pi\_{i\dot{m}\_j}^{(j)} = k\_j)| < 2p\_{i\prime} \quad j = \overline{1, m}.$$

Then the total variation distance between the distributions of the bivariate vectors under consideration is estimated as follows:

$$\sum\_{k\_1,k\_2\geq 0} |\mathbf{P}(\upsilon\_{in\_1}^{(1)}=k\_1)\mathbf{P}(\upsilon\_{in\_2}^{(2)}=k\_2) - \mathbf{P}(\pi\_{in\_1}^{(1)}=k\_1)\mathbf{P}(\pi\_{in\_2}^{(2)}=k\_2)| \leq 4p\_{\text{i}}.$$

Therefore,

$$\begin{aligned} \sum\_{i\geq 1} \sum\_{k\_1, k\_2 \geq 0} f\_{\hat{m}}(k\_1, k\_2) \mathbf{P}(\nu\_{i\hat{m}\_1}^{(1)} = k\_1) \mathbf{P}(\nu\_{i\hat{m}\_2}^{(2)} = k\_2) \\ - \sum\_{i\geq 1} \sum\_{k\_1, k\_2 \geq 0} f\_{\hat{m}}(k\_1, k\_2) \mathbf{P}(\pi\_{i\hat{m}\_1}^{(1)} = k\_1) \mathbf{P}(\pi\_{i\hat{m}\_2}^{(2)} = k\_2) \left| \\ \leq \sum\_{i\geq 1} \mathbb{C}\_i \sum\_{k\_1, k\_2 \geq 0} \left| \mathbf{P}(\nu\_{i\hat{m}\_1}^{(1)} = k\_1) \mathbf{P}(\nu\_{i\hat{m}\_2}^{(2)} = k\_2) - \mathbf{P}(\pi\_{i\hat{m}\_1}^{(1)} = k\_1) \mathbf{P}(\pi\_{i\hat{m}\_2}^{(2)} = k\_2) \right| \leq 4 \sum\_{i\geq 1} \mathbb{C}\_i p\_i \end{aligned}$$

or

 

$$|\mathbf{E}\Phi\_f(\overline{V}\_{\mathbb{R}}) - \mathbf{E}\Phi\_f(\overline{\Pi}\_{\mathbb{R}})| \le 4 \sum\_{i \ge 1} C\_i p\_i.$$

From here we obtain the assertion we need.

**Proof of Lemma 3.** As in the proof of Lemma 2, we restrict ourselves to the case *m* = 2. It is clear that we need to examine two series

$$\begin{aligned} S\_1(\overline{V}\_n) &:= \sum\_{i \ge 1} \sum\_{k\_1, k\_2 \ge 0} f\_{in}^2(k\_1, k\_2) \mathbf{P}(\nu\_{in\_1}^{(1)} = k\_1) \mathbf{P}(\nu\_{in\_2}^{(2)} = k\_2), \\ S\_2(\overline{V}\_n) &:= \sum\_{i \ge 1} \left( \sum\_{k\_1, k\_2 \ge 0} f\_{in}(k\_1, k\_2) \mathbf{P}(\nu\_{in\_1}^{(1)} = k\_1) \mathbf{P}(\nu\_{in\_2}^{(2)} = k\_2) \right)^2. \end{aligned}$$

In the same way as in the proof of Lemma 1, we obtain

$$|S\_1(\overline{V}\_{\mathbb{R}}) - S\_1(\overline{\Pi}\_{\mathbb{R}})| \le 4 \sum\_{i \ge 1} C\_i^2 p\_i.$$

Similarly,


From these estimates it follows that

$$|\mathbf{D}\Phi\_f(\Pi\_n) - \mathbf{D}\Phi\_f(\nabla\_n)| \le 8 \sum\_{i \ge 1} C\_i^2 p\_{i'} $$

whence we obtain the assertion of Lemma 2.

**Proof of Theorem 4.** By Corollary 4, it suffices to present conditions for the asymptotic normality of the Poisson version of the *χ*2-statistic, i.e., conditions for the feasibility of relation (16). As such, we take the Lyapunov condition of third order. Indeed, consider the following scheme of series of independent in each series of centered random variables:

$$\zeta\_{in} := \frac{(\pi\_{in} - np\_i)^2}{np\_i} - 1, \quad i = 1, \dots, N(n), \ n \ge 1.$$

The Lyapunov condition of third order, which guarantees the fulfillment of the central limit theorem (16), is as follows:

$$D\_n^{-3/2} \sum\_{i=1}^{N(n)} \mathbf{E} |\tilde{\zeta}\_{in}|^3 \to 0 \quad \text{as } n \to \infty. \tag{36}$$

In order to estimate the absolute third moment in (36), we need the well-known recurrence relation for the central moments of the Poisson distribution:

$$\mathbb{E}(\pi\_{\lambda} - \lambda)^n = \lambda \sum\_{k=0}^{n-2} \mathbb{C}\_{n-1}^k \mathbb{E}(\pi\_{\lambda} - \lambda)^k, \quad n \ge 2,$$

where *πλ* is a Poisson random variable with parameter *λ*. From here it follows that **E**(*πλ* − *λ*)<sup>6</sup> = 15*λ*<sup>3</sup> + 25*λ*<sup>2</sup> + *λ*,

and using the elementary estimate |*a*<sup>2</sup> − 1| <sup>3</sup> ≤ 4(*a*<sup>6</sup> + 1), we obtain

$$\mathbb{E}|\xi\_{in}|^3 \le \frac{4}{(np\_i)^3} \left( 15(np\_i)^3 + 25(np\_i)^2 + np\_i \right) + 4 = 64 + \frac{100}{np\_i} + \frac{4}{(np\_i)^2}.$$

It is clear that, to prove relation (36) it suffices to verify that, under the conditions of the theorem, <sup>64</sup>*<sup>N</sup>* <sup>+</sup> <sup>100</sup> <sup>∑</sup>*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> <sup>1</sup> *<sup>i</sup>*=<sup>1</sup> <sup>1</sup>

$$\frac{64N + 100\sum\_{i=1}^{N} \frac{1}{np\_i} + 4\sum\_{i=1}^{N} \frac{1}{(np\_i)^2}}{\left(2N + \sum\_{i=1}^{N} \frac{1}{np\_i}\right)^{3/2}}$$

$$\leq 100\left(2N + \sum\_{i=1}^{N} \frac{1}{np\_i}\right)^{-1/2} + \frac{4\sum\_{i=1}^{N} \frac{1}{(np\_i)^2}}{\left(N + \sum\_{i=1}^{N} \frac{1}{np\_i}\right)^{3/2}} \to 0,$$

that is true in virtue of (17).

**Proof of Theorem 5.** For any natural *k*, denote

$$\Phi\_f^{(k)}(\overline{V}\_\hbar) := \sum\_{i \le k} f\_{i\hbar}(\overline{v}\_{i\hbar}).$$

$$\mathbf{P}\left(\Phi\_f(\overline{V}\_\hbar) \ge \mathbf{x}\right) \le \mathbf{P}\left(\Phi\_f^{(k)}(\overline{V}\_\hbar) \ge \frac{\mathbf{x}}{2}\right) + \mathbf{P}\left(\Phi\_f(\overline{V}\_\hbar) - \Phi\_f^{(k)}(\overline{V}\_\hbar) \ge \frac{\mathbf{x}}{2}\right). \tag{37}$$

In the notation of Theorem 1, let *V*<sup>0</sup> *<sup>n</sup>*¯ be the restriction of the point process *Vn*¯ to the set *A*<sup>0</sup> := ! *i*≤*k* Δ*<sup>i</sup>* with hit probability *p* := ∑ *i*≤*k pi*. Under the sign of the first probability of the right-hand side of inequality (37), instead of the point process *Vn*¯, we can substitute *V*0 *<sup>n</sup>*¯ and use inequality (2) for the distributions of the restrictions of the corresponding point processes.

The difference

$$\Phi\_f(\overline{\mathcal{V}}\_{\vec{n}}) - \Phi\_f^{(k)}(\overline{\mathcal{V}}\_{\vec{n}}) = \sum\_{i>k} f\_{i\hbar}(\overline{\nu}\_{i\hbar})^2$$

is also an additive functional of the restriction of the point process *Vn*¯ to the additional set *A*<sup>0</sup> := ! *i*>*k* Δ*<sup>i</sup>* with hit probability *p* := ∑ *i*>*k pi*. For this functional, we also use estimate (2). As a result, from (37) and Theorem 1, taking into account the non-negativity of the terms *fin*¯(·), we obtain

$$\mathbf{P}\left(\boldsymbol{\Phi}\_{f}(\overline{V}\_{\hbar}) \geq \boldsymbol{x}\right) \leq \left(\sum\_{i>k} p\_{i}\right)^{-m} \mathbf{P}\left(\boldsymbol{\Phi}\_{f}^{(k)}(\overline{\Pi}\_{\hbar}) \geq \frac{\boldsymbol{x}}{2}\right)$$

$$+ \left(\sum\_{i\leq k} p\_{i}\right)^{-m} \mathbf{P}\left(\boldsymbol{\Phi}\_{f}(\overline{\Pi}\_{\hbar}) - \boldsymbol{\Phi}\_{f}^{(k)}(\overline{\Pi}\_{\hbar}) \geq \frac{\boldsymbol{x}}{2}\right) \leq 2\mathbf{C}^{\*}\mathbf{P}\left(\boldsymbol{\Phi}\_{f}(\overline{\Pi}\_{\hbar}) \geq \frac{\boldsymbol{x}}{2}\right).$$

.

The theorem is proved.

**Proof of Corollary 5.** is based on the following well-known equality. If *ζ* is a non-negative random variable with finite mean then

$$\mathbb{E}\tilde{\zeta} = \int\_0^\infty \mathbb{P}(\zeta \ge x) dx.$$

Using successively this equality for *ζ* equal to Φ*f*(*Vn*¯) or 2Φ*f*(Π*n*¯), we easily obtain from (18) the moment inequality (20).

#### **9. Conclusions**

In this paper, we discuss a remarkable asymptotic property of a wide class of additive statistics that allows us to ignore the dependence of the summands in the additive structure of the statistics under consideration and to reduce asymptotic analysis of their distributions to the classical theory of the central limit problem. As consequences, we obtain refinements of certain results concerning the limit behavior of some known classes of additive statistics. Although we limited ourselves only to the law of large numbers and the central limit theorem for the statistics at issue, in the model under consideration it is possible to study sufficient conditions for the weak convergence of their distributions to other infinitely divisible laws as well. In fact, we deal here with a variant of Poisson approximation of empirical point processes, or in other words, with a compound Poisson approximation of an *n*-th partial sum of independent random variables taking values in some function space. So, in the present paper we deal with the classical subject of Probability Theory and the Poisson approximation of sums of independent multivariate random variables (for example, see [1,12,22,23]).

Moreover, one can reformulate the above-mentioned Poissonization duality theorem for more general *U*-statistic-type functionals

$$\mathcal{U}\_f(\overline{\mathcal{V}}\_n) := \sum\_{\substack{i\_1 \le \dots \le i\_m \ge i\_m}} f\_{\mathfrak{n}, i\_1, \dots, i\_m}(\overline{\mathcal{V}}\_{\mathfrak{n}, i\_1 \wedge \dots \wedge} \dots, \overline{\mathcal{V}}\_{\mathfrak{n}, i\_m})\_{\wedge}$$

where *<sup>f</sup>* ≡ { *fn*¯,*i*1,...,*im* (·)} is an array of finite functions defined on *<sup>Z</sup><sup>d</sup>* <sup>+</sup>, with *d* := ∑*k*≤*<sup>m</sup> nk*, satisfying only the restriction

$$\sum\_{i\_1 \le \dots \le i\_m} |f\_{\bar{n}, i\_1, \dots, i\_m}(0, \dots, 0)| < \infty \quad \forall \bar{n}.$$

For example, in this more general setting, one can study the limit behavior of the functionals

$$\mathcal{U}I\_I(V\_n) := \sum\_{i \ge 1} I\_{\bar{A}}(\upsilon\_{i-1,n}) I\_A(\upsilon\_{i,n}) \cdot \cdots \cdot I\_A(\upsilon\_{i+m-1,n}) I\_{\bar{A}}(\upsilon\_{i+m,n}) \cdots$$

where *A*¯ is the complement of an arbitrary subset *A* ⊂ Z+, with 0 ∈/ *A*, and *ν*0*<sup>n</sup>* := 0. These functionals count the number of success chains of length *m* in the dependent (finite or infinite) Bernoulli trials {*IA*(*νi*,*n*); *i* ≥ 1}.

**Author Contributions:** Conceptualization, I.B.; formal analysis, I.B. and M.J.; methodology, I.B.; writing—original draft, I.B. and M.J.; writing—review and editing, I.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study of I. Borisov was supported by the Russian Science Foundation, project no. 22-21-00414.

**Acknowledgments:** The authors thank the anonymous reviewers for careful reading of the paper and insightful comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Asymptotic Expansions for Symmetric Statistics with Degenerate Kernels**

**Shuya Kanagawa 1,2**


**Abstract:** Asymptotic expansions for U-statistics and V-statistics with degenerate kernels are investigated, respectively, and the remainder term *O*(*n*1−*p*/2), for some *p* ≥ 4, is shown in both cases. From the results, it is obtained that asymptotic expansions for the Crame´r–von Mises statistics of the uniform distribution *U*(0, 1) hold with the remainder term *O n*1−*p*/2 for any *p* ≥ 4. The scheme of the proof is based on three steps. The first one is the almost sure convergence in a Fourier series expansion of the kernel function *u*(*x*, *y*). The key condition for the convergence is the nuclearity of a linear operator *Tu* defined by the kernel function. The second one is a representation of U-statistics or V-statistics by single sums of Hilbert space valued random variables. The third one is to apply asymptotic expansions for single sums of Hilbert space valued random variables.

**Keywords:** U-statistics; V-statistics; asymptotic expansion; integral kernel; nuclearity

**MSC:** 60B12; 60F05; 62G20

#### **1. Introduction**

Asymptotic expansions for symmetric statistics are studied by many people. See, e.g., Callaert–Janssen–Veraverbeke (1980) [1], Withers (1988) [2], Maesono (2004) [3], and so on. They treat U-statistics with non-degenerate kernels. On the other hand, Bentkus—Go¨tze (1999) [4]and Zubayraev (2011) [5] obtained optimal bounds in asymptotic expansions for U-statistics with degenerate kernels. They treat the following modified U-statistics,

$$\mathcal{W}\_{\mathfrak{n}} = \frac{1}{n^2} \sum\_{1 \le i < j \le n} \phi\left(\xi\_{i\nu}\,\xi\_j\right) + \frac{1}{n} \sum\_{1 \le i \le n} \phi\_1\left(\xi\_i\right),\tag{1}$$

where *φ*(·, ·) is a symmetric function, *φ*1(·) is a measurable function and {*ξi*} are i.i.d. random variables. *Wn* coincides with V-statistics when

$$
\phi\_1(\mathbf{x}) = \frac{1}{2}\phi(\mathbf{x}, \mathbf{x}).\tag{2}
$$

If *φ*1(*x*) = 0 for any *x*, then *Wn* coincides with U-statistics. They obtained asymptotic expansions with remainder *O*(*n*−1) for the distribution function of *Wn*.In this paper, we investigate asymptotic expansions for the simple U-statistics and the V-statistics with degree two defined by

$$\mathcal{U}\_n = \frac{2}{n^2} \sum\_{1 \le i < j \le n} \mu\left(\mathbb{\mathcal{F}}\_i, \mathbb{\mathcal{F}}\_j\right), \quad V\_n = \frac{1}{n^2} \sum\_{1 \le i, j \le n} \mu\left(\mathbb{\mathcal{F}}\_i, \mathbb{\mathcal{F}}\_j\right), \tag{3}$$

respectively. We obtain asymptotic expansions with remainder *O*(*n*1−*p*/2) for some *p* ≥ 4 for the distribution function of *Un* or *Vn* under some assumptions for {*ξi*} and *u*(*x*, *y*). Our scheme of the proof is based on three steps. The first one is the almost sure convergence

**Citation:** Kanagawa, S. Asymptotic Expansions for Symmetric Statistics with Degenerate Kernels. *Mathematics* **2022**, *10*, 4158. https://doi.org/10.3390/ math10214158

Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

Received: 21 September 2022 Accepted: 29 October 2022 Published: 7 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

in a Fourier series expansion of *u ξi*, *ξ<sup>j</sup>* . The key condition for the convergence is the nuclearity of a linear operator *Tu* defined by the kernel function *u*(*x*, *y*). The second one is a representation of U-statistics or V-statistics by single sums of Hilbert space valued random variables. The third one is to apply asymptotic expansions for single sums of Hilbert space valued random variables due to Sazonov—Uyanov (1995) [6].

#### **2. Symmetric Statistics**

Let *ξj*, *j* ≥ 1 be i.i.d. random variables with a probability distribution *μ* on an arbitrary measurable space (*X*, B). Suppose that *u*(*x*1, *x*2, ··· , *xn*) is a real valued symmetric function for some *k* ≥ 1, i.e.,

$$
\mu(\mathbf{x}\_1, \mathbf{x}\_2, \cdots, \mathbf{x}\_k) = \mu(\mathbf{x}\_{i\_1}, \mathbf{x}\_{i\_2}, \cdots, \mathbf{x}\_{i\_k}), \tag{4}
$$

for any permutation (*i*1, *i*2, ··· , *ik*) of (1, 2, ··· , *k*). A statistics defined by the kernel function *u*(*x*1, *x*2, ··· , *xk*) is called a symmetric statistics. The followings are the typical examples of the symmetric statistics.

**Example 1.** *U-statistics with degree k* ≥ 1*:*

$$\iota L\_n = \binom{n}{k}^{-1} \sum\_{1 \le i\_1 < i\_2 < \cdots < i\_k \le n} \mu\left(\mathfrak{F}\_{i\_1}, \mathfrak{F}\_{i\_2}, \dots, \mathfrak{F}\_{i\_k}\right). \tag{5}$$

**Example 2.** *V-statistics with degree k* ≥ 1*:*

$$V\_n = n^{-k} \sum\_{1 \le i\_1, i\_2, \dots, i\_k \le n} u\left(\xi\_{i\_1 \prime} \xi\_{i\_2 \prime} \cdots \xi\_{i\_k}\right). \tag{6}$$

In this paper, we treat V-statistics *Vn* and U-statistics *Un* with degree two defined by (3) when the kernel function *u*(*x*, *y*) is degenerate, i.e.,

$$E[\mu(\xi\_1, \mathfrak{x})] = 0,\tag{7}$$

for any real number *x*.

#### **3. Non-Central Limit Theorems for U-Statistics with Degenerate Kernels**

Assume that {*ξi*} are i.i.d. random variables with a distribution *μ*. Let *u*(*x*, *y*) be a real valued symmetric function on **R** × **R** and square integrable such that

$$E\left[\mu(\mathfrak{f}\_1, \mathfrak{f}\_2)^2\right] < \infty. \tag{8}$$

Suppose that *u*(*x*, *y*) is a degenerate kernel satisfying the condition (7). Let *L*2(**R**, *μ*) be the space of all square integrable functions with respect to *μ*. Then, according to Serfling (1980) [7], we see that the kernel *u*(*x*, *y*) induces a bounded linear operator *L*2(**R**, *μ*) → *L*2(**R**, *μ*) (trace class) defined by

$$T\_u(f) = E[u(\xi\_1, \mathbf{x}) f(\xi\_1)] = \int\_{-\infty}^{\infty} u(y, \mathbf{x}) f(y) \mu(dy), \quad f \in L^2,\tag{9}$$

which has eigenvalues {*λi*} and eigenfunctions {*gi*} satisfying for each *i* ≥ 1

$$\begin{cases} \begin{array}{ll} E[\mathcal{g}\_i(\mathfrak{f}\_1)] = 0, & E\left[\mathcal{g}\_i^2(\mathfrak{f}\_1)\right] = 1 \\ E\left[\mathcal{g}\_i(\mathfrak{f}\_1)\mathcal{g}\_j(\mathfrak{f}\_1)\right] = 0 \ (i \neq j), & E[u(\mathfrak{f}\_1, x)\mathcal{g}\_i(\mathfrak{f}\_1)] = \lambda\_i \mathcal{g}\_i(x) \end{array} . \end{cases} \tag{10}$$

With respect to (10), see Serfling (1980) [7], pp. 196 and Dunford and Schwartz (1963), pp. 905, 1009, 1083, 1087 for more details. Then we have

$$\lim\_{n \to \infty} E\left[ \left( \mu \left( \mathfrak{f}\_i, \mathfrak{f}\_j \right) - \sum\_{k=1}^n \lambda\_k \mathfrak{g}\_k \left( \mathfrak{f}\_i \right) \mathfrak{g}\_k \left( \mathfrak{f}\_j \right) \right)^2 \right] = 0,\tag{11}$$

for each *i*, *j* ≥ 1. Serfling (1980) [7] showed the non-central limit theorem for U-statistics with degree 2.

**Theorem 1.** *(Serfling (1980) [7]) Put θ* = *E*[*u*(*ξ*1, *ξ*2)]*. Let Un be a U-statistics with the degenerate kernel u*(*x*, *y*) *defined by*

$$\mathcal{U}\_n = \frac{2}{n^2} \sum\_{1 \le i < j \le n} \mu\left(\xi\_{i'} \mathfrak{F}\_j\right),\tag{12}$$

*Let* {*Zi*} *be i.i.d. random variables with the standard Normal distribution N*(0, 1)*. Then, as n* → ∞

$$m!I\_{\rm II} \quad \Rightarrow \quad \sum\_{j=1}^{\infty} \lambda\_j \left(Z\_j^2 - 1\right),\tag{13}$$

*where "*⇒*" means the weak convergence in* **R***.*

It is well known that the rate of convergence in (13) is *O*(*n*−1/2) (See, e.g., Serfling (1980) [7] for more details). We obtain asymptotic expansions for *Un* and *Vn* using asymptotic expansions due to Sazonov—Uyanov (1995) [6] for sums of Hilbert space valued i.i.d. random variables in the next section.

#### **4. Asymptotic Expansions for Single Sums which Hit a Ball in a Hilbert Space**

In this section we consider an asymptotic expansions for sums of Hilbert space valued random vectors {*Xi*} according to Sazonov—Uyanov (1995) [6]. Let {*Xi*} be a sequence of i.i.d. random vectors in a separable Hilbert space *H* with *E*[*X*1] = 0 and *E X*1<sup>2</sup> = 1, where *x*<sup>2</sup> = *x*, *x* for *x* ∈ *H* and ·, · is the inner product in *H*. Define the covariance operator *V* of *X*<sup>1</sup> by

$$
\langle Vx, y \rangle = E[\langle X\_1 - E[X\_1], x \rangle \langle X\_1 - E[X\_1], y \rangle], \tag{14}
$$

for *x*, *y* ∈ *H*. Denote by *σ*<sup>2</sup> <sup>1</sup> ≥ *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> ≥ ··· the eigenvalues of *V* and by *e*1,*e*2, ··· be the orthonormal eigenvectors corresponding to the eigenvalues. Put

$$S\_n = \frac{1}{\sigma\sqrt{n}}\sum\_{i=1}^n (X\_i - E[X\_i]), \quad v\_k = \left(\prod\_{i=1}^k \sigma\_i\right)^{-1/k}, \quad c\_k(V) = v\_k^{k-1},\tag{15}$$

where *σ*<sup>2</sup> = *E X*<sup>1</sup> − *E*(*X*1)<sup>2</sup> . Define the projection *K* : *H* → *H* by

$$Ky = \sum\_{i=1}^{6k-5} \langle y, e\_i \rangle e\_{i\prime} \quad y \in H. \tag{16}$$

Put

$$\theta\_k(L) = \sup \left\{ \left| E \left[ \exp \left( \sqrt{-1} \langle y, X\_1 \rangle \right) \right] \right| \, \middle| \, \|Ky\| \ge \frac{1}{L} \right\}. \tag{17}$$

for any *L* > 0. Let *Y* be the *H*-valued Gaussian random variables with mean 0 and the covariance operator *V*. For *a*, *h* ∈ *H*, *r* > 0, *i* = 0, 1, ··· we put

$$\Phi\_i(a,r) = P\left\{ \left\| \left(1 - \frac{i}{n}\right)^{1/2} Y - a \right\| < r \right\},\tag{18}$$

$$d\_h \Phi\_i(a, r) = \lim\_{t \to \infty} \frac{\Phi\_i(a - th, r) - \Phi\_i(a, r)}{t}. \tag{19}$$

Define the differential operators *d<sup>k</sup> <sup>h</sup>* by

$$d\_h^1 \Phi\_i(a, r) = d\_h \Phi\_i(a, r), \quad d\_h^k \Phi\_i(a, r) = d\_h \left( d\_h^{k-1} \Phi\_i(a, r) \right), \quad k \ge 2. \tag{20}$$

Put

$$\chi\_{j,L} \,' = I \{ \|X\_j\| \, < \, L \} \tag{21}$$

for the indicator *I*{·}

$$\chi\_{j,t} = \chi\_{j,\sqrt{n}(1+t)}{}'\tag{22}$$

and *χ<sup>j</sup>* = *χj*,0. For positive integers *l*1, *l*2, ··· , *ls* we put

$$Q\_s = \left(d\_{X\_{1X\_1}}^{l\_1} - d\_{Y\_1}^{l\_1}\right) \cdots \left(d\_{X\_{1X\_s}}^{l\_s} - d\_{Y\_s}^{l\_s}\right) \tag{23}$$

and for integers *k* ≥ 2, 1 ≤ *i* ≤ *k* − 2, we put

$$A\_i(a,r) = n^{-i/2} \sum\_{j=1}^n \sum \prime n^{-j} \binom{n}{j} \binom{l^{(j)}}{}^{-1} E\left(Q\_j\right) \Phi\_{\bar{\jmath}}(a,r),\tag{24}$$

where *l* (*j*) = *l*1! ··· *lj*! and ∑ denotes the summation over all, such that

$$l\_1 \ge 3, \ l\_2 \ge 3, \ \dots \searrow, \ l\_j \ge 3, \ l\_1 + \ l\_2 + \dots + \ne \ne + l\_j = 2j + i. \tag{25}$$

The following theorem is the key result for the proofs of our theorems.

#### **Theorem 2.** *Sazonov—Uyanov (1995) [6]*

*Suppose that E X*1*<sup>p</sup>* < ∞ *for some p* ≥ 4*. For any t* ≥ 0 *and integer k* ≥ 2*, let L be a positive number, such that*

$$E\left[\|X\_1\|^2 \left(1 - \chi\_{j,L}\right)\right] \le \frac{\sigma\_{6k-5}^2}{3}.\tag{26}$$

*Then, for L* ≤ *n*1/2

$$\Delta\_{\mathfrak{n}}(a,r) := \left| P\{ \|\mathbf{S}\_{\mathfrak{n}} - a\| < r \} - P\{ \|\mathbf{Y} - a\| < r \} - \sum\_{i=1}^{k-2} A\_i(a,r) \right| \tag{27}$$

$$\leq A(p,s,t)$$

$$1+c(k)\exp\{-s^{a}\}\left\{c\_{6k-5}(V)E\left[B\_{2}(a,r)(1-\chi\_{1})\right]+\left(1+M(a,r)^{k-2}\right)E\left[B\_{k+1}(a,r)(1-\chi\_{1})\right]\right\}$$

$$1+c\_{6k-5}(V)\left(1+m^{3}(a,r)\left|a\right|\left|\left(Va,a\right)\right\rangle^{k-2}\left(\frac{L^{2}}{n}\right)^{(k-1)/2}+\theta\_{k}^{n\left/\left(k\log\left(n/L^{2}\right)\right)}\left(L\right)\log\left(n\left/L^{2}\right)\right),$$

*where for s* = |*a* − *r*| *and α* ≥ <sup>1</sup> 5 *,*

$$A(p,s,t) := nE[(1 - \chi\_{1,t})] + c\_p(1 - s)^p n^{1 - p/2} E\left[ ||X\_1||^p (\chi\_{1,t} - \chi\_1) \right],\tag{28}$$

$$B(j,r) = n^{-(j-2)} \left( ||X\_1||^j + m^j(a,r)| \langle X\_1, a \rangle |^j \right),\tag{29}$$

*M*(*a*,*r*) = *m*2(*a*,*r*)*Va*, *a* (30)

*and*

$$m(a,r) := \begin{cases} \min\{1, \frac{r}{\|a\|}\}, & ||a|| > 0 \\ 0, & a = 0 \end{cases} . \tag{31}$$

*In addition, the terms in the asymptotic expansion for ε* > 0 *satisfies the estimates*

$$|A\_i(a,r)| \le c(\varepsilon,i) \exp\left\{-\frac{s^2}{2+\varepsilon}\right\} n^{-i/s} c\_{6i+3}(V) \tag{32}$$

$$\times E\left[\chi\_i ||X\_1||^{i+2} + |\langle X\_1, a\rangle|^{i+2} \chi\_i m^{i+2}(a,r) \times \right.$$

$$\left\{1 + m^{2(i+2)}(a,r) \left(1 + m^{2(i+2)}(a,r) \langle Va, a\rangle^{i-1}\right)\right\} + M(a,r)^{3i+2} \right]$$

*for even i, and if i is odd, then we have*

$$|A\_i(a,r)| \le c(\varepsilon,i) \exp\left\{-\frac{s^2}{2+\varepsilon}\right\} n^{-i/s} \left\{c\_{6i+3}(V) \left(1 + \left(m^2(a,r)\langle Va, a\rangle^{i-1}\right)\right)\right. \\ \tag{33}$$

$$\times E\left[|\langle X\_1, a\rangle|\chi\_i m(a,r)\left\{||X\_1||^2 + ||X\_1||^{i+1} + \langle X\_1, a\rangle^{i+1}(a,r)m^{i+1}(a,r)\right\}\right]$$

$$+ c\_{6i+3}(V)m(a,r)\langle Va, a\rangle^{1/2} E\left[\chi\_i ||X\_1||^{i+1}\right]$$

#### **5. The Sato–Mercer Theorem**

In the proofs of our theorems we use the Fourier series expansion for the kernel function *u ξi*, *ξ<sup>j</sup>* by eigenvalues and eigenfunction of the linear operator *Tu* defined by (9). Since (11) holds in the sense of the *L*2-convergence, (11) can not be applied to show the asymptotic expansions for U-statistics or V-statistics as it is. We show that *u ξi*, *ξ<sup>j</sup>* can be represented by the Fourier series expansion in (11) almost surely using the following Sato–Mercer theorem. (See Sato (1992) [8] for more details.)

#### **Theorem 3.** *(The Sato–Mercer theorem)*

*Let X be a separable metric space with a Borel measure ν on X, and K*(*x*, *y*) *be a function on X* × *X such that there exists a Borel-measurable subset X*0*, such that*

$$\nu(X \backslash X\_0) = 0.\tag{34}$$

*Suppose that K*(*x*, *y*) *is continuous on X*<sup>0</sup> *and satisfies*

$$\int\_{X} \int\_{X} |K(\mathfrak{x}, y)|^{2} \nu(d\mathfrak{x}) \nu(dy) < \infty \tag{35}$$

*and*

$$\int\_{X} \int\_{X} \mathcal{K}(\mathbf{x}, \mathbf{y}) f(\mathbf{x}) \overline{f(\mathbf{y})} \nu(d\mathbf{x}) \nu(d\mathbf{y}) \ge 0,\tag{36}$$

*for any f* ∈ *L*2(*X*, *ν*)*. Then, the linear operator TK on L*2(*X*, *ν*) *defined by*

$$T\_K f(\mathbf{x}) = \int\_X \mathcal{K}(\mathbf{x}, y) f(y) \nu(dy), \quad f \in L^2(X, \nu) \tag{37}$$

*is nuclear if, and only if,*

$$\int\_{X} K(x,\boldsymbol{x}) \nu(d\boldsymbol{x}) < \infty \tag{38}$$

*holds.*

From Theorem 3, we have the next result.

**Theorem 4.** *Let ξj*, *j* ≥ 1 *be i.i.d. random variables with the distribution μ. Let u*(*x*, *y*) *be a real valued symmetric function on* **R** × **R** *and Tu be a linear operator defined by*

$$T\_u f(\mathbf{x}) = E[u(\xi\_1, \mathbf{x}) f(\xi\_1)] = \int\_{-\infty}^{\infty} u(y, \mathbf{x}) f(y) \mu(dy), \quad f \in L^2(\mathbf{R}, \mu). \tag{39}$$

*Suppose that u*(*x*, *y*) *is the square integrable degenerate kernel of the linear operator Tu, such that*

$$\int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} u^2(x, y) \mu(dx) \mu(dy) < \infty,\tag{40}$$

$$\int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} u(x, y) f(x) \overline{f(y)} \mu(dx) \mu(dy) \ge 0,\tag{41}$$

*for any f* ∈ *L*2(**R**, *μ*) *and*

$$E[\mu(\xi\_1, \mathbf{x})] = \int\_{-\infty}^{\infty} \mu(y, \mathbf{x}) \mu(dy) = 0 \tag{42}$$

*for any x* ∈ **R***. Let* {*λk*} *and* {*gk*} *be eigenvalues and eingenfunctions of the linear operator Tu, respectively. Suppose*

*λ<sup>k</sup>* ≥ 0, *k* ≥ 1. (43)

*Furthermore assume that there exists a Lebesgue measurable subset X*<sup>0</sup> ⊂ **R***, such that*

$$
\mu(X\_0) = 1\tag{44}
$$

*and u*(*x*, *y*) *is continuous on X*0*. Then, we have*

$$
\mu\left(\mathfrak{f}\_{i\prime}\mathfrak{f}\_{j}\right) = \sum\_{k=1}^{\infty} \lambda\_{k} \mathfrak{g}\_{k}\left(\mathfrak{f}\_{i}\right) \mathfrak{g}\_{k}\left(\mathfrak{f}\_{j}\right) \quad a.s.. \tag{45}
$$

*for each i*, *j* ≥ 1*.*

**Proof.** It is easy to see that from (10)

$$E\left[\sum\_{k=1}^{n} \left| \lambda\_k \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_i) \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_j) \right| \right] = \sum\_{k=1}^{n} E\left[ \left| \lambda\_k \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_i) \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_j) \right| \right] \tag{46}$$

$$= \sum\_{k=1}^{n} |\lambda\_k| E\left[ \left| \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_i) \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_j) \right| \right]$$

$$\leq \sum\_{k=1}^{n} |\lambda\_k| \left\{ E\left[ \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_i)^2 \right] \right\}^{1/2} \left\{ E\left[ \mathcal{g}\_k(\tilde{\boldsymbol{\xi}}\_j)^2 \right] \right\}^{1/2}$$

$$= \sum\_{k=1}^{n} |\lambda\_k|\_{\prime}$$

for each *n* ≥ 1. Tending *n* → ∞, (46) implies that

$$E\left[\sum\_{k=1}^{\infty} |\lambda\_k \g\_k(\xi\_i) \g\_k(\xi\_j)|\right] \le \sum\_{k=1}^{\infty} |\lambda\_k|. \tag{47}$$

On the other hand, from (40) and (41), *u*(*x*, *y*) satisfies (35) and (36). Therefore, *Tu* is nuclear by Theorem 3. Hence, from (43) and the nuclearity of *Tu*, we have

$$\sum\_{k=1}^{\infty} |\lambda\_k| = \sum\_{k=1}^{\infty} \lambda\_k < \infty. \tag{48}$$

From (47) and (48), we have

$$E\left[\sum\_{k=1}^{\infty} \left| \lambda\_k \varrho\_k \left(\xi\_i\right) \varrho\_k \left(\xi\_j\right) \right| \right] < \infty,\tag{49}$$

which implies <sup>∞</sup>

$$\sum\_{k=1}^{\infty} \left| \lambda\_k \varrho\_k(\mathfrak{f}\_i) \varrho\_k(\mathfrak{f}\_j) \right| < \infty, \quad a.s. \tag{50}$$

Therefore, (45) is proved from (11) and (50).

**Remark 1.** *If the symmetric function u*(*x*, *y*) *is piecewise continuous on* **R***, then there exists X*<sup>0</sup> ⊂ **R** *satisfying (44) such that u*(*x*, *y*) *is continuous on X*0*. In the next section, we show a typical example of U- or V-statistics defined by such piecewise continuous function u*(*x*, *y*) *as its kernel function.*

#### **6. Asymptotic Expansions for Degenerate V-Statistics and U-Statistics with Degree 2**

For applying Theorem 2 for Hilbert space valued random variables to the proof of asymptotic expansions for *Vn*, we represent *Vn* by sums of Hilbert space valued random variables {*Gi*} by the following method.

According to K.—Yoshihara (1994) [9], we introduce a separable Hilbert space *H*equipped with the inner product ·, · and the norm · as follows,

$$H = \left\{ \mathbf{x} = (x\_1, x\_2, \dots) \in \mathbb{R}^{\infty} \, \middle| \, \sum\_{k=1}^{\infty} |\lambda\_k| x\_k^2 < \infty \right\},\tag{51}$$

$$
\langle \mathbf{x}, \mathbf{y} \rangle = \sum\_{k=1}^{\infty} |\lambda\_k| \mathbf{x}\_k y\_k \tag{52}
$$

and

$$\|\mathbf{x}\| = \left(\sum\_{k=1}^{\infty} |\lambda\_k| \mathbf{x}\_k^2\right)^{1/2}.\tag{53}$$

Using the assumptions of Theorem 4, we have from (10) and (48) that

$$E\left[\sum\_{k=1}^{\infty} |\lambda\_k| \mathcal{g}\_k^2(\xi\_i)\right] = \sum\_{k=1}^{\infty} |\lambda\_k| E\left[\mathcal{g}\_k^2(\xi\_i)\right] = \sum\_{k=1}^{\infty} |\lambda\_k| < \infty,\tag{54}$$

which implies that we can define *H*-valued random variables by

$$G\_i = (\mathcal{g}\_1(\mathfrak{f}\_i), \mathcal{g}\_2(\mathfrak{f}\_i), \mathcal{g}\_3(\mathfrak{f}\_i), \dotsb) \tag{55}$$

for each *i* ≥ 1. Let {*Un*, *n* ≥ 1} and {*Vn*, *n* ≥ 1} be U-statistics and V-statistics with degree 2 defined by (3), respectively.

**Theorem 5.** *Without loss of generality we assume that θ* = 0*. Suppose that ξj*, *j* ≥ 1 *is a sequence of i.i.d. random variables with the distribution μ. Assume that u*(*x*, *y*) *is a square integrable symmetric function with respect to μ* × *μ satisfying (40)* ∼ *(42). Suppose that for some p* ≥ 4

$$E\left[||G\_1||^p\right] < \infty. \tag{56}$$

*Furthermore, without loss of generality, assume that*

$$\sum\_{k=1}^{\infty} \lambda\_k = 1.\tag{57}$$

*Let Y be the H-valued Gaussian random variables with mean 0 and the covariance operator V satisfying (14) with the eigenvalues σ*<sup>2</sup> <sup>1</sup> ≥ *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> ≥··· *and the orthogonal eigenvectors e*1,*e*2, ··· *. For any t* ≥ 0*, integer k* ≥ 2*, let L be a positive number, such that*

$$E\left[\left\|G\_1\right\|^2 \left(1-\chi\_{j,L}\right)'\right] \le \frac{\sigma\_{6k-5}^2}{3}.\tag{58}$$

*Then, for L* ≤ *n*1/2 *and α* ≥ <sup>1</sup> 5 *,*

$$\left| P\{ |nV\_n| \le r \} - P\{ |\|Y\|| \le r \} - \sum\_{i=1}^{k-2} A\_i(0, r) \right| \tag{59}$$

$$\leq A(p,s,t) + c(k)\exp\{-r^{a}\} [c\_{6k-5}(V)E[B\_2(0,r)(1-\chi\_1)]$$

$$+ E[B\_{k+1}(0,r)\chi\_1] + c\_{6k-5}(V)\left(\frac{L^2}{n}\right)^{(k-1)/2} + \theta\_k^{n/\left(k\log\left(n/L^2\right)\right)}(L)\log\left(n/L^2\right),$$

*where*

$$\|\|Y\|\| = \left| \sum\_{j=1}^{\infty} \lambda\_j \left( Z\_j^2 - 1 \right) \right|,\tag{60}$$

$$A(p,s,t) = nE[(1 - \chi\_{1,t})] + c(p)(1+r)^{-p}n^{1-p/2}E\left[\|G\_1\|^p(\chi\_{1,t} - \chi\_1)\right] \tag{61}$$

*and*

$$B\_j(0, r) = n^{-(j-2)/2} ||G\_1||^j. \tag{62}$$

**Proof.** Put

$$h(\mathbf{x}) = \sum\_{k=1}^{\infty} \lambda\_k \mathbf{x}\_k \tag{63}$$

for

$$\mathbf{x} \in H = \left\{ \mathbf{x} = (\mathbf{x}\_1, \mathbf{x}\_2, \dots) \, \middle| \, \sum\_{k=1}^{\infty} |\lambda\_k| \mathbf{x}\_k^2 < \infty \right\} \tag{64}$$

Recall that for each *i*,

$$\frac{1}{\sqrt{n}}\sum\_{i=1}^{n}\mathcal{G}\_{i} = \left(\frac{1}{\sqrt{n}}\sum\_{i=1}^{n}\mathcal{G}\_{1}(\mathbb{\tilde{G}}\_{i}), \frac{1}{\sqrt{n}}\sum\_{i=1}^{n}\mathcal{G}\_{2}(\mathbb{\tilde{G}}\_{i}), \dots \right) \in H. \tag{65}$$

Then we have

$$nV\_n = \frac{1}{n} \sum\_{1 \le i,j \le n} \mu\left(\xi\_i, \xi\_j\right) = \frac{1}{n} \sum\_{1 \le i,j \le n} \sum\_{k=1}^{\infty} \lambda\_k \mathcal{g}\_k(\xi\_i) \mathcal{g}\_k(\xi\_j) \tag{66}$$

$$\epsilon\_{\text{a.s.}} \qquad \epsilon\_{\text{b.s.}} \qquad \epsilon\_{\text{c.s.}} \qquad \epsilon\_{\text{c.s.}} \qquad \chi^2$$

$$\begin{split} \lambda = \frac{1}{n} \sum\_{k=1}^{\infty} \lambda\_k \left\{ \sum\_{1 \le i,j \le n} g\_k(\vec{\xi}\_i) g\_k(\vec{\xi}\_j) \right\} &= \frac{1}{n} \sum\_{k=1}^{\infty} \lambda\_k \left\{ \sum\_{i=1}^n g\_k(\vec{\xi}\_i) \right\}^2 \\ &= \left\| \frac{1}{\sqrt{n}} \sum\_{i=1}^n \mathcal{G}\_i \right\|. \end{split}$$

Thus, we can apply Theorem 2 to show Theorem 5.

**Theorem 6.** *Suppose that the i.i.d. random variables* {*ξi*, *i* ≥ 1} *obey a continuous distribution. Let v*(*x*, *y*) *be a symmetric function defined by*

$$v(x,y) = \begin{cases} \
u(x,y), & x \neq y \\ 0, & x = y \end{cases} \tag{67}$$

*Under the same assumptions in Theorem 5, the equation (59) holds for Un with the degenerate kernel v*(*x*, *y*)*.*

**Proof.** Since the i.i.d. random variables {*ξi*, *i* ≥ 1} obey a continuous distribution, we have

$$P\left\{\mathfrak{J}\_i \not\equiv \mathfrak{J}\_j\right\} = 1 \quad (i \not\equiv j). \tag{68}$$

Therefore, from (67) and (68)

$$n!I\_{\mathfrak{n}} = \frac{2}{n} \sum\_{1 \le i < j \le n} u\left(\mathfrak{z}\_i, \mathfrak{z}\_j\right) = \frac{1}{n} \sum\_{1 \le i, j \le n} v\left(\mathfrak{z}\_i, \mathfrak{z}\_j\right) \quad a.s. \tag{69}$$

Since the right hand side of (69) is the V-statistics with the degenerate kernel *v*(*x*, *y*) satisfying all assumptions of Theorem 5, Theorem 6 holds from Theorem 5.

$$\text{Remark 2. }\text{ From (10), }\text{E[G\_1]} = 0 \text{ and } \sigma^2 = E\left[\left\|\mathbf{G\_1}\right\|^2\right] = 1 \text{ in Theorem 5.}$$

#### **7. Cramer–Von Mises Statistics**

There are some examples of U-statistics or V-statistics for which the above theorems are applicable under the assumption of nuclearity of the kernel functions where the above theorems are applicable.

#### **Example 3.** *(Cram*e´*r-von Mises Statistics, Sato (1992) [8])*

*Assume that i.i.d. random variables ξj*, *j* ≥ 1 *obey the uniform distribution U*(0, 1)*, i.e., μ is the Lebesgue on* [0, 1]*. Define a kernel function u*(*x*, *y*) *by*

$$u(x,y) = \int\_0^1 \frac{\left(I\_{[x,1]}(t) - t\right)\left(I\_{[y,1]}(t) - t\right)}{t(1-t)} dt, \quad x, y \in [0,1] \tag{70}$$

*satisfies the hypothesis of Theorem 5 or Theorem 6. On the other hand, we have*

$$\int\_0^1 u(\mathbf{x}, \mathbf{x})d\mathbf{x} = \int\_0^1 d\mathbf{x} \int\_0^1 \frac{\left(I\_{[\mathbf{x}, 1]}(t) - t\right)^2}{t(1 - t)} dt \tag{71}$$

$$= \int\_0^1 \frac{dt}{t(1 - t)} \int\_0^1 \left(I\_{[\mathbf{x}, 1]}(t) - t\right)^2 d\mathbf{x} = 1 < \infty.$$

*Therefore, the integral operator Tu defined by*

$$T\_{\rm u}f(y) = \int\_0^1 u(x,y)f(y)dx\tag{72}$$

*is nuclear from Theorem 3. Therefore, since the degenerate kernel u*(*x*, *y*) *defined by (70) satisfies all assumptions of Theorem 5, Theorem 5 holds for the Cram*e´*r-von Mises Statistics. Furthermore, Theorem 6 also holds for U-statistics with the degenerate kernel v*(*x*, *y*) *defined by (67) and (70).*

#### **8. Conclusions**

Bentkus—Go¨tze (1999) [4] and Zubayraev (2011) [5] obtained the remainder *O*(*n*−1) in asymptotic expansions for U-statistics or V-statistics with degenerate kernels. From Theorems 5 and 6, if we assume *E G*1*<sup>p</sup>* ≤ ∞, *p* ≥ 4 and some conditions, then we obtain the remainder *O n*1−*p*/2 . Applying Theorem 5, we obtain asymptotic expansions for the Crame´r–von Mises statistics of the uniform distribution *U*(0, 1) with the remainder *O n*1−*p*/2 for any *p* ≥ 4.

**Funding:** Grant-in-Aid Scientific Research (C), No.18K03431, Ministry of Education, Science and Culture, Japan.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The author uses no data.

**Acknowledgments:** The author would like to express his gratitude to the anonymous referees for their useful comments. He also would like to express his gratitude to V.V. Ulyanov for giving the opportunity to present this work.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **Sharp Estimates for Proximity of Geometric and Related Sums Distributions to Limit Laws**

**Alexander Bulinski 1,\* and Nikolay Slepov <sup>2</sup>**


**\*** Correspondence: alexander.bulinski@math.msu.ru

**Abstract:** The convergence rate in the famous Rényi theorem is studied by means of the Stein method refinement. Namely, it is demonstrated that the new estimate of the convergence rate of the normalized geometric sums to exponential law involving the ideal probability metric of the second order is sharp. Some recent results concerning the convergence rates in Kolmogorov and Kantorovich metrics are extended as well. In contrast to many previous works, there are no assumptions that the summands of geometric sums are positive and have the same distribution. For the first time, an analogue of the Rényi theorem is established for the model of exchangeable random variables. Also within this model, a sharp estimate of convergence rate to a specified mixture of distributions is provided. The convergence rate of the appropriately normalized random sums of random summands to the generalized gamma distribution is estimated. Here, the number of summands follows the generalized negative binomial law. The sharp estimates of the proximity of random sums of random summands distributions to the limit law are established for independent summands and for the model of exchangeable ones. The inverse to the equilibrium transformation of the probability measures is introduced, and in this way a new approximation of the Pareto distributions by exponential laws is proposed. The integral probability metrics and the techniques of integration with respect to sign measures are essentially employed.

**Keywords:** probability metrics; Stein method; geometric sums; generalization of the Rényi theorem; generalized transformation of equilibrium for probability measures and its inverse; generalized gamma distribution

**MSC:** 60F99; 60E10; 60G50; 60G09

#### Received: 29 October 2022 Accepted: 7 December 2022 Published: 14 December 2022

**Citation:** Bulinski, A.; Slepov, N. Sharp Estimates for Proximity of Geometric and Related Sums Distributions to Limit Laws. *Mathematics* **2022**, *10*, 4747. https:// doi.org/10.3390/math10244747 Academic Editors: Alexander Tikhomirov and Vladimir Ulyanov

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**1. Introduction**

The theory of sums of random variables belongs to the core of modern probability theory. The fundamental contribution to the formation of the classical core was made by A. de Moivre, J. Bernoulli, P.-S. Laplace, D. Poisson, P.L. Chebyshev, A.A. Markov, A.M. Lyapunov, E. Borel, S.N. Bernstein, P. Lévy, J. Lindeberg, H. Cramér, A.N. Kolmogorov, A.Ya. Khinchin, B.V. Gnedenko, J.L. Doob, W. Feller, Yu.V. Prokhorov, A.A. Borovkov, Yu.V. Linnik, I.A. Ibragimov, A. Rényi, P. Erdös, M. Csörgö, P. Révész, C. Stein, P. Hall, V.V. Petrov, V.M. Zolotarev, J. Jacod and A.N. Shiryaev among others. The first steps led to limit theorems for appropriately normalized partial sums of sequences of independent random variables. Besides the laws of large numbers, special attention was paid to emergence of Gaussian and Poisson limit laws. Note that despite many efforts to find necessary and sufficient conditions for the validity of the central limit theorem (the term was proposed by G. Pólya for a class of limit theorems describing weak convergence of distributions of normalized sums of random variables to the Gaussian law), this problem was completely resolved for independent summands only in the second part of the 20th century in the

works by V.M. Zolotarev and V.I. Rotar. Also in the last century, the beautiful theory of infinitely divisible and stable laws was constructed. New developments of infinite divisibility along with classical theory can be found in [1]. For exposition of the theory of stable distributions and their applications, we refer to [2], see also references therein.

Parallel to partial sums of a sequence of random variables (and vectors), other significant schemes have appeared, for instance, the arrays of random variables. Moreover, in physics, biology and other domains, researchers found that it was essential to study the sums of random variables when the number of summands was random. Thus, the random sums with random summands became an important object of investigation. One can mention the branching processes which stem from the 19th century population models by I.J. Bienaymé, F. Galton and H.W. Watson that are still intensively being developed, see, e.g., [3]. In the theory of risk, it is worth recalling the celebrated Cramér–Lundberg model for dynamics of the capital of an insurance company, see, e.g., Ch. 6 in [4]. Various examples of models described by random sums are considered in Ch. 1 of [5], including (see Example 1.2.1) the relationship between certain random sums analysis and the famous Pollaczek–Khinchin formula in queuing theory. A vast literature deals with the so-called geometric sums. There, one studies the sum of independent identically distributed random variables, and the summation index follows the geometric distribution, being independent with summands. Such random sums can model many real world phenomena, e.g., in queuing, insurance and reliability, see the Section "Origin of Geometric Sums" in the Introduction of [6]. Furthermore, a multitude of important stochastic models described by systems of dependent random variables occurred to meet diverse applications, see, e.g., [7]. In particular, the general theory of stochastic processes and random fields arose in the last century (for introduction to random fields, see, e.g., [8]).

An intriguing problem of estimating the convergence rate to a limit law was addressed by A.C. Berry and C.-G. Esseen. Their papers initiated the study of proximity for distribution functions of the normalized partial sums of independent random variables to the distribution function of a standard Gaussian law in the framework of the classical theory of random sums.

To assess the proximity of distributions, we will employ various integral probability metrics. Usually, for random variables *Y*, *Z* and a specified class H of functions *h* : R → R, one sets

$$d\_{\mathcal{H}}(Y, Z) := \sup\_{h \in \mathcal{H}} |\mathbb{E}[h(Y)] - \mathbb{E}[h(Z)]| \in [0, \infty]. \tag{1}$$

Clearly, *d*H(*Y*, *Z*) is a functional depending on *law*(*Y*) and *law*(*Z*), i.e., distributions of *Y* and *Z*. A class H should be rich enough to guarantee that *d*<sup>H</sup> possesses the properties of a metric (or semi-metric). The general theory of probability metrics is presented, e.g., in [9,10]. In terms of such metrics, one often compares the distribution of a random variable *Y* under consideration with that of a target random variable *Z*. In Section 2, we recall the definitions of the Kolmogorov and Kantorovich (alternatively called Wasserstein) distances and Zolotarev ideal metrics corresponding to the adequate choice of H, denoted below as K, H<sup>1</sup> and H2, respectively.

It should be emphasized that for sums of random variables, deep results were established along with creation and development of different methods of analysis. One can mention the method of characteristic functions due to the works of J.Fourier, P.-S.Laplace and A.M.Lyapunov, the method of moments proposed by P.L.Chebyshev and developed by A.A.Markov, the Lindeberg method of employing auxiliary Gaussian random variables and the Bernstein techniques of large and small boxes. In 1972, C.Stein in [11] (see also [12]) introduced the new method to estimate the proximity of the distribution under consideration to a normal law. Furthermore, this powerful method was developed in the framework of classical limit theorems of the probability theory. We describe this method in Section 2. Applying the Stein method along with other tools, one can establish in certain cases the sharp estimates of closeness between a target distribution and other ones in specified metrics (see, e.g., [13,14]). We recommend the books [15,16] and the paper [17]

for basic ideas of the ingenious Stein method. The development of this techniques under mild moment restrictions for summands is treated in [18,19]. We mention in passing that there are deep generalizations of Stein techniques involving generators of certain Markov processes; a compact exposition is provided, e.g., on p. 2 of [20].

In the theory of random sums of random summands, the limit theorems with exponential law as a target distribution play a role similar to the central limit theorem for (nonrandom) sums of random variables. Here, one has to underline the principal role of the Rényi classical theorem for geometric sums published in [21]. Recall this famous result. Let *X*1, *X*2, ... be a sequence of independent identically distributed (i.i.d.) random variables such that *μ* := E[*X*1] = 0. Take a geometric random variable *Np* with parameter *p* ∈ (0, 1), defined as follows:

$$\mathbb{P}(N\_{\mathcal{V}} = k) = p(1 - p)^k, \quad k \in \mathbb{N} \cup \{0\}. \tag{2}$$

Assume that *Np* and (*Xn*)*n*∈<sup>N</sup> are independent. Set *S*<sup>0</sup> := 0, *Sn* := *X*<sup>1</sup> + ... + *Xn*, *n* ∈ N. Then,

$$\mathcal{W}\_p := \frac{\mathbb{S}\_{N\_p}}{\mathbb{E}[\mathbb{S}\_{N\_p}]} \xrightarrow{\mathcal{D}} Z \sim \mathbb{E}xp(1) \quad \text{as} \ p \to 0+,\tag{3}$$

where <sup>D</sup> → stands for convergence in distribution, and *Z* follows the exponential law *Exp*(*λ*) with parameter *λ* = 1, E[*SNp* ] = *μ*(1 − *p*)/*p*. In fact, instead of *Np*, A.Rényi considered the shifted geometric random variable *N*(*p*) such that P(*N*(*p*) = *k*) = *p*(1 − *p*)*k*<sup>−</sup>1, *k* ∈ N. Clearly, *Np* has the same law as *N*(*p*) − 1. He supposed that i.i.d. random variables *X*1, *X*2, ... are non-negative, and *N*(*p*) and (*Xn*)*n*∈<sup>N</sup> are independent. Then, *SN*(*p*)/E[*SN*(*p*)] converges in distribution to *Z* ∼ *Exp*(1) as *p* → 0+, where E[*SN*(*p*)] = *μ*/*p*. It was explained in [22] that both statements are equivalent and the assumption of nonnegativity of summands can be omitted.

Building on the previous investigations discussed below in this section, we study different instances of quantifying the approximation of random sums by limit laws and also extend the Stein method employment. The main goals of our paper are the following: (1) to find sharp estimates (i.e., optimal ones which cannot be diminished) of proximity of geometric sums of independent (in general non-identically distributed) random variables to exponential law using the probability metric *d*H<sup>2</sup> ; (2) to prove the new version of the Rényi theorem when the summands are described by a model of exchangeable random variables, establishing the due non-exponential limit law together with an optimal bound of the convergence rate applying *d*H<sup>2</sup> ; (3) to obtain the exact convergence rate of appropriately normalized random sums of random summands to the generalized gamma distribution when the number of summands follows the generalized negative binomial distribution employing *d*H<sup>2</sup> ; (4) to introduce the inverse transformation to an "equilibrium distribution transformation", give full description of its existence and demonstrate the advantage of applying the Stein method combined with that inverse transform; and (5) to use such approach in deriving the new approximation in the Kolmogorov metric *d*K of the Pareto distribution by an exponential one, which is important in signal processing.

The main idea is to apply the Stein method and deduce (Lemma 2) new estimates of the solution of Stein's equation (corresponding to an exponential law *Exp*(*λ*) as a target distribution) when a function *h* appearing in its right-hand side belongs to a class H2. This entails the established sharp estimates. The integral probability metrics and the techniques of integration with respect to sign measures are essentially employed. It should be stressed that we consider random summands which take, in general, positive and negative values and in certain cases need not have the same law.

Now, we briefly comment on the relevance of the five groups of the paper results mentioned above. Some upper bounds for convergence rates in Equation (3) were obtained previously by different tools (the renewal techniques and the memoryless property of the geometric distribution), and the estimates were not sharp. We refer to the results by A.D. Soloviev, V.V. Kalashnikov and S.Y. Vsekhsvyatskii, M. Brown, V.M. Kruglov and V.Yu. Korolev, where the authors either used the Kolmogorov distance or proved specified nonuniform estimates for differences of the corresponding distribution functions. For instance, in [23] the following estimate was proved

$$\sup\_{x \in \mathbb{R}} |\mathbb{P}(W\_p \le x) - \mathbb{P}(Z \le x)| \le p \frac{\mathbb{E}[X\_1^2]}{\mu^2} \max\left\{1, \frac{1}{2(1-p)}\right\},$$

where *Z* ∼ *Exp*(1). Moreover, this estimate is asymptotically exact when *p* → 0+. Some improvements are in [24] under certain (hazard rate) assumptions. E.V. Sugakova obtained a version of the Rényi theorem for independent, in general, not identically distributed random variables. We also mention contributions by V.V. Kalashnikov, E.F. Peköz, A. Röllin, N. Ross and T.L. Hung which gave the estimates in terms of the Zolotarev ideal metrics. We do not reproduce all these results here since they can be viewed on pages 3 and 4 of [22] with references where they were published.

In Corollary 3.6 of [25] for nondegenerate i.i.d. positive random variables *X*1, *X*2, ... with mean *μ* and finite second moment, it was proved that

$$
\zeta\_2(p S(p) , Z(1/\mu)) \le p (\mathbb{E}[X\_1^2] + 2\mu^2) ,
$$

where *<sup>S</sup>*(*p*) := <sup>∑</sup>*N*(*p*) *<sup>j</sup>*=<sup>1</sup> *Xj*, *ζ*<sup>2</sup> is the Zolotarev ideal metric of order two, *Z*(*λ*) ∼ *Exp*(*λ*), *λ* > 0. In [22], the estimates for proximity of geometric sums distributions to *Z* ∼ *Exp*(1) were provided in the Kantorovich and *ζ*<sup>2</sup> metrics. A substantial contribution of the authors of [22] is the study of random summands *X*1, *X*2, ... that need not be positive (see also [26]). The general estimate for deviation of *Wp* from *Z* ∼ *Exp*(1) in the ideal metric of order *s* was proved in [27]. We do not assume that *Wp* is constructed by means of i.i.d. random variables and, moreover, demonstrate that our estimate (for summands taking real values) involving the metric *d*H<sup>2</sup> is sharp.

The exchangeable random variables form an important class having various applications in statistics and combinatorics, see, e.g., [28]. As far as we know, the model of exchangeable random variables is studied in the context of random sums for the first time here. It is interesting that instead of the exponential limit law we indicate explicit expression of the new limit law. In addition, we establish the sharp estimate of proximity of random sums distributions to this law using *d*H<sup>2</sup> .

A natural generalization of the Rényi theorem is to study the summation index following non-geometrical distribution. In this way, the upper bound of the convergence rate of random sums of random summands to generalized gamma distribution was proved in [29]. Theorem 3.1 in [30] contains the estimates in the Kolmogorov and Kantorovich distances for approximations of non-negative random variable law by specified (nongeneralized) gamma distribution. The proof relies on Stein's identity for gamma distribution established in H.M.Luk's PhD thesis (see the reference in [30]). New estimates of the solutions of the gamma Stein equation are given in [31]. We derive the sharp estimate for approximation of random sums by generalized gamma law using the Zolotarev metric of order two. In a quite recent paper [32] the author established deep results concerning further generalizations of the Rényi theorem. Namely, Theorem 1 of [32] demonstrates how one can provide the upper bounds of the convergence rate of specified random sums to a more general law than an exponential one using the estimates in the Rényi theorem. This approach is appealing since the author employs the ideal metric of order *s* > 0. However, the sharpness of these estimates was not examined.

Note that in [33] the important "equilibrium transformation of distributions" was proposed and employed along with the Stein techniques. We will consider this transformation *X<sup>e</sup>* for a random variable *X* in Section 7 and also tackle other useful transformations. In the present paper, the inverse to the "equilibrium distribution transformation" is introduced. We completely describe the possibility to construct such transformation and provide an explicit formula for the corresponding density. The idea to apply such inverse transformation whenever it exists is based on the result [33] demonstrating that one can obtain a more

precise estimate for proximity in the Kantorovich metric between *X<sup>e</sup>* and *Z* than between *X* and *Z*, where *Z* ∼ *Exp*(1) and E[*X*] = 1, E[*X*2] < ∞. We extend this result. Moreover, we prove that in this way one can obtain a new estimate of approximation of the Pareto distribution by an exponential one. It is shown that our new estimate is advantageous for a wide range of parameters of the Pareto distribution. Let *X<sup>e</sup>* ∼ *Pareto*(*α*, *β*), i.e., the distribution function of *X<sup>e</sup>* is

$$F^\mathfrak{e}(\mathfrak{x}) = 1 - \left(\frac{\mathfrak{z}}{\mathfrak{x} + \mathfrak{z}}\right)^\mathfrak{a}, \quad \mathfrak{x} \ge 0, \ \mathfrak{a} > 0, \ \mathfrak{z} > 0.$$

We show that the preimage *X* ∼ *Pareto*(*α* + 1, *β*). Thus, for any *α* > 2, *β* > 0, one has *d*K(*X<sup>e</sup>* , *Z*) ≤ 1/(*α* − 1), where *Z* ∼ *Exp*(*α*/*β*) and *d*<sup>K</sup> stands for the Kolmogorov distance. This bound is more precise than the previous ones applied in signal processing, see, e.g., [34].

This paper is organized as follows. After the Introduction, the auxiliary results are provided in Section 2. Here we include the material important for understanding the main results. We recall the concept of probability metrics, consider the Kolmogorov and the Kantorovich distances and examine the Zolotarev ideal metrics. We describe the basic ideas of Stein's method, especially for the exponential target distribution. In this section, we formulate a simple but useful Lemma 1 concerning the essential supremum of the Lipschitz function, an important Lemma 2 giving the solution of the Stein equation for different functional classes. We explain the essential role of the generalized equilibrium transformation proposed in [22] which permits study of the summands taking both positive and negative values. We formulate Lemma 3 to be able to solve an integral equation involving the generalized equilibrium transformation when E[*X*] = 0 and E[*X*2] < ∞. The proofs of auxiliary lemmas are placed in Appendix A. Section 3 is devoted to an approximation of the normalized geometric sums *Wp* by an exponential law. Here, the sharp convergence rate is found (see Theorem 1) by means of the probability metric *d*H<sup>2</sup> . The proof is based on the Lebesgue–Stieltjes integration techniques, the formula of integration by parts for functions of bounded variations, Lemma 2, various limit theorems for integrals and the important result of [22] concerning the estimates involving the Kantorovich distance. In Section 4, for the first time an analog of the Rényi theorem is proved for a model of exchangeable random variables proposed in [35]. We demonstrate (Theorem 2) that, in contrast to Rényi's theorem, the limit distribution for random sums under consideration is a specified mixture of two explicitly indicated laws. Moreover, the sharp convergence rate to this limit law is obtained (Theorem 3) by means of *d*H<sup>2</sup> . In Section 5, the distance between the generalized gamma law and the suitably normalized sum of independent random variables is estimated when the number of summands has the generalized negative binomial distribution. Theorem 4 demonstrates that this estimate is sharp. For the proof, we employ various truncation techniques, the transformations of parameters of initial random variables, the monotone convergence theorem and explicit formula for the generalized gamma distribution moments of order *δ* > 0, obtained in [27]. Section 6 provides the pioneering study of the same problem in the framework of exchangeable random variables and also gives the sharp estimate for the *d*H<sup>2</sup> metric (Theorem 5). In Section 7, we introduce the inverse to the equilibrium transformation of the probability measures. Lemma 6 contains a full description of situations when a unique preimage *X* of a random variable *X<sup>e</sup>* exists and gives an explicit formula for distribution of *X*. This approach permits us to obtain the new estimates of closeness of probability measures in the Kolmogorov and Kantorovich metrics (Theorem 6). In particular, due to Theorem 6 and Lemmas 2, 6, it becomes possible to find a useful estimate of proximity of the Pareto law to the exponential one (Example 2). Section 8 containing the conclusions and indications for further research work is followed by Appendix A and the list of references.

#### **2. Auxiliary Results**

Let K := {*h* : *hz*(*x*) = I{*x* ≤ *z*}, *x*, *z* ∈ R}, where I{*A*} := 1 if *A* holds and zero otherwise. The choice H = K in Equation (1) corresponds to the Kolmogorov distance. Note that *h* above is a function in *x*, whereas *z* is the index parameterizing the class. A function *h* : R → R is called the Lipschitz one if

$$\operatorname{Lip}(h) := \sup\_{\mathbf{x}, \boldsymbol{\mu} \in \mathbb{R}; \, \mathbf{x} \neq \boldsymbol{\mu}} \frac{|h(\boldsymbol{\mu}) - h(\boldsymbol{\mu})|}{|\boldsymbol{\chi} - \boldsymbol{\mu}|} < \infty. \tag{4}$$

Then,

$$|h(\mathbf{x}) - h(\mathbf{u})| \le C|\mathbf{x} - \mathbf{u}|, \text{ } \mathbf{x}, \mathbf{u} \in \mathbb{R}, \tag{5}$$

and in light of Equation (4), *Lip*(*h*) is the smallest possible constant *C* appearing in Equation (5). We write Lip(*C*), where *C* ∈ [0, ∞) for a collection of the Lipschitz functions having *Lip*(*h*) ≤ *C*. For *s* > 0 set *m* = *m*(*s*) := '*s* − 1( ∈ N ∪ {0} (where, for *a* ∈ R, '*a*( stands for the minimal integer number which is equal or greater than *a*). Introduce a class of functions

$$\mathcal{H}\_s := \{ h : \mathbb{R} \to \mathbb{R}, \ |h^{(m)}(\mathbf{x}) - h^{(m)}(\mathbf{u})| \le |\mathbf{x} - \mathbf{u}|^{s-m}, \ \mathbf{x}, \mathbf{u} \in \mathbb{R} \}, \ s > 0.$$

As usual, *<sup>h</sup>*(0)(*x*) = *<sup>h</sup>*(*x*), *<sup>x</sup>* ∈ R. We write *<sup>d</sup>*H*<sup>s</sup>* for a metric defined according to Equation (1) with H = H*s*. V.M. Zolotarev and many other researchers defined an ideal metric *ζ<sup>s</sup>* of order *s* > 0 involving only bounded functions from H*s*. We will use collections H<sup>1</sup> and H<sup>2</sup> without assumption that functions *h* are bounded on R. This is the reason why we write *d*H*<sup>s</sup>* instead of *ζs*. Thus, we employ

$$\mathcal{H}\_1 := \{ \mathsf{Lip}(1) \}, \quad \mathcal{H}\_2 := \{ h : h' \in \mathsf{Lip}(1) \}.$$

Note that in definitions of H<sup>2</sup> we deal with *h* ∈ *C*(1), where the space *C*(1)(R) consists of functions *h* : R → R such that *h* (*x*) exists for all *x* ∈ R, and *h* is continuous on R (evidently the Lipschitz function is continuous). One calls *d*H<sup>1</sup> the Kantorovich metric (the term Wasserstein metric appears in the literature as well). One also uses the bounded Kantorovich metric when the class H<sup>1</sup> contains all the bounded functions from Lip(1). The metric *ζ<sup>s</sup>* was introduced in [36] and called an ideal metric in light of its important properties. The properties of *ζ<sup>s</sup>* metrics, where *s* > 0, are collected in Sec. 2 of [32]. We mention in passing that various functionals are ubiquitous in assessing the proximity of distributions. In this regard, we refer, e.g., to [37,38].

To apply the Stein method, we begin with fixing the target random variable *Z* (or its distribution) and describe a class H to estimate *d*H(*Y*, *Z*) for a random variable *Y* under consideration. Then, the problem is to indicate an operator *T* (with specified domain of definition) so that the Stein equation

$$Tf(\mathbf{x}) = h(\mathbf{x}) - \mathbb{E}[h(Z)] \tag{6}$$

has a solution *fh*(*x*), *x* ∈ R, for each function *h* ∈ H. After that, one can substitute *Y* instead of *x* in Equation (6) and take the expectation of both sides, assuming that all these expectations are finite. As a result, one comes to the relation

$$\mathbb{E}[Tf\_h(\boldsymbol{Y})] = \mathbb{E}[h(\boldsymbol{Y})] - \mathbb{E}[h(\boldsymbol{Z})].\tag{7}$$

It is not a priori clear why the estimation of the left-hand side of Equation (7) is more adequate than the estimation of |E[*h*(*Y*)] − E[*h*(*Z*)]| for *h* ∈ H. However, in many situations, justifying the method this occurs. The choice of *T* depends on the distribution of *Z*. Note that in certain cases (e.g., when *Z* follows the Poisson law) one considers functions *f* defined on a subset of R. We emphasize that the construction of operator *T* is a nontrivial problem, see, e.g., [33,39–41].

The basic idea in this way is the following. For many probability distributions (Gaussian, Laplace, Exponential, etc.), one can find an operator *T* characterizing the law of a target variable *Z*. In other words, for a rather large class of functions *f* , E[*Tf*(*Y*)] = 0 if and only if *law*(*Y*) = *law*(*Z*) (i.e., the laws of *Y* and *Z* coincide). Thus, if |E[*Tfh*(*Y*)]| is small enough for a suitable class of functions *h*, this leads to the assertion that the law of *Y* is close (in a sense) to the law of *Z*. One has to verify that this kind of "continuity" takes place. Clearly, if for any *h* ∈ H, where H defines the integral probability metric in Equation (1), one can find a solution *fh* of Equation (6), then the relation E[*Tfh*(*Y*)] = 0 for all *fh*, *h* ∈ H, yields *d*H(*Y*, *Z*) = 0 and, consequently, *law*(*Y*) = *law*(*Z*).

Further, we assume that *Z* ∼ *Exp*(*λ*), i.e., *Z* has exponential distribution with parameter *λ* > 0. In this case (see, e.g., Sec. 5 in [17]), one uses the operator

$$Tf(\mathbf{x}) := f'(\mathbf{x}) - \lambda f(\mathbf{x}) + \lambda f(\mathbf{0}), \ \mathbf{x} \in \mathbb{R}, \ \lambda > 0,\tag{8}$$

and writes the Stein Equation (6) as follows

$$f'(\mathbf{x}) - \lambda f(\mathbf{x}) + \lambda f(0) = h(\mathbf{x}) - \mathbb{E}[h(Z)], \ \mathbf{x} \in \mathbb{R}.\tag{9}$$

It should be stipulated that E[*h*(*Z*)] ∈ R for a test function *h* ∈ H, and there exists a differentiable solution *f* of Equation (9). Therefore, if one can find such solution *f* , then

$$\mathbb{E}[f'(Y)] - \lambda \mathbb{E}[f(Y)] + \lambda f(0) = \mathbb{E}[h(Y)] - \mathbb{E}[h(Z)] \tag{10}$$

under the hypothesis that all these expectations are finite. If *f* : R → R is absolutely continuous, then (see, e.g., Theorem 13.18 of [42]) for almost all *x* ∈ R with respect to the Lebesgue measure, there exists *f* (*x*). Moreover, one can find an integrable (on each interval) function *g* : R → R, *x* ∈ R, to guarantee, for each *x*, *u* ∈ R, that

$$f(\mathbf{x}) = f(u) + \int\_{u}^{\mathbf{x}} g(v) dv,\tag{11}$$

where *g*(*v*) = *f* (*v*) for almost all *v* ∈ R. Thus, (*Tf*)(*x*) is defined for such *f* according to Equation (8) for almost all *x* ∈ R. In general, for an arbitrary random variable *Y*, one cannot write E[(*Tf*)(*Y*)] since the value of expectation depends on the choice of a version of (*Tf*)(*x*), *x* ∈ R. Really, let *B* ∈ B(R) be such that *m*(*B*) = 0, where *m* stands for the Lebesgue measure. Assume that *Y* takes values in *B*. Then, it is clear that E[(*Tf*)(*Y*)] depends on the choice of a function (*Tf*)(*x*) version defined on R. However, if the distribution P*<sup>Y</sup>* of a random variable *Y* has a density with respect to *m*, then E[(*Tf*)(*Y*)] will be the same for any version of *Tf* (with respect to the Lebesgue measure). In certain cases, the Stein operator is applied to smoothed functions (see, e.g., [33,43]). Otherwise, Equation (6) does not hold at each point of R (see, e.g., Lemma 2.2 in [16]), and complementary efforts are needed. For our study, it is convenient to employ in Equation (8) for *T* in the capacity of *f* (*x*), *x* ∈ R, the right derivative. In many cases, for a real-valued function *f* defined on a fixed set *<sup>D</sup>* ⊂ R one considers sup*x*∈*<sup>D</sup>* | *<sup>f</sup>*(*x*)| as "essential supremum". Recall that a function ˜ *f* is a version of *f* (and vice versa) if the measure (here the Lebesgue measure) of points *x* such that ˜ *f*(*x*) = *f*(*x*) is zero. The notation *f* <sup>∞</sup> means that one takes inf ˜ *<sup>f</sup>* sup*x*∈*<sup>D</sup>* <sup>|</sup> ˜ *f*(*x*)|, where ˜ *f* belongs to the class of all versions of *f* . Clearly, *f* <sup>∞</sup> will be the same if we change *f* on a subset of *D* having a measure which is equal to zero. Thus, we write *f* <sup>∞</sup> instead of *g*<sup>∞</sup> appearing in Equation (11). The following simple observation is useful. Its proof is provided in Appendix A.

**Lemma 1.** *A function h is the Lipschitz function on* R *with Lip*(*h*) = *C* < ∞ *if and only if h is absolutely continuous and* (*its essential supremum*) *h* <sup>∞</sup> = *C* < ∞*.*

**Remark 1.** *Note that* 0 ≤ *h*(*x*) ≤ 1*, x* ∈ R*, for any h* ∈ K*. If, for some positive constant C, h* ∈ Lip(*C*)*, then Equation* (5) *yields that* |*h*(*x*)| ≤ *C*|*x*| + |*h*(0)|*. If h is a Lipschitz function (with Lip*(*h* ) = *C), then h* (*x*) *exists for almost all x* ∈ R *and an application of Lemma 1 gives*

$$|h'(x) - h'(0)| = \left| \int\_0^x h''(u) du \right| \le C|x|, \ x \in \mathbb{R}.$$

*Consequently,* |*h* (*x*)| ≤ *A*|*x*| + *B for some positive A, B (one can take A* = *C, B* = |*h* (0)|*) and any x* ∈ R*. As h* (*x*) *is continuous on each interval, it follows that* |*h*(*x*)| ≤ *ax*<sup>2</sup> + *b*|*x*| + *c for some positive a*, *b*, *c and all x* ∈ R *(a* = *C*/2*, b* = |*h* (0)|*, c* = |*h*(0)|*). Therefore,* |*h*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*<sup>0</sup> *for some positive A*0, *B*<sup>0</sup> *and each x* ∈ R*.*

**Lemma 2.** *For any λ* > 0 *and each h* ∈K∪H<sup>1</sup> ∪ H2*, the equation*

$$f'(\mathbf{x}) - \lambda f(\mathbf{x}) = h(\mathbf{x}), \ \mathbf{x} \in \mathbb{R}, \tag{12}$$

*has a solution*

$$f\_h(\mathbf{x}) = -e^{\lambda x} \int\_{\mathcal{X}}^{\infty} h(u)e^{-\lambda u} du, \ \mathbf{x} \in \mathbb{R}, \tag{13}$$

*where fh*(0) = −E[*h*(*Z*)]/*λ. If h* ∈ K*, then for all x* ∈ R *there exists f <sup>h</sup>*(*x*) *and <sup>f</sup> <sup>h</sup>*<sup>∞</sup> ≤ 1*. If h* ∈ H<sup>1</sup> ∪ H2*, then f <sup>h</sup> is defined on* <sup>R</sup> *and <sup>f</sup> <sup>h</sup>*<sup>∞</sup> ≤ *<sup>h</sup>* ∞/*λ. For h* ∈ H2*, a function f <sup>h</sup> is defined on* R *and f <sup>h</sup>* <sup>∞</sup> <sup>≤</sup> min{2*<sup>h</sup>* ∞, *h* ∞/*λ*}*.*

The right-hand side of Equation (13) is well defined for each *x* ∈ R in light of Remark 1. Lemma 4.1 of [33] contains for *λ* = 1 some statements of Lemma 1. We will use the above estimates for any *λ* > 0. Estimates for *h* ∈ H<sup>2</sup> were not considered in [33]. The proof of Lemma 2 is given in Appendix A.

The following concept was introduced in [33].

**Definition 1** ([33])**.** *Let X be a non-negative random variable with finite* E[*X*] > 0*. One says that a random variable X<sup>e</sup> has distribution of equilibrium with respect to X if for any Lipschitz function f* : R → R*,*

$$\mathbb{E}[f(X)] - f(0) = \mathbb{E}[X]\mathbb{E}[f'(X^\varepsilon)].\tag{14}$$

Note that Definition 1 deals separately with distributions of *X* and *X<sup>e</sup>* . One says that *X<sup>e</sup>* is the result of the equilibrium transformation applied to *X*. The same terminology is used for transition from *law*(*X*) to *law*(*Xe*). For the sake of completeness, we explain in Appendix A (Comments to Definition 1) why one can take the law of *X<sup>e</sup>* having a density with respect to the Lebesgue measure

$$p^{\varepsilon}(\mathbf{x}) = \begin{cases} \frac{1}{\mathbb{E}[X]} \mathbb{P}(X > \mathbf{x}), & \mathbf{x} \ge \mathbf{0}, \\ 0, & \mathbf{x} < \mathbf{0}, \end{cases} \tag{15}$$

to guarantee the validity of Equation (14).

**Remark 2.** *For a non-negative random variable X with finite* E[*X*] > 0*, one can construct a random variable X<sup>e</sup> having a density* (15)*. Accordingly, we then have a random vector* (*X*, *Xe*) *with specified marginal distributions. However, the joint law of X and X<sup>e</sup> is not fixed and can be chosen in appropriate way. If X*1, *X*2, ... *is a sequence of independent random variables, we will assume that a sequence* (*Xn*, *X<sup>e</sup> <sup>n</sup>*)*n*∈<sup>N</sup> *consists of independent vectors, and these vectors are independent with all considered random variables which are independent with* (*Xn*)*n*∈N*.*

In the recent paper [22], a generalization of the equilibrium transformation of distributions was proposed without assuming that random variable *X* is non-negative.

**Definition 2** ([22])**.** *Let X be a random variable having a distribution function F*(*x*) := P(*X* ≤ *x*)*, x* ∈ R*. Assume the existence of finite* E[*X*] = 0*. An equilibrium distribution function corresponding to X (or F*(*x*)*) is introduced by way of*

$$F^\epsilon(\mathbf{x}) := \begin{cases} -\frac{1}{\mathbb{E}[\mathbf{X}]} \int\_{-\infty}^{\mathbf{x}} F(u) du, & \mathbf{x} \le \mathbf{0},\\ -\frac{\mathbb{E}[\mathbf{X}^-]}{\mathbb{E}[\mathbf{X}]} + \frac{1}{\mathbb{E}[\mathbf{X}]} \int\_0^\mathbf{x} (1 - F(u)) du, & \mathbf{x} > \mathbf{0}, \end{cases} \tag{16}$$

*where X*<sup>−</sup> := *X*I{*X* < 0}*. This function can be written as Fe*(*x*)= *<sup>x</sup>* <sup>−</sup><sup>∞</sup> *<sup>p</sup>e*(*u*)*du, where*

$$p^{\varepsilon}(\mathbf{x}) = \begin{cases} -\frac{1}{\mathbb{E}[X]} F(\mathbf{x}), & \mathbf{x} \le \mathbf{0}, \\ \frac{1}{\mathbb{E}[X]} (1 - F(\mathbf{x})), & \mathbf{x} > \mathbf{0}, \end{cases} \tag{17}$$

*thus, p<sup>e</sup> is a density (with respect to the Lebesgue measure) of a signed measure Q<sup>e</sup> corresponding to Fe . In other words, Equation* (17) *demonstrates the Jordan decomposition (see, e.g., Sec. 29 of [44]) of Q<sup>e</sup> .*

Clearly, for a non-negative random variable, the functions defined in Equation (15) and Equation (16) coincide. For a nonpositive random variable, the function *F<sup>e</sup>* appearing in Equation (16) is a distribution function of a probability measure. In general, when *X* can take positive and negative values, the function introduced in Equation (16) is not a distribution function. We will call *F<sup>e</sup>* the generalized equilibrium distribution function. Note that |*pe*(*x*)| ≤ <sup>1</sup> |E[*X*]| . Thus, *F<sup>e</sup>* is the Lipschitz function and consequently continuous (*Fe*(*x*) is well defined for each *x* ∈ R since E[*X*] is finite and nonzero). Moreover, *F<sup>e</sup>* is absolutely continuous being the Lipschitz function. Each absolutely continuous function has bounded variation. If *G* is a function of bounded variation, then *G* = *G*<sup>1</sup> − *G*2, where *G*<sup>1</sup> and *G*<sup>2</sup> are nondecreasing functions (see, e.g., [42], Theorem 12.18). One can employ the canonical choice *G*1(*x*) := *Var<sup>x</sup>* <sup>0</sup> (*G*), where *Var<sup>b</sup> <sup>a</sup>* (*G*) means the variation of *G* on [*a*, *b*], −∞ < *a* ≤ *b* < ∞ (if *a* > *b* then *Var<sup>b</sup> <sup>a</sup>* (*G*) := −*Var<sup>a</sup> <sup>b</sup>*(*G*)). If *<sup>G</sup>* is right-continuous (on <sup>R</sup>), then evidently *G*<sup>1</sup> and *G*<sup>2</sup> are also right-continuous. Thus, for a right-continuous *G* having bounded variation, a nondecreasing function *Gi* in its representation corresponds to a *σ*-finite measure *Qi* on B(R), *i* = 1, 2. More precisely, there exists a unique *σ*-finite measure *Qi* on B(R) such that, for each finite interval (*a*, *b*], *Qi*((*a*, *b*]) = *Gi*(*b*) − *Gi*(*a*), *i* = 1, 2. Recall that one writes for the Lebesgue–Stieltjes integral with respect to a function *G*

$$\int\_{\mathbb{R}} f(u) dG(u) := \int\_{\mathbb{R}} f(u) dG\_1(u) - \int\_{\mathbb{R}} f(u) dG\_2(u),\tag{18}$$

whenever the integrals in the right-hand side exist (with values in [−∞, ∞]), and the cases ∞ − ∞ or −∞ + ∞ are excluded. The integral <sup>R</sup> *f*(*u*)*dGi*(*u*) means the integration with respect to measure *Qi*, *i* = 1, 2. The signed measure *Q* corresponding to *G* is *Q*<sup>1</sup> − *Q*2. Thus, <sup>R</sup> *f*(*u*)*dG*(*u*) means the integration with respect to signed measure *Q*. Note that if *G* = *U*<sup>1</sup> − *U*<sup>2</sup> where *Ui* is right-continuous and nondecreasing (*i* = 1, 2), then

$$\int\_{\mathbb{R}} f(u) d\mathbb{G}\_1(u) - \int\_{\mathbb{R}} f(u) d\mathbb{G}\_2(u) = \int\_{\mathbb{R}} f(u) d\mathbb{U}\_1(u) - \int\_{\mathbb{R}} f(u) d\mathbb{U}\_2(u). \tag{19}$$

The left-hand side and the right-hand side of Equation (19) make sense simultaneously, and if so, are equal to each other. Indeed, for any finite interval (*a*, *b*] (*a* ≤ *b*), one has *G*1(*b*) − *G*1(*a*) − (*G*2(*b*) − *G*2(*a*)) = *U*1(*b*) − *U*1(*a*) − (*U*2(*b*) − *U*2(*a*)). Thus, the signed measures corresponding to *G*<sup>1</sup> − *G*<sup>2</sup> and *U*<sup>1</sup> − *U*<sup>2</sup> coincide on B(R). We mention in passing that one can also employ the Jordan decomposition of a signed measure.

For *F<sup>e</sup>* introduced in Equation (16), the analog of Equation (15) has the form

$$\mathbb{E}[f(X)] - f(0) = \mathbb{E}[X] \int\_{\mathbb{R}} f'(\mathfrak{x}) dF^{\mathfrak{c}}(\mathfrak{x}).\tag{20}$$

Taking into account Equation (17), one can rewrite Equation (20) equivalently as follows

$$\mathbb{E}[f(X)] - f(0) = \int\_{(-\infty,0]} f'(\mathbf{x}) (-F(\mathbf{x})) d\mathbf{x} + \int\_{(0,\infty)} f'(\mathbf{x}) (1 - F(\mathbf{x})) d\mathbf{x}.\tag{21}$$

The right-hand side of the latter relation does not depend on the choice of a version of *f* . Due to Theorem 1(d) of [22], Equation (20) is valid for any Lipschitz function *f* . Evidently, an arbitrary function *f* ∈ H<sup>2</sup> need not be the Lipschitz one and vice versa.

**Lemma 3.** *Let X be a random variable such that* E[*X*2] < ∞ *and* E[*X*] = 0*. Then, Equation* (20) *is satisfied for all f* ∈ H2*.*

The proof is provided in Appendix A.

#### **3. Limit Theorem for Geometric Sums of Independent Random Variables**

Consider *Np* ∼ *Geom*(*p*), see Equation (2). In other words, *Np* has a geometric distribution with parameter *p*. Let *X*1, *X*2, ... be a sequence of independent random variables such that E[*Xk*] = *μ*, where *μ* ∈ R, *μ* = 0, *k* ∈ N. Assume that *Np* and (*Xn*)*n*∈<sup>N</sup> are independent. Consider a normalized geometric sum

$$\mathcal{W}\_p := \frac{p}{\mu(1-p)} \sum\_{k=1}^{N\_p} \mathcal{X}\_{k\prime} \tag{22}$$

introduced in Equation (3). Since *Np* can take zero value, set, as usual, ∑<sup>0</sup> *<sup>k</sup>*=<sup>1</sup> *Xk* := 0. One can see that *Wp* can be viewed as a random sum *Sp* :<sup>=</sup> <sup>∑</sup>*Np <sup>k</sup>*=<sup>1</sup> *Xk* normalized by <sup>E</sup>[*X*]E[*Np*].

**Lemma 4.** *Let X*1, *X*2, ... *and Np, where p* ∈ (0, 1)*, be random variables described above in this Section. Then, the following relations hold:*

$$\mathbb{E}[\mathcal{W}\_{\mathcal{V}}] = 1, \quad \mathbb{E}|\mathcal{W}\_{\mathcal{V}}| \le \frac{\sup\_{k \in \mathbb{N}} \mathbb{E}|X\_k|}{|\mu|},$$

$$\mathbb{E}[\mathcal{W}\_{\mathcal{V}}^2] = \frac{p}{\mu^2(1-p)} \mathbb{E}[X\_{N\_{\mathcal{V}}+1}^2] + 2. \tag{23}$$

**Proof.** Recall that

$$\mathbb{E}[N\_p] = \sum\_{k=1}^{\infty} kp(1-p)^{k-1} = \frac{1-p}{p} \,\_{\prime} \tag{24}$$

$$\mathbb{E}[N\_p^2] = \sum\_{k=1}^{\infty} k^2 p (1-p)^{k-1} = \frac{(1-p)(2-p)}{p^2}.\tag{25}$$

Thus, one has

$$\mathbb{E}[\mathcal{W}\_p] = \frac{p}{\mu(1-p)} \sum\_{k=1}^{\infty} k \mu \mathbb{P}(N\_p = k) = \frac{p}{1-p} \mathbb{E}[N\_p] = 1.1$$

Clearly, E|*Xk*| < ∞ since E[*Xk*] is finite (*k* ∈ N). Therefore

$$\mathbb{E}|\mathcal{W}\_{\mathbb{P}}| \le \frac{p}{|\mu|(1-p)} \sum\_{k=1}^{\infty} k \mathbb{E}|X\_k|\mathbb{P}(N\_{\mathbb{P}}=k) \le \frac{\sup\_{k \in \mathbb{N}} \mathbb{E}|X\_k|}{|\mu|}.$$

Set *ν<sup>k</sup>* := E[*X*<sup>2</sup> *<sup>k</sup>* ], *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. One has

$$\mathbb{E}[S\_p^2] = \sum\_{k=1}^{\infty} \mathbb{P}(N\_p = k) \mathbb{E}\left(\sum\_{i=1}^k X\_i\right)^2 = \sum\_{k=1}^{\infty} p(1-p)^k \left(\sum\_{i=1}^k \upsilon\_i + k(k-1)\mu^2\right). \tag{26}$$

According to Equations (24) and (25) one derives the formula

$$\sum\_{k=1}^{\infty} p(1-p)^k \left( k(k-1)\mu^2 \right) = \mu^2 \left( \frac{(1-p)(2-p)}{p^2} - \frac{1-p}{p} \right) = 2 \left( \frac{\mu(1-p)}{p} \right)^2. \tag{27}$$

Convergence of the series ∑<sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> *<sup>p</sup>*(<sup>1</sup> − *<sup>p</sup>*)*<sup>k</sup>* <sup>∑</sup>*<sup>k</sup> <sup>i</sup>*=<sup>1</sup> *ν<sup>i</sup>* having non-negative terms holds simultaneously with the validity of inequality E[*W*<sup>2</sup> *<sup>p</sup>* ] < ∞. Changing the order of summation, we obtain <sup>∞</sup>

$$\sum\_{k=1}^{\infty} p(1-p)^k \sum\_{i=1}^k \nu\_i = \sum\_{i=1}^{\infty} (1-p)^i \nu\_i = \left(\frac{1-p}{p}\right) \mathbb{E}[X\_{N\_p+1}^2].$$

The latter formula and Equations (26), (27) yield

$$\mathbb{E}[\mathcal{W}\_p^2] = \left(\frac{p}{\mu(1-p)}\right)^2 \mathbb{E}[\mathcal{S}\_p^2] = \left(\frac{p}{\mu(1-p)}\right)^2 \left(\left(\frac{1-p}{p}\right) \mathbb{E}[X\_{N\_p+1}^2] + 2\left(\frac{\mu(1-p)}{p}\right)^2\right).$$

$$= \frac{p}{\mu^2(1-p)} \mathbb{E}[X\_{N\_p+1}^2] + 2.$$

Equation (23) is established.

The proof of Theorem 3.1 in [45] shows for non-negative i.i.d. random variables *X*1, *X*2, ... (when *μ* = 1, see Formula (3.15) in [45]) that the equilibrium transformation of *Wp* distribution has the following form:

$$\mathcal{W}\_p^\varepsilon = \frac{p}{\mu(1-p)} \left( \sum\_{k=1}^{N\_p} X\_k + X\_{N\_p+1}^\varepsilon \right) = \mathcal{W}\_p + \frac{p}{\mu(1-p)} X\_{N\_p+1}^\varepsilon \tag{28}$$

where *X<sup>e</sup> Np*+<sup>1</sup> means that we construct *<sup>X</sup><sup>e</sup>* <sup>1</sup>, *<sup>X</sup><sup>e</sup>* <sup>2</sup>, ... and then take a random index *Np* + 1. In other words,

$$X\_{N\_p+1}^{\epsilon} = \sum\_{n=0}^{\infty} X\_{n+1}^{\epsilon} \mathbb{I} \{ N\_p = n \} .$$

It was explained in Section 2 that a generalized equilibrium distribution function *F<sup>e</sup> Wp* (*x*) (see Definition 2) need not be a distribution function when the summands *X*1, *X*2, ... can take values of different signs. However, employing this function, one can establish the following result.

**Theorem 1.** *Let X*1, *X*2, ... *be a sequence of independent random variables having finite* E[*Xk*] = *μ, where μ* = 0*, k* ∈ N*. Assume that Np and* (*Xn*)*n*∈<sup>N</sup> *are independent, where Np* ∼ *Geom*(*p*)*,* 0< *p*<1*. If Z* ∼ *Exp*(1)*, then*

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{p\prime}Z) = \frac{\mathbb{E}[X\_{N\_p+1}^2]}{2\mu^2} \left(\frac{p}{1-p}\right) \tag{29}$$

*where Wp was introduced in Equation* (22)*.*

**Proof.** If E[*W*<sup>2</sup> *<sup>p</sup>* ] = <sup>∞</sup>, then *<sup>d</sup>*H<sup>2</sup> (*Wp*, *<sup>Z</sup>*) = <sup>∞</sup> since, for a function *<sup>h</sup>*(*x*) = *<sup>x</sup>*2/2, *<sup>x</sup>* ∈ R, belonging to H2, one has E[*h*(*Wp*)] = ∞, whereas E[*h*(*Z*)] < ∞. According to Equation (23), E[*W*<sup>2</sup> *<sup>p</sup>* ] and E[*X*<sup>2</sup> *Np*+1] are both finite or infinite simultaneously. Consequently, Equation (29) is true when E[*W*<sup>2</sup> *<sup>p</sup>* ] = ∞.

Let us turn to the case E[*W*<sup>2</sup> *<sup>p</sup>* ] < ∞. At first, we obtain an upper bound for *d*H<sup>2</sup> (*Wp*, *Z*). Take *h* ∈ H2. Applying Lemmas 1 and 2 and Remark 1, one can write due to Stein's Equation (10) that

$$|\mathbb{E}[h(\mathcal{W}\_p)] - \mathbb{E}[h(Z)]| = |\mathbb{E}[f\_h'(\mathcal{W}\_p)] - \mathbb{E}[f\_h(\mathcal{W}\_p)] + f(0)|.\tag{30}$$

Using the generalized equilibrium distribution transformation (20) one obtains:

$$\left| \mathbb{E} \left[ f\_h'(\mathcal{W}\_p) \right] - \mathbb{E} [f\_h(\mathcal{W}\_p)] + f(0) \right| = \left| \int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, d\mathcal{F}\_{\mathcal{W}\_p}(\mathbf{x}) - \int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, d\mathcal{F}\_{\mathcal{W}\_p}^{\varepsilon}(\mathbf{x}) \right|. \tag{31}$$

Due to Lemma 3 this is true, for *h* ∈ H2, because *fh* ∈ H<sup>2</sup> according to Lemma 2 (with *λ* = 1). Next, we employ the relation

$$\int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, dF\_{\mathbb{W}\_p}(\mathbf{x}) - \int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, dF\_{\mathbb{W}\_p}^{\varepsilon}(\mathbf{x}) = \int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, d(F\_{\mathbb{W}\_p} - F\_{\mathbb{W}\_p}^{\varepsilon})(\mathbf{x}).\tag{32}$$

Evidently, one can write <sup>R</sup> | *f <sup>h</sup>*(*x*)| *dFWp* (*x*) < <sup>∞</sup>. The notation *dF<sup>e</sup> Wp* (*x*) in the integral refers to the Lebesgue–Stieltjes integral with respect to a function *F<sup>e</sup> Wp* (*x*) of bounded variation. In fact, the integral with integrator *dF<sup>e</sup> Wp* (*x*) means that integration employs a signed measure *Q*<sup>+</sup> *<sup>p</sup>* − *Q*<sup>−</sup> *<sup>p</sup>* , where *Q*<sup>+</sup> *<sup>p</sup>* and *Q*<sup>−</sup> *<sup>p</sup>* have the following densities with respect to the Lebesgue measure:

$$q\_p^+(\mathbf{x}) := (1 - F\_{\mathcal{W}\_p}(\mathbf{x})) \mathbb{I}\{ (0, \infty) \}, \quad q\_p^-(\mathbf{x}) := F\_{\mathcal{W}\_p}(\mathbf{x}) \mathbb{I}\{ (-\infty, 0] \}, \quad \mathbf{x} \in \mathbb{R}\_+$$

we took into account that E[*Wp*] = 1 according to Lemma 4. Then, for any −∞ < *a* < *b* < ∞, one ascertains that variation of *F<sup>e</sup> Wp* on [*a*, *<sup>b</sup>*] is given by formula *Var<sup>b</sup> <sup>a</sup>* (*F<sup>e</sup> Wp* ) = *b <sup>a</sup>* <sup>|</sup>*p<sup>e</sup> Wp* (*u*)|*du* (see, e.g., Theorem 4.4.7 [46]). Note that for any −∞ < *a* < *b* < ∞,

$$\int\_{a}^{b} |p\_{\mathcal{W}\_{\mathcal{P}}}^{c}(u)| du \le \mathbb{E}|\mathcal{W}\_{\mathcal{P}}| < \infty$$

according to Lemma 4. Thus, *F<sup>e</sup> Wp* is a function of bounded variation. In the right-hand side of Equation (32), we take the Lebesgue–Stieltjes integral with respect to the function of bounded variation (*FWp* − *F<sup>e</sup> Wp* )(*x*), *x* ∈ R. Let *F<sup>e</sup> Wp* (*x*) = *F<sup>e</sup> <sup>p</sup>*,1(*x*) − *<sup>F</sup><sup>e</sup> <sup>p</sup>*,2(*x*), *<sup>x</sup>* ∈ R, where *<sup>F</sup><sup>e</sup> p*,*i* are nondecreasing right-continuous functions (even continuous since *F<sup>e</sup> Wp* is continuous), *i* = 1, 2. Thus,

$$F\_{\mathcal{W}\_p}(\mathfrak{x}) - F\_{\mathcal{W}\_p}^{\varepsilon}(\mathfrak{x}) = (F\_{\mathcal{W}\_p}(\mathfrak{x}) + F\_{p,2}^{\varepsilon}(\mathfrak{x})) - F\_{p,1}^{\varepsilon}(\mathfrak{x}), \ \mathfrak{x} \in \mathbb{R}.$$

With the help of Equations (18) and (19) one makes sure that, for each *n* ∈ N,

$$\int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, d(F\_{W\_p} - F\_{W\_p}^{\varepsilon})(\mathbf{x}) = \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, d(F\_{W\_p}(\mathbf{x}) + F\_{p,2}^{\varepsilon}(\mathbf{x})) - \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, d(F\_{p,1}^{\varepsilon}(\mathbf{x})) $$

$$= \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{W\_p}(\mathbf{x}) + \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{p,2}^{\varepsilon}(\mathbf{x}) - \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{p,1}^{\varepsilon}(\mathbf{x}) $$

$$= \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{W\_p}(\mathbf{x}) - \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, d\left(F\_{p,1}^{\varepsilon}(\mathbf{x}) - F\_{p,2}^{\varepsilon}(\mathbf{x})\right) $$

$$= \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{W\_p}(\mathbf{x}) - \int\_{\left(-n,n\right]} f\_h^t(\mathbf{x}) \, dF\_{W\_p^t}(\mathbf{x}) .$$

All the integrals in the latter formulas are finite. According to Lemma 2 and Remark 1, one can write | *f <sup>h</sup>*(*x*)| ≤ *A*0|*x*| + *B*0, where *A*0, *B*<sup>0</sup> are positive constants. Thus, the Lebesgue theorem on dominated convergence ensures that

$$\lim\_{n \to \infty} \int\_{(-n,n]} f'\_h(\mathfrak{x}) \, dF\_{W\_p}(\mathfrak{x}) = \int\_{\mathbb{R}} f'\_h(\mathfrak{x}) \, dF\_{W\_p}(\mathfrak{x}),$$

where the latter integral is finite. Indeed,

$$\int\_{\mathbb{R}} (A\_0|\mathbf{x}| + B\_0) \, dF\_{\mathcal{W}\_p}(\mathbf{x}) = A\_0 \mathbb{E}|\mathcal{W}\_p| + B\_0 < \infty \tag{33}$$

according to Lemma 4. By the same Lemma, one has E[*Wp*] = 1. Therefore, on account of Equation (17), the following relation holds:

$$\int\_{\left(-n,n\right]} f\_h'(\mathbf{x}) \, dF\_{W\_p^x}(\mathbf{x}) = \int\_{\left(-n,0\right]} f\_h'(\mathbf{x}) (-F\_{W\_p}(\mathbf{x})) d\mathbf{x} + \int\_{\left(0,n\right]} f\_h'(\mathbf{x}) (1 - F\_{W\_p}(\mathbf{x})) d\mathbf{x} \, \rho$$

whereas Corollary 2, Sec. 6, Ch. II of [47] and Lemma 4 entail that

$$\begin{split} \int\_{(-\infty,0]} (A\_0|\mathbf{x}| + B\_0) F\_{\mathcal{W}\_p}(\mathbf{x}) d\mathbf{x} + \int\_{(0,\infty)} (A\_0|\mathbf{x}| + B\_0)(1 - F\_{\mathcal{W}\_p}(\mathbf{x})) d\mathbf{x} \\ \leq A\_0 \mathbb{E}[\mathcal{W}\_p^2] + B\_0 \mathbb{E}|\mathcal{W}\_p| < \infty. \end{split} \tag{34}$$

The Lebesgue theorem on dominated convergence for *σ*-finite measures and Equation (34) yield

$$\lim\_{n \to \infty} \int\_{(-n,n]} f'\_h(\mathfrak{x}) dF^{\mathfrak{c}}\_{W\_p}(\mathfrak{x}) = \int\_{\mathbb{R}} f'\_h(\mathfrak{x}) dF^{\mathfrak{c}}\_{W\_p}(\mathfrak{x}),$$

where the latter integral is finite. Now, we show that

$$\lim\_{n \to \infty} \int\_{(-n,n]} f\_h'(\mathbf{x}) \, d(F\_{W\_p} - F\_{W\_p}^{\varepsilon})(\mathbf{x}) = \int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, d(F\_{W\_p} - F\_{W\_p}^{\varepsilon})(\mathbf{x}).\tag{35}$$

Note that *f <sup>h</sup>*(*x*)I(−*n*,*n*](*x*) <sup>→</sup> *<sup>f</sup> <sup>h</sup>*(*x*) at each *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>. To apply the version of the Lebesgue theorem to integrals over a signed measure, it suffices (see, e.g., [48], p. 74) to verify that

$$\int\_{\mathbb{R}} |f'\_{h}(\mathfrak{x})| |d(F\_{\mathcal{W}\_{p}} - F^{c}\_{\mathcal{W}\_{p}})(\mathfrak{x})| \, < \infty,$$

where |*dG*| means that one evaluates an integral with respect to the measure corresponding to the total variation of a measure determined by a right-continuous function *G* of bounded variation. The extension of the Lebesgue theorem on dominated convergence for signed measures is an immediate corollary of the Jordan decomposition mentioned above. Using this decomposition, one obtains the inequality

$$\int\_{\mathbb{R}} |f\_h'(\mathbf{x})| |d(F\_{\mathcal{W}\_p} - F\_{\mathcal{W}\_p}^{\varepsilon})(\mathbf{x})| \leq \int\_{\mathbb{R}} |f\_h'(\mathbf{x})| |dF\_{\mathcal{W}\_p}(\mathbf{x})| + \int\_{\mathbb{R}} |f\_h'(\mathbf{x})| |dF\_{\mathcal{W}\_p}^{\varepsilon}(\mathbf{x})|.$$

Due to Remark 1 one has | *f <sup>h</sup>*(*x*)| ≤ *<sup>A</sup>*0|*x*<sup>|</sup> <sup>+</sup> *<sup>B</sup>*<sup>0</sup> for all *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> and some positive constants *A*0, *B*0. Then, Equations (33) and (34) yield (as *FWp* generates probability measure)

$$\int\_{\mathbb{R}} (A\_0|\boldsymbol{\chi}| + B\_0) dF\_{\mathcal{W}\_p}(\boldsymbol{\chi}) + \int\_{\mathbb{R}} (A\_0|\boldsymbol{\chi}| + B\_0) |dF\_{\mathcal{W}\_p}(\boldsymbol{\chi})| < \infty.$$

The functions *f <sup>h</sup>* and *FWp* − *<sup>F</sup><sup>e</sup> Wp* are right-continuous and have bounded variation. Then each of them can be represented as the difference of right-continuous nondecreasing functions, and using for any *n* ∈ N the integration by parts formula (see, e.g., Theorem 11, Sec. 6, Ch. 2, [47]), one has

 $\int\_{(-n,n]} f\_h'(\mathbf{x}) \, d(F\_{W\_p} - F\_{W\_p}')(\mathbf{x})$ 
$$= f\_h'(\mathbf{x}) (F\_{W\_p}(\mathbf{x}) - F\_{W\_p}'(\mathbf{x}))|\_{-n}^n - \int\_{(-n,n]} (F\_{W\_p}(\mathbf{x}) - F\_{W\_p}'(\mathbf{x})) df\_h'(\mathbf{x}).$$

Since the integral in the right-hand side of Equation (35) is finite, it holds

$$f\_h'(\mathbf{x})(F\_{\mathcal{W}\_p}(\mathbf{x}) - F\_{\mathcal{W}\_p}^\varepsilon(\mathbf{x})) \to 0, \ \mathbf{x} \to -\infty \text{ or } \ \mathbf{x} \to \infty \tag{36}$$

(the proof is similar to the proof of Corollary 2, Sec. 6, Ch. 2 in [47]). Then,

$$\int\_{\mathbb{R}} f\_h'(\mathbf{x}) \, d(F\_{\mathcal{W}\_p} - F\_{\mathcal{W}\_p}^{\varepsilon})(\mathbf{x}) = - \lim\_{n \to \infty} \int\_{(-n,n]} (F\_{\mathcal{W}\_p}(\mathbf{x}) - F\_{\mathcal{W}\_p}^{\varepsilon}(\mathbf{x})) df\_h'(\mathbf{x}).$$

The function *f <sup>h</sup>* is absolutely continuous according to Lemma 2. Hence (see also Equations (36) and (A12) in Appendix A) we get

$$\begin{split} \left| \int\_{\mathbb{R}} f\_{\boldsymbol{n}}'(\boldsymbol{x}) \, d(F\_{\boldsymbol{W}\_{\mathcal{P}}}(\boldsymbol{x}) - F\_{\boldsymbol{W}\_{\mathcal{P}}}^{\varepsilon}(\boldsymbol{x})) \right| &= \lim\_{n \to \infty} \left| \int\_{(-n,n]} (F\_{\boldsymbol{W}\_{\mathcal{P}}}(\boldsymbol{x}) - F\_{\boldsymbol{W}\_{\mathcal{P}}}^{\varepsilon}(\boldsymbol{x})) f\_{\boldsymbol{n}}^{\prime\prime}(\boldsymbol{x}) \, d\boldsymbol{x} \right| \\ &\leq \| f\_{\boldsymbol{n}}^{\prime\prime} \|\_{\infty} \int\_{\mathbb{R}} \left| F\_{\boldsymbol{W}\_{\mathcal{P}}}(\boldsymbol{x}) - F\_{\boldsymbol{W}\_{\mathcal{P}}}^{\varepsilon}(\boldsymbol{x}) \right| \, d\boldsymbol{x} \leq \int\_{\mathbb{R}} \left| F\_{\boldsymbol{W}\_{\mathcal{P}}}(\boldsymbol{x}) - F\_{\boldsymbol{W}\_{\mathcal{P}}}^{\varepsilon}(\boldsymbol{x}) \right| \, d\boldsymbol{x}, \end{split} \tag{37}$$

because *f <sup>h</sup>* <sup>∞</sup> ≤ *<sup>h</sup>* <sup>∞</sup> ≤ 1 due to Lemmas 1 and 2. Using the homogeneity of the Kantorovich metric for signed measures which is derived from formula (20) of [22] (see Lemma 1 (a) there) and applying Lemma 3 of that paper, we can write

$$\begin{split} \int\_{\mathbb{R}} \left| F\_{\mathcal{W}\_{\mathbb{P}}}(\mathbf{x}) - F\_{\mathcal{W}\_{\mathbb{P}}}^{\varepsilon}(\mathbf{x}) \right| d\mathbf{x} &= \frac{p}{|\mu|(1-p)} \int\_{\mathbb{R}} \left| F\_{\mathbb{S}\_{\mathcal{N}\_{\mathbb{P}}}}(\mathbf{x}) - F\_{\mathbb{S}\_{\mathcal{N}\_{\mathbb{P}}}}^{\varepsilon}(\mathbf{x}) \right| d\mathbf{x} \\ &\leq \frac{\mathbb{E}[X\_{\mathcal{N}\_{\mathbb{P}}+1}^{2}]}{2\mu^{2}} \left( \frac{p}{1-p} \right). \end{split} \tag{38}$$

Relations (30), (31), (32), (37), (38) and Lemmas 1 and 2 guarantee that *d*H<sup>2</sup> (*Wp*, *Z*) does not exceed the right-hand side of Equation (29).

Now, we turn to the lower bounds for *d*H<sup>2</sup> (*Wp*, *Z*). Choose *h*(*x*) = *x*2/2 as the test function. Since *h* ∈ H2, we can write

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_p, Z) \ge \left| \mathbb{E}[h(\mathcal{W}\_p)] - \mathbb{E}[h(Z)] \right| = \frac{1}{2} \left| \mathbb{E}[\mathcal{W}\_p^2] - \mathbb{E}[Z^2] \right|. \tag{39}$$

For a random variable *Z* following the exponential law *Exp*(1), one has E[*Z*2] = 2. Formula (23) of Lemma 4 yields

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{p\prime}Z) \ge \frac{\mathbb{E}[X\_{N\_p+1}^2]}{2\mu^2} \left(\frac{p}{1-p}\right).$$

Taking into account formula (38), we come to the desired statement. The proof is complete.

**Remark 3.** *Evidently,*

$$\mathbb{E}[\mathbf{X}\_{N\_p+1}^2] = \sum\_{n=0}^{\infty} \mathbb{E}[\mathbf{X}\_{n+1}^2] p(1-p)^n.$$

*Thus, one obtains*

$$\mathbb{E}[X\_{N\_P+1}^2] \le \sup\_{n \in \mathbb{N}} \mathbb{E}[X\_n^2]\_{\prime}$$

*and the latter inequality becomes an equality when* E[*X*<sup>2</sup> *<sup>n</sup>*] = E[*X*<sup>2</sup> <sup>1</sup>] *for all <sup>n</sup>* <sup>∈</sup> <sup>N</sup>*. Therefore, the statement of Theorem 1 can be written as follows*

$$d\_{\mathcal{H}\_2}(W\_{p^\star}Z) \le \frac{\sup\_{n \in \mathbb{N}} \mathbb{E}[X\_n^2]}{2\mu^2} \left(\frac{p}{1-p}\right).$$

*and this becomes an equality when* E[*X*<sup>2</sup> *<sup>n</sup>*] = E[*X*<sup>2</sup> <sup>1</sup>] *for all n* <sup>∈</sup> <sup>N</sup>*.*

**Remark 4.** *In [22], the authors proved the following inequality*

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{p\prime}Z) \le \frac{3\mathbb{E}[X\_{N\_p+1}^2]}{2\mu^2} \left(\frac{p}{1-p}\right).$$

*We established the sharp estimate with a factor* 1/2 *instead of* 3/2 *having employed Equation* (20) *for a class of functions comprising solutions of the Stein equation for h* ∈ H2*. The estimate with factor* 1/2 *was also obtained in the recent paper [49] but for i.i.d. summands. The lower bounds were not provided there. In our Theorem 1, the summands have the same expectations but need not have the same distribution.*

**Remark 5.** *If the summands of Wp are non-negative, we consider W<sup>e</sup> <sup>p</sup> appearing in Equation* (28)*. Applying Theorem 1(i) [22] to relation* (29)*, one obtains*

$$d\_{\mathcal{H}\_1}(\mathcal{W}\_{p^\prime}^\varepsilon Z) = \frac{\mathbb{E}[X\_{N\_p+1}^2]}{2\mu^2} \frac{p}{1-p}.$$

*For i* ∈ N*, consider a random variable Xi having distribution Exp*(1/*μ*)*. Then X<sup>e</sup> <sup>i</sup>* ∼ *Exp*(1/*μ*)*, and, consequently, X<sup>e</sup> Np*+<sup>1</sup> <sup>∼</sup> *Exp*(1/*μ*)*. We can choose <sup>X</sup><sup>e</sup> <sup>i</sup> , <sup>i</sup>* <sup>∈</sup> <sup>N</sup>*, according to Remark 2. Then, the distribution of W<sup>e</sup> <sup>p</sup> will be the same if we change X<sup>e</sup> Np*+<sup>1</sup> *to XNp*<sup>+</sup><sup>1</sup> *in Equation* (28)*. In such a way, W<sup>e</sup> <sup>p</sup> is a normalized sum of a random number of independent random variables. Using the homogeneity of the Kantorovich metric, one has*

$$d\mu\_{\mathbb{H}\_1}\left(\frac{p}{\mu}\sum\_{k=1}^{N\_p+1}X\_{k\prime}(1-p)Z\right) = (1-p)d\mu\_{\mathbb{H}\_1}\left(\frac{p}{\mu(1-p)}\sum\_{k=1}^{N\_p+1}X\_{k\prime}Z\right) = \frac{\mathbb{E}[X\_{N\_p+1}^2]}{2\mu^2}p.\tag{40}$$

*Therefore, for an arbitrary sequence* (*Xk*)*k*∈<sup>N</sup> *satisfying conditions of Theorem 1, the upper bound for the left-hand side of Equation* (40) *is not less than the right-hand side of Equation* (40)*.*

#### **4. Limit Theorem for Geometric Sums of Exchangeable Random Variables**

Now, we consider exchangeable random variables *X*1, *X*2, ... satisfying the dependence condition proposed in [35]. Namely, assume that for all *n* ∈ N, *tj* ∈ R (*j* = 1, ... , *n*) and some *ρ* ∈ [0, 1]

$$\mathbb{E}\left[\epsilon^{i(t\_1X\_1+...+t\_nX\_n)}\right] = \rho \mathbb{E}\left[\epsilon^{iX\_1(t\_1+...+t\_n)}\right] + (1-\rho)\prod\_{j=1}^n \mathbb{E}\left[\epsilon^{it\_jX\_j}\right],\tag{41}$$

where *i* <sup>2</sup> = −1. The cases of *ρ* = 0 and *ρ* = 1 correspond, respectively, to independent random variables and those possessing the property of comonotonicity. The latter means that for *ρ* = 1 the joint behavior of *X*1, ... , *Xn* is strongly correlated and coincides with one of a vector (*X*1,..., *X*1).

**Theorem 2.** *Let X*1, *X*<sup>2</sup> ... *be exchangeable random variables with* E[*X*1] = *μ, μ* = 0 *satisfying condition* (41) *for some ρ* ∈ (0, 1)*. Suppose that* (*Xn*)*n*∈<sup>N</sup> *and Np are independent, where Np* ∼ *Geom*(*p*)*, p* ∈ (0, 1)*. In contrast to the Rényi theorem, one has*

$$\mathcal{W}\_{\mathcal{P}} \stackrel{\mathcal{D}}{\rightarrow} \mathcal{Y}, \ p \rightarrow 0 + \text{,}$$

*where the law of Y is the following mixture*

$$\mathbb{P}\_Y = \rho \mathbb{P}\_{VX\_1/\mu} + (1 - \rho)\mathbb{P}\_{Z,\prime} \tag{42}$$

*random variables X*1, *V are independent and V* ∼ *Exp*(1)*, Z* ∼ *Exp*(1)*.*

**Proof.** Let *<sup>X</sup>*1, *<sup>X</sup>*2, ... be independent copies of *<sup>X</sup>*1, *<sup>X</sup>*2, ..., respectively. Suppose that *<sup>X</sup>*1, *<sup>X</sup>*2, ... are independent with *Np*. Set *<sup>S</sup>*<sup>0</sup> :<sup>=</sup> 0, *<sup>S</sup>*<sup>0</sup> :<sup>=</sup> 0, *<sup>S</sup><sup>n</sup>* :<sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>+</sup> ... <sup>+</sup> *<sup>X</sup>n*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. Denote the characteristic function of a random variable *ξ* by *f<sup>ξ</sup>* (*t*), *t* ∈ R. For each *t* ∈ R, using Equation (41), one has

$$f\_{\bar{N}\_{\mathcal{N}\_p}}(t) = \sum\_{n=0}^{\infty} \mathbb{E}\left[e^{itS\_n}\right] \mathbb{P}(N\_{\mathcal{P}} = n)$$

$$= \mathbb{P}(N\_{\mathcal{P}} = 0) + \sum\_{n=1}^{\infty} \left(\rho \mathbb{E}\left[e^{iX\_1tn}\right] + (1-\rho)\prod\_{j=1}^{n} \mathbb{E}\left[e^{itX\_j}\right]\right) \mathbb{P}(N\_{\mathcal{P}} = n)$$

$$= p + \sum\_{n=0}^{\infty} \left(\rho \mathbb{E}\left[e^{iX\_1tn}\right] + (1-\rho)\prod\_{j=1}^{n} \mathbb{E}\left[e^{it\bar{X}\_j}\right]\right) \mathbb{P}(N\_{\mathcal{P}} = n) - \rho p - (1-\rho)p$$

$$= \rho f\_{\bar{X}\_1\bar{N}\_p}(t) + (1-\rho)\sum\_{n=0}^{\infty} f\_{\bar{S}\_n}(t)\mathbb{P}(N\_p = n) = \rho f\_{\bar{X}\_1\bar{N}\_p}(t) + (1-\rho)f\_{\bar{S}\_{\bar{N}\_p}}(t).$$

For each *t* ∈ R, one has

$$f\_{\mathcal{W}\_p}(t) = \rho f\_{\frac{p}{\mu(1-p)}X\_1N\_p}(t) + (1-\rho)f\_{\bar{\mathcal{W}}\_p}(t),\tag{43}$$

where *<sup>W</sup> <sup>p</sup>* <sup>=</sup> *<sup>p</sup> <sup>μ</sup>*(1−*p*) <sup>∑</sup>*Np <sup>j</sup>*=<sup>1</sup> *<sup>X</sup>j*.

According to the classical Rényi theorem, *<sup>W</sup> <sup>p</sup>* <sup>D</sup> → *Z* as *p* → 0+, where *Z* ∼ *Exp*(1). Note that *Tp* := *<sup>p</sup>* <sup>1</sup>−*<sup>p</sup> Np* <sup>D</sup> → *V* as *p* → 0+, where *V* ∼ *Exp*(1). In fact, one can apply Theorem 1 with *Xj* ≡ 1, *j* ∈ N to check this. For each *t* ∈ R, taking into account that *Tp* and *X*<sup>1</sup> are independent and applying the Lebesgue theorem on dominated convergence, we see that

$$\mathbb{E}\left[\epsilon^{itT\_pX\_1}\right] = \mathbb{E}\left[\mathbb{E}\epsilon^{itT\_pX\_1}|X\_1\right] = \int\_{\mathbb{R}}\epsilon^{itT\_p\ge}dF\_{X\_1}(\mathbf{x}) \to \int\_{\mathbb{R}}\epsilon^{itV\ge}dF\_{X\_1}(\mathbf{x}) = \mathbb{E}\left[\epsilon^{itVX\_1}\right],\ p \to 0+,$$

since *X*<sup>1</sup> and *V* are independent. Hence,

$$\frac{p}{\mu(1-p)}X\_1N\_p \stackrel{\mathcal{D}}{\longrightarrow} \frac{VX\_1}{\mu}, \ p \to 0+.$$

is true. In light of Equation (43),

$$\mathcal{W}\_p \stackrel{\mathcal{D}}{\longrightarrow} \mathcal{Y}\_{\prime} \ p \to 0 + \prime$$

here the law of *Y* is the mixture of distributions *VX*1/*μ* and *Z* provided by Equation (42). The proof is complete.

**Theorem 3.** *Assume that Np and* (*Xn*)*n*∈<sup>N</sup> *satisfy conditions of Theorem 2. Let μ*<sup>2</sup> = E[*X*<sup>2</sup> 1]*. Then,*

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{\mathcal{V}}, Y) = \frac{\mu\_2}{2\mu^2} \left(\frac{p}{1-p}\right). \tag{44}$$

**Proof.** Relation (43) for characteristic functions implies that the following equality of distributions holds

$$\mathcal{W}\_p \stackrel{\mathcal{D}}{=} \frac{p}{\mu(1-p)} \left( (1-\mathbb{I}\_p)\mathcal{N}\_p X\_1 + \mathbb{I}\_p \tilde{S}\_{\mathcal{N}\_p} \right),\tag{45}$$

where indicator I*<sup>ρ</sup>* equals 1 and 0 with probabilities 1 − *ρ* and *ρ*, respectively, and is independent of all the variables under consideration. Assume at first that *μ*<sup>2</sup> < ∞. Then, for *h* ∈ H2,

$$\mathbb{E}\left[h(\mathcal{W}\_p)\right] = \rho \mathbb{E}\left[h\left(\frac{p}{\mu(1-p)}N\_pX\_1\right)\right] + (1-\rho)\mathbb{E}[h(\bar{\mathcal{W}}\_p)].$$

In view of Equation (42) one has

$$\mathbb{E}[h(\boldsymbol{Y})] = \rho \mathbb{E}\left[h\left(\frac{VX\_1}{\mu}\right)\right] + (1-\rho)\mathbb{E}[h(Z)].$$

The latter two formulas and the triangle inequality yield

$$\begin{split} & \left| \mathbb{E} [h(\mathcal{W}\_{\mathcal{P}})] - \mathbb{E} [h(Y)] \right| \\ & \leq \rho \left| \mathbb{E} \left[ h \left( \frac{p}{\mu(1-p)} \mathcal{N}\_{\mathcal{P}} X\_1 \right) \right] - \mathbb{E} \left[ h \left( \frac{V X\_1}{\mu} \right) \right] \right| + (1-\rho) \left| \mathbb{E} [h(\tilde{\mathcal{W}}\_{\mathcal{P}})] - \mathbb{E} [h(Z)] \right|. \end{split} \tag{46}$$

By means of Theorem 1 we have

$$\sup\_{h \in \mathcal{H}\_2} |\mathbb{E}[h(\tilde{W}\_p)] - \mathbb{E}[h(Z)]| = \frac{\mu\_2}{2\mu^2} \left(\frac{p}{1-p}\right). \tag{47}$$

For each *h* ∈ H2, taking into account the independence of *X*1, *Np*, *V*, one can write

$$\begin{aligned} \left| \mathbb{E} \left[ h \left( \frac{p}{\mu(1-p)} \mathbf{N}\_p \mathbf{X}\_1 \right) \right] - \mathbb{E} \left[ h \left( \frac{V \mathbf{X}\_1}{\mu} \right) \right] \right| \\ &= \left| \int\_{\mathbb{R}} \left( \mathbb{E} \left[ h \left( \frac{p}{\mu(1-p)} \mathbf{N}\_p \mathbf{X}\_1 \right) \right] - \mathbb{E} \left[ h \left( \frac{\mathbf{x} \mathbf{V}}{\mu} \right) \right] \right) dF\_{\mathbf{X}\_1} (\mathbf{x}) \right|. \end{aligned}$$

Due to homogeneity of *d*H<sup>2</sup> we infer from Theorem 1 that

$$\sup\_{h \in \mathcal{H}\_2} \left| \mathbb{E} \left[ h \left( \frac{p}{\mu(1-p)} N\_p X\_1 \right) \right] - \mathbb{E} \left[ h \left( \frac{\ge V}{\mu} \right) \right] \right| = d\_{\mathcal{H}\_2} \left( \frac{p\chi}{\mu(1-p)} N\_{p\prime} \frac{\ge V}{\mu} \right)$$

$$= \left( \frac{\chi}{\mu} \right)^2 d\_{\mathcal{H}\_2} \left( \frac{p}{(1-p)} \sum\_{k=1}^{N\_p} 1, V \right) = \frac{1}{2} \left( \frac{\chi}{\mu} \right)^2 \frac{p}{1-p}.$$

Consequently, it holds

$$\begin{split} \left| \mathbb{E} \left[ h \left( \frac{p}{\mu(1-p)} N\_p X\_1 \right) \right] - \mathbb{E} \left[ h \left( \frac{V X\_1}{\mu} \right) \right] \right| \\ &\leq \frac{p}{2(1-p)} \int\_{\mathbb{R}} \left( \frac{\mathbf{x}}{\mu} \right)^2 d\mathbb{F}\_{\mathbf{X}\_1}(\mathbf{x}) = \frac{\mu\_2}{2\mu^2} \left( \frac{p}{1-p} \right). \end{split} \tag{48}$$

Equations (46), (47) and (48) lead to the upper bound for *d*H<sup>2</sup> (*Wp*,*Y*).

Note that a function *h*(*x*) = *x*2/2, *x* ∈ R, belongs to ∈ H<sup>2</sup> and therefore

$$\sup\_{\mathcal{H}\_2} \left| \mathbb{E} [h(\mathcal{W}\_p)] - \mathbb{E} [h(Y)] \right| \geq \frac{1}{2} \left( \mathbb{E} [\mathcal{W}\_p^2] - \mathbb{E} [Y^2] \right). \tag{49}$$

Note that E[*Z*2] = E[*V*2] = 2 because *Z* ∼ *Exp*(1) and *V* ∼ *Exp*(1). The random variables *X*1, *V*, *Z* are independent. Thus, in light of Equation (42), one has

$$\mathbb{E}[Y^2] = 2\rho \frac{\mu\_2}{\mu^2} + 2(1 - \rho). \tag{50}$$

By means of Equations (45), (23) and (25) we obtain

$$\begin{split} \mathbb{E}[\mathbb{W}\_p^2] &= \left(\frac{p}{\mu(1-p)}\right)^2 \rho \mathbb{E}[X\_p^2] \mathbb{E}[X\_1^2] + (1-\rho)\mathbb{E}[\tilde{\mathbb{W}}\_p^2] \\ &= \left(\frac{p}{\mu(1-p)}\right)^2 \rho \frac{(1-p)(2-p)}{p^2} \mu\_2 + (1-\rho) \left(\frac{p}{\mu^2(1-p)}\mu\_2 + 2\right) = \\ &= \frac{\mu\_2}{\mu^2} \left(\rho \frac{2-p}{1-p} + (1-\rho)\frac{p}{1-p}\right) + 2(1-\rho). \end{split} \tag{51}$$

Equations (50) and (51) permit to find E[*W*<sup>2</sup> *<sup>p</sup>* ] − E[*Y*2]. Hence Equation (49) leads to the inequality

$$\begin{split} \sup\_{\mathcal{H}\_2} \left| \mathbb{E} [h(\mathcal{W}\_p)] - \mathbb{E} [h(\mathbf{Y})] \right| \\ &\geq \left( \frac{1}{2} \right) \frac{\mu\_2}{\mu^2} \left( \rho \left( \frac{2 - p}{1 - p} - 2 \right) + (1 - \rho) \frac{p}{1 - p} \right) = \left( \frac{1}{2} \right) \frac{\mu\_2}{\mu^2} \frac{p}{1 - p}. \end{split} \tag{52}$$

Now, let *μ*<sup>2</sup> = ∞. Then, *d*H<sup>2</sup> (*Wp*,*Y*) = ∞ according to Equation (52). The proof is complete.

#### **5. Convergence of Random Sums of Independent Summands to Generalized Gamma Distribution**

Statements concerning weak convergence of geometric sums distributions to exponential law are often just particular cases of more general results concerning the convergence of random sums of random summands to generalized gamma law when the number of summands follows the generalized negative binomial distribution, see, e.g., [27,29,49]). The recent work [29] demonstrated how it is possible to study the mentioned general case employing the estimates of proximity of geometric sums distributions to exponential law. We introduce some notation to apply Theorem 1 for analysis of the distance between the distributions of random sums and the generalized gamma law.

Introduce a random variable *Gr*,*<sup>λ</sup>* such that *Gr*,*<sup>λ</sup>* ∼ *G*(*r*, *λ*), where *G*(*r*, *λ*) is the gamma law with positive parameters *r* and *λ*, i.e., its density with respect to the Lebesgue measure has the form

$$\lg(z;r,\lambda) = \frac{\lambda^r z^{r-1}}{\Gamma(r)} e^{-\lambda z} \mathbb{I}\_{(0,\infty)}(z), \ z \in \mathbb{R}\_{\ge 0}$$

Γ(*r*) being the gamma function. For *r* = 1, one has *G*(1, *λ*) = *Exp*(*λ*). Clearly, for *a* > 0, *aGr*,*<sup>λ</sup>* ∼ *G*(*r*, *λ*/*a*). Set *G*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* :<sup>=</sup> *<sup>G</sup>*1/*<sup>α</sup> <sup>r</sup>*,*<sup>λ</sup>* , where *<sup>α</sup>* <sup>&</sup>gt; 0. One says that random variable *<sup>G</sup>*<sup>∗</sup> *r*,*α*,*λ* has the generalized gamma distribution *G*∗(*r*, *α*, *λ*). According to Equation (5) of [29], the density of *G*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* is given by formula

$$g^\*(z; r, \mathfrak{a}, \lambda) = \frac{|\mathfrak{a}| \lambda^r z^{ar-1}}{\Gamma(r)} e^{-\lambda z^a} \mathbb{I}\_{(0, \infty)}(z), \quad z \in \mathbb{R}.$$

Also it is known (see Equation (6) in [29]) that, for *r* ∈ (0, 1), *α* ∈ (0, 1] and *λ* > 0, the following relation holds

$$g^\*(z; r, a, \lambda) = \int\_0^1 \frac{u}{1 - u} e^{-\frac{\mu}{1 - u} z} q(u; r, a, \lambda) \, du, \; z > 0,\tag{53}$$

where *q* is a density of a specified random variable *Yr*,*α*,*<sup>λ</sup>* such that support of its distribution belongs to (0, 1) (see Remark 3 [49]). We only note that for *α* = 1 the density *q* admits a representation

$$a\left(u;r,1,\frac{b}{1-b}\right) = b^r \left(\frac{\sin \pi r}{\pi}\right) \frac{(1-u)^{r-1}}{u(u-b)^r} \mathbb{I}\_{(b,1)}(u), \ b \in (0,1).$$

Consider a random variable *N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup>* having the generalized negative binomial distribution *GNB*(*r*, *α*, *p*), where *r* > 0, *α* = 0 and *p* ∈ (0, 1), i.e.,

$$\mathbb{P}(N\_{r,a,p}^\* = k) = \int\_0^\infty \frac{z^k}{k!} e^{-z} g^\*\left(z; r, a, \frac{p}{1-p}\right) dz, \ k = 0, 1, \ldots \tag{54}$$

Thus *GNB*(*r*, *α*, *p*) has a mixed Poisson distribution. One can verify that *GNB*(*r*, 1, *p*) coincides with *NB*(*r*, *p*), where *NB*(*r*, *p*) is the negative binomial law. Recall that *Nr*,*<sup>p</sup>* ∼ *NB*(*r*, *p*) if

$$\mathbb{P}(\mathsf{N}\_{r,p} = k) = \frac{\Gamma(k+r)}{k!\Gamma(r)} p^r (1-p)^k, \; k = 0, \; 1, \ldots$$

Note also that *N*1,*<sup>p</sup>* ∼ *Geom*(*p*).

Introduce the random variables

$$\mathcal{W}\_{r,a,p}^\* := \frac{1}{\mu} \left(\frac{p}{1-p}\right)^{1/a} \sum\_{k=1}^{N\_{r,a,p}^\*} X\_{k\prime} \quad S\_{r,a,p}^\* := \sum\_{k=1}^{N\_{r,a,p}^\*} X\_{k\prime} \tag{55}$$

where *N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup>* ∼ *GNB*(*r*, *α*, *p*), *r* > 0, *α* = 0, *p* ∈ (0, 1), and E[*Xk*] = *μ*, *μ* = 0, *k* ∈ N. We assume that (*Xn*)*n*∈<sup>N</sup> and *N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup>* are independent, where *r* > 0, *α* = 0, *p* ∈ (0, 1).

**Theorem 4.** *Let* (*Xn*)*n*∈<sup>N</sup> *be a sequence of independent random variables having* E[*Xn*] = *μ, μ* = 0*, n* ∈ N*. Then, for W*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup> introduced in Equation* (55) *with parameters r* ∈ (0, 1)*, α* ∈ (0, 1]*, p* ∈ (0, 1) *and Gr*,1 *having the gamma distribution G*(*r*, 1)*, the following relation holds*

$$d\_{\mathcal{H}\_2}(W\_{r,\mathfrak{a},\mathfrak{p}'}^\* G\_{r,1}^{1/a}) = \frac{1}{2\mu^2} \left(\frac{p}{1-p}\right)^{2/a} \int\_0^1 \mathbb{E}[X\_{N\_\mathfrak{a}+1}^2] \left(\frac{1-\mu}{\mu}\right) q\left(u; r, \mathfrak{a}, \frac{p}{1-p}\right) du,\tag{56}$$

*whenever the right-hand side of Equation* (56) *is finite. Here, Nu* := *N*<sup>∗</sup> 1,1,*u, Nu* ∼ *Geom*(*u*)*, u* ∈ (0, 1) *and q appeared in Equation* (53)*.*

**Proof.** Without loss of generality, we can assume that *μ* = 1; otherwise, we consider *<sup>X</sup><sup>n</sup>* :<sup>=</sup> *Xn <sup>μ</sup>* , *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. For such sequence, <sup>E</sup>[*X*<sup>2</sup> *Nu*+1] = <sup>1</sup> *μ*<sup>2</sup>E[*X*<sup>2</sup> *Nu*+1]. Note that <sup>1</sup>−*<sup>p</sup> <sup>p</sup> Gr*,1 has the same distribution as *Gr*,*p*/(1−*p*). Applying the homogeneity property of the ideal probability metric of order two, one has

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{r,a,p}^\*, \mathcal{G}\_{r,1}^{1/a}) = \left(\frac{p}{1-p}\right)^{2/a} d\_{\mathcal{H}\_2}\left(\mathcal{S}\_{r,a,p}^\*, \mathcal{G}\_{r,p'/(1-p)}^{1/a}\right).$$

The proof of Theorem 1 [29] starts with establishing for any bounded Borel function *h*, *r* ∈ (0, 1), *α* ∈ (0, 1] and *p* ∈ (0, 1), that

$$\mathbb{E}\left[h\left(\mathbf{G}\_{r,p}^{1/\mathfrak{a}}\right)\right] = \int\_0^1 \mathbb{E}\left[h\left(\frac{1-\mathfrak{u}}{\mathfrak{u}}Z\right)\right] q\left(\mathfrak{u};r,\mathfrak{a},\frac{p}{1-p}\right) d\mathfrak{u},\tag{57}$$

where *Z* ∼ *Exp*(1), and

$$\mathbb{E}\left[h(\mathcal{S}\_{r,\mathfrak{a},p}^\*)\right] = \int\_0^1 \mathbb{E}\left[h(\mathcal{S}\_{1,1,\mu}^\*)\right] q\left(\mu; r, \alpha, \frac{p}{1-p}\right) d\mu. \tag{58}$$

Let us examine these relations for each *h* ∈ H2. Recall that in light of Remark 1 |*h*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*<sup>0</sup> for some positive constants *A*<sup>0</sup> and *B*<sup>0</sup> (which depend on *h*), we write *h* = *h*<sup>+</sup> − *h*−, where *h*+(*x*) := *h*(*x*)I{*h*(*x*) ≥ 0}, *h*−(*x*) := −*h*(*x*)I{*h*(*x*) ≤ 0}. Set *hn*(*x*) := *h*+(*x*)I(−*n*,*n*](*x*), *n* ∈ N. Then, *hn* and *n* ∈ N are bounded Borel functions such that for each *x* ∈ R, 0 ≤ *hn*(*x*) ) *h*+(*x*) as *n* → ∞. Hence, the monotone convergence theorem yields

$$\mathbb{E}\left[h^+\left(G^{1/\kappa}\_{r,p'/(1-p)}\right)\right] = \lim\_{n \to \infty} \mathbb{E}\left[h\_n\left(G^{1/\kappa}\_{r,p'/(1-p)}\right)\right].$$

Note that, for each *u* ∈ (0, 1), E *hn* 1−*u <sup>u</sup> Z*  ) E *h*+ 1−*u <sup>u</sup> Z* . Applying the monotone convergence theorem once again, we obtain

$$\int\_0^1 \mathbb{E}\left[h^+\left(\frac{1-u}{u}Z\right)\right]q\left(u;r,u,\frac{p}{1-p}\right)du = \lim\_{n \to \infty} \int\_0^1 \mathbb{E}\left[h\_n\left(\frac{1-u}{u}Z\right)\right]q\left(u;r,u,\frac{p}{1-p}\right)du.$$

So, Equation (57) is valid if instead of *h* belonging to H<sup>2</sup> we write *h*+. Obviously, 0 ≤ *h*+(*x*) ≤ |*h*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*0, *x* ∈ R, *n* ∈ R. Thus,

$$\mathbb{E}\left[h^+\left(G^{1/a}\_{r,p'/(1-p)}\right)^2\right] \le A\_0 \mathbb{E}\left(G^{2/a}\_{r,p'/(1-p)}\right) + B\_0 < \infty.$$

According to [27] (page 8), for *δ* > 0, one has

$$\mathbb{E}\left[\left(G\_{r,a,\lambda}^\*\right)^\delta\right] = \frac{\Gamma(r + \frac{\delta}{a})}{\lambda^{\delta/a}\Gamma(r)}.\tag{59}$$

This permits us to write E *G*2/*<sup>α</sup> r*,*p*/(1−*p*) = E (*G*<sup>∗</sup> *r*,1,*p*/(1−*p*) )2/*<sup>α</sup>* < ∞.

In the same manner, we demonstrate that Equation (57) is valid if instead of *h* ∈ H<sup>2</sup> we take *h*−. Moreover, E *h*− *G*1/*<sup>α</sup> r*,*p*/(1−*p*) is finite. Therefore, Equation (57) holds for any *h* ∈ H2, and for such *h*, E *h G*1/*<sup>α</sup> r*,*p*/(1−*p*) is finite.

By the monotone convergence theorem E[*h*+(*S*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*)] = lim*n*→<sup>∞</sup> E[*hn*(*S*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*)]. In a similar way, E[*hn*(*S*<sup>∗</sup> 1,1,*u*)] ) <sup>E</sup>[*h*+(*S*<sup>∗</sup> 1,1,*u*)] as *n* → ∞, and applying this theorem once again, we obtain

$$\int\_0^1 \mathbb{E}[h^+(S\_{1,1,u}^\*)]q\left(u;r,\varkappa,\frac{p}{1-p}\right)du = \lim\_{n \to \infty} \int\_0^1 \mathbb{E}[h\_n(S\_{1,1,u}^\*)]q\left(u;r,\varkappa,\frac{p}{1-p}\right)du\dots$$

Taking into account that Equation (58) is valid for bounded Borel functions *hn*, one ascertains that Equation (58) holds if we replace *h* by *h*+. To show the latter integral is finite, we note that 0 ≤ *h*+(*x*) ≤ |*h*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*0, for some positive *A*0, *B*<sup>0</sup> and all *x* ∈ R. Formula (23) of Lemma 4 yields, for each *u* ∈ (0, 1),

$$\mathbb{E}\left[\left(\mathcal{S}\_{1,1,\mu}^\*\right)^2\right] \le \frac{1-\mu}{\mu}\mathbb{E}[X\_{N\_\mu+1}^2] + 2\frac{(1-\mu)^2}{\mu^2}.$$

It was assumed above that the right-hand side of Equation (56) is finite. So,

$$\int\_0^1 \mathbb{E}\left(A\_0\left(\frac{1-u}{u}\mathbb{E}[X\_{N\_\mu+1}^2] + 2\frac{(1-u)^2}{u^2}\right) + B\_0\right) q\left(u; r, u, \frac{p}{1-p}\right) du < \infty, 1$$

since in light of Equation (57), taking *h*(*x*) = 1 and *h*(*x*) = *<sup>x</sup>*<sup>2</sup> <sup>2</sup> (these functions belong to H2), *x* ∈ R, we obtain, respectively,

$$\int\_0^1 q\left(u; r, \alpha, \frac{p}{1-p}\right) du = 1,$$

$$\mathbb{E}[Z^2] \int\_0^1 \frac{(1-u)^2}{u^2} q\left(u; r, \alpha, \frac{p}{1-p}\right) du = \mathbb{E}(\mathcal{G}\_{r, p}^{2/a}) < \infty. \tag{60}$$

We demonstrate analogously that Equation (58) holds upon replacing *h* ∈ H<sup>2</sup> with *h*<sup>−</sup> and if the right-hand side of Equation (56) is finite, it follows that

$$\int\_0^1 \mathbb{E}\left[h^-\left(\mathcal{S}\_{1,1,u}^\*\right)\right] q\left(u;r,\varkappa,\frac{p}{1-p}\right) du\varkappa$$

is finite as well. Consequently, Equation (58) is established for each *h* ∈ H<sup>2</sup> (whenever the right-hand side of Equation (56) is finite) and E *h*(*S*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*) is finite for such *h*. Therefore, for *h* ∈ H<sup>2</sup> and fixed *α*,*r*, *p*, one has

$$\begin{aligned} &\mathbb{E}\left[h(\mathcal{S}\_{r,\boldsymbol{a},\boldsymbol{p}}^{\*})\right]-\mathbb{E}\left[h\left(\mathcal{G}\_{r,\boldsymbol{p}\prime}^{1/\boldsymbol{a}}\right)\_{1}\right] \\ & \qquad =\int\_{0}^{1}\Big(\mathbb{E}\left[h(\mathcal{S}\_{1,1,\boldsymbol{\mu}}^{\*})\right]-\mathbb{E}\left[h\left(\frac{1-\boldsymbol{u}}{\boldsymbol{u}}\boldsymbol{Z}\right)\right]\Big)q\left(\boldsymbol{u};\boldsymbol{r},\boldsymbol{a},\frac{\boldsymbol{p}}{1-\boldsymbol{p}}\right)d\boldsymbol{u} =: J(h). \end{aligned}$$

By Theorem 1, for *h* ∈ H2, it holds

$$\left| \mathbb{E} \left[ h(S\_{1, 1, \mu}^{\*}) \right] - \mathbb{E} \left[ h \left( \frac{1 - u}{u} Z \right) \right] \right| \le d\_{\mathcal{H}^{1}} \left( S\_{1, 1, \mu}^{\*}, \frac{1 - u}{u} Z \right) = \left( \frac{1 - u}{u} \right)^{2} d\mu\_{2} \left( \frac{u}{1 - u} S\_{1, 1, \mu}^{\*}, Z \right)$$

$$\le \left( \frac{1 - u}{u} \right)^{2} \frac{u}{1 - u} \left( \frac{1}{2} \right) \mathbb{E} [X\_{\mathcal{N}u + 1}^{2}] = \left( \frac{1}{2} \right) \frac{1 - u}{u} \mathbb{E} [X\_{\mathcal{N}u + 1}^{2}]$$

where we take into account that *N*<sup>∗</sup> 1,1,*<sup>u</sup>* ∼ *NB*(1, *u*), and *NB*(1, *u*) coincides with *Geom*(*u*). Thus, *<sup>u</sup>* <sup>1</sup>−*<sup>u</sup> <sup>S</sup>*<sup>∗</sup> 1,1,*<sup>u</sup>* can be written as

$$\frac{u}{1-u} \sum\_{k=1}^{N\_u} \mathbf{X}\_{k\prime}$$

where *Nu* ∼ *Geom*(*u*), *Nu* and (*Xk*)*k*∈<sup>N</sup> are independent.

Therefore, for each *h* ∈ H2, *p* 1−*p* 2/*<sup>α</sup>* |*J*(*h*)| is bounded by the right-hand side of Equation (56), and so the desired upper bound is obtained (recall that *μ* = 1).

Now, we turn to the lower bound of *<sup>d</sup>*H<sup>2</sup> (*W*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*, *<sup>G</sup>*1/*<sup>α</sup> <sup>r</sup>*,1 ). Take *<sup>h</sup>*(*x*) = *<sup>x</sup>*2/2 belonging to H2. Then, applying Equation (23) to evaluate E *S*<sup>∗</sup> 1,1,*u* <sup>2</sup> , one has

$$\begin{split} d\_{\mathcal{H}\_{2}}(W\_{r,n,p^\*}^\* G\_{r,1}^{1/n}) \\ \geq & \frac{1}{2} \left(\frac{p}{1-p}\right)^{2/a} \left| \int\_0^1 \left( \mathbb{E}\left[ \left( S\_{1,1,u}^\* \right)^2 \right] - \left( \frac{1-u}{u} \right)^2 \mathbb{E}\left[ G\_{1,1}^2 \right] \right) q\left( u; r, \frac{p}{1-p} \right) du \right| \\ &= \frac{1}{2} \left(\frac{p}{1-p}\right)^{2/a} \int\_0^1 \left( \frac{1-u}{u} \right) \mathbb{E}[X\_{N\_u+1}^2] q\left( u; r, \frac{p}{1-p} \right) du, \end{split} \tag{61}$$

where *G*1,1 = *Z* ∼ *Exp*(1). Thus, Equation (61) completes the proof.

**Corollary 1.** *Let conditions of Theorem <sup>4</sup> be satisfied and also <sup>μ</sup>*<sup>2</sup> = sup*n*∈<sup>N</sup> E[*X*<sup>2</sup> *<sup>n</sup>*] < ∞*. Then, the right-hand side of Equation* (56) *is finite and*

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{r,a,p\prime}^\* \mathcal{G}\_{r,1}^{1/\alpha}) \le \frac{\mu\_2}{2\mu^2} \left(\frac{p}{1-p}\right)^{1/\alpha} \frac{\Gamma(r + \frac{1}{\alpha})}{\Gamma(r)}.$$

*The inequality becomes an equality if μ*<sup>2</sup> = E[*X*<sup>2</sup> *<sup>n</sup>*] *for all <sup>n</sup>* <sup>∈</sup> <sup>N</sup>*. In particular, if <sup>α</sup>* <sup>=</sup> <sup>1</sup> *then* <sup>Γ</sup>(*r*+1) <sup>Γ</sup>(*r*) = *r.*

**Proof.** According to Equation (57), for *h*(*x*) = *x*, *x* ∈ R,

$$\mathbb{E}\left[\mathcal{G}^{1/\mathfrak{a}}\_{r,p'(1-p)}\right] = \mathbb{E}[Z] \int\_0^1 \left(\frac{1-u}{u}\right) q\left(u; r, u, \frac{p}{1-p}\right) du.s.$$

Thus, the following relation is valid.

$$\int\_{0}^{1} \left(\frac{1-\mu}{\mu}\right) q\left(u; r, a, \frac{p}{1-p}\right) du = \mathbb{E}\left[G\_{r, p/(1-p)}^{1/a}\right].\tag{62}$$

Due to [27] (see page 8 there), for *δ* > 0, one has E[*G*<sup>∗</sup> *<sup>r</sup>*,*α*,*λ*] = <sup>Γ</sup>(*r*+1/*α*) *<sup>λ</sup>*1/*α*Γ(*r*) . Therefore,

$$\mathbb{E}\left[\mathcal{G}\_{r,p'/(1-p)}^{1/a}\right] = \mathbb{E}[\mathcal{G}\_{r,p,p'/(1-p)}^{\*}] = \left(\frac{1-p}{p}\right)^{\frac{1}{a}} \frac{\Gamma(r+\frac{1}{a})}{\Gamma(r)}.$$
 
$$\text{For } a = 1, \text{ we obtain } \mathbb{E}[\mathcal{G}\_{r,p'/(1-p)}] = \frac{1-p}{p} \frac{\Gamma(r+1)}{\Gamma(r)} = r \frac{(1-p)}{p}. \quad \square$$

#### **6. Convergence of Random Sums of Exchangeable Summands to Generalized Gamma Distribution**

Consider the model of exchangeable random variables *X*1, *X*2, ... described in Section 4. Introduce the distribution of a random variable *U*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* as the following mixture

$$\mathbb{P}\_{\mathcal{U}\_{r,a,\boldsymbol{\lambda}}^\*} = \rho \mathbb{P}\_{\left(\frac{V\_{r,a,\boldsymbol{\lambda}}^\* \boldsymbol{X}\_1}{\mu}\right)} + (1 - \rho) \mathbb{P}\_{Z\_{r,a,\boldsymbol{\lambda}'}^\*} \tag{63}$$

where *ρ* ∈ [0, 1], *α* > 0, *r* > 0, *μ* := E[*X*1], *μ* = 0, random variables *X*1, *V*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* are independent, *V*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* <sup>∼</sup> *<sup>G</sup>*∗(*r*, *<sup>α</sup>*, *<sup>λ</sup>*), *<sup>Z</sup>*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>λ</sup>* <sup>∼</sup> *<sup>G</sup>*∗(*r*, *<sup>α</sup>*, *<sup>λ</sup>*). Since <sup>E</sup>[*G*2/*<sup>α</sup> <sup>r</sup>*,*<sup>λ</sup>* ] = <sup>Γ</sup>(*r*+2/*α*) *<sup>λ</sup>*2/*α*Γ(*r*) (see, e.g., page 8 [27]), one has

$$\mathbb{E}\left[ (\mathcal{U}^\*\_{r,a,\lambda})^2 \right] = \left( \rho \frac{\mathbb{E}[X\_1^2]}{\mu^2} + (1 - \rho) \right) \frac{\Gamma(r + 2/a)}{\lambda^{2/a} \Gamma(r)}.\tag{64}$$

Due to the properties of generalized gamma distributions, for any positive number *c*,

$$\begin{split} \frac{1}{c^{\alpha}} \mathcal{U}^{\*}\_{r,\mu,\lambda} &= \frac{1}{c^{\alpha}} \left( (1 - \mathbb{I}\_{\rho}) \frac{V^{\*}\_{r,\mu,\lambda} X\_{1}}{\mu} + \mathbb{I}\_{\rho} Z^{\*}\_{r,\mu,\lambda} \right) \\ &= \left( (1 - \mathbb{I}\_{\rho}) \frac{V^{\*}\_{r,\mu,\lambda} X\_{1}}{\mu} + \mathbb{I}\_{\rho} Z^{\*}\_{r,\mu,\lambda} \right) = \mathcal{U}^{\*}\_{r,\mu,\lambda}, \end{split} \tag{65}$$

where indicator I*<sup>ρ</sup>* equals 1 and 0 with probabilities 1 − *ρ* and *ρ*, respectively, and is independent with all the variables under consideration. Note that *U*<sup>∗</sup> 1,1,1 has the same distribution as a random variable *Y*, having the law defined in Equation (42). Recall that the generalized negative binomial distribution *GNB*(*r*, *α*, *p*) is the law of a random variable *N*∗ *<sup>r</sup>*,*α*,*p*, see Equation (54). We will use the following result.

**Lemma 5.** *If r* > 0*, α* = 0*, p* ∈ (0, 1)*, then for N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup>* ∼ *GNB*(*r*, *α*, *p*) *one has*

$$\mathbb{E}\left[N\_{r,a,p}^\*\right] = \mathbb{E}\left[G\_{r,a,p}^\*(\boldsymbol{1}\_{(1-p)})\right], \qquad \mathbb{E}\left[N\_{r,a,p}^\*(N\_{r,a,p}^\* - 1)\right] = \mathbb{E}\left[\left(G\_{r,a,p}^\*(\boldsymbol{1}\_{(1-p)})^2\right)\right].\tag{66}$$

**Proof.** According to Equation (54), for each *n* ∈ N,

$$\sum\_{k=1}^{n} k \mathbb{P}(N\_{r,a,p}^\* = k) = \int\_0^{\infty} z \sum\_{k=1}^n \frac{z^{k-1}}{(k-1)!} e^{-z} g^\*(z; r, a, \frac{p}{1-p}) \, dz,$$

$$\sum\_{k=2}^n k(k-1) \mathbb{P}(N\_{r,a,p}^\* = k) = \int\_0^{\infty} z^2 \sum\_{k=2}^n \frac{z^{k-2}}{(k-2)!} e^{-z} g^\*(z; r, a, \frac{p}{1-p}) \, dz.$$

The desired statement follows from the monotone convergence theorem for the Lebesgue integral by letting *n* → ∞.

**Theorem 5.** *Let X*1, *X*<sup>2</sup> ... *be exchangeable random variables, introduced in Section 4, such that* E[*X*1] = *μ,* E[*X*<sup>2</sup> <sup>1</sup>] = *μ*<sup>2</sup> < ∞*. Assume that for some ρ* ∈ (0, 1) *Equation* (41) *holds. Suppose that* (*Xn*)*n*∈<sup>N</sup> *and N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup> are independent, where N*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup>* ∼ *GNB*(*r*, *α*, *p*)*. Then, for W*<sup>∗</sup> *<sup>r</sup>*,*α*,*<sup>p</sup> defined in Equation* (55) *with parameters r* ∈ (0, 1)*, α* ∈ (0, 1]*, p* ∈ (0, 1) *and U*<sup>∗</sup> *<sup>r</sup>*,*α*,1 *given in Equation* (63)*, one has*

$$d\_{\mathcal{H}\_2}(\mathcal{W}\_{r,a,p^\*}^\* \mathcal{U}\_{r,a,1}^\*) = \frac{\mu\_2}{2\mu^2} \left(\frac{p}{1-p}\right)^{1/a} \frac{\Gamma(1+\frac{1}{a})}{\Gamma(r)}.\tag{67}$$

**Proof.** Without loss of generality, we can assume that *μ* = 1; otherwise, we consider *<sup>X</sup><sup>n</sup>* :<sup>=</sup> *Xn*/*μ*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. For such sequence, *<sup>μ</sup>*<sup>2</sup> <sup>=</sup> <sup>E</sup>*X*<sup>2</sup> <sup>1</sup> = *<sup>μ</sup>*2/*μ*2. Note that Equation (58) is true for dependent summands (see Theorem 1 [29]). Furthermore, for bounded *h*(*t*), *t* ∈ R, function *hx*(*t*) = *h*(*xt*) is also bounded for any *x* ∈ R. Thus, an employment of Equation (63) gives

$$\mathbb{E}\left[h\left(\mathcal{U}\_{r,a,\lambda}^\*\right)\right] = \rho \int\_{\mathbb{R}} \mathbb{E}\left[h\_{\mathbf{x}}\left(\mathcal{G}\_{r,\lambda}^{1/a}\right)\right] dF\_{\mathcal{X}\_1}(\mathbf{x}) + (1-\rho)\mathbb{E}\left[h\left(\mathcal{G}\_{r,\lambda}^{1/a}\right)\right].\tag{68}$$

Now we apply Equation (57) with bounded *hx* and by Fubini's theorem obtain:

$$\begin{split} \int\_{\mathbb{R}} \mathbb{E} \left[ h\_{\mathbf{x}} \left( \mathbf{G}\_{r,\lambda}^{1/\kappa} \right) \right] dF\_{\mathbf{X}\_{1}}(\mathbf{x}) &= \int\_{\mathbb{R}} \int\_{0}^{1} \mathbb{E} \left[ h\_{\mathbf{x}} \left( \frac{1-u}{u} V^{\*} \right) \right] q(u; r, a, \lambda) \, du \, dF\_{\mathbf{X}\_{1}}(\mathbf{x}) \\ &= \int\_{0}^{1} \mathbb{E} \left[ h \left( \frac{1-u}{u} \mathbf{X}\_{1} V^{\*} \right) \right] q(u; r, a, \lambda) \, du, \quad \text{(69)} \end{split} \tag{60}$$

where *X*<sup>1</sup> and *V*<sup>∗</sup> are independent and *V*<sup>∗</sup> ∼ *Exp*(1). Apply Equation (57) for the second summand of Equation (68). Then, Equation (69) yields

$$\begin{split} \mathbb{E}\left[h\left(\mathcal{U}\_{r,\mu,\lambda}^{\star}\right)\right] \\ = &\rho \int\_{0}^{1} \mathbb{E}\left[h\left(\frac{1-\underline{u}}{\underline{u}}X\_{1}V^{\star}\right)\right]q(\boldsymbol{u};r,\boldsymbol{u},\lambda)\,d\boldsymbol{u}+(1-\rho)\int\_{0}^{1} \mathbb{E}\left[h\left(\frac{1-\underline{u}}{\underline{u}}Z^{\star}\right)\right]q(\boldsymbol{u};r,\boldsymbol{u},\lambda)\,d\boldsymbol{u} \\ &= \int\_{0}^{1} \mathbb{E}\left[h\left(\frac{1-\underline{u}}{\underline{u}}\mathcal{U}\_{1,1,1}^{\star}\right)\right]q(\boldsymbol{u};r,\boldsymbol{u},\lambda)\,d\boldsymbol{u}, \tag{70} \end{split}$$

where *Z*<sup>∗</sup> ∼ *Exp*(1) and *U*<sup>∗</sup> 1,1,1 have the same distribution as *Y*, see Equation (42).

Recall that, for *h* ∈ H2, an inequality |*h*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*<sup>0</sup> holds for all *x* ∈ R and some positive constants *A*0, *B*<sup>0</sup> (see Remark 1). Moreover, E *U*<sup>∗</sup> *r*,*α*,*λ* <sup>2</sup> < ∞ according to Equation (64). So, employing bounded *hn*(*x*) = *h*(*x*)I(−*n*,*n*](*x*) tending to *h*(*x*) ∈ H<sup>2</sup> as *n* → ∞, one can invoke the Lebesgue dominated convergence theorem to claim that lim*n*→<sup>∞</sup> E *hn*(*U*<sup>∗</sup> *<sup>r</sup>*,*α*,*λ*) = E *h U*∗ *r*,*α*,*λ* . We take into account that

$$\int\_{0}^{1} \mathbb{E}\left|h\_{\mathfrak{n}}\left(\frac{1-u}{u}\mathsf{U}\_{1,1,1}^{\*}\right)\right| q(u;r,a,\lambda) \,du \leq A\_{0} \mathbb{E}\left[\left(\mathsf{U}\_{1,1,1}^{\*}\right)^{2}\right] \int\_{0}^{1} \left(\frac{1-u}{u}\right)^{2} q(u;r,a,\lambda) \,du + B\_{0}.$$

The integral in the right-hand side of the latter formula is finite by Equation (60) and E *U*<sup>∗</sup> 1,1,1<sup>2</sup> < ∞ in accord with Equation (64). Thus, it is possible to apply the Lebesgue dominated convergence theorem to obtain

$$\lim\_{n \to \infty} \int\_0^1 \mathbb{E}\left[h\_n\left(\frac{1-u}{u}lI\_{1,1,1}^\*\right)\right]q(u;r,u,\lambda) \, du = \int\_0^1 \mathbb{E}\left[h\left(\frac{1-u}{u}lI\_{1,1,1}^\*\right)\right]q(u;r,u,\lambda) \, du$$

for any *h* ∈ H2. So, Equation (70) holds for all *h* ∈ H2.

In a similar way, lim*n*→<sup>∞</sup> E[*hn*(*S*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*)] = E *h S*∗ *r*,*α*,*p* for *h* ∈ H2. According to the Cauchy–Bunyakovsky–Schwarz inequality for identically distributed variables *X*1, *X*2, ... we have |E[*XiXj*]| ≤ *μ*<sup>2</sup> for *i*, *j* ∈ N and consequently

$$\begin{split} \mathbb{E}\left[\left(S\_{r,\mu,p}^{\*}\right)^{2}\right] &= \sum\_{k=0}^{\infty} \mathbb{P}(N\_{r,\mu,p}^{\*} = k) \mathbb{E}\left[\left(\sum\_{j=1}^{k} X\_{j}\right)^{2}\right] \\ &\leq \mu\_{2} \sum\_{k=0}^{\infty} \mathbb{P}(N\_{r,\mu,p}^{\*} = k) \mathbb{k}^{2} = \mu\_{2} \mathbb{E}\left[\left(N\_{r,\mu,p}^{\*}\right)^{2}\right]. \end{split} \tag{71}$$

Equations (59) and (66) entail that E *N*<sup>∗</sup> *r*,*α*,*p* <sup>2</sup> < ∞. Thus, the dominated convergence theorem guarantees that lim*n*→<sup>∞</sup> E[*hn*(*S*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*)] = E *h S*∗ *r*,*α*,*p* . Furthermore, one can demonstrate that, for each *h* ∈ H2,

$$\lim\_{n \to \infty} \int\_0^1 \mathbb{E}\left[h\_n(\mathcal{S}\_{1,1,u}^\*)\right] q(u; r, u, \lambda) \, du = \int\_0^1 \mathbb{E}\left[h(\mathcal{S}\_{1,1,u}^\*)\right] q(u; r, u, \lambda) \, du. \tag{72}$$

For this purpose we note that Equation (71) implies

$$\int\_0^1 \mathbb{E}\left|h\_n(\mathcal{S}\_{1,1,u}^\*)\right| q(u;r,u,\lambda) \, du \le \mathcal{C} + A\mu\_2 \int\_0^1 \mathbb{E}\left[\left(\mathcal{N}\_{1,1,u}^\*\right)^2\right] q(u;r,u,\lambda) \, du.$$

According to Equation (66) one has

$$\int\_0^1 \mathbb{E}\left[\left(N\_{1,1,u}^\*\right)^2\right] q(u;r,a,\lambda) \, du = \int\_0^1 \left(\mathbb{E}\left[\left(G\_{1,1,u/(1-u)}^\*\right)^2\right] + \mathbb{E}\left[G\_{1,1,u/(1-u)}^\*\right]\right) q(u;r,a,\lambda) \, du \, du$$

The latter integral is finite because one can take *h*(*x*) = *x* and *h*(*x*) = *x*2/2 in Equation (57) and invoke Equation (59). Then, it is possible to use the dominated convergence theorem once again to establish Equation (72).

Now, combining Equation (58) and Equation (70) leads for any *h* ∈ H<sup>2</sup> to the relation

$$\begin{split} \mathbb{E}\left[h(S\_{r,\mu,p}^{\*})\right] - \mathbb{E}\left[h(\mathcal{U}\_{r,\mu,p}^{\*}(1\_{p}))\right] \\ = \int\_{0}^{1} \left(\mathbb{E}\left[h(S\_{1,1,\mu}^{\*})\right] - \mathbb{E}\left[h\left(\frac{1-\mu}{\mu}\mathcal{U}\_{1,1,1}^{\*}\right)\right]\right) q\left(\mu;r,\alpha,\frac{p}{1-p}\right) d\mu. \end{split} \tag{73}$$

Note that a random variable *N*<sup>∗</sup> 1,1,*<sup>u</sup>* follows the geometric distribution *Geom*(*u*) with parameter *u* ∈ (0, 1). For each *h* ∈ H<sup>2</sup> and any *u* ∈ (0, 1), by Theorem 3 and in view of *d*H<sup>2</sup> homogeneity, we obtain

$$\begin{split} \left| \mathbb{E} \left[ h(S\_{1,1,u}^\*) \right] - \mathbb{E} \left[ h \left( \frac{1-u}{u} \mathcal{U}\_{1,1,1}^\* \right) \right] \right| &\leq d\_{\mathcal{H}\_2} \left( S\_{1,1,u}^\* \frac{1-u}{u} \mathcal{U}\_{1,1,1}^\* \right) \\ &= \left( \frac{1-u}{u} \right)^2 d\_{\mathcal{H}\_2} (\mathcal{W}\_{\mathbb{H}^\*} Y) \leq \left( \frac{1-u}{u} \right)^2 \left( \frac{u}{1-u} \right) \frac{\mu\_2}{2} = \left( \frac{1-u}{u} \right) \frac{\mu\_2}{2}. \end{split} \tag{74}$$

Employing Equations (73), (74) and (62) one deduces

$$d\_{\mathcal{H}\_2}(S\_{r,a,p\prime}^\* \mathcal{U}\_{r,a,p\prime/(1-p)}^\*) \le \frac{\mu\_2}{2} \int\_0^1 \left(\frac{1-u}{u}\right) q\left(u;r,a,\frac{p}{1-p}\right) du = \frac{\mu\_2}{2} \mathbb{E}\left[G\_{r,p\prime/(1-p)}^{1/a}\right]. \tag{75}$$

Equation (65) implies by virtue of *d*H<sup>2</sup> homogeneity that

$$d\_{\mathcal{H}\_2}(W\_{r,a,p'}^\* \wr l\_{r,a,1}^\*) = \left(\frac{p}{1-p}\right)^{2/a} d\_{\mathcal{H}\_2}(S\_{r,a,p'}^\* l L\_{r,a,p'/(1-p)}^\*). \tag{76}$$

Combining Equations (59), (75) and (76) we conclude that the right-hand side of Equation (67) is an upper bound for *<sup>d</sup>*H<sup>2</sup> (*W*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*, *U*<sup>∗</sup> *<sup>r</sup>*,*α*,1).

Choosing *h*(*x*) = *x*2/2 in Equation (73), upon employing Equation (52) and Equation (62) one infers:

$$\begin{split} &d\_{\theta\_{2}}(W\_{r,\mathfrak{a},p'}^{\*}G\_{r,1}^{1/n}) \geq \\ &\geq \frac{1}{2} \left(\frac{p}{1-p}\right)^{2/a} \left| \int\_{0}^{1} \left(\mathbb{E}\left[\left(S\_{1,1,u}^{\*}\right)^{2}\right] - \left(\frac{1-u}{u}\right)^{2} \mathbb{E}\left[\left(\mathcal{U}\_{1,1,1}^{\*}\right)^{2}\right] \right) q\left(u;r,u,\frac{p}{1-p}\right) du\right| = \\ &= \frac{\mu\_{2}}{2} \left(\frac{p}{1-p}\right)^{2/a} \int\_{0}^{1} \left(\frac{1-u}{u}\right) q\left(u;r,u,\frac{p}{1-p}\right) du = \frac{\mu\_{2}}{2} \left(\frac{p}{1-p}\right)^{2/a} \mathbb{E}[G\_{r,\mathfrak{a},p'/(1-p)}^{\*}]. \end{split}$$

Using Equation (59) once again, we see that the right-hand side of Equation (67) is a lower bound for *<sup>d</sup>*H<sup>2</sup> (*W*<sup>∗</sup> *<sup>r</sup>*,*α*,*p*, *U*<sup>∗</sup> *<sup>r</sup>*,*α*,1).

#### **7. Inverse to Equilibrium Transformation**

The development of Stein's method is closely connected with various transformations of distributions. Let a random variable *W* ≥ 0 and 0 < *μ* = E[*W*] < ∞. Then, one says that a random variable *W<sup>s</sup>* has the *W*-size biased distribution if for all *f* such that E[*W f*(*W*)] exists

$$\mathbb{E}[\mathcal{W}f(\mathcal{W})] = \mu \mathbb{E}[f(\mathcal{W}^s)].$$

The connection of this transformation with Stein's equation was considered in [50,51]. It was pointed out in [51] that this transformation works well for combinatorial problems, such as counting the number of vertices in a random graph having prespecified degrees, see also [52]. In [53], another transformation was introduced. Namely, if a random variable *W* has mean zero and variance *σ*<sup>2</sup> ∈ (0, ∞), then the authors of [53] write (Definition 1.1) that a variable *W*<sup>∗</sup> has *W*-zero biased distribution whenever, for all differentiable *f* such that E*W f*(*W*) exists, the following relation holds

$$\mathbb{E}[\mathcal{W}f(\mathcal{W})] = \sigma^2 \mathbb{E}[f'(\mathcal{W}^\*)].$$

This definition is inspired by an equation E[*W f*(*W*)] = *σ*2E[ *f* (*W*)] characterizing the normal law *N*(0, *σ*2). The authors of [53] explain that *W*<sup>∗</sup> always exists if E[*W*] = 0 and var*W* ∈ (0, ∞). Zero-based coupling for products of normal random variables is treated in [54]. In Sec. 2 of [30], it is demonstrated that the gamma distribution is uniquely characterised by the property that its size-biased distribution is the same as its zero-biased distribution. Two generalizations of zero biasing were proposed in [55], see p. 104 of that paper for discussion of these transformations. We refer also to survey [56].

Now, we turn to the equilibrium distribution transformation introduced in [33] and concentrate on approximation of the law under consideration by means of an exponential law, see the corresponding Definition 1 in Section 2.

According to the second part of Theorem 2.1 of [33] (in our notation), for *Z* ∼ *Exp*(1) and non-negative random variable *X* with E[*X*] = 1 and E[*X*2] < ∞ the following estimate holds

$$d\_{\mathcal{H}\_1}(X, Z) \le 2\mathbb{E}|X^{\varepsilon} - X|\_{\prime}$$

*<sup>d</sup>*H<sup>1</sup> (*X<sup>e</sup>*

and at the same time

The authors of [33] also proved that *d*K(*X<sup>e</sup>* , *Z*) ≤ E|*X<sup>e</sup>* − *X*|. Notice that the estimate for *d*H<sup>1</sup> (*X<sup>e</sup>* , *Z*) is more precise than that for *d*H<sup>1</sup> (*X*, *Z*).

, *Z*) ≤ E|*X<sup>e</sup>* − *X*|. (77)

Now we turn to Equation (77) and demonstrate how to find the distribution of *X* when we know the distribution of *X<sup>e</sup>* . In other words, we concentrate on the inverse of an equilibrium distribution transformation.

Assume that E[*X*] > 0. Recall that a random variable *X<sup>e</sup>* exists if *Fe*(*x*) appearing in Equation (16) is a distribution function. The latter statement for E[*X*] > 0 is equivalent to nonnegativity of *X*. Indeed, for non-negative *X*, *Fe*(*x*) coincides with a distribution function having a density (15). If *Fe*(*x*) is a distribution function and E[*X*] > 0 in Equation (16), then *Fe*(*x*) ≥ 0 for *x* < 0 only if *F*(*x*) = 0 for *x* < 0.

Thus a random variable *X<sup>e</sup>* has a (version of) density *pe*(*x*) introduced in Equation (15). Obviously, the function *pe*(*x*) has the following properties. It is nonincreasing on [0, ∞) and *pe*(*x*) = 0 for *x* < 0. This density is right-continuous on [0, ∞) and consequently *pe*(0) < ∞. Now, we are able to provide a full description of the class of densities for random variables *X<sup>e</sup>* relevant to all non-negative *X* with positive mean.

**Lemma 6.** *Let a non-negative random variable X<sup>e</sup> have a version of density* (*with respect to the Lebesgue measure*) *pe*(*x*)*, x* ∈ R*, such that this function is nonincreasing on* [0, ∞)*, pe*(*x*) = 0 *for*

*x* < 0*, and there is finite* lim*x*→0<sup>+</sup> *pe*(*x*)*. Then, there exists a unique preimage of X<sup>e</sup> distribution having the distribution function F continuous at x* = 0*. Namely,*

$$F(\mathbf{x}) = \begin{cases} 1 - \frac{p^\*(\mathbf{x})}{p^\*(\mathbf{0})}, & \mathbf{x} \ge \mathbf{0}, \\ 0, & \mathbf{x} < \mathbf{0}. \end{cases} \tag{78}$$

**Proof.** First of all, note that *pe*(0) > 0 as otherwise *pe*(*x*) = 0 for all *x* ∈ R (*p<sup>e</sup>* is a nonincreasing function on [0, ∞)). We also know that there exist a left-sided limit and a right-sided limit of *p<sup>e</sup>* at each point *x* ∈ (0, ∞) as well as the right-sided limit of *p<sup>e</sup>* at *x* = 0. The set of discontinuity points of *p<sup>e</sup>* is at most countable, and we can take a version which is right continuous at each point of [0, ∞). Then, Equation (78) introduces a distribution function. Consider a random variable *X* with distribution function *F* and check the validity of Equation (14).

The integration by a parts formula yields, for any *b* > 0,

$$1 \ge \int\_0^b p^\varepsilon(\mathbf{x}) \, d\mathbf{x} = bp^\varepsilon(b) + p^\varepsilon(0) \int\_0^b \mathbf{x} \, dF(\mathbf{x}).\tag{79}$$

Summands in the right-hand side of Equation (79) are non-negative. Therefore, for any *b* > 0, E[*X*I(*X* ≤ *b*)] ≤ 1/*pe*(0). Hence, the monotone convergence theorem implies that E[*X*] is finite. According to Equation (78)

$$bp^\varepsilon(b)/p^\varepsilon(0) = b(1 - F(b)) = b\mathbb{P}(X > b) \to 0, \quad b \to \infty,\tag{80}$$

since E[*X*] < ∞. Taking in the Equation (79) limit as *b* → ∞, one obtains 1 = *pe*(0)E[*X*]. Now, we are ready to verify Equation (14). For any Lipschitz function *f* , E[ *f*(*X*)] is finite and

$$\mathbb{E}[f(X)] = \int\_0^\infty f(\mathbf{x}) dF(\mathbf{x}) = -\frac{1}{p^e(0)} \int\_0^\infty f(\mathbf{x}) dp^e(\mathbf{x}) .$$

Taking into account Equation (80), we infer that *f*(*b*)*pe*(*b*) → 0 as *b* → ∞. Consequently, applying integration by parts once again (*f* has bounded variation), we obtain

$$\begin{aligned} \mathbb{E}[X]\mathbb{E}[f'(X^{\varepsilon})] &= \frac{1}{p^{\varepsilon}(0)} \int\_{0}^{\infty} f'(\mathbf{x}) p^{\varepsilon}(\mathbf{x}) \, d\mathbf{x} = \frac{1}{p^{\varepsilon}(0)} \int\_{0}^{\infty} p^{\varepsilon}(\mathbf{x}) df(\mathbf{x}) \\ &= \frac{1}{p^{\varepsilon}(0)} \left[ -f(0) p^{\varepsilon}(0) - \int\_{0}^{\infty} f(\mathbf{x}) dp^{\varepsilon}(\mathbf{x}) \right] = \mathbb{E}[f(X)] - f(0). \end{aligned}$$

Uniqueness of *X* distribution corresponding to *X<sup>e</sup>* is a consequence of Equation (15) and continuity of *F*(*x*) at *x* = 0. Indeed, assume that for *X*<sup>1</sup> and *X*<sup>2</sup> one has *X<sup>e</sup>* <sup>1</sup> = *<sup>X</sup><sup>e</sup>* <sup>2</sup>. Then, Equation (15) yields that for almost all *x* ≥ 0,

$$\frac{1}{\mathbb{E}[X\_1]}\mathbb{P}(X\_1 > \varkappa) = \frac{1}{\mathbb{E}[X\_2]}\mathbb{P}(X\_2 > \varkappa),\tag{81}$$

and therefore P(*X*<sup>1</sup> > *x*) = *c*P(*X*<sup>2</sup> > *x*), where *c* is a positive constant (the equilibrium distribution in Definition 1 is introduced for random variables with positive expectation only). Since P(*X*<sup>1</sup> = 0) = P(*X*<sup>2</sup> = 0) = 0, one has P(*X*<sup>1</sup> > 0) = P(*X*<sup>2</sup> > 0). Let *xn* → 0+, *n* → ∞, where the points *xn* belong to the set considered in Equation (81) to ensure that *c* = 1. Thus, distributions of *X*<sup>1</sup> and *X*<sup>2</sup> coincide.

**Remark 6.** *Let Xp be the Bernoulli random variable taking values* 1 *and* 0 *with probabilities p and* 1 − *p, respectively. Then, it is easily seen that the distribution of X<sup>e</sup> <sup>p</sup> is uniform on* [0, 1]*. Thus, in contrast to Lemma 6, without assumption of continuity of F at a point x* = 0 *one can not guarantee, in general, the preimage uniqueness for the inverse transformation to the equilibrium one.*

In the proof of Lemma 6, we find out that E[*X*] = 1/*pe*(0). Set *λ* = *pe*(0), *Z* ∼ *Exp*(*λ*). Then, E[*X*] = E[*Z*]. Further, we suppose that this choice of *λ* is made.

Recall that random variables *U* and *V* are stochastically ordered if either P(*U* ≤ *x*) ≤ P(*V* ≤ *x*), for every *x* ∈ R, or the opposite inequality holds (for all *x* ∈ R). Now, we clarify one of the Theorem 2.1 of [33] statements (see also Theorem 3 [22], where the result similar to Theorem 2.1 of [33] is formulated employing the generalized distributions).

**Theorem 6.** *Let a random variable X<sup>e</sup> satisfy conditions of Lemma 6, and* E[*Xe*] < ∞ *and X be a preimage of the equilibrium transformation. Then, Equation* (77) *holds. Moreover, the inequality becomes an equality when X and X<sup>e</sup> are stochastically ordered.*

**Proof.** Apply the Stein Equation (10) along with equilibrium transformation (14). Then, in light of E[*X*] = <sup>1</sup> *<sup>λ</sup>* and <sup>E</sup>*fh*(*X*) <sup>−</sup> *fh*(0) = <sup>1</sup> *<sup>λ</sup>*E*<sup>f</sup> <sup>h</sup>*(*Xe*), we can write

$$\begin{split} \left| \mathbb{E} [h(X^{\varepsilon})] - \mathbb{E} [h(Z)] \right| &= \left| \mathbb{E} \left( f\_h^{\ell}(X^{\varepsilon}) - \lambda f\_h(X^{\varepsilon}) \right) + \lambda f(0) \right| \\ &= \lambda \left| \mathbb{E} \left( f\_h(X^{\varepsilon}) - f\_h(X) \right) \right| \leq \lambda ||f\_h^{\prime}||\_{\infty} \mathbb{E} |X^{\varepsilon} - X| \leq ||h^{\prime}||\_{\infty} \mathbb{E} |X^{\varepsilon} - X|. \end{split} \tag{82}$$

The last inequality in (82) is true due to Lemma 2. Now, we demonstrate that equality in (82) can be attained. Taking *h*(*x*) = *x* − <sup>1</sup> *<sup>λ</sup>* , we have a solution *fh*(*x*) = <sup>−</sup> <sup>1</sup> *<sup>λ</sup> x* of Equation (12). Then,

$$\left| \mathbb{E} [h(X^{\varepsilon})] - \mathbb{E} [h(Z)] \right| = \lambda \left| \mathbb{E} \left( f\_h(X^{\varepsilon}) - f\_h(X) \right) \right| = \left| \mathbb{E} (X^{\varepsilon} - X) \right|.$$

Employing the integration by parts formula, one can show that the expression in the right-hand side of the last equality is equal to the Kantorovich distance between *X* and *X<sup>e</sup>* when these variables are stochastically ordered. Note that *x*(1 − *F*(*x*)) → 0, *x*(1 − *Fe*(*x*)) → 0 as *x* → ∞ and *xF*(*x*) → 0, *xFe*(*x*) → 0 as *x* → −∞ because E[*X*] and E[*Xe*] are finite. Thus,

$$\begin{aligned} \left| \left| \mathbb{E}[X^{\varepsilon}] - \mathbb{E}[X] \right| \right| &= \left| \int\_{\mathbb{R}} \mathbf{x} \left( dF\_{X^{\varepsilon}}(\mathbf{x}) - dF\_{X}(\mathbf{x}) \right) \right| \\ &= \left| - \int\_{\mathbb{R}} \left( F\_{X^{\varepsilon}}(\mathbf{x}) - F\_{X}(\mathbf{x}) \right) d\mathbf{x} \right| = \int\_{\mathbb{R}} \left| F\_{X^{\varepsilon}}(\mathbf{x}) - F\_{X}(\mathbf{x}) \right| d\mathbf{x} \,\omega \end{aligned}$$

since *FXe* (*x*) ≥ *FX*(*x*) (or ≤) for all *x* ∈ R. It is well-known that the Kantorovich distance is the minimal one for the metric *τ*(*U*, *V*) = E|*U* − *V*| (see, e.g., [9], Ch. 1, §1.3). Therefore,

$$\int\_{\mathbb{R}} |F\_{\mathcal{X}^c}(\mathfrak{x}) - F\_{\mathcal{X}}(\mathfrak{x})| \, d\mathfrak{x} = \inf \mathbb{E} |U - V| \,$$

where the infimum has taken over all joint laws (*U*, *V*) such that P*<sup>U</sup>* = P*X<sup>e</sup>* and P*<sup>V</sup>* = P*<sup>X</sup>* (see also Remark 2 and [10], Corollary 5.3.2). Consequently, in the framework of Theorem 6, E[*Xe*] <sup>−</sup> <sup>E</sup>[*X*] <sup>=</sup> <sup>E</sup>|*X<sup>e</sup>* <sup>−</sup> *<sup>X</sup>*|.

**Remark 7.** *One can show that by means of Lemma 2 and Equation* (82) *it is possible to provide an estimate*

$$d\_{\mathcal{K}}(X^{\varepsilon}, Z) \le \lambda \mathbb{E}|X^{\varepsilon} - X|. \tag{83}$$

*For each function h belonging to* K*, in a similar way to Equation* (82)*, one can apply Equation* (10) *together with equilibrium transformation. Now, it is sufficient to study the Stein equation with right derivative. Formula* (13) *gives a solution of the Stein equation according to Lemma 2. Note that for fh, the right derivative coincides almost everywhere with the derivative, and the law of X<sup>e</sup> is absolutely continuous according to Equation* (15)*. Thus, for the Lipschitz function fh (see Lemma 2), one can use an equilibrium transformation.*

**Example 1.** Consider the distribution functions *Fε*(*x*) of random variables *Xε*, taking values *ε* and 2 − *ε* with probabilities 1/2, 0 < *ε* < 1. Formula (15) yields that *X<sup>e</sup> <sup>ε</sup>* has the following piece-line structure

$$F\_{\varepsilon}^{\varepsilon}(\mathbf{x}) = \begin{cases} 0, & \text{if } \mathbf{x} < \mathbf{0}, \\ \mathbf{x}, & \text{if } \mathbf{0} \le \mathbf{x} < \varepsilon, \\ \mathbf{x}/2 + \varepsilon/2, & \text{if } \varepsilon \le \mathbf{x} < 2 - \varepsilon, \\ 1, & \text{if } 2 - \varepsilon \le \mathbf{x}. \end{cases}$$

If *ε* ≥ 1/2 then, for all *x* ∈ R, the following inequality holds: *F<sup>e</sup> <sup>ε</sup>*(*x*) ≥ *Fε*(*x*), i.e., *X<sup>ε</sup>* and *X<sup>e</sup> ε* are stochastically ordered. We see that for *ε* < 1/2, the inequality is violated in the right neighborhood of a point *ε*. Thus, there are beside the stochastically ordered pairs (*X*, *X<sup>e</sup>* ) also those of a different kind.

Now, we turn to another example of stochastically ordered *X* and *X<sup>e</sup>* .

**Example 2.** Take *X<sup>e</sup>* having the Pareto distribution. The notation *X<sup>e</sup>* ∼ *Pareto*(*α*, *β*) means that *<sup>X</sup><sup>e</sup>* has a density *<sup>f</sup> <sup>e</sup>*(*x*) = *αβ<sup>α</sup>* (*x*+*β*)*α*+<sup>1</sup> (*<sup>x</sup>* ≥ 0) and the corresponding distribution function *Fe*(*x*) = 1 − *β x*+*β α* , where *x* ≥ 0, *α* > 0, *β* > 0.

Further, we consider only *α* > 1, since in this case there exists finite E[*Xe*] = *<sup>β</sup> <sup>α</sup>*−<sup>1</sup> . By means of Lemma 6, we obtain the distribution of the preimage of the equilibrium transformation

$$F(\mathbf{x}) = 1 - \frac{f^{\varepsilon}(\mathbf{x})}{f^{\varepsilon}(0)} = 1 - \frac{a\beta^{a}}{(\mathbf{x} + \beta)^{a+1}} \frac{\beta^{a+1}}{a\beta^{a}} = 1 - \left(\frac{\beta}{\mathbf{x} + \beta}\right)^{a+1}, \mathbf{x} \ge 0.$$

Thus one can state that *X* ∼ *Pareto*(*α* + 1, *β*). It is not difficult to see that *Fe*(*x*) ≤ *F*(*x*) for *x* ∈ R, i.e., the random variables *X<sup>e</sup>* and *X* are stochastically ordered. Due to Theorem 6, one has

$$d\_{\mathcal{H}\_1}(X^{\varepsilon}, Z) = \mathbb{E}[X^{\varepsilon} - X] = \mathbb{E}[X^{\varepsilon}] - \mathbb{E}[X] = \frac{\beta}{\alpha - 1} - \frac{\beta}{\alpha} = \frac{\beta}{\alpha(\alpha - 1)},\tag{84}$$

$$d\_{\mathcal{K}}(X^{\varepsilon}, Z) \le \frac{\alpha}{\beta} \, \mathbb{E}|X^{\varepsilon} - X| = \frac{1}{\alpha - 1}.$$

In such a way we find the bound for the Kolmogorov distance between the distributions *Pareto*(*α*, *β*) and *Exp*(*α*/*β*). This relation demonstrates the convergence rate of *d*1(*X<sup>e</sup>* , *Z*) to zero as *α* → ∞. The estimate is nontrivial for *α* > 2.

**Remark 8.** It is interesting that estimation of the proximity of the Pareto law to the Exponential one became important in signal processing, see [34] and references therein. Let *X* ∼ *Pareto*(*α*, *β*), where *α* > 0, *β* > 0, and *Z* ∼ *Exp*(*λ*). In [34], the author indicates that the Pinsker–Csiszár inequality was employed to derive

$$d\_{\mathcal{K}}(X, Z) \le \sqrt{2D\_{KL}(X||Z)},\tag{85}$$

where *DKL*(*X*||*Z*) is the Kullback–Leibler divergence between laws of *X* and *Z*. More precisely, in the left-hand side of Equation (85) one can write the total variation distance *dTV*(*X*, *Z*) between distributions of *X* and *Z*. Clearly, *d*K(*X*, *Z*) ≤ *dTV*(*X*, *Z*). By evaluating *DKL*(*X*||*Z*) and performing an optimal choice of parameter *λ*, it was demonstrated (formula (19) in [34]) that, for *α* > 1 and any *β* > 0,

$$d\_{\mathcal{K}}(X, Z) \le \sqrt{\frac{2}{\alpha(\alpha - 1)}}\tag{86}$$

if *λ* = *<sup>α</sup>*−<sup>1</sup> *<sup>β</sup>* . The author of [34] on page 8 writes that in his previous work [57] the inequality

$$d\_{\mathcal{K}}(X, Z) \le \frac{3}{a} \tag{87}$$

was established with the same choice of *λ*. Next, he also writes that "in the most cases *α* > 2" and notes that the estimate in Equation (86) involving the Kullback–Leibler divergence is more precise for *α* > <sup>9</sup> <sup>7</sup> than the estimate in Equation (87) obtained by the Stein method. Moreover, on page 4 of [34] we read: "The problem with the Stein approach is that the bounds do not suggest a suitable way in which, for a given Pareto model, an appropriate approximating Exponential distribution can be specified". However, we have demonstrated that application of the inverse equilibrium transformation together with the Stein method permits indicating, whenever *α* > 2, the corresponding Exponential distribution with proximity closer than the right-hand sides of Equation (86) and Equation (87) can provide.

#### **8. Conclusions**

Our principle goal was to find the sharp estimates of the proximity of random sums distributions to exponential and more general laws. This goal is achieved when we employ the probability metric *d*H<sup>2</sup> . Thus, it would be valuable to find the best possible approximations of random sums distributions by means of specified laws using the metrics *ζ<sup>s</sup>* of order *s* > 0. The results of [32] provide the basis for this approach.

There are various complementary refinements of the Rényi theorem. One approach is related to the employment of Brownian motion. It is interesting that in [58] (p. 1071) the authors proposed an explanation of the Rényi theorem involving the embedding theorem. We provide a little bit different complete proof. Let *X*1, *X*2, ... be i.i.d. random variables with mean *μ* := E*X*<sup>1</sup> and *σ*<sup>2</sup> := var*X*<sup>1</sup> < ∞, whereas *Sn*, *n* ∈ N, denote the corresponding partial sums. According to Theorem 12.6 of [59], which is due to A.V. Skorokhod and V. Strassen, there exists a standard Brownian motion *B*(*t*), *t* ≥ 0, (perhaps it is defined on an extension of initial probability space) such that

$$\frac{1}{\sqrt{t}}\sup\_{0\le u\le t}|\mathcal{S}\_{[u]} - \mu u - \sigma B(u)| \stackrel{\mathbb{P}}{\to} 0, \ t \to \infty,\tag{88}$$

and

$$\lim\_{t \to \infty} \frac{S\_{[t]} - \mu t - \sigma B(t)}{\sqrt{2t \log \log t}} = 0 \quad a.s.. \tag{89}$$

where <sup>P</sup> → stands for convergence in probability, and a.s. means almost surely. Thus, in light of Equation (89), we can write, for *t* ≥ 0,

$$S\_{[t]} = \mu t + \sigma B(t) + R(t),\tag{90}$$

where sup0<sup>≤</sup>*u*≤*<sup>t</sup> R*(*u*)/ <sup>√</sup>*<sup>t</sup>* <sup>P</sup> → 0 and *R*(*t*)/ -2*t* log log *t* → 0 a.s. when *t* → ∞. Substitute *Np* (see Equation (2)) in Equation (90) instead of *<sup>t</sup>*. It is easily seen that *Np* <sup>P</sup> → ∞ (i.e., for each *t* > 0, one has P(*Np* ≤ *t*) → 0 as *p* → 0+) and by means of characteristic functions one can verify that *pNp* <sup>D</sup> <sup>→</sup> *<sup>Z</sup>* as *<sup>p</sup>* <sup>→</sup> <sup>0</sup>+, where *<sup>Z</sup>* <sup>∼</sup> *Exp*(1). Therefore, *<sup>μ</sup>pNp* <sup>D</sup> → *μZ*, *p* → 0+. In the proof of Lemma 4, we showed (Equation (24)) that E[*Np*]=(1 − *p*)/*p*. Consequently,

$$\begin{aligned} \text{var}[pB(N\_p)] &= p^2 \mathbb{E}[B(N\_p)^2] = p^2 \sum\_{k=0}^{\infty} \mathbb{E}[B(k)^2] p (1-p)^k \\ &= p^2 \sum\_{k=0}^{\infty} k p (1-p)^k = p^2 \mathbb{E}[N\_p] = p^2 \frac{1-p}{p} = p(1-p) \to 0, \quad p \to 0+. \end{aligned}$$

Hence, *<sup>p</sup>σB*(*Np*) <sup>P</sup> <sup>→</sup> 0 as *<sup>p</sup>* <sup>→</sup> <sup>0</sup>+. Now, we demonstrate that *pR*(*Np*) <sup>P</sup> → 0, *p* → 0 + . For any *ε* > 0 and any *t* > 0,

$$\begin{aligned} \mathbb{P}(p|\mathcal{R}(N\_p)| > \varepsilon) &\leq \mathbb{P}(p|\mathcal{R}(N\_p)| > \varepsilon, N\_p \leq t) + \mathbb{P}(N\_p > t) \\ &\leq \mathbb{P}(p \sup\_{0 \leq u \leq t} |\mathcal{R}(u)| > \varepsilon) + \mathbb{P}(N\_p > t). \end{aligned}$$

In light of Equation (88), for arbitrary *γ* > 0 and *ε* > 0, one can take *t*<sup>0</sup> = *t*0(*γ*) such that <sup>P</sup>(sup0<sup>≤</sup>*u*≤*t*<sup>0</sup> <sup>|</sup>*R*(*u*)<sup>|</sup> <sup>&</sup>gt; *<sup>ε</sup>* <sup>√</sup>*t*0) <sup>&</sup>lt; *<sup>γ</sup>*/2. Then, for any 0 <sup>&</sup>lt; *<sup>p</sup>* <sup>≤</sup> 1/√*t*0, we obtain

$$\mathbb{P}(p\sup\_{0\le u\le t\_0}|\mathcal{R}(\mu)|>\varepsilon) < \gamma/2.$$

Since *Np* <sup>P</sup> → ∞, we can find *p*<sup>0</sup> > 0 such that P(*Np* > *t*0) < *γ*/2 if 0 < *p* ≤ *p*0. Therefore, *<sup>R</sup>*(*Np*) <sup>P</sup> → 0 as *p* → 0+. The Slutsky lemma yields the desired relation

$$p\mathcal{S}\_{N\_{\mathcal{P}}} \xrightarrow{\mathcal{D}} \mu Z, \quad p \to 0+, \mu$$

which implies Equation (3). However, it seems that there is no clear intuitive reason why the law of the random sum converges to an exponential in the Rényi theorem. Moreover, in Ch. 3, Sec. 2 "The Rényi Limit Theorem" of [20] (see Sec. 2.1 "Motivation"), one can find examples demonstrating that intuition behind the Rényi theorem is poor.

Actually, relation (90) leads to refinements of Equation (3). In [58], it is proved that if *X*<sup>1</sup> has finite exponential moments and other specified conditions are satisfied then there exists a more sophisticated approximation for distribution of *Wp*, and its accuracy is estimated. The results are applied to the study of *M*/*G*/1 queue for both light-tailed and heavy-tailed service time distributions. Note that in [58], Section 5, the authors study the model where the distribution of *X*<sup>1</sup> can depend on *p*. For future research, it would be desirable to establish analogues of our theorems for such a model.

The results concerning the accuracy of approximating a distribution under consideration by an exponential law are applicable to some queuing models. Let, for a queue *M*/*G*/1, the inter-arrival times follow *Exp*(*λ*) distribution and *S* stand for the general service time. Introduce the stationary waiting time *W* and define *ρ* := *λ*E[*S*] to be its load. Due to [60], if E[*S*3] < <sup>∞</sup> then (<sup>1</sup> − *<sup>ρ</sup>*)*<sup>W</sup>* <sup>D</sup> → *Z* as *ρ* → 1, where *Z* ∼ *Exp*(1). Theorem 3.1 of [45] contains an upper bound of *d*H<sup>1</sup> (*Wp*, *Z*), where *Z* ∼ *Exp*(1). This estimate is used by the authors for analysis of queueing systems with a single server. It would be interesting to obtain the sharp approximations in the framework of queueing systems.

For the model of exchangeable random variables, Theorem 2 in Section 2 ensures the weak convergence of distributions under consideration to specified mixture of explicitly indicated laws. Theorem 3 proves the sharp convergence rate estimate to this limit law by means of the ideal probability metric of the second order. It would be worthwhile to establish such an estimate of the distributions proximity applying the Lévy–Prokhorov distance because convergence in this metric is equivalent to the weak convergence of distributions of random variables. All the more, at present there is no unified theory of probability metrics. In this regard, one can mention Proposition 1.2 of [17] stating that if a random variable *Z* has the Lebesgue density bounded by *C* then, for any random variable *Y*,

$$d\_{\mathbb{K}}(\boldsymbol{\Upsilon}, \boldsymbol{Z}) \leq \sqrt{\mathbb{C}d\_{\mathcal{H}\_{\mathbb{I}}}(\boldsymbol{\Upsilon}, \boldsymbol{Z})}.$$

However, this estimate only gives the sub-optimal convergence rates. We also highlight the important total variation distance *dTV*. The authors of [61] study the sum *W* := ∑*j*∈*<sup>J</sup> Xj*, where {*Xj*, *j* ∈ *J*} is a family of locally dependent non-negative integer-valued random variables. Using the perturbations of Stein's operator, they establish the upper bounds for *dTV*(*W*, *M*) where the law of *M* is a mixture of Poisson distribution and either binomial or

negative binomial distribution. It would be desirable to obtain the sharp estimates and, moreover, consider a more general model where the set of summation is random. In this connection, it seems helpful to employ the paper [62], where the authors proved results concerning the weak convergence of distributions of statistics constructed from samples of random size. In addition, it would be interesting to extend these results to stratified samples by invoking Lemma 1 of [63].

Special attention is paid to various generalizations of the geometric sums. In Theorem 3.3 of [64], the authors consider random sums with summation index *Tn* := *Y*<sup>1</sup> + ... + *Yn*, where *Y*1,*Y*2, ... are i.i.d. random variables following the geometric law *Geom*(*p*), see Equation (2). Then, they show that *STn*/E[*STn* ] converge in distribution to the gamma law with certain parameters as *p* → 0+. In [62], it is demonstrated that the Linnik and the Mittag–Leffler laws arise naturally in the framework of limit theorems for random sums. Hopefully, in future the complete picture of limit laws involving general theory of distributions mixtures will appear. In addition, it is desirable to study various models of random sums of dependent random variables. On this track, it could be useful to consider the decompositions of exchangeable random sequences extending the fundamental de Finetti theorem, see, e.g., [65].

One can try to generalize the results of Section 7 for accumulative laws proposed in [66]. These laws are akin to both the Pareto distribution and the lognormal distribution. In addition, we refer to [43] where the "variance-gamma distributions" were studied. These distributions form a four-parameter family and comprise as special and limiting cases the normal, gamma and Laplace distributions. Employment of these distributions permits enlarging a range of applications in modeling and fitting real data.

To complete the indication of further research directions, we note that the next essential and nontrivial step is to establish the limit theorem in functional spaces for processes generated by a sequence of random sums of random variables. For such stochastic processes, one can obtain the analogues of the classical invariance principles.

**Author Contributions:** Conceptualization, A.B. and N.S.; methodology, A.B and N.S.; formal analysis, A.B. and N.S.; investigation, A.B. and N.S.; writing—original draft preparation, A.B. and N.S.; writing—review and editing, A.B. and N.S.; supervision, A.B.; project administration, A.B.; funding acquisition, A.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was supported by the Lomonosov Moscow State University project "Fundamental Mathematics and Mechanics".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors are grateful to Alexander Tikhomirov for invitation to present manuscript for this issue. In addition, they would like to thank three anonymous Reviewers for the careful reading of the manuscript and valuable remarks.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Proof of Lemma 1.** If *Lip*(*h*) = *C* < ∞, then *h* is absolutely continuous (see, e.g., §13 in [42]), and consequently there exists *h* (*x*) for almost all *x* ∈ R. Thus, |*h* (*x*)| ≤ *C* for almost all *x* ∈ R in light of Equation (4). Assume that essential supremum *h* <sup>∞</sup> = *C*<sup>0</sup> < *C*. Then, for any *ε* > 0, one can find a version of *h* , defined on <sup>R</sup>, such that sup*x*∈<sup>R</sup> <sup>|</sup>*<sup>h</sup>* (*x*)| ≤ *C*<sup>0</sup> + *ε*. (It was explained in Section 2 that one can consider a measurable extension of *h* to R). Then, due to Equation (11) with *h* instead of *f* we obtain Equation (5) with *C*<sup>0</sup> + *ε* instead of *C*. Consequently, *Lip*(*h*) ≤ *C*<sup>0</sup> < *C*. We come to the contradiction.

On the other hand, let *h* be absolutely continuous. Then, for almost all *x* ∈ R, there exists *h* (*x*) and Equation (11) is valid for *h* instead of *f* . Assume that essential supremum

*h* <sup>∞</sup> <sup>=</sup> *<sup>C</sup>* <sup>&</sup>lt; <sup>∞</sup>. Then, for any *<sup>ε</sup>* <sup>&</sup>gt; 0 there is a version of *<sup>h</sup>* such that sup*x*∈<sup>R</sup> <sup>|</sup>*<sup>h</sup>* (*x*)| ≤ *C* + *ε*. According to Equation (11), the relation (5) holds with *C* + *ε* instead of *C*. Since *ε* > 0 can be taken as an arbitrary small, one can claim that *Lip*(*h*) ≤ *C*. Suppose that *Lip*(*h*) ≤ *C*<sup>0</sup> < *C*. Then, for almost all *x* ∈ R, there exists *h* and |*h* | ≤ *C*0. Thus, we found a version with *h* <sup>∞</sup> ≤ *C*0. The contradiction shows that *Lip*(*h*) = *C*. Hence, the desired statement is proved.

**Proof of Lemma 2.** Let *x*<sup>0</sup> be a continuity point of a function *h* ∈K∪H<sup>1</sup> ∪ H2. Then, the same is true for a function *h*(*u*)*e*−*λu*, *u* ∈ R. Hence, the function <sup>∞</sup> *<sup>x</sup> <sup>h</sup>*(*u*)*e*−*<sup>λ</sup>udu* has a derivative <sup>−</sup>*h*(*x*0)*e*−*λx*<sup>0</sup> at point *<sup>x</sup>*<sup>0</sup> (in light of Remark 1 an integral <sup>∞</sup> *<sup>x</sup> <sup>h</sup>*(*u*)*e*−*<sup>λ</sup>udu* is well defined for any *x* ∈ R). Thus, for each point *x* of continuity *h* there exists

$$f\_h'(\mathbf{x}) = -\lambda e^{\lambda \mathbf{x}} \int\_{\mathbf{x}}^{\infty} h(\mathbf{u}) e^{-\lambda \mathbf{u}} d\mathbf{u} - e^{\lambda \mathbf{x}} (-h(\mathbf{x}) e^{-\lambda \mathbf{x}}) = \lambda f\_h(\mathbf{x}) + h(\mathbf{x}).\tag{A1}$$

For each fixed *z* ∈ R and a function *h*(*x*) = I{*x* ≤ *z*}, where *x* ∈ R, Equation (12) is verified in a similar way for the right derivative *fh* at point *z* ∈ R. Taking *x* = 0 in Equation (12), we obtain −E[*h*(*Z*)]/*λ*. Evidently, −*eλ<sup>x</sup>* <sup>∞</sup> *<sup>x</sup> <sup>e</sup>*−*<sup>λ</sup>udu* <sup>=</sup> <sup>−</sup>1/*λ*. Therefore, Equation (A1) yields

$$f\_h'(\mathbf{x}) = -\lambda e^{\lambda \mathbf{x}} \int\_{\mathbf{x}}^{\infty} (h(\mu) - h(\mathbf{x})) e^{-\lambda \mu} d\mu. \tag{A2}$$

If a function *h* belongs to K, then, for any *u*, *x* ∈ R, the following inequality holds |*h*(*u*) − *h*(*x*)| ≤ 1. Consequently, for *h* ∈ K, one has *f <sup>h</sup>*<sup>∞</sup> <sup>≤</sup> 1 (where *<sup>f</sup> <sup>h</sup>* means a right derivative of a version of *f <sup>h</sup>*, and we operate with essential supremum).

Taking into account Lemma 1, for a function *h* ∈ H<sup>1</sup> and any *x* ≤ *u*, one can write |*h*(*u*) − *h*(*x*)| ≤ *Lip*(*h*)(*u* − *x*) = *h* ∞(*u* − *x*). For *h* ∈ H<sup>2</sup> and *x* ≤ *u*, by the Lagrange finite-increments formula, |*h*(*u*) − *h*(*x*)|≤|*h* (*v*)|(*u* − *x*) ≤ *h* ∞(*u* − *x*), where *x* < *v* < *u*. Hence, for any *x* ∈ R and *h* ∈ H<sup>1</sup> ∪ H2,

$$\frac{1}{\lambda}|f'\_h(\mathbf{x})| = \lambda \varepsilon^{\lambda \mathbf{x}} \int\_{\mathcal{X}}^{\infty} (h(\mathbf{u}) - h(\mathbf{x})) \varepsilon^{-\lambda \mathbf{u}} d\mathbf{u} \leq \lambda \varepsilon^{\lambda \mathbf{x}} \||h'||\_{\infty} \int\_{\mathcal{X}}^{\infty} (\mathbf{u} - \mathbf{x}) \varepsilon^{-\lambda \mathbf{u}} d\mathbf{u} = \frac{||h'||\_{\infty}}{\lambda}$$

since

$$
\lambda \boldsymbol{\sigma}^{\lambda \mathbf{x}} \int\_{\mathbf{x}}^{\infty} (\boldsymbol{u} - \boldsymbol{x}) \boldsymbol{\sigma}^{-\lambda \mathbf{u}} d\boldsymbol{u} = \int\_{0}^{\infty} \lambda \boldsymbol{v} \boldsymbol{\sigma}^{-\lambda \mathbf{v}} d\boldsymbol{v} = \frac{1}{\lambda}. \tag{A3}
$$

Taking into account Equation (12), one can see that, for any *h* ∈ H2, *f <sup>h</sup>* = *λ fh* + *h*, where *fh* and *h* have derivatives at each point *x* ∈ R. Using Equation (A2) and Equation (A3), we obtain, for *x* ∈ R,

$$f\_h^{\prime\prime}(\mathbf{x}) = \lambda f\_h^{\prime}(\mathbf{x}) + h^{\prime}(\mathbf{x}) = -\lambda^2 e^{\lambda \mathbf{x}} \int\_{\mathbf{x}}^{\infty} (h(\mathbf{u}) - h(\mathbf{x})) e^{-\lambda \mathbf{u}} du + h^{\prime}(\mathbf{x})$$

$$= -\lambda^2 e^{\lambda \mathbf{x}} \int\_{\mathbf{x}}^{\infty} (h(\mathbf{u}) - h(\mathbf{x}) - h^{\prime}(\mathbf{x})(\mathbf{u} - \mathbf{x})) e^{-\lambda \mathbf{u}} du. \tag{A4}$$

By means of Equation (A3) and the Lagrange finite-increments formula we can write

$$|f\_{\hbar}^{\prime\prime}(\mathbf{x})| \le 2||h^{\prime}||\_{\infty}\lambda^2 e^{\lambda x} \int\_{x}^{\infty} (u - \mathbf{x})e^{-\lambda u} du = 2||h^{\prime}||\_{\infty} \tag{A5}$$

Let us apply the Taylor formula with integral representation of the residual term:

$$h(\mathbf{u}) = h(\mathbf{x}) + h'(\mathbf{x})(\mathbf{u} - \mathbf{x}) + R(\mathbf{u}, \mathbf{x}), \quad R(\mathbf{u}, \mathbf{x}) = \int\_{\mathbf{x}}^{\mathbf{u}} (\mathbf{u} - \mathbf{t}) h''(\mathbf{t}) d\mathbf{t}, \text{ } \mathbf{u}, \mathbf{x} \in \mathbb{R}. \tag{A6}$$

This representation known for the Riemann integral (see, e.g., [67], §9.17) holds in the framework of the Lebesgue integral if it is possible to use the recurrent integration by parts for *R*(*u*, *x*), i.e.,

$$\int\_{x}^{u} (u - t)h''(t)dt = -h'(\mathbf{x})(u - \mathbf{x}) + \int\_{x}^{u} h'(t)dt = -h'(\mathbf{x})(u - \mathbf{x}) + h(u) - h(\mathbf{x}). \tag{A7}$$

Integral in the left-hand side of Equation (A7) exists by virtue of Lemma 1 since *h* ∈ Lip(1). Therefore, *h* (*x*) is defined for almost all *x* ∈ R and (essential supremum) *h* ≤ 1. The latter equality in Equation (A7) is obvious since *h* is continuous function on R. The first equality in Equation (A7) is valid due to the integration by parts formula for the Lebesgue integral. Indeed, functions *h* (*t*) and (*u* − *t*) are absolutely continuous for *t* belonging to [*x*, *u*]. Thus, we can apply, e.g., Theorem 13.29 of [42] to justify the first equality in Equation (A7). Consequently, due to Equation (A4) and Equation (A6) one can write

$$|f\_{h}^{\prime\prime}(\mathbf{x})| \le \left| -\lambda^{2}e^{\lambda x} \int\_{x}^{\infty} \left( \int\_{x}^{u} (u-t)h^{\prime\prime}(t)dt \right) e^{-\lambda u} du \right| $$

$$\le \frac{||h^{\prime\prime}||\_{\infty}}{2} \left| \int\_{x}^{\infty} \lambda^{2} (u-x)^{2} e^{-\lambda (u-x)} du \right| = \frac{||h^{\prime\prime}||\_{\infty} \Gamma(3)}{2\lambda} = \frac{||h^{\prime\prime}||\_{\infty}}{\lambda}, \tag{A8}$$

where Γ(*α*) := <sup>∞</sup> <sup>0</sup> *<sup>u</sup>α*−1*e*<sup>−</sup>*udu*, *<sup>α</sup>* <sup>&</sup>gt; 0. Relations Equation (A5) and Equation (A8) lead to the last statement of Lemma 2. The proof is complete.

**Comments to Definition 1**. For each Lipschitz function *f* , one can claim that E[ *f*(*X*)] is finite since E|*X*| < ∞ and, in light of Remark 1, one has | *f*(*x*)| ≤ *C*|*x*| + | *f*(0)|, where *C* = *Lip*(*f*), *x* ∈ R. Clearly, it is sufficient to verify Equation (14) for any Lipschitz function *f* such that *f*(0) = 0 (otherwise we take the Lipschitz function *f*(*x*) − *f*(0), *x* ∈ R). Evidently, *pe*(*x*), *x* ∈ R, introduced by Equation (15), is a probability density because for non-negative random variable *X* according to [47], Ch.2, formula (69)

$$\mathbb{E}[X] = \int\_{[0,\infty)} \mathbb{P}(X > u) du. \tag{A9}$$

We will show that, for such *f* and a density *p<sup>e</sup>* of *X<sup>e</sup>* , one has

$$\int\_{[0,\infty)} f(\mu)dF(\mu) = \int\_{[0,\infty)} f'(\mu)\mathbb{P}(X > \mu)d\mu,\tag{A10}$$

where *F* is a distribution function of *X* and E[*X*] = 0. We take integrals over [0, ∞) as *X* ≥ 0 and *pe*(*x*) = 0 for *x* < 0.

We know that a function *f* has a derivative at almost all points *x* ∈ R. Therefore, the right-hand side of Equation (A10) does not depend on the choice of a version *f* (P(*X* > *u*) is a measurable bounded function). The integral in the right-hand side of Equation (A10) is finite because *f* ≤ *C* in light of Lemma 1 and since the right-hand side of Equation (A9) is finite. One can take the integrals over (0, ∞) in Equation (A10) as *f*(0) = 0 and *m*({0}) = 0, where *m* stands for the Lebesgue measure.

Function *f* is a function of finite variation (as *f* is the Lipschitz function). Therefore, *f* = *f*<sup>1</sup> − *f*<sup>2</sup> where *f*<sup>1</sup> and *f*<sup>2</sup> are nondecreasing functions. We can take the canonical representation with *f*1(*x*) = *Var<sup>x</sup>* <sup>0</sup> (*f*) and *<sup>f</sup>*2(*x*) = *<sup>f</sup>*(*x*) − *<sup>f</sup>*1(*x*), *<sup>x</sup>* ∈ R, where *Var<sup>b</sup> <sup>a</sup>* (*f*) is the variation of *f* on [*a*, *b*], *a* < *b* (see, e.g., [42], Theorem 12.18). If *f* ∈ *Lip*(*C*), then *Var<sup>b</sup> <sup>a</sup>* (*f*) ≤ *C*(*b* − *a*). For *a* < *c* < *b*, one has (see, e.g., [42], Lemma 12.15)

$$
\operatorname{Var}\_a^\varepsilon(f) + \operatorname{Var}\_c^b(f) = \operatorname{Var}\_a^b(f).
$$

We see that such *f*<sup>1</sup> and *f*<sup>2</sup> are the Lipschitz functions when *f* is the Lipschitz one. Hence, for almost all *x* ∈ R, there exist *f* <sup>1</sup>(*x*), *<sup>f</sup>* <sup>2</sup>(*x*) and *<sup>f</sup>* (*x*) = *f* <sup>1</sup>(*x*) <sup>−</sup> *<sup>f</sup>* <sup>2</sup>(*x*). Thus, it is enough to demonstrate that

$$\int\_{(0,\infty)} f\_i(\mu)dF(\mu) = \int\_{(0,\infty)} f\_i'(\mu)\mathbb{P}(X > \mu)d\mu, \text{ i } i = 1,2.$$

These integrals are finite since *f*<sup>1</sup> and *f*<sup>2</sup> are the Lipschitz functions. Note that

$$\int\_{(0,\infty)} f\_i(u)dF(u) = -\int\_{(0,\infty)} f\_i(u)d(1 - F(u)) = -\int\_{(0,\infty)} f\_i(u)d\mathbb{P}(X > u).$$

By applying Theorem 11 of Sec. 6, Ch. 2 [47], one obtains, for each *b* > 0, nondecreasing continuous function *fi* and a nondecreasing right-continuous function (−P(*X* > *u*)), the following formula:

$$\int\_{(0,b]} f\_i(u)d\mathbb{P}(X > u) = f\_i(b)\mathbb{P}(X > b) - f\_i(0)\mathbb{P}(X > 0) - \int\_{(0,b]} \mathbb{P}(X > u)df\_i(u) \quad \text{(A11)}$$

$$= f\_i(b)\mathbb{P}(X > b) - \int\_{(0,b]} \mathbb{P}(X > u)f\_i'(u)du.$$

We take into account that *fi*(0) = 0 and the *σ*-finite measure *Qi* corresponding to *fi* is absolutely continuous w.r.t. *m*, and the Radon–Nikodým derivative *dQi dm* (*x*) = *<sup>f</sup> <sup>i</sup>*(*x*), *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>, *i* = 1, 2. In addition, we can write P(*X* > *u*) in Equation (A11) since for at almost all *u* ∈ R the left-limit of this function coincides with P(*X* > *u*) (there exist at most a countable set of jumps of P(*X* > *u*), *u* ∈ R). Obviously, *fi*(*b*)P(*X* > *b*) → 0 as *b* → ∞ because | *fi*(*u*)| ≤ *Aiu* + *Bi* for some positive *Ai*, *Bi* and all *u* ∈ R. Indeed, according to formula (73) of Sec. 6, Ch. 2 of [47] the condition E|*X*| < ∞ yields

$$b\mathbb{P}(|X| > b) \to 0, \ b \to \infty.$$

By the Lebesgue dominated convergence theorem one infers that

$$\int\_{(0,b]} f\_i(u)d\mathbb{P}(X > u) \to \int\_{(0,\infty)} f\_i(u)d\mathbb{P}(X > u), \ b \to \infty.$$

and

$$\lim\_{\mu \to \infty} \int\_{(0,b]} \mathbb{P}(X > \mu) f\_i'(\mu) d\mu = \int\_{(0,\infty)} \mathbb{P}(X > \mu) f\_i'(\mu) d\mu.$$

This permits to claim the validity of Equation (A10) which entails the desired Equation (15).

**Proof of Lemma 3.** For *f* ∈ H2, in light of Remark 1 one can state that | *f*(*x*)| ≤ *A*0*x*<sup>2</sup> + *B*<sup>0</sup> for some positive numbers *A*<sup>0</sup> and *B*0. Let *F* be a distribution function of *X*. Since E[*X*2] < ∞, due to Corollary 2, Sec. 6, Ch. 2, v.1, [47] one has

$$\mathbf{x}^2 F(\mathbf{x}) \to 0, \ \mathbf{x} \to -\infty; \quad \mathbf{x}^2 (1 - F(\mathbf{x})) \to 0, \ \mathbf{x} \to \infty.$$

Hence, we obtain that *f*(*x*)*F*(*x*) → 0 as *x* → −∞ and *f*(*x*)(1 − *F*(*x*)) → 0 as *x* → ∞. Continuous function *f* has a bounded variation. Thus *f* = *f*<sup>1</sup> − *f*<sup>2</sup> where *f*<sup>1</sup> and *f*<sup>2</sup> are nondecreasing continuous functions. Thus, for any *a* < 0 and *i* = 1, 2, the integration by parts formula (see, e.g., Theorem 11, Sec. 6, Ch. 2, [47]) and Equation (18) give

$$\int\_{\left(a,0\right]} \left(f\_1(\mathbf{x}) - f\_2(\mathbf{x})\right) dF(\mathbf{x}) = f(0)F(0) - f(a)F(a) - \left(\int\_{\left(a,0\right]} F(\mathbf{x}) df\_1(\mathbf{x}) - \int\_{\left(a,0\right]} F(\mathbf{x}) df\_2(\mathbf{x})\right),$$

$$= f(0)F(0) - f(a)F(a) - \int\_{\left(a,0\right]} F(\mathbf{x}) df(\mathbf{x}).$$

We take into account that the integrands are bounded measurable functions and the measures corresponding to *F*, *f*<sup>1</sup> and *f*<sup>2</sup> are finite on any interval (*a*, 0]. Therefore such integrals are finite. According to the Lebesgue theorem on dominated convergence (recall that E[*X*2] < ∞) one has

$$\lim\_{a \to -\infty} \int\_{(a,0]} f(\mathbf{x}) dF(\mathbf{x}) = \int\_{(-\infty,0]} f(\mathbf{x}) dF(\mathbf{x}) \,\mathrm{d}\mathbf{x}$$

and the limit is finite. The monotone convergence theorem for *σ*-finite measure yields

$$\lim\_{a \to -\infty} \left( \int\_{(a,0]} F(\mathbf{x}) df\_1(\mathbf{x}) - \int\_{(a,0]} F(\mathbf{x}) df\_2(\mathbf{x}) \right) = \int\_{( -\infty, 0]} F(\mathbf{x}) df\_1(\mathbf{x}) - \int\_{( -\infty, 0]} F(\mathbf{x}) df\_2(\mathbf{x}).$$

We have seen that *f*(*a*)*F*(*a*) → 0 as *a* → −∞. Hence, in light of Equation (18)

$$\int\_{(-\infty,0]} F(\mathfrak{x}) df\_1(\mathfrak{x}) - \int\_{(-\infty,0]} F(\mathfrak{x}) df\_2(\mathfrak{x}) = \int\_{(-\infty,0]} F(\mathfrak{x}) df(\mathfrak{x}).$$

Therefore, for *i* = 1, 2, each integral (−∞,0] *F*(*x*)*d fi*(*x*) is finite as (−∞,0] *F*(*x*)*d f*(*x*) is finite. Thus,

$$\int\_{(-\infty,0]} f(\mathbf{x})dF(\mathbf{x}) = f(0)F(0) - \int\_{(-\infty,0]} F(\mathbf{x})df(\mathbf{x}) = f(0)F(0) + \int\_{(-\infty,0]} (-F(\mathbf{x}))f'(\mathbf{x})d\mathbf{x} = 0$$

as *f* is absolutely continuous. Indeed, for any *x* ∈ R,

$$f(\mathfrak{x}) = f(0) + \int\_{(0,\mathfrak{x}]} f'(\mathfrak{u})d\mathfrak{u}\_{\mathfrak{x}}$$

where (continuous) *f* ∈ *L*1[*a*, *b*] for any finite interval [*a*, *b*]. Thus, (*f* )<sup>+</sup> ∈ *L*1[*a*, *b*] and (*f* )<sup>−</sup> ∈ *L*1[*a*, *b*]. Set

$$f\_1(\mathbf{x}) := f(0) + \int\_{(0,\mathbf{x}]} (f'(u))^+ du, \quad f\_2(\mathbf{x}) := \int\_{(0,\mathbf{x}]} (f'(u))^- du.$$

Then *f*<sup>1</sup> and *f*<sup>2</sup> are nondecreasing continuous functions on R, *f* = *f*<sup>1</sup> − *f*<sup>2</sup> and

$$\int\_{\left(a,0\right]} F(\mathbf{x}) df(\mathbf{x}) = \int\_{\left(a,0\right]} F(\mathbf{x}) df\_1(\mathbf{x}) - \int\_{\left(a,0\right]} F(\mathbf{x}) df\_2(\mathbf{x}) \,\mathrm{d}\mathbf{x}$$

where these three integrals are finite. For (non-negative) *σ*-finite measures corresponding to *f*<sup>1</sup> and *f*2, one can write

$$\int\_{(a,0]} F(\mathbf{x}) df\_1(\mathbf{x}) = \int\_{(a,0]} F(\mathbf{x}) (f'(\mathbf{x}))^+ d\mathbf{x}, \\ \int\_{(a,0]} F(\mathbf{x}) df\_2(\mathbf{x}) = \int\_{(a,0]} F(\mathbf{x}) (f'(\mathbf{x}))^- d\mathbf{x}.$$

Thus, one has

$$\int\_{\left(a,0\right]} F(\mathbf{x}) df(\mathbf{x}) = \int\_{\left(a,0\right]} F(\mathbf{x}) (f'(\mathbf{x}))^+ d\mathbf{x} - \int\_{\left(a,0\right]} F(\mathbf{x}) (f'(\mathbf{x}))^- d\mathbf{x}$$

$$= \int\_{\left(a,0\right]} F(\mathbf{x}) ((f'(\mathbf{x}))^+ - (f'(\mathbf{x}))^-) d\mathbf{x} = \int\_{\left(a,0\right]} F(\mathbf{x}) f'(\mathbf{x}) d\mathbf{x}.\tag{A12}$$

The bound *f* ≤ 1 follows from Lemma 1. Therefore, the Lebesgue theorem on dominated convergence yields (as E|*X*| < ∞)

$$\lim\_{a \to -\infty} \int\_{(a,0]} F(\mathfrak{x}) f'(\mathfrak{x}) d\mathfrak{x} = \int\_{( -\infty, 0]} F(\mathfrak{x}) f'(\mathfrak{x}) d\mathfrak{x}.$$

We have demonstrated that

$$\int\_{(-\infty,0]} F(\mathfrak{x}) df(\mathfrak{x}) = \int\_{(-\infty,0]} F(\mathfrak{x}) f'(\mathfrak{x}) d\mathfrak{x}.$$

In a similar way, we consider (0,*b*] (1 − *F*(*x*))*dx* and letting *b* → ∞ come to relation

$$-\int\_{(0,\infty)} f(\mathbf{x})d(\mathbf{1}-F(\mathbf{x})) = f(0)(\mathbf{1}-F(0)) + \int\_{(0,\infty)} (\mathbf{1}-F(\mathbf{x}))df(\mathbf{x})$$

$$=f(0)(1 - F(0)) + \int\_{(0,\infty)} (1 - F(\varkappa)) f'(\varkappa) d\varkappa.$$

This establishes Equation (21).

#### **References**


## *Article* **High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures**

**Yasunori Fujikoshi 1,\* and Tetsuro Sakurai <sup>2</sup>**


**Abstract:** In this paper, we consider the high-dimensional consistencies of KOO methods for selecting response variables in multivariate linear regression with covariance structures. Here, the covariance structures are considered as (1) independent covariance structure with the same variance, (2) independent covariance structure with different variances, and (3) uniform covariance structure. A sufficient condition for model selection consistency is obtained using a KOO method under a high-dimensional asymptotic framework, such that sample size *n*, the number *p* of response variables, and the number *k* of explanatory variables are large, as in *p*/*n* → *c*<sup>1</sup> ∈ (0, 1) and *k*/*n* → *c*<sup>2</sup> ∈ [0, 1), where *c*<sup>1</sup> + *c*<sup>2</sup> < 1.

**Keywords:** consistency property; covariance structures; high-dimensional asymptotic framework; KOO methods; multivariate linear regression

**MSC:** 62H12; 62H10

### **1. Introduction**

We focus on a multivariate linear regression model of *p* response variables *y*1, ... , *yp* on a subset of *k* explanatory variables *x*1, ... , *xk*. Suppose that there are *n* observations on a *p*-dimensional response vector *y* = (*y*1, ... , *yp*) and a *k*-dimensional explanatory vector *x* = (*x*1, ... , *xk*) , and let **Y** : *n* × *p* and **X** : *n* × *k* be the observation matrices of *y* and *x* with sample size *n*, respectively. The multivariate linear regression model including all the explanatory variables under normality is written as follows:

$$\mathbf{Y} \sim \mathbf{N}\_{n \times p} (\mathbf{X} \oplus\_{\prime} \Sigma \otimes \mathbf{I}\_{n}),\tag{1}$$

where Θ is a *k* × *p* unknown matrix of regression coefficients, and Σ is a *p* × *p* unknown covariance matrix that is positive definite. N*n*×*p*(·, ·) is the normal matrix distribution, such that the mean of **Y** is **X**Θ, and the covariance matrix of vec (**Y**) is Σ ⊗ **I***n*; equivalently, the rows of **Y** are independently normal with the same covariance matrix Σ. Here, vec(**Y**) is the *np* × 1 column vector that is obtained by stacking the columns of **Y** on top of one another. We assumed that rank(**X**) = *k*.

In multivariate linear regression, the selection of variables for the model is an important concern. One of the approaches is to first consider variable selection models and then apply model selection criteria such as AIC and BIC. Such a criterion for Full Model (1) is expressed as follows:

$$\text{GIC} = -2\log L(\Xi) + d\lg \tag{2}$$

where *<sup>L</sup>*(Ξ) is the maximal likelihood, <sup>Ξ</sup> <sup>=</sup> {Θ, <sup>Σ</sup>}, *<sup>d</sup>* <sup>&</sup>gt; 0 is the penalty term, and *<sup>g</sup>* is the number of unknown parameters given by {*kp* + <sup>1</sup> <sup>2</sup> *p*(*p* + 1)}. For AIC and BIC, *d* is

**Citation:** Fujikoshi, Y.; Sakurai, T. High-Dimensional Consistencies of KOO Methods for the Selection of Variables in Multivariate Linear Regression Models with Covariance Structures. *Mathematics* **2023**, *11*, 671. https://doi.org/10.3390/math 11030671

Academic Editors: Alexander Tikhomirov and Liangxiao Jiang

Received: 18 November 2022 Revised: 28 December 2022 Accepted: 17 January 2023 Published: 28 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

defined as 2 and log *n*, respectively. In the selection of *k* variables *x*1, ... , *xk*, we identified {*x*1, ... , *xk*} with the index set {1, ... , *k*} ≡ *ω*, and denote GIC for subset *j* ⊂ *ω* by GIC*j*. Then, the model selection based on GIC chooses the following model:

$$
\tilde{f} = \arg\min\_{\tilde{f}} \text{GIC}\_{\tilde{f}}.\tag{3}
$$

Here the minimum is usually taken for all combinations of response variables. There are computational problems for the methods based on GIC, including AIC and BIC methods, since we need to compute 2*<sup>k</sup>* − 1 statistics for the selection of *k* explanatory variables. To avoid this computational problem, [1] proposed a method that was essentially thanks to [2]. The method, which was named the knock-one-out (KOO) method by [3], determines "selection" or "no selection" for each variable by comparing the model removing that variable and the full model. More precisely, the KOO method chooses the model or the set of variables given by

$$\hat{j} = \{ j \in \omega \mid \text{GIC}\_{\omega^\vee \downarrow} > \text{GIC}\_{\omega} \}, \tag{4}$$

where *ω*\*j* is a short expression for *ω*\{*j*}, which is the set obtained by removing element *j* from the set *ω*. In general, the KOO method can be applied to a method or criterion, not only AIC, a general variable selection criterion or method.

In the literature on multivariate linear regression, numerous papers have dealt with the variable selection problem, as it relates to selecting explanatory variables. When Σ is unknown positive definite, [4–6], for example, indicated that, in a high-dimensional case, AIC and C*<sup>p</sup>* have consistency properties, but BIC is not necessarily consistent. KOO methods in the multivariate regression model were studied by [3] and [7,8]. The KOO method in discriminant analysis; see [9], and [10]. For a review, see [11].

In this paper, we assume that the covariance structure was one of three covariance structures: (1) an independent covariance structure with the same variance, (2) an independent covariance structure with different variances, and (3) a uniform covariance structure. The numbers of unknown parameters in covariance structures (1)–(3) were 1, *p*, and 2, respectively. Sufficient conditions for the KOO method given by (4) to be consistent were derived under a high-dimensional asymptotic framework, such that sample size *n*, the number *p* of response variables, and the number *k* of explanatory variables were large, as in *p*/*n* → *c*<sup>1</sup> ∈ (0, 1) and *k*/*n* → *c*<sup>2</sup> ∈ [0, 1), where *c*<sup>1</sup> + *c*<sup>2</sup> < 1. Ref. [12] considered similar problems under covariance structures (1), (3), and (4), an autoregressive covariance structure, but did not consider them under (2). Moreover, in the study of asymptotic consistencies, they assumed that *k* was fixed, but in this paper, *k* may tend to infinity, such that *k*/*n* → *c*<sup>2</sup> ∈ [0, 1). From the numerical experiments in [12], we know that the probability of choosing the true model in Cases (1) and (3) results from the following table (Table 1). In variable selection for multivariate linear regression using the KOO method, the probability of selecting the true model is shown in the following table. Here, we examine Cases (1), an independent covariance structure with the same variance, and (3), a uniform covariance structure.



In this table (Table 1), *k* is the number of nonzero true explanatory variables, and the true parameter values were omitted. In [12], *k* was treated as finite. In this paper, *k* may tend to infinity, such that *k*/*n* → *c*<sup>2</sup> ∈ [0, 1).

The present paper is organized as follows. In Section 2, we present notations and preliminaries. In Section 3, we state KOO methods with Covariance Structures (1)–(3) in terms of key statistics. Further, an approach for their consistencies is stated in Section 3. In Sections 4–6, we discuss consistency properties of KOO methods under Covariance Structures (1)–(3). In Section 7, our conclusions are discussed.

#### **2. Notations and Preliminaries**

Suppose that *j* denotes a subset of *ω* = {1, ... , *k*} containing *k<sup>j</sup>* elements, and **X***<sup>j</sup>* denotes the *n* × *k<sup>j</sup>* matrix comprising the columns of **X** indexed by the elements of *j*. Then, **X***ω* = **X**. Further, we assumed that covariance matrix Σ had a covariance structure Σ*c*. Then, we have a generic candidate model:

$$M\_{\mathfrak{c},\mathfrak{f}} \colon \quad \mathsf{Y} \sim \mathsf{N}\_{n \times p} (\mathsf{X}\_{\mathfrak{f}} \oplus\_{\mathfrak{f}} \Sigma\_{\mathfrak{c},\mathfrak{f}} \odot \mathbf{I}\_n), \tag{5}$$

where Θ*<sup>j</sup>* is a *k<sup>j</sup>* × *p* unknown matrix of regression coefficients. We assumed that rank(**X**) = *k*. When Σ*c*,*<sup>j</sup>* is a *p* × *p* unknown covariance matrix, we could write the GIC in (2) as follows:

$$\text{GIC}\_{\mathfrak{c},\mathfrak{f}} = n \log |\widehat{\Sigma}\_{\mathfrak{f}}| + np(\log 2\pi + 1) + d\left\{k\_{\mathfrak{f}}p + \frac{1}{2}p(p+1)\right\},\tag{6}$$

where *<sup>n</sup>*Σ*<sup>j</sup>* <sup>=</sup> **<sup>Y</sup>** (**I***<sup>n</sup>* − **P***j*)**Y** and **P***<sup>j</sup>* = **X***j*(**X** *j***X***j*)−1**<sup>X</sup>** *<sup>j</sup>*. When *j* = *ω*, model *Mc*,*ω* is called the full model. <sup>Σ</sup>*c*,*<sup>ω</sup>* and **<sup>P</sup>***<sup>ω</sup>* are defined from <sup>Σ</sup>*c*,*<sup>j</sup>* and **<sup>P</sup>***<sup>j</sup>* as *<sup>j</sup>* <sup>=</sup> *<sup>ω</sup>*, *<sup>k</sup><sup>ω</sup>* <sup>=</sup> *<sup>k</sup>* and **<sup>X</sup>***<sup>ω</sup>* <sup>=</sup> **<sup>X</sup>**.

In this paper, we considered the cases in which the covariance matrix Σ*<sup>c</sup>* belonged to each of the following three structures:

(1) Independent covariance structure with the same variance (ICSS).

$$
\Sigma\_v = \sigma\_v^2 \mathbf{I}\_{p,v}
$$

(2) Independent covariance structure with different variances (ICSD).

$$\Sigma\_b = \text{diag}(\sigma\_1^2, \dots, \sigma\_p^2)\_r$$

(3) Uniform covariance structure (UCS).

$$
\Sigma\_{\mathfrak{u}} = \sigma\_{\mathfrak{u}}^2 (\rho\_{\mathfrak{u}}^{1-\delta\_{\vec{i}\vec{j}}})\_{1 \le i, \vec{j} \le p \dotsb}
$$

The models considered in this paper can be expressed as in (5) with Σ*v*,*j*, Σ*b*,*j*, and Σ*u*,*<sup>j</sup>* for Σ*c*,*j*. Let *f*(**Y**; Θ*j*, Σ*c*,*j*) be the density of **Y** in (5) with Σ = Σ*c*,*j*. In the derivation of the GIC, under the covariance structure Σ = Σ*c*,*j*, we use the following equality:

$$\begin{split} -2\log\max\_{\boldsymbol{\Theta}\_{\boldsymbol{f}},\boldsymbol{\Sigma}\_{\boldsymbol{c},\boldsymbol{j}}} f(\mathbf{Y};\boldsymbol{\Theta}\_{\boldsymbol{f}},\boldsymbol{\Sigma}\_{\boldsymbol{c},\boldsymbol{j}}) &= np\log(2\pi) \\ &+\min\_{\boldsymbol{\Sigma}\_{\boldsymbol{c},\boldsymbol{j}}} \Big\{ np\log|\boldsymbol{\Sigma}\_{\boldsymbol{c},\boldsymbol{j}}| + \text{tr}\boldsymbol{\Sigma}\_{\boldsymbol{c},\boldsymbol{j}}^{-1}\mathbf{Y}'(\mathbf{I}\_{n}-\mathbf{P}\_{\boldsymbol{f}})\mathbf{Y} \Big\}. \end{split} \tag{7}$$

Let <sup>Σ</sup>*c*,*<sup>j</sup>* be the quantity minimizing the right-hand side of (7). Then, in our model, it satisfies trΣ−<sup>1</sup> *<sup>c</sup>*,*<sup>j</sup>* **<sup>Y</sup>** (**I***<sup>n</sup>* − **P***j*)**Y** = *np*, and we obtain

$$\begin{split} \text{GIC}\_{\mathbf{c},\mathbf{j}} &= -2\log f(\mathbf{Y}; \Theta\_{\mathbf{j}}, \Sigma\_{\mathbf{c}}) + dm\_{\mathbf{c},\mathbf{j}} \\ &= np\log|\hat{\Sigma}\_{\mathbf{c},\mathbf{j}}| + np(\log 2\pi + 1) + dm\_{\mathbf{c},\mathbf{j}}. \end{split} \tag{8}$$

where *mc*,*<sup>j</sup>* is the number of independent unknown parameters under *Mc*,*j*, and *d* is a positive constant that may depend on *n*. For AIC and BIC, *d* is defined by 2 ([13]) and log *n* ([14]), respectively.

#### **3. Approach to Consistencies of KOO Methods**

Our KOO method is based on

$$T\_{c,j;d} = \text{GIC}\_{c,\mathfrak{a}\mathcal{O}\backslash j} - \text{GIC}\_{c,\mathfrak{a}\mathcal{O}}.\tag{9}$$

In fact, the KOO method chooses the following model:

$$\widehat{f}\_{c;d} = \left\{ j \mid T\_{c,j;d} > 0 \right\}.\tag{10}$$

Its consistency can be proven by showing the following two properties:

$$\text{Q1}: \quad \text{[F1]} \equiv \sum\_{j \in j\_\*} \text{Pr}(T\_{c, j \circ l} \le 0) \to 0,\tag{11}$$

$$\text{Q2}: \quad \text{[F2]} \equiv \sum\_{j \notin f\_\*} \text{Pr}(T\_{c, j \cdot d} \ge 0) \to 0,\tag{12}$$

as in [11]. The result can be shown by using the following inequality:

$$\begin{split} \Pr(\widehat{j\_{c;d}} = j\_\*) &= \Pr\left(\bigcap\_{j \in j\_\*} \, ^\mu T\_{c,j;d} > 0 ^\nu \bigcap\_{j \notin j\_\*} \, ^\mu T\_{c,j;d} < 0 ^\nu \right) \\ &= 1 - \Pr\left(\bigcup\_{j \in j\_\*} \, ^\mu T\_{c,j;d} \le 0 ^\nu \bigcup\_{j \notin j\_\*} \, ^\mu T\_{c,j;d} \ge 0 ^\nu \right) \\ &\ge 1 - \sum\_{j \in j\_\*} \Pr(T\_{c,j;d} \le 0) - \sum\_{j \notin j\_\*} \Pr(T\_{c,j;d} \ge 0). \end{split}$$

Here, [F1] denotes the probability that true variables are not selected, and [F2] denotes the probability that nontrue variables are selected. Such notations are used for other variable selection methods. *xj* is included in the true set of variables if *θ<sup>j</sup>* = **0**.

Here, we list some of our main assumptions:

A1: The set *j*∗ of the true explanatory variables is included in the full subset, i.e., *j*∗ ⊂ *ω*. and the set *j*∗ is finite.

A2: The high-dimensional asymptotic framework: *p* → ∞, *n* → ∞, *k* → ∞, *p*/*n* → *c*<sup>1</sup> ∈ (0, 1), *k*/*n* → *c*<sup>2</sup> ∈ [0, 1), where 0 < *c*<sup>1</sup> + *c*<sup>2</sup> < 1.

A general model selection criterion *j <sup>c</sup>*;*<sup>d</sup>* is high-dimensionally consistent if

$$\dim \Pr(\hat{f}\_{c\mathcal{A}} = j\_\*) = 1,$$

under a high-dimensional asymptotic framework. Here, "lim" means the limit under A2.

#### **4. Asymptotic Consistency under an Independent Covariance Structure**

In this section, we show an asymptotic consistency of the KOO method on the basis of a general information criterion under an independent covariance structure. A generic candidate model when the set of explanatory variables is *j* can be expressed as follows:

$$M\_{\upsilon,\hat{f}} \colon \mathbf{Y} \sim \mathbf{N}\_{\mathbf{n} \times \mathcal{p}} (\mathbf{X}\_{\hat{f}} \oplus\_{\hat{f}} \Sigma\_{\upsilon,\hat{f}} \circledast \mathbf{I}\_{\mathbf{n}}),\tag{13}$$

where Σ*v*,*<sup>j</sup>* = *σ*<sup>2</sup> *v*,*j* **I***<sup>p</sup>* and *σ*<sup>2</sup> *<sup>v</sup>*,*<sup>j</sup>* > 0. Let us denote the density of **Y** under (13) with *f*(**Y**; Θ*j*, *σv*,*j*). Then, we have

$$\begin{aligned} -2\log f(\mathbf{Y}; \Theta\_{j'} \boldsymbol{\sigma}\_{v,j}^2) &= np \log(2\pi) + np \log \boldsymbol{\sigma}\_{v,j}^2 \\ &+ \frac{1}{\boldsymbol{\sigma}\_{v,j}^2} \text{tr}(\mathbf{Y} - \mathbf{X}\_{\hat{f}} \Theta\_{\hat{f}})'(\mathbf{Y} - \mathbf{X}\_{\hat{f}} \Theta\_{\hat{f}}) .\end{aligned}$$

Therefore, the maximal estimators of Θ*<sup>j</sup>* and *σ*<sup>2</sup> *<sup>v</sup>*,*<sup>j</sup>* under *Mv*,*<sup>j</sup>* are given as follows:

$$
\hat{\boldsymbol{\Theta}}\_{j} = (\mathbf{X}\_{j}^{\prime}\mathbf{X}\_{j})^{-1}\mathbf{X}\_{j}^{\prime}\mathbf{Y}\_{\prime} \quad \hat{\boldsymbol{\sigma}}\_{v,j}^{2} = \frac{1}{np}\text{tr}\mathbf{Y}^{\prime}(\mathbf{I}\_{n} - \mathbf{P}\_{j})\mathbf{Y}.\tag{14}
$$

General Information Criterion (8) is given by

$$\text{GIC}\_{v, \dot{\jmath}} = np \log \hat{\sigma}\_{v, \dot{\jmath}}^2 + np(\log 2\pi + 1) + dm\_{v, \dot{\jmath}}.\tag{15}$$

where *d* is a positive constant, and *mv*,*<sup>j</sup>* = *kj p* + 1.

Using (9) and (15), we have

$$\begin{split} T\_{v,j;d} &\equiv \text{GIC}\_{v,\omega' \nmid \ j} - \text{GIC}\_{v,\omega} \\ &= np \log \left( 1 + \text{l}\mathbb{I}\_{2j}\text{l}I\_1^{-1} \right) - dp\_\prime \end{split} \tag{16}$$

where

$$\begin{aligned} \mathcal{U}\_1 &= \text{tr}\mathbf{Y}'(\mathbf{I}\_\mathbb{n} - \mathbf{P}\_{\mathcal{U}})\mathbf{Y} = \sum\_{\ell=1}^p y\_\ell'(\mathbf{I}\_\mathbb{n} - \mathbf{P}\_{\mathcal{U}})y\_{\ell'}\\ \mathcal{U}\_{2j} &= \text{tr}\mathbf{Y}'(\mathbf{P}\_{\mathcal{U}} - \mathbf{P}\_{\mathcal{U}})\mathbf{Y} = \sum\_{\ell=1}^p y\_\ell'(\mathbf{P}\_{\mathcal{U}} - \mathbf{P}\_{\mathcal{U}\backslash j})y\_{\ell'} \end{aligned}$$

*U*1/*σ*<sup>2</sup> *<sup>v</sup>*,*j*<sup>∗</sup> and *<sup>U</sup>*2*j*/*σ*<sup>2</sup> *<sup>v</sup>*,*j*<sup>∗</sup> are independently distributed as a central and a noncentral chisquared distribution, respectively. More precisely, assume that

$$\mathcal{E}(\mathbf{Y}) = \mathfrak{X}\_{\mathbf{j}\_\*} \oplus\_{\mathbf{j}\_\*} \tag{17}$$

and let *σ*<sup>2</sup> *<sup>v</sup>*,<sup>∗</sup> = *σ*<sup>2</sup> *v*,*j*∗ . Then, using basic distributional properties (see, [15]) on quadratic forms of normal variates and Wishart matrices, we have the following results:

(1) *U*1/*σ*<sup>2</sup> *<sup>v</sup>*,<sup>∗</sup> ∼ *χ*<sup>2</sup> (*n*−*k*)*p*, (2) *U*2*j*/*σ*<sup>2</sup> *<sup>v</sup>*,<sup>∗</sup> ∼ *χ*<sup>2</sup> *p*(*δ*<sup>2</sup> *v*,*j* ), (18) (3) *U*<sup>1</sup> ⊥ *U*2*j*,

where noncentrality parameter *τ*<sup>2</sup> *<sup>v</sup>*,*<sup>j</sup>* is defined by

$$\delta\_{v,j}^2 = \frac{1}{\sigma\_{\upsilon\_{\succ}\*}^2} \text{tr}(\mathbf{X}\_{\dot{j}\*} \oplus\_{\dot{j}\*})'(\mathbf{P}\_{\omega} - \mathbf{P}\_{\omega \vee j}) \mathbf{X}\_{\dot{j}\*} \oplus\_{\dot{j}\*} 1$$

If *j* ∈/ *j*∗, *δ*<sup>2</sup> *<sup>v</sup>*,*<sup>j</sup>* = 0, and if *<sup>j</sup>* ∈ *<sup>j</sup>*∗, in general, *<sup>τ</sup>*<sup>2</sup> *<sup>v</sup>*,*<sup>j</sup>* = 0. For a sufficient condition for the consistency of the KOO method based on GIC*v*,*j*, we assumed

$$\text{A3v}: \text{For any } j \in j\_{\ast \ast}, \delta\_{v,j}^2 = \text{O}(np), \text{and } \lim\_{p/n \to c\_1} \frac{1}{np} \delta\_{v,j}^2 = \eta\_{v,j}^2 > 0. \tag{19}$$

Now, we consider thew high-dimensional asymptotic consistency of the KOO method based on GIC*v*,*<sup>j</sup>* in (15), whose selection method is given by *j <sup>v</sup>*,*j*;*<sup>d</sup>* = {*j* | *Tv*,*j*;*<sup>d</sup>* > 0}. When *j* ∈ *j*∗, from (16), we can write

$$\mathcal{T}\_{v,j;d} = np \log \left\{ 1 + \chi\_p^2 / \chi\_m^2 \right\} - dp, \quad m = (n-k)p.$$

Therefore, we have

$$\begin{split} \text{[F2]} &= \sum\_{j \notin j\_\*} \Pr(np \log \left\{ 1 + \chi\_p^2 / \chi\_m^2 \right\} \ge dp) \\ &= (k - k\_{j\_\*}) \Pr(\mathcal{U} \ge h) \\ &\le (k - k\_{j\_\*}) \Pr(\mathcal{U} \ge h\_0), \end{split} \tag{20} $$

where

*<sup>U</sup>* <sup>=</sup> *<sup>χ</sup>*<sup>2</sup> *p χ*2 *m* <sup>−</sup> *<sup>p</sup> m* − 2 , *<sup>h</sup>* <sup>=</sup> <sup>e</sup>*d*/*<sup>n</sup>* <sup>−</sup> <sup>1</sup> <sup>−</sup> *<sup>p</sup> m* − 2 , *<sup>h</sup>*<sup>0</sup> <sup>=</sup> *<sup>d</sup> <sup>n</sup>* <sup>−</sup> *<sup>p</sup> m* − 2 . (21)

Note that *h*<sup>0</sup> < *h*. Then, under the assumption *h*<sup>0</sup> > 0, we have

$$\mathbb{E}[\mathcal{F}\mathcal{Q}] \le \left(k - k\_{\dot{f}\_{\ast}}\right) h^{-2\ell} \mathbb{E}[\mathcal{U}^{2\ell}] \le \left(k - k\_{\dot{f}\_{\ast}}\right) h\_0^{-2\ell} \mathbb{E}[\mathcal{U}^{2\ell}].\tag{22}$$

Related to the assumption *h*<sup>0</sup> > 0, we assumed

$$\text{AAv}: d > \frac{np}{m-2} \to \frac{1}{1-c\_2}, \text{and } d = \mathcal{O}(n^a), \quad 0 < a < 1. \tag{23}$$

The first part in A4v implies *h*<sup>0</sup> > 0. It is easy to see that

$$\operatorname{E}[\mathcal{U}^2] = \frac{2p(m+p-2)}{(m-2)^2(m-4)} = \operatorname{O}((n^2p)^{-1})\dots$$

Here, for the first equality, assumption *m* > 4 is required. Further, *h*−<sup>2</sup> <sup>0</sup> <sup>=</sup> <sup>O</sup>(*n*2(1−*a*)). Therefore, from (22), we have that [F2] → 0.

When *<sup>j</sup>* <sup>∈</sup> *<sup>j</sup>*∗, we can write T*v*,*j*;*<sup>d</sup>* <sup>=</sup> *np* log& 1 + *χ*<sup>2</sup> *p*(*δ*<sup>2</sup> *v*,*j* )/*χ*<sup>2</sup> *m* ' − *dp*. Therefore, we can express [F1] as

$$\mathbb{E}[\mathcal{F}1] = \sum\_{j \in j\_\*} \Pr(\overline{T}\_{v,j;d} \le 0),$$

where

$$
\tilde{T}\_{v,j;d} = \frac{p}{n} \log \left\{ 1 + \frac{\chi\_p^2(\delta\_{v,j}^2)}{\chi\_m^2} \right\} - \frac{d}{n}.
$$

Assumptions A3v and A4v easily show that

$$
\vec{T}\_{v,j;d} \to c\_1 \log(1 + \eta\_{v,j}^2) > 0.
$$

This implies that Pr(*Tv*,*j*;*<sup>d</sup>* <sup>≤</sup> <sup>0</sup>) <sup>→</sup> 0.

These imply the following theorem.

**Theorem 1.** *Suppose that Assumptions* A1, A2 A3v, and A4v *are satisfied. Then, the* KOO *method based on general information criteria* GIC*v*,*<sup>j</sup> defined by* (15) *is asymptotically consistent.*

An alternative approach for "[F1] → 0". When *j* ∈ *j*∗, we can write

$$T\_{v,j;d} = np \log \left\{ 1 + \chi\_p^2(\delta\_{v,j}^2) / \chi\_m^2 \right\} - dp.s$$

Therefore, we have

$$\begin{aligned} [\mathcal{F}1] &= \sum\_{j \in j\_\*} \Pr(np \log\left\{1 + \chi\_p^2(\delta\_{v,j}^2) / \chi\_m^2\right\} \le dp), \\ &= \sum\_{j \in j\_\*} \Pr(\tilde{\mathcal{U}}\_j \le \tilde{h}\_j), \end{aligned}$$

where, for *j* ∈ *j*∗,

$$
\delta\tilde{\Pi}\_{\dot{\jmath}} = \frac{\chi\_p^2(\delta\_{v,\dot{\jmath}}^2)}{\chi\_m^2} - \frac{p + \delta\_{v,\dot{\jmath}}^2}{m - 2}, \quad \tilde{h}\_{\dot{\jmath}} = \mathbf{e}^{d/n} - 1 - \frac{p + \delta\_{v,\dot{\jmath}}^2}{m - 2} = h - \frac{\delta\_{v,\dot{\jmath}}^2}{m - 2}.
$$

Then, under *<sup>d</sup>* <sup>=</sup> <sup>O</sup>(*na*)(<sup>0</sup> <sup>&</sup>lt; *<sup>a</sup>* <sup>&</sup>lt; <sup>1</sup>), A3v in (19) and the assumption *hj* <sup>&</sup>lt; 0 (or equivalently *h* < *δ*<sup>2</sup> *<sup>j</sup>* /(*m* − 2)), we have

$$[\text{F1}] \le k\_{\vec{h}\*} \max\_{\vec{j}} |\tilde{h}\_{\vec{j}}|^{-2\ell} \mathbb{E}[\tilde{U}^{2\ell}].$$

It is easily seen that

$$\mathbb{E}[\tilde{\mathcal{U}}\_j^2] = \frac{2(p + 2\delta\_{v,j}^2)(m + p - 2 + \delta\_{v,j}^2)}{(m - 2)^2(m - 4)} = \mathcal{O}((n^2 p)^{-1})\_{\prime\prime}$$

where *m* > 4 and under *d* = *na*(0 < *a* < 1) and A3v,

$$|\overline{h}\_{\vec{j}}|^2 \to \frac{\eta\_{\upsilon\_{\vec{\upsilon}}\vec{j}}^2}{c\_1(1-c\_2)}$$

.

These imply that [F1] <sup>→</sup> 0. In this approach, it was assumed that *hj* <sup>&</sup>lt; 0 (or equivalently *h* < *δ*<sup>2</sup> *<sup>j</sup>* /(*m* − 2)).

#### **5. Asymptotic Consistency under an Independent Covariance Structure with Different Variances**

In this section, we assumed that covariance matrix Σ had an independent covariance matrix with different variances, i.e., Σ = Σ*<sup>b</sup>* = diag(*σ*<sup>2</sup> *<sup>b</sup>*1, ... , *σbp*). First, let us consider deriving a key statistic *Tb*,*j*;*<sup>d</sup>* = GIC*b*,*ω*\*<sup>j</sup>* − GIC*b*,*ω*. Consider a candidate model with E(**Y**) = **X**Θ,

$$M\_{b, \mathsf{i}\mathfrak{o}} : \, \mathsf{Y} \sim \mathsf{N}\_{n \times p} (\mathsf{X} \oplus\_{\mathsf{i}} \Sigma\_{\mathsf{b}} \otimes \mathsf{I}\_{n}). \tag{24}$$

Let the density in the full model be expressed as *f*(**Y**; Θ, Σ*b*). Then, we have

$$\begin{split} -2\log f(\mathbf{Y}; \Theta, \Sigma\_{\mathbb{b}}) &= \mathbb{u}p \log(2\pi) \\ &+ \sum\_{\ell=1}^{p} \left\{ n \log \sigma\_{b\ell}^{2} + \frac{1}{\sigma\_{b\ell}^{2}} (y\_{\ell} - \mathbf{X}\theta\_{\ell})'(y\_{\ell} - \mathbf{X}\theta\_{\ell}) \right\}. \end{split}$$

It holds that

$$\begin{split}-2\log\max\_{\boldsymbol{\Theta},\boldsymbol{\Sigma}\_{b}}f(\boldsymbol{\mathsf{Y}};\boldsymbol{\Theta},\boldsymbol{\Sigma}\_{b})&=\operatorname{np}(\log 2\pi+1) \\ &+\sum\_{\ell=1}^{p}n\log\frac{1}{n}\mathbf{y}\_{\ell}^{\prime}(\mathbf{I}\_{n}-\mathbf{P}\boldsymbol{\omega})\mathbf{y}\_{\ell}.\end{split}\tag{25}$$

Next, consider the model removing the *j*th explanatory variable from the full model *Mb*,*ω*, which is denoted by *Mb*,*ω*\*<sup>j</sup>* or *M*; *b*, *ω*\*j*. Similarly,

$$\begin{split}-2\log\max\_{\mathbf{M}\not{p},\mathbf{M}'\not{p}}f(\mathbf{Y};\Theta,\Sigma\_{\mathbb{D}})&=np(\log 2\pi+1)\\+\sum\_{\ell=1}^{p}n\log\frac{1}{n}\mathbf{y}\_{\ell}'(\mathbf{I}\_{n}-\mathbf{P}\_{\mathbf{W}'\not{p}})\mathbf{y}\_{\ell}.\end{split}\tag{26}$$

Using (25) and (26), we can obtain a general information criterion (8) for two models, *Mb*,*ω* and *Mb*,*ω*\*j*, and we have

$$\begin{split} T\_{b,j;l} & \equiv \text{GIC}\_{b,\omega^j/j} - \text{GIC}\_{b,\omega} \\ &= \sum\_{\ell=1}^{p} n \log \left( 1 + \mathcal{U}\_{2\ell} \mathcal{U}\_{1\ell}^{-1} \right) - dp\_{\prime} \end{split} \tag{27}$$

where

$$\begin{aligned} \mathcal{U}\_{1\ell} &= \mathcal{Y}\_{\ell}^{\prime} (\mathbf{I}\_{\text{il}} - \mathbf{P}\boldsymbol{\omega}) \mathcal{Y}\_{\ell\prime} \quad \ell = 1, \dots, p, \\ \mathcal{U}\_{2\ell} &= \mathcal{Y}\_{\ell}^{\prime} (\mathbf{P}\boldsymbol{\omega} - \mathbf{P}\_{\boldsymbol{\omega}\boldsymbol{\omega}\boldsymbol{\cdot}\boldsymbol{\cdot}}) \mathcal{Y}\_{\ell\prime} \quad \ell = 1, \dots, p. \end{aligned}$$

Let us assume that

$$\mathbb{E}(\mathbf{Y}) = \mathbf{X}\_{\dot{f}\_{\ast}} \boldsymbol{\Theta}\_{\dot{f}\_{\ast}} \text{ and } \sigma\_{b,\ast}^{2} = \sigma\_{b,\dot{f}\_{\ast}}^{2} \tag{28}$$

Then, as in (18), we have the following results:

$$\begin{aligned} (1) \ l\mathcal{U}\_{1\ell}/\sigma\_{b,\*}^2 &\sim \chi^2\_{n-k\prime} \quad \ell = 1, \ldots, p, \\ (2) \ l\mathcal{U}\_{2\ell}/\sigma\_{b,\*}^2 &\sim \chi^2\_1(\delta^2\_{b,j;\ell}), \quad \ell = 1, \ldots, p, \\ (3) \ l\mathcal{U}\_{1\ell}/\mathcal{U}\_{2\ell}/(\ell = 1, \ldots, p) \text{ are independent,} \end{aligned} \tag{29}$$

where noncentral parameters *δ*<sup>2</sup> *b*,*j*;are defined by

$$
\delta\_{b,j;\ell}^2 = \frac{1}{\sigma\_{b,\*}^2} (\mathbf{X}\_{\dot{f}\_\*} \theta\_\*^{(\ell)})' (\mathbf{P}\_{\omega} - \mathbf{P}\_{\omega^\vee \dot{f}}) (\mathbf{X}\_{\dot{f}\_\*} \theta\_\*^{(\ell)}).
$$

with Θ<sup>∗</sup> = (*θ* (1) ∗ , ... , *θ* (*p*) <sup>∗</sup> ). If *<sup>j</sup>* <sup>∈</sup>/ *<sup>j</sup>*∗, *<sup>δ</sup>*<sup>2</sup> *b*,*j*;- = 0, and if *j* ∈ *j*∗, *δ*<sup>2</sup> *b*,*j*;- = 0. For a sufficient condition for consistency of the KOO method based on GIC*b*,*j*, we assumed

$$\text{A3b}: \text{For any } j \in j\_{\ast \prime}, \lim\_{\ell \to 1} (n-k)^{-1} \delta\_{b, j; \ell}^2 = \eta\_{b, j; \ell}^2 > 0, \text{ and}$$

$$\lim\_{p} \frac{1}{p} \sum\_{\ell=1}^p \log \left\{ 1 + \frac{1}{n-k} \delta\_{b, j; \ell}^2 \right\} \to \eta\_{b, j}^2 > 0. \tag{30}$$

Now, we consider the high-dimensional asymptotic consistency of the KOO method based on T*b*,*j*;*<sup>d</sup>* in (9), whose selection method is given by *j <sup>v</sup>*,*j*;*<sup>d</sup>* = {*j* | *Tb*,*j*;*<sup>d</sup>* > 0}. When *j* ∈ *j*∗, we have

$$\begin{aligned} \text{[F2]} &= \sum\_{j \notin j\_\*} \Pr(\sum\_{\ell=1}^p n \log \left\{ 1 + \mathcal{U}\_{2\ell} \mathcal{U}\_{1\ell}^{-1} \right\} \ge d) \\ &\le \sum\_{j \notin j\_\*} \sum\_{\ell=1}^p \Pr(n \log \left\{ 1 + \mathcal{U}\_{2\ell} \mathcal{U}\_{1\ell}^{-1} \right\} \ge d) .\end{aligned}$$

This implies that

$$\begin{split} \text{[F2]} &\le p(k - k\_{\dot{f}\_{\ast}}) \Pr(n \log \left\{ 1 + \chi\_1^2 / \chi\_{n-k}^2 \right\} \ge d) \\ &= p(k - k\_{\dot{f}\_{\ast}}) \Pr(V \ge r), \end{split} \tag{31}$$

where

$$\begin{split} V &= \frac{\chi\_1^2}{\chi\_{n-k}^2} - \frac{1}{n-k-2}, \\ r &= \mathbf{e}^{d/n} - 1 - \frac{1}{n-k-2}, \quad r\_0 = \frac{d}{n} - \frac{1}{n-k-2}. \end{split} \tag{32}$$

Note that *r*<sup>0</sup> < *r*. Then, under the assumption *r*<sup>0</sup> > 0, we have

$$\mathbb{E}[\mathcal{F}\mathcal{Q}] \le p\left(k - k\_{\bar{f}\_{\ast}}\right)r^{-2\ell}\mathbb{E}[V^{2\ell}] \le p\left(k - k\_{\bar{f}\_{\ast}}\right)r\_0^{-2\ell}\mathbb{E}[V^{2\ell}].\tag{33}$$

Related to the assumption *r*<sup>0</sup> > 0, we assumed

$$\text{A4b}: \ d > \frac{n}{n-k-2} \to \frac{1}{1-c\_2}, \text{and } d = \text{O}(n^a), \quad 0 < a < 1. \tag{34}$$

The first part in A4b implies *r*<sup>0</sup> > 0. It is easy to see that

$$\operatorname{E}[V^2] = \frac{2(n-k-1)}{(n-k-2)^2(n-k-4)} = \operatorname{O}((n^2)^{-1}).$$

Further, *r*−<sup>2</sup> <sup>0</sup> <sup>=</sup> <sup>O</sup>(*n*2(1−*a*)). Therefore, from (33), we have that [F2] <sup>→</sup> 0.

When *<sup>j</sup>* <sup>∈</sup> *<sup>j</sup>*∗, we can write *Tb*,*j*;*<sup>d</sup>* <sup>=</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>p</sup>* -<sup>=</sup><sup>1</sup> log{1 + *U*2-*U*−<sup>1</sup> 1- } − *dp*. Therefore, we can express [F1] as follows:

$$[F1] = \sum\_{j \in j\_\*} \Pr(\tilde{T}\_{b, j\mathbb{1}} \le 0),$$

where

$$\bar{T}\_{b,j;d} = \frac{1}{p} \sum\_{\ell=1}^p \log \left\{ 1 + \frac{\chi^2\_{1;\ell}(\delta^2\_{b,j;\ell})}{\chi^2\_{n-k;\ell}} \right\} - \frac{d}{n}.$$

Assumptions A3b and A4b easily show that

$$
\bar{T}\_{b,j;d} \to \eta\_{b,j}^2 > 0.
$$

This implies that Pr(*Tb*,*j*;*<sup>d</sup>* <sup>≤</sup> <sup>0</sup>) <sup>→</sup> 0. These imply the following theorem.

**Theorem 2.** *Suppose that Assumptions* A1, A2, A3b and A4b *are satisfied. Then, the* KOO *method based on Tb*,*j*:*<sup>d</sup> in* (27) *is asymptotically consistent.*

Let us consider an alternative approach for "[F1] → 0" as in the case of independent covariance structure. When *j* ∈ *j*∗, we can write

$$\begin{split} [\mathrm{F1}] &= \sum\_{j \in j\_{\ast}} \Pr\left(\sum\_{\ell=1}^{p} \left\{ n \log\left(1 + \frac{\chi^{2}\_{1;\ell}(\boldsymbol{\delta}^{2}\_{b;j;\ell})}{\chi^{2}\_{n-k;\ell}}\right) - d \right\} \le 0 \right) \\ &\le \sum\_{j \in j\_{\ast}} \sum\_{\ell=1}^{p} \Pr\left(n \log\left(1 + \frac{\chi^{2}\_{1;\ell}(\boldsymbol{\delta}^{2}\_{b;j;\ell})}{\chi^{2}\_{n-k;\ell}}\right) - d \le 0 \right) \\ &= \sum\_{j \in j\_{\ast}} \sum\_{\ell=1}^{p} \Pr\left(\vec{V}\_{j,\ell} \le \vec{r}\_{j,\ell}\right). \end{split}$$

Here, for *j* ∈ *j*∗,

$$\begin{aligned} \bar{V}\_{j,\ell} &= \frac{\chi^2\_{1;\ell}(\delta^2\_{b,j;\ell})}{\chi^2\_{n-k;\ell}} - \frac{1 + \delta^2\_{b,j;\ell}}{n-k-2}, \quad \ell = 1, \dots, p, \\\ \bar{r}\_{j,\ell} &= \mathbf{e}^{d/n} - 1 - \frac{1 + \delta^2\_{b,j;\ell}}{n-k-2} = r - \frac{\delta^2\_{b,j}}{n-k-2}, \quad \ell = 1, \dots, p, \end{aligned}$$

where *r* is the same one as in (32). Note that *χ*<sup>2</sup> 1;-(*δ*<sup>2</sup> *b*,*j*;-), - = 1, ... , *p* are distributed as a noncentral distribution *χ*<sup>2</sup> 1(*δ*<sup>2</sup> *b*,*j*;-), and they are independent. Then, under the assumption *rj* <sup>&</sup>lt; 0 (or equivalently *<sup>r</sup>* <sup>&</sup>lt; *<sup>δ</sup>*<sup>2</sup> *bj*;-/(*n* − *k* − 2)), we have

$$\mathbb{E}[\mathbf{F}\mathbf{1}] \le k\_{\tilde{\mathcal{Y}}\*} \sum\_{\ell=1}^{p} |\tilde{r}\_{j,\ell}|^{-2s} \mathbb{E}[\tilde{V}\_{j,\ell}^{2s}], \quad s = 1, 2, \dots \tag{35}$$

In the above upper bound, it holds that

$$|\bar{r}\_{\mathbf{j},\ell}| \sim \delta\_{\mathbf{b},\mathbf{j};\ell}^2 / (n-k) \to \eta\_{\mathbf{b},\mathbf{j};\ell}^2. \tag{36}$$

Useful bounds are obtained by giving the first few moments of *<sup>V</sup>j*;-. For example,

$$\begin{aligned} \mathbb{E}[\tilde{V}^2\_{j,\ell}] &= \frac{2(1+2\delta^2\_{v,j;\ell})(n-k-1+\delta^2\_{v,j;\ell})}{(n-k-2)^2(n-k-4)} = \mathcal{O}(n^{-1}),\\ \mathbb{E}[\tilde{V}^4\_{j,\ell}] &= \mathcal{O}(n^{-2}).\end{aligned}$$

Then, Bound (35) with *s* = 2 can be asymptotically expressed as follows:

$$k\_{j\_\*} \sum\_{\ell=1}^p \eta\_{b,j;\ell}^{-4} \mathbb{E}[\tilde{V}\_{j,\ell}^4] = k\_{j\_\*} p\left(\frac{1}{p} \sum\_{\ell=1}^p \eta\_{b,j;\ell}^{-4}\right) \times \mathcal{O}(n^{-2})\,.$$

The above expression is O(*n*−1) under the assumption that <sup>1</sup> *<sup>p</sup>* <sup>∑</sup>*<sup>p</sup>* -<sup>=</sup><sup>1</sup> *<sup>η</sup>*−<sup>4</sup> *b*,*j*; tends to a quantity.

#### **6. Asymptotic Consistency under a Uniform Covariance Structure**

In this section, we show an asymptotic consistency of KOO method based on a general information criterion under a uniform covariance structure. First, following [12], we derive a GIC*u*,*<sup>j</sup>* as in (6), and a key statistic *Tu*,*j*;*<sup>d</sup>* as in (9). A uniform covariance structure is given by

$$
\Sigma\_{\mathfrak{u}} = \sigma\_{\mathfrak{u}}^2(\rho\_{\mathfrak{u}}^{1-\delta\_{\bar{i}\bar{j}}}) = \sigma\_{\mathfrak{u}}^2\{(1-\rho\_{\mathfrak{u}})\mathbf{1}\_p + \rho\_{\mathfrak{u}}\mathbf{1}\_p\mathbf{1}\_p'\},\tag{37}
$$

with Kronecker delta *δij*. The covariance structure is expressed as follows:

$$
\Sigma\_{\boldsymbol{\mu}} = \kappa \left( \mathbf{I}\_p - \frac{1}{p} \mathbf{G}\_p \right) + \beta \frac{1}{p} \mathbf{G}\_{p'},
$$

where

$$\boldsymbol{\alpha} = \sigma\_{\boldsymbol{u}}^2 (1 - \rho\_{\boldsymbol{u}}), \quad \boldsymbol{\beta} = \sigma\_{\boldsymbol{u}}^2 \{1 + (p - 1)\rho\_{\boldsymbol{u}}\}, \quad \mathbf{G}\_p = \mathbf{1}\_p \mathbf{1}'\_{p \times p}$$

and **1***<sup>p</sup>* = (1, ... , 1) . Matrices **I***<sup>p</sup>* − <sup>1</sup> *<sup>p</sup>***G***<sup>p</sup>* and <sup>1</sup> *<sup>p</sup>***G***<sup>p</sup>* are orthogonal idempotent matrices, so we have

$$|\Sigma\_{\mathfrak{u}}| = \beta \mathfrak{a}^{p-1}, \quad \Sigma\_{\mathfrak{u}}^{-1} = \frac{1}{\mathfrak{a}} \left( \mathbf{I}\_p - \frac{1}{p} \mathbf{G}\_p \right) + \frac{1}{\beta} \cdot \frac{1}{p} \mathbf{G}\_p.$$

Now, we consider the multivariate regression model *Mu*,*<sup>j</sup>* given by

$$M\_{\mathfrak{u},\mathfrak{f}} \colon \Upsilon \sim \mathbb{N}\_{\mathfrak{u} \times \mathfrak{p}} (\mathbb{X}\_{\mathfrak{f}} \oplus\_{\mathfrak{f}} \Sigma\_{\mathfrak{u},\mathfrak{f}} \otimes \mathbf{I}\_{\mathfrak{n}}),\tag{38}$$

where Σ*u*,*<sup>j</sup>* = *α<sup>j</sup>* **I***<sup>p</sup>* − *p*−1**G***<sup>p</sup>* + *β<sup>j</sup> p*−1**G***p*. Let **H** = (*h*1, **H**2) be an orthogonal matrix where *h*<sup>1</sup> = *p*−1/2**1***p*, and let

$$\mathbf{W}\_{\hat{f}} = \mathbf{Y}'(\mathbf{I}\_n - \mathbf{P}\_{\hat{f}})\mathbf{Y} \text{ and } \mathbf{U}\_{\hat{f}} = \mathbf{H}'\mathbf{W}\_{\hat{f}}\mathbf{H}.$$

Here, *h*<sup>1</sup> is a characteristic vector of Σ*u*,*j*, and each column vector of **H**<sup>2</sup> is a characteristic vector of Σ*u*,*j*. Let the density function of **Y** under *Mu*,*<sup>j</sup>* be denoted by *f*(**Y**; Θ*j*, *αj*, *βj*). Then, we have

$$\begin{aligned} \log(\mathfrak{a}\_{\dot{\jmath}}, \beta\_{\dot{\jmath}}) &= -2 \log \max\_{\Theta\_{\dot{\jmath}}} f(\mathbf{Y}; \Theta\_{\dot{\jmath}}, \mathbf{a}\_{\dot{\jmath}}, \beta\_{\dot{\jmath}}) \\ &= np \log(2\pi) + n(p-1) \log a\_{\dot{\jmath}} + n \log \beta\_{\dot{\jmath}} + \text{tr} \mathbf{Y}\_{\dot{\jmath}}^{-1} \mathbf{U}\_{\dot{\jmath}}. \end{aligned}$$

where Ψ*<sup>j</sup>* = diag(*βj*, *αj*, ... , *αj*). Therefore, the maximum likelihood estimators of *α<sup>j</sup>* and *β<sup>j</sup>* under *Mu*,*<sup>j</sup>* are given by

$$\begin{split} \widehat{\boldsymbol{\alpha}}\_{\boldsymbol{\hat{f}}} &= \frac{1}{n(p-1)} \text{tr} \mathbf{H}\_{2}^{\prime} \mathbf{Y}^{\prime} (\mathbf{I}\_{n} - \mathbf{P}\_{\boldsymbol{\hat{f}}}) \mathbf{Y} \mathbf{H}\_{2} \boldsymbol{\hat{\boldsymbol{\alpha}}} \\ \widehat{\boldsymbol{\beta}}\_{\boldsymbol{\hat{f}}} &= \frac{1}{n} \boldsymbol{h}\_{1}^{\prime} \mathbf{Y}^{\prime} (\mathbf{I}\_{n} - \mathbf{P}\_{\boldsymbol{\hat{f}}}) \mathbf{Y} \boldsymbol{h}\_{1} . \end{split}$$

The number of independent parameters under *Mu*,*<sup>j</sup>* is *m<sup>j</sup>* = *k<sup>j</sup> p* + 2. Noting that Ψ*<sup>j</sup>* is diagonal, we can obtain the general information criterion (GIC) in (8) for **Y** in (38) as follows:

$$\text{GIC}\_{u,j} = n(p-1)\log\widehat{\alpha}\_{\hat{f}} + n\log\widehat{\beta}\_{\hat{f}} + np(\log 2\pi + 1) + d(k\_{\hat{f}}p + 2). \tag{39}$$

Therefore, we have

$$\begin{split} T\_{\mathsf{u},\mathsf{j}\mathcal{A}} & \equiv \mathrm{GIC}\_{\mathsf{u},\mathsf{i}\mathcal{U}\nmid\mathsf{j}} - \mathrm{GIC}\_{\mathsf{u},\mathsf{i}\mathcal{U}} \\ &= n(p-1)\log\left\{\widehat{a}\_{\mathsf{i}\mathcal{U}\nmid\mathcal{i}}(\widehat{a}\omega)^{-1}\right\} + n\log\left\{\widehat{\beta}\_{\mathsf{i}\mathcal{U}\nmid\mathsf{j}}\left(\widehat{\beta}\omega\right)^{-1}\right\} - dp \\ &= Z\_{\mathsf{l}\nkern-1.1mu\mathsf{T}} + Z\_{\mathsf{l}\flat} .\end{split} \tag{40}$$

Here, *Z*1*<sup>j</sup>* and *Z*2*<sup>j</sup>* are defined as follows:

$$Z\_{1j} = n(p-1)\log\left\{1 + V\_{2j}^{(1)} \left(V\_1^{(1)}\right)^{-1}\right\} - d(p-1),$$

$$Z\_{2j} = n\log\left\{1 + V\_{2j}^{(2)} \left(V\_1^{(2)}\right)^{-1}\right\} - d,\tag{41}$$

using the following *V*(*i*) <sup>1</sup> , *<sup>V</sup>*(*i*) <sup>2</sup>*<sup>j</sup>* , *i* = 1, 2:

$$\begin{split} V\_{1}^{(1)} &= \text{tr} \mathbf{H}\_{2}^{\prime} \mathbf{Y} (\mathbf{I}\_{\text{\textn}} - \mathbf{P}\boldsymbol{\omega}) \mathbf{Y} \mathbf{H}\_{2}, \quad V\_{2j}^{(1)} = \text{tr} \mathbf{H}\_{2}^{\prime} \mathbf{Y}^{\prime} (\mathbf{P}\boldsymbol{\omega} - \mathbf{P}\_{\boldsymbol{\omega}\boldsymbol{\omega}\boldsymbol{\cdot}j}) \mathbf{Y} \mathbf{H}\_{2\boldsymbol{\omega}j} \\ V\_{1}^{(2)} &= h\_{1}^{\prime} \mathbf{Y}^{\prime} (\mathbf{I}\_{\text{\textn}} - \mathbf{P}\_{\boldsymbol{\omega}\boldsymbol{\omega}}) \mathbf{Y} \mathbf{h}\_{1\boldsymbol{\omega}} \quad V\_{2j}^{(2)} = h\_{1}^{\prime} \mathbf{Y}^{\prime} (\mathbf{P}\_{\boldsymbol{\omega}\boldsymbol{\omega}} - \mathbf{P}\_{\boldsymbol{\omega}\boldsymbol{\omega}\boldsymbol{\cdot}j}) \mathbf{Y} \mathbf{h}\_{1\boldsymbol{\omega}} \end{split}$$

Related to the distributional reductions of *Z*1*j*, *Z*2*j*, *j* = 1, ... , *k*, we use the following Lemma frequently.

**Lemma 1.** *Let* **W** *have a noncentral Whishart distribution* W*p*(*m*, Σ; Ω)*. Let the covariance matrix* Σ *be decomposed into characteristic roots and vectors as follows:*

$$\begin{aligned} \boldsymbol{\Sigma} &= \mathbf{H} \boldsymbol{\Lambda} \mathbf{H}^{\prime} \\ &= (\mathbf{H}\_{1}, \dots, \mathbf{H}\_{h}) \text{diag} \left( \lambda\_{1} \mathbf{I}\_{q\_{1} \vee}, \dots, \lambda\_{h} \mathbf{I}\_{q\_{h}} \right) (\mathbf{H}\_{1}, \dots, \mathbf{H}\_{h})^{\prime}, \end{aligned}$$

*where λ*<sup>1</sup> > ... > *λ<sup>h</sup>* > 0 *and* **H** *is an orthogonal matrix. Then,* tr**H** *j* **W***j***H***j, i* = 1, ... , *h are independently distributed to noncentral chi-squared distributions with mkj degrees of freedom and noncentrality parameters δ*<sup>2</sup> *<sup>j</sup>* <sup>=</sup> tr**<sup>H</sup>** *j* Ω**H***j.*

**Proof.** The result may be proven by considering the characteristic function of (tr**H** 1**WH**1, . . . , tr**<sup>H</sup>** *<sup>q</sup>***WH***q*) which is expressed as follows (see Theorem 2.1.2 in [15]):

$$\begin{aligned} &\mathbb{E}\left[\mathbf{e}^{it\_1}\mathbf{tr}\mathbf{H}\_1'\mathbf{W}\mathbf{H}\_1 + \dots + it\_k \mathbf{tr}\mathbf{H}\_k'\mathbf{W}\mathbf{H}\_k\right] \\ &= \mathbb{E}\left[\mathbf{e}\mathbf{tr}(\mathbf{K})\right] \\ &= |\mathbf{I}\_p - 2\boldsymbol{\Sigma}\mathbf{K}|^{-m/2} \text{erf}\left\{\boldsymbol{\Omega}\mathbf{K}(\mathbf{I}\_p - 2\boldsymbol{\Sigma}\mathbf{K})^{-1}\right\}\right). \end{aligned}$$

where **K** = *it*1**H**1**H** <sup>1</sup> <sup>+</sup> ··· <sup>+</sup> *it*1**H***q***<sup>H</sup>** *<sup>q</sup>*. The result can be easily obtained by checking that the above last expression equals

$$\prod\_{j=1}^{q} (1 - 2it\_j)^{-nk\_j/2} \exp\left\{ \frac{it\_j}{1 - 2it\_j} \text{tr} \mathbf{H}\_j^\prime \Omega \mathbf{H}\_j \right\}.$$

Assume that the true model is expressed as

$$M\_{\boldsymbol{\mu},\boldsymbol{\dot{\mu}}\_{\*}}: \mathbf{Y} \sim \mathbf{N}\_{\boldsymbol{n}\times p}(\mathbf{X}\_{\boldsymbol{\dot{\mu}}\_{\*}} \oplus\_{\boldsymbol{\dot{\mu}}\_{\*}} \Sigma\_{\boldsymbol{\mu},\*} \circledcirc \mathbf{I}\_{\boldsymbol{n}}),\tag{42}$$

where Σ*u*,<sup>∗</sup> = *α*<sup>∗</sup> **I***<sup>p</sup>* − *p*−1**G***<sup>p</sup>* + *β*<sup>∗</sup> *p*−1**G***p*. Using Lemma 1, we have the following lemma.

**Lemma 2.** *Under True Model* (42)*, it holds that*


$$\begin{split} \delta\_{1\dot{\jmath}}^{2} &= \frac{1}{\varkappa\_{\ast}} \text{tr} \mathbf{H}\_{2}' (\mathbf{X}\_{\dot{f}\_{\ast}} \oplus\_{\dot{f}\_{\ast}})' (\mathbf{P}\_{\omega\prime} - \mathbf{P}\_{\omega\prime\dot{\jmath}}) (\mathbf{X}\_{\dot{f}\_{\ast}} \oplus\_{\dot{f}\_{\ast}}) \mathbf{H}\_{2}, \\ \delta\_{2\dot{\jmath}}^{2} &= \frac{1}{\mathcal{B}\_{\ast}} \text{tr}\_{1}' (\mathbf{X}\_{\dot{f}\_{\ast}} \oplus\_{\dot{f}\_{\ast}})' (\mathbf{P}\_{\omega\prime} - \mathbf{P}\_{\omega\prime\dot{\jmath}}) (\mathbf{X}\_{\dot{f}\_{\ast}} \oplus\_{\dot{f}\_{\ast}}) \mathbf{H}\_{1}. \end{split}$$

*Here, if j* ∈/ *j*∗*, then δ*<sup>2</sup> <sup>1</sup>*<sup>j</sup>* = <sup>0</sup> *and <sup>δ</sup>*<sup>2</sup> <sup>2</sup>*<sup>j</sup>* = 0*.*

Now, we consider the high-dimensional asymptotic consistency of the KOO method based on T*b*,*j*;*<sup>d</sup>* in (27), whose selection method is given by *j <sup>v</sup>*,*j*;*<sup>d</sup>* = {*j* | *Tb*,*j*;*<sup>d</sup>* > 0}. For a sufficient condition for the consistency of *j <sup>v</sup>*,*j*;*d*, we assumed

A3u: For any *j* ∈ *j*∗, *δ*<sup>2</sup> <sup>1</sup>*<sup>j</sup>* = <sup>O</sup>(*np*), *<sup>δ</sup>*<sup>2</sup> <sup>2</sup>*<sup>j</sup>* = O(*n*) and

$$\lim \frac{1}{np} \delta\_{1\dot{j}}^2 = \eta\_{1\dot{j}}^2 > 0, \quad \lim \frac{1}{n} \delta\_{2\dot{j}}^2 = \eta\_{2\dot{j}}^2 > 0,\tag{43}$$

When *j* ∈ *j*∗, we have

$$\begin{aligned} \left[\mathcal{F}\mathcal{Z}\right] &= \sum\_{j \notin j\_\*} \left\{ \Pr(Z\_{1j} + Z\_{2j} \ge 0) \right\} \\ &\le \sum\_{j \notin j\_\*} \left\{ \Pr(Z\_{1j} \ge 0) + \Pr(Z\_{2j} \ge 0) \right\} \\ &= (k - k\_{j\_\*}) \left\{ \Pr(Z^{(1)} \ge s\_0^{(1)}) + \Pr(Z^{(2)} \ge s\_0^{(2)}) \right\}. \end{aligned}$$

Here,

$$\begin{aligned} Z^{(1)} &= \frac{\chi^2\_{p-1}}{\chi^2\_{(p-1)(n-k)}} - \frac{p-1}{(p-1)(n-k)-2}, \\ s^{(1)} &= \mathbf{e}^{d/n} - 1 - \frac{p-1}{(p-1)(n-k)-2}, \quad s^{(1)} = \frac{d}{n} - \frac{p-1}{(p-1)(n-k)-2}, \\ Z^{(2)} &= \frac{\chi^2\_1}{\chi^2\_{n-k}} - \frac{1}{n-k-2}, \\ s^{(2)} &= \mathbf{e}^{d/n} - 1 - \frac{1}{n-k-2}, \quad s^{(2)} = \frac{d}{n} - \frac{1}{n-k-2}. \end{aligned}$$

Note that *s* (1) <sup>0</sup> <sup>&</sup>lt; *<sup>s</sup>*(1) and *<sup>s</sup>* (2) <sup>0</sup> <sup>&</sup>lt; *<sup>s</sup>*(2). Then, under the assumption that *<sup>s</sup>* (1) <sup>0</sup> > 0 and *s* (2) <sup>0</sup> > 0, we have

$$\mathbb{E}\left[\mathcal{F}\mathbf{2}\right] \le \left(k - k\_{\hat{f}\_{\ast}}\right) \left[\left(s\_0^{(1)}\right)^{-2\ell} \mathbb{E}\left[\left(Z^{(1)}\right)^{2\ell}\right] + \left(s\_0^{(2)}\right)^{-2\ell} \mathbb{E}\left[\left(Z^{(2)}\right)^{2\ell}\right]\right].\tag{44}$$

Related to assumptions *s* (1) <sup>0</sup> > 0 and *s* (2) <sup>0</sup> > 0, we assumed

$$\begin{aligned} \text{A4u}: d &> \frac{n(p-1)}{(p-1)(n-k)-2} \to \frac{1}{1-c\_2}, & d &> \frac{n}{n-k-2} \to \frac{1}{1-c\_2}, \\ \text{and } d &= \mathcal{O}(n^a), & 0 < a < 1. \end{aligned} \tag{45}$$

The first part in A4u implies *s* (1) <sup>0</sup> > 0 and *s* (2) <sup>0</sup> > 0. It is easy to see that

$$\begin{aligned} \mathrm{E}[(Z^{(1)})^2] &= \frac{2(p-1)^2(n-k+1)}{\{(p-1)(n-k)-2\}^2\{(p-1)(n-k)-4\}} = \mathrm{O}((n^3)^{-1}),\\ \mathrm{E}[(Z^{(2)})^2] &= \frac{2(n-k-1)}{(n-k-2)^2(n-k-4)} = \mathrm{O}((n^2)^{-1}).\end{aligned}$$

Further, (*s* (1) <sup>0</sup> )−<sup>2</sup> <sup>=</sup> <sup>O</sup>(*n*2(1−*a*)) and (*<sup>s</sup>* (2) <sup>0</sup> )−<sup>2</sup> <sup>=</sup> <sup>O</sup>(*n*2(1−*a*)). Therefore, from (44), we have that [F2] → 0.

When *<sup>j</sup>* <sup>∈</sup> *<sup>j</sup>*∗, we can write *Tb*,*j*;*<sup>d</sup>* <sup>=</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>p</sup>* -<sup>=</sup><sup>1</sup> log{1 + *U*2-*U*−<sup>1</sup> 1- } − *dp*. Therefore, we can express [F1] as follows:

$$[F1] = \sum\_{j \in j\_\*} \text{Pr}(\vec{T}\_{b, j\cdot \mathbb{N}} \le 0),$$

where

$$\tilde{T}\_{b,j;d} = \frac{1}{p} \sum\_{\ell=1}^p \log \left\{ 1 + \frac{\chi^2\_{1;\ell}(\delta^2\_{b,j;\ell})}{\chi^2\_{n-k;\ell}} \right\} - \frac{d}{n}.$$

Assumptions A3b and A4b easily show that

$$
\tilde{T}\_{v,j;d} \to \log(1 + \gamma\_{v,j}^2) > 0.
$$

This implies that Pr(*Tv*,*j*;*<sup>d</sup>* <sup>≤</sup> <sup>0</sup>) <sup>→</sup> 0, and [F1] <sup>→</sup> 0. These imply the following theorem.

**Theorem 3.** *Suppose that Assumptions* A1, A2, A3u and A4u *are satisfied. Then, the* KOO *method based on Tu*,*j*:*<sup>d</sup> in* (40) *is asymptotically consistent.*

#### **7. Concluding Remarks**

In this paper, we considered selecting regression variables in a *p* variate regression model with one of three covariance structures: (1) ICSS (an independent covariance structure with the same variance), (2) ICSD (an independent covariance structure with different variances), and (3) UCS (a uniform covariance structure). It was proposed to use a KOO method on the basis of a general information criterion with a penalty term *d*. We indicated high-dimensional consistencies of the KOO methods with *d* = O(*na*), 0 < *a* < 1. Ref. [12] studied the asymptotic consistencies of KOO methods in (1) and (3). However, in their approach, the number of explanatory variables was fixed; in this paper, the number of explanatory variables may have tended to infinity. KOO methods may be feasible in computation. The idea goes back to [1], and [2]. However, high-dimensional properties were recently studied in [7–9,11].

A high-dimensional study of the KOO method under an autoregressive covariance structure (AUTO), and extending our results to the case of non-normality remain as future work.

**Author Contributions:** Conceptualization, Y.F.; Methodology, Y.F. and T.S.; Software, T.S.; Writing original draft, Y.F. and T.S.; Writing—review & editing, Y.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to express their gratitude to Vladimir V. Ulyanov and the three referees for their valuable comments and suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Limit Theorem for Spectra of Laplace Matrix of Random Graphs**

**Alexander N. Tikhomirov**

Institute of Physics and of Mathematics, Komi Science Center of Ural Branch of RAS, 167982 Syktyvkar, Russia; tikhomirov@ipm.komisc.ru

**Abstract:** We consider the limit of the empirical spectral distribution of Laplace matrices of generalized random graphs. Applying the Stieltjes transform method, we prove under general conditions that the limit spectral distribution of Laplace matrices converges to the free convolution of the semicircular law and the normal law.

**Keywords:** semicircular law; random graph; normal law; Stieltjes transform; Laplace matrix

**MSC:** 60B20; 60C05

#### **1. Introduction and Summary**

The spectral theory of random graphs is a branch of mathematics that has been studied intensively in the literature in recent decades. The asymptotic behavior of eigenvalues and eigenvectors of matrices associated with graphs, adjacency matrices and Laplace matrices, in particular (see definition below), as the number of vertices of the graph tends to infinity is investigated. See for instance [1–8]. The adjacency matrix of the generalized Erd˝os–Rènyi random graph is a special case of the generalized Wigner matrix (matrices with elements that are independent up to symmetry, with zero means and different variances). Many deep results have been obtained recently for such matrices. Methods of studying of the spectrum asymptotics of the adjacency matrices are the same as for the spectrum asymptotics of Wigner matrices—these are the method of moments and the Stieltjes transform method. It should be noted that the most profound results for the spectrum of Wigner random matrices were obtained by the methods related to the Stieltjes transform; see [3,9,10].

Laplace matrices have one significant difference—the dependence of the diagonal elements on the remaining elements of the matrix. This significantly complicates the study. For instance, the limit distribution of the empirical spectral function of the Laplace matrix of a complete graph (non-random) was found firstly in 2006; see [11]. In most of the works devoted to the study of the spectrum asymptotics of Laplace matrices of random graphs, the method of moments is used; see [2,4,12]. In this paper, we consider the empirical spectral distribution function of the Laplace matrices of both weighted and unweighted generalized Erd˝os-Rényi random graphs. We have obtained simple sufficient conditions for the convergence of the empirical spectral distribution function of the Laplace matrices of random graphs to a distribution function that is a free convolution of the semicircular law and the standard normal law. The conditions are expressed in terms of the properties of the graph edge probability matrix and the weight variance matrix (for weighted graphs). To prove the convergence, we exclusively use the Stieltjes transform method.

We consider a non-oriented simple graph (without loops and with simple edges) {*V*, *E*} with vertices |*V*| = *n* and set of edges *E* such that edges *e* ∈ *E* are independent and have probability *pe*. Consider the adjacency *n* × *n* matrix

$$\mathbf{A} = \lfloor A\_{\vec{\mathcal{R}}} \rfloor\_{\prime} \tag{1}$$

**Citation:** Tikhomirov, A.N. Limit Theorem for Spectra of Laplace Matrix of Random Graphs. *Mathematics* **2023**, *11*, 764. https:// doi.org/10.3390/math11030764

Academic Editor: Christophe Chesneau

Received: 6 December 2022 Revised: 31 January 2023 Accepted: 1 February 2023 Published: 2 February 2023

**Copyright:** © 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

where

$$A\_{jk} = \begin{cases} 0, \text{ if } (j,k) \notin E, \\ 1, \text{ if } (j,k) \in E. \end{cases}$$

Define a degree of vertex *j* ∈ *V* as

$$d\_{\vec{\jmath}} := \sum\_{k:(j,k)\in E} A\_{jk}.$$

We shall assume that *Ajk* for 1 ≤ *j* ≤ *k* ≤ *n* are independent and E*Ajk* = *pjk*(*n*). Note that E*dj* = ∑*k*:*<sup>k</sup>*<sup>=</sup>*<sup>j</sup> pjk*(*n*). We have that matrix **A** is symmetric, i.e., *Ajk* = *Akj*, and that r.v.'s *Ajk* for 1 ≤ *j* ≤ *k* ≤ *n* are independent. We introduce the quantity

$$
\widehat{a}\_n = \frac{1}{n} \sum\_{j,k=1}^n p\_{jk}^{(n)} (1 - p\_{jk}^{(n)}).\tag{2}
$$

We introduce the diagonal matrix

$$\mathbf{D} = \text{diag}(d\_{1'}, \dots, d\_n)\_{'}$$

normalized and centered Laplace matrix of not weighted graph *G* defined as

$$
\widehat{\mathbf{L}} = \frac{1}{\sqrt{\widehat{a}\_n}} \Big[ (\mathbf{D} - \mathbf{A}) - \mathbb{E}(\mathbf{D} - \mathbf{A}) \Big].
$$

We shall consider the weighted graphs *<sup>G</sup>* = (*V*, *<sup>E</sup>*, *<sup>w</sup>*) as well with weight function *wjk* = *wkj* = *Xjk*, where, for 1 ≤ *j* ≤ *k* ≤ *n*, there are independent random variables s.t.

$$\mathbb{E}X\_{jk} = 0, \quad \mathbb{E}X\_{jk}^2 = \sigma\_{jk}^2.$$

The distribution of *Xjk* may depend on *n*, but for brevity, we shall omit the index *n* in the notations. We introduce the quantity

$$a\_n = \frac{1}{n} \sum\_{i,j=1}^n p\_{ij}^{(n)} \sigma\_{ij}^2. \tag{3}$$

The quantity *an* may be interpreted as the expected mean degree of graph *<sup>G</sup>*. With graph *<sup>G</sup>*, we consider the adjacency matrix

$$\mathbf{A} = \begin{bmatrix} A\_{i\bar{\jmath}} X\_{i\bar{\jmath}} \end{bmatrix}$$

and normalized Laplace or Markov matrix

$$
\tilde{\mathbf{L}} = \frac{1}{\sqrt{a\_n}} (\bar{\mathbf{D}} - \tilde{\mathbf{A}})\_\prime
$$

where

$$\tilde{\mathbf{D}} = \text{diag}(\tilde{d}\_1, \dots, \tilde{d}\_n) \text{ with } \tilde{d}\_{\bar{i}} = \sum\_{j:j \neq i} A\_{i\bar{j}} X\_{i\bar{j}}.$$

We shall denote by *λ*1(**B**) ≥ *λ*2(**B**) ≥ ··· ≥ *λn*(**B**) ordered eigenvalues of a symmetric *<sup>n</sup>* <sup>×</sup> *<sup>n</sup>* matrix **<sup>B</sup>**. We shall consider the spectrum of matrices **<sup>L</sup>**, and **<sup>L</sup>**. For brevity of notation, we shall write *<sup>μ</sup><sup>j</sup>* <sup>=</sup> *<sup>λ</sup>j*(**L**), and *<sup>μ</sup><sup>j</sup>* <sup>=</sup> *<sup>λ</sup>j*(**L**). We introduce the corresponding empirical spectral distributions (ESDs)

$$\hat{G}\_{\boldsymbol{\Pi}}(\boldsymbol{x}) := \frac{1}{n} \sum\_{j=1}^{n} \mathbb{I}\{\hat{\mu}\_{j} \le \boldsymbol{x}\}, \quad \tilde{G}\_{\boldsymbol{\Pi}}(\boldsymbol{x}) := \frac{1}{n} \sum\_{j=1}^{n} \mathbb{I}\{\tilde{\mu}\_{j} \le \boldsymbol{x}\}. \tag{4}$$

In the paper [11], in 2006, it was shown under conditions *pij* (*n*) ≡ 1 and *σ*<sup>2</sup> *ij* ≡ 1, for any <sup>1</sup> <sup>≤</sup> *<sup>i</sup>*, *<sup>j</sup>* <sup>≤</sup> *<sup>n</sup>*, that ESD *<sup>G</sup>n*(*x*) weakly converges in probability to the non-random distribution function *G*(*x*), which is defined as a free convolution of the Gaussian distribution function and the semicircular distribution function (the definition of free convolution see, for instance, in [13]).

In [4], in 2010, the authors considered the limit of *<sup>G</sup>n*(*x*) for weighted Erdös–Renyi graphs (*pij* (*n*) ≡ *pn*) with equivariance weights (*σ*<sup>2</sup> *ij* ≡ *<sup>σ</sup>*2). Assuming that *pn* bounded away from zero and one, and that random variables *Xij* have the fourth moment, they proved that *<sup>G</sup>n*(*x*) weakly converges to the same function *<sup>G</sup>*(*x*).

In [14], in 2020, Yizhe Zhu considered the so-called graphon approach to the limiting spectral distribution of Wigner-type matrices. The author described the moments of the limit spectral measure in terms 2279–2375, of graphon of the variance profile matrix Σ = (*σ*<sup>2</sup> *ij*) and number of trees with a fixed number of vertices. Recently, Chatterjee and Hazra published the paper [12] in which the approach of Zhu was developed.

In [15], in 2021, the author stated simple conditions on probabilities *pij* for the convergence of ESD of adjacency matrices to the semicircular law. In the present paper, we consider the convergence of ESD *<sup>G</sup>n*(*x*) and *<sup>G</sup>n*(*x*) under similar conditions to the function *G*(*x*).

First, we formulate some conditions which we shall use in the present paper.

• Condition *CP*(0):

$$a\_{\text{ll}} \to \infty, \text{ as } n \to \infty. \tag{5}$$

• Condition *CP*(0*a*): There exists a constant *C*<sup>0</sup> s.t.

$$\sup\_{n\geq 1} \max\_{1\leq j,k\leq n} \frac{1}{a\_n} p\_{jk}^{(n)} \sigma\_{jk}^2 \leq \mathbb{C}\_0 < \infty.$$

• Condition *CP*(1):

$$\lim\_{n \to \infty} \frac{1}{na\_n} \sum\_{j=1}^n \sum\_{k=1}^n |p\_{jk}^{(n)} \sigma\_{jk}^2 - \frac{a\_n}{n}| = 0.$$

• Condition *CX*(1): For any *τ* > 0

$$L\_{\pi}(\tau) := \frac{1}{na\_n} \sum\_{i,j=1}^n p\_{ij}^{(n)} \mathbb{E}X\_{ij}^2 \mathbb{I}\{ |X\_{ij}| > \tau \sqrt{a\_n} \} \to 0 \text{ as } n \to \infty. \tag{6}$$

**Remark 1.** *Condition CP*(1) *is equivalent to the following two conditions together*

• *Condition CP*(1*a*)*:*

$$\lim\_{n \to \infty} \frac{1}{n} \sum\_{j=1}^{n} |\frac{1}{a\_n} \sum\_{k=1}^{n} p\_{jk}^{(n)} \sigma\_{jk}^2 - 1| = 0. \tag{7}$$

• *Condition CP*(1*b*)*:*

$$\lim\_{n \to \infty} \frac{1}{na\_n} \sum\_{j=1}^n \sum\_{k=1}^n |p\_{jk}^{(n)} \sigma\_{jk}^2 - \frac{1}{n} \sum\_{l=1}^n p\_{jl}^{(n)} \sigma\_{jl}^2| = 0.$$

The main result of the present paper is the following theorem.

**Theorem 1.** *Let conditions CP*(0)*, CP*(0*a*)*, CP*(1)*, CX*(1) *hold. Then, ESDs <sup>G</sup>n*(*x*) *converge in probability to the distribution function G*(*x*)*, which is the additive free convolution of the standard normal distribution function and the semi-circular distribution function:*

$$\lim\_{n \to \infty} \tilde{G}\_n(x) = G(x).$$

**Corollary 1.** *Assume that σ*<sup>2</sup> *jk* <sup>≡</sup> *<sup>σ</sup>*<sup>2</sup> *and pjk*(*n*) <sup>≡</sup> *pn for any* <sup>1</sup> <sup>≤</sup> *<sup>j</sup>*, *<sup>k</sup>* <sup>≤</sup> *<sup>n</sup> and any <sup>n</sup>* <sup>≥</sup> <sup>1</sup>*. Assume that npn* <sup>→</sup> <sup>∞</sup> *as <sup>n</sup>* <sup>→</sup> <sup>∞</sup> *and assume that condition CX*(1) *holds. Then, ESDs <sup>G</sup>n*(*x*) *converge in probability to the distribution function G*(*x*)*, which is the additive free convolution of the standard normal distribution function and the semi-circular distribution function:*

$$\lim\_{n \to \infty} \bar{G}\_n(x) = G(x).$$

**Proof of Corollary.** Note that in the case *pjk*(*n*) ≡ *pn* and *σ*<sup>2</sup> *jk* = *<sup>σ</sup>*2, we have

$$a\_n = n p\_n \sigma^2.$$

Condition *CP*(0) is fulfilled. Moreover, it is simple to see that all conditions of Theorem 1 are fulfilled.

**Theorem 2.** *Let conditions*

$$
\widehat{a}\_n \to \infty \text{ as } n \to \infty,\tag{8}
$$

*and*

$$\lim\_{n \to \infty} \frac{1}{n\widehat{a}\_n} \sum\_{j=1}^n \sum\_{k=1}^n |p\_{jk}^{\quad (n)}(1 - p\_{jk}^{\quad (n)}) - \frac{\widehat{a}\_n}{n}| = 0 \tag{9}$$

*hold. Then, ESDs <sup>G</sup>n*(*x*) *converge in probability to the distribution function <sup>G</sup>*(*x*)*, which is the additive free convolution of the standard normal distribution function and the semicircular distribution function,*

$$\lim\_{n \to \infty} \widehat{G}\_n(x) = G(x).$$

In what follows, we shall omit the superscript (*n*) in the notations of *p* (*n*) *ij* , writing *pij* instead.

#### **2. Toy Example**

Consider graph {*V*, *E*} with clique number *d* = *d*(*n*) where |*V*| = *n*. The clique number of graph *G* is the size of the largest clique or a maximal clique of the graph. Let M denote the clique of the graph. Define the weights of vertices as follows

$$W\_{\bar{i}} = \begin{cases} d\_{\prime} & \text{if } i \in \mathcal{M} \\ 1, otherwise. \end{cases} \text{ .}$$

We introduce edge probabilities as follows

$$p\_{ij} = \mathcal{W}\_i \mathcal{W}\_j / d^2 = \begin{cases} \frac{1}{d^2} & \text{if } i \notin \mathcal{M}, j \notin \mathcal{M}, \\\frac{1}{d} \text{ if } i \in \mathcal{M}, j \notin \mathcal{M}, \text{ or } i \notin \mathcal{M}, j \in \mathcal{M}, \\\mathbf{1}, \text{ if } i, j \in \mathcal{M}. \end{cases} \tag{10}$$

We assume that *σ*<sup>2</sup> *jk* ≡ *<sup>σ</sup>*<sup>2</sup> = 1, for 1 ≤ *<sup>j</sup>*, *<sup>k</sup>* ≤ *<sup>n</sup>*. In this case, we have

$$\sum\_{j,k=1}^{n} p\_{jk} = (\frac{n-d}{d} + d)^2,\tag{11}$$

and

$$a\_n = \frac{n}{d^2} (1 + a\_n)^2,\text{ where } a\_n = \frac{d(d-1)}{n}.\tag{12}$$

**Proposition 1.** *Under condition*

$$\lim\_{n \to \infty} \frac{d^2(n)}{n} = 0 \tag{13}$$

*conditions CP*(0)*, CP*(0*a*) *and CP*(1) *hold.*

#### **Proof.** We have

$$\frac{1}{na\_n} \sum\_{j,k=1}^n |p\_{jk} - \frac{a\_n}{n}| = \frac{1}{na\_n} (\frac{1}{d^2} (2a\_n + a\_n^2)(n-d)^2 + 2|\frac{1}{d} - \frac{1}{d^2}(1+a\_n)^2| d(n-d) + 1)$$

$$\begin{split} d^2(1 - \frac{1}{d^2}(1+a\_n)^2) \\ &= \frac{a\_n(1+2a\_n)(n-d)^2}{n^2(1+a\_n)^2} + 2|1 - \frac{1}{d}(1+a\_n)^2| \frac{d^2(n-d)}{n^2(1+a\_n)^2} \\ &+ \frac{d^4}{n^2(1+a\_n)^2} (1 - \frac{1}{d^2}(1+a\_n)^2). \end{split} \tag{14}$$

It is straightforward to check that for *d* = *d*(*n*) satisfying the condition (13), we have *α<sup>n</sup>* = *o*(1), *an* → ∞ as *n* → ∞ and

$$\lim\_{n \to \infty} \frac{1}{na\_n} \sum\_{j,k=1}^n |p\_{jk} - \frac{a\_n}{n}| = 0. \tag{15}$$

That means that the conditions *CP*(0*a*) and *CP*(1) hold. Furthermore,

$$\max\_{1 \le k \le n} \sum\_{l=1}^{n} p\_{kl} \le \frac{n}{d} + d. \tag{16}$$

It is straightforward to check as well that

$$\sup\_{n\geq 1} \frac{\max\_{1\leq k,l\leq n} p\_{kl}}{a\_n} \leq \mathcal{C}\_0. \tag{17}$$

Thus, Proposition 1 is proved.

#### **3. Proof of Theorem 1**

We shall use the method of the Stieltjes transform for the proof of Theorem 1. Introduce the resolvent matrix of matrix *L*,

$$\mathbf{R} := \mathbf{R}\_{\bar{\mathbf{L}}}(z) = (\bar{\mathbf{L}} - z\mathbf{I})^{-1}\prime$$

where **I** := **I***<sup>n</sup>* denotes a *n* × *n* unit matrix. Let *mn*(*z*) denote the Stieltjes transform of the empirical spectral distribution function of matrix **<sup>L</sup>**,

$$m\_n(z) = \int\_{-\infty}^{\infty} \frac{1}{x - z} d\bar{G}\_n(x) = \frac{1}{n} \text{Tr} \mathbf{R}.$$

For the proof of Theorem 1, it is enough to prove the convergence of the Stieltjes transforms for any fixed *z* = *u* + *iv* with *v* > 0; moreover, it is enough to prove that *mn*(*z*) converges to some function, say *s*(*z*), in some set with a non-empty interior. According to Lemma A2, it is enough to prove the convergence of the expected Stieltjes transform *sn*(*z*) = E*mn*(*z*) = E1 *<sup>n</sup>*Tr**R** only. Using Lemma A1, the result of Theorem 1 follows from the relation

$$s\_n(z) - s\_{\mathcal{R}}(z + s\_n(z)) \to 0 \text{ as } n \to \infty,$$

where *sg*(*z*) denotes the Stieltjes transform of the standard Gaussian distribution,

$$s\_{\mathcal{S}}(z) = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} \frac{1}{x - z} \exp\{-\frac{x^2}{2}\} dx.$$

First, we need some additional notations. By **<sup>L</sup>**(*j*), we denote the matrix obtained from **<sup>L</sup>** by replacing diagonal entries *Lll*, *<sup>l</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>* with **<sup>L</sup>**(*j*) *ll* <sup>=</sup> <sup>√</sup><sup>1</sup> *an* ∑*<sup>r</sup>*<sup>=</sup>*<sup>j</sup> AlrXlr*. Note that the diagonal entries of matrix **<sup>L</sup>**(*j*) (except *<sup>L</sup>*(*j*) *jj* ) do not depend on the r.v. values *Xjk*, *Ajk* for *<sup>k</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>*. We denote by **<sup>D</sup>** (*j*) the diagonal matrix with diagonal entries *<sup>D</sup>*(*j*) *ll* <sup>=</sup> <sup>√</sup><sup>1</sup> *an AjlXjl*. Denote by **<sup>R</sup>**(*j*) the resolvent matrix corresponding to the matrix **<sup>L</sup>**(*j*),

$$
\tilde{\mathbf{R}}^{(j)} = (\tilde{\mathbf{L}}^{(j)} - z\mathbf{I})^{-1}.
$$

We have

$$\mathbf{R} = \overline{\mathbf{R}}^{(j)} - \mathbf{R}\overline{\mathbf{D}}^{(j)}\overline{\mathbf{R}}^{(j)}.\tag{18}$$

Using this formula, we may write

$$R\_{\vec{j}\vec{j}} = \overline{R}\_{\vec{j}\vec{j}}^{(j)} - \frac{1}{\sqrt{a\_n}} \sum\_{r=1}^{n} A\_{\vec{j}r} X\_{\vec{j}r} R\_{\vec{j}r} \overline{R}\_{r\vec{j}}^{(j)}.\tag{19}$$

According to Lemma A5, we obtain

$$\lim\_{n \to \infty} \left| \frac{1}{n} \text{Tr} \mathbf{R} - \frac{1}{n} \sum\_{j=1}^{n} \bar{R}\_{j\bar{j}}^{(j)} \right| = 0. \tag{20}$$

Furthermore, let us denote by **<sup>L</sup>**(*j*,0) the matrix obtained from **<sup>L</sup>**(*j*) by deleting both the *<sup>j</sup>*-th column and *<sup>j</sup>*-th row. **<sup>R</sup>**(*j*,0) denotes the resolvent matrix corresponding to the matrix **<sup>L</sup>**(*j*,0). Using the Schur complement formula, we may write

$$\tilde{R}\_{jj}^{(j)} = \frac{1}{\tilde{L}\_{jj}^{(j)} - z - \sum\_{l,k:\mathbb{I}\neq j,k\neq j} [\tilde{\mathbb{R}}^{(j,0)}(z)]\_{kl} \tilde{L}\_{jl} \tilde{L}\_{jk}}.\tag{21}$$

Introduce the following notations

$$\begin{split} \varepsilon\_{j1} &:= \sum\_{\substack{l+k \ l \neq j,k \neq j}} [\bar{R}^{(j,0)}]\_{kl} \bar{L}\_{jl} \bar{L}\_{jk}, \quad \varepsilon\_{j2} = \frac{1}{a\_{n}} \sum\_{k:k \neq j} [\bar{R}^{(j,0)}]\_{kl} (A\_{jk} - p\_{jk}) X\_{jk}^{2}, \\ \varepsilon\_{j3} &= \frac{1}{a\_{n}} \sum\_{k:k \neq j} [\bar{R}^{(j,0)}]\_{kl} p\_{jk} (X\_{jk}^{2} - \sigma\_{jk}^{2}), \\ \varepsilon\_{j4} &= \frac{1}{a\_{n}} \sum\_{k:k \neq j} [\bar{R}^{(j,0)}]\_{kl} (p\_{jk} \sigma\_{jk}^{2} - \frac{1}{n} \sum\_{l=1}^{n} p\_{jl} \sigma\_{jl}^{2}), \\ \varepsilon\_{j5} &= \frac{1}{n} \sum\_{k:k \neq j} \bar{R}^{(j,0)}\_{kk} \left( \frac{1}{a\_{n}} \sum\_{l=1}^{n} p\_{jl} \sigma\_{jl}^{2} - 1 \right), \\ \varepsilon\_{j6} &= \frac{1}{n} \sum\_{k:k \neq j} \bar{R}^{(j,0)}\_{kk} - \frac{1}{n} \sum\_{k=1}^{n} R\_{kk}, \\ \varepsilon\_{j7} &= \frac{1}{n} \sum\_{k=1}^{n} [R\_{kk} - \mathbb{E} \frac{1}{n} \sum\_{k=1}^{n} [R(z)]\_{kl}. \end{split}$$

Put *ε<sup>j</sup>* = ∑<sup>7</sup> *<sup>ν</sup>*=<sup>1</sup> *εjν*. Let

$$\mathcal{L}\_{\vec{J}} := \vec{L}\_{\vec{J}}^{(j)} = \frac{1}{\sqrt{a\_n}} \sum\_{k \ne j} A\_{jk} X\_{jk} \dots$$

In these notations, we may write

$$\mathbb{E}[\vec{\mathcal{R}}^{(j)}]\_{\vec{j}\vec{j}} = \mathbb{E} \frac{1}{\zeta\_{\vec{j}} - z - s\_n(z) - \varepsilon\_{\vec{j}}}.$$

We continue as follows

$$\mathbb{E}\widetilde{R}^{(j)}\_{\langle j\rangle} = \mathbb{E}\frac{1}{\zeta\_{\rangle} - z - s\_{\hbar}(z)} + \mathbb{E}\frac{\varepsilon\_{j}}{\zeta\_{\rangle} - z - s\_{\hbar}(z)}\widetilde{R}^{(j)}\_{\langle j\rangle}.\tag{22}$$

Summing the last equality in *j* = 1, . . . , *n*, we obtain

$$s\_{\rm li}(z) = \mathbb{E}\frac{1}{\mathbb{J}\_{\rm J} - z - s\_{\rm li}(z)} + \mathbb{E}\frac{\varepsilon\_{\rm J}}{\mathbb{J}\_{\rm J} - z - s\_{\rm li}(z)}R\_{\rm J\rm J}^{(\rm J)} + \mathbb{E}(R\_{\rm J,\rm J} - \tilde{R}\_{\rm J,\rm J}^{(\rm J)}),\tag{23}$$

where J denotes a random variable which is uniform distributed on the set {1, ... , *n*} and independent on all other random variables. Denote by *Fn*(*x*) the distribution function of *ζ*J and let

$$\Delta\_{\mathfrak{m}} = \sup\_{\mathfrak{x}} |F\_{\mathfrak{m}}(\mathfrak{x}) - \Phi(\mathfrak{x})|\_{\mathsf{H}}$$

where Φ(*x*) denotes the distribution function of the standard normal law. Denote the Stieltjes transform of the standard normal law by *sg*(*z*),

$$s\_{\mathfrak{X}}(z) = \int\_{-\infty}^{\infty} \frac{1}{x - z} d\Phi(x).$$

Note that

$$\mathbb{E}\frac{1}{\zeta\_{\mathbb{J}} - z - \widehat{s}\_{\mathbb{I}}(z)} - s\_{\mathbb{J}}(z + \widehat{s}\_{\mathbb{I}}(z)) = \int\_{-\infty}^{\infty} \frac{1}{x - z - \widehat{s}\_{\mathbb{II}}(z)} d(F\_{\mathbb{II}}(\mathbf{x}) - \Phi(\mathbf{x})).\tag{24}$$

Integrating by part, we obtain

$$|\mathbb{E}\frac{1}{\mathbb{J}\_{\mathbb{J}} - z - \widehat{s}\_{\mathbb{M}}(z)} - s\_{\mathbb{J}}(z + \widehat{s}\_{\mathbb{M}}(z))| \le 2v^{-2}\Delta\_{\mathbb{M}}.\tag{25}$$

According to Lemma A3,

$$\left| \mathbb{E} \frac{1}{\mathbb{Q}\_{\mathbb{J}} - z - s\_n(z)} - s\_{\mathcal{J}}(z + s\_{\mathcal{U}}(z)) \right| \to 0 \text{ as } n \to \infty. \tag{26}$$

Note that

$$|\mathbb{E}\frac{\varepsilon\_{\mathbb{J}}}{\mathbb{E}\_{\mathbb{J}} - z - s\_n(z)} R\_{\mathbb{J}\mathbb{J}}| \le \upsilon^{-2} \mathbb{E}|\varepsilon\_{\mathbb{J}}|.\tag{27}$$

It remains to prove that <sup>E</sup>|*ε*J| → 0 and <sup>E</sup>(*R*J,<sup>J</sup> <sup>−</sup> *<sup>R</sup>*(J) <sup>J</sup>,<sup>J</sup> ) → 0 as *n* → ∞. The last claim follows from Lemmas A6–A11, Lemma A2 and equality (20).

Thus, Theorem 1 is proved.

#### **4. The Proof of Theorem 2**

Similar to the previous section, we may write that diagonal entries of matrix *L*

$$
\hat{\mathbb{G}}\_{j} = \frac{1}{\sqrt{\hat{a}\_{\text{ll}}}} \sum\_{k \neq j} (A\_{jk} - p\_{jk}). \tag{28}
$$

Let **<sup>R</sup>** = (**L** <sup>−</sup> *<sup>z</sup>***I**)−<sup>1</sup> denote the resolvent matrix of the matrix **<sup>L</sup>**. Let *<sup>j</sup>* ∈ {1, ... , *<sup>n</sup>*} be fixed. We denote by **<sup>L</sup>**(*j*) the matrix obtained from **<sup>L</sup>** by replacing diagonal entries *Lll*, *l* = 1, ... , *n* with **L**(*j*) *ll* <sup>=</sup> <sup>√</sup><sup>1</sup> *an* <sup>∑</sup>*<sup>r</sup>*<sup>=</sup>*j*(*Alr* <sup>−</sup> *plr*). Let **<sup>D</sup>** (*j*) <sup>=</sup> **<sup>L</sup>** <sup>−</sup> **<sup>L</sup>**(*j*). By definition, **<sup>D</sup>** (*j*) <sup>=</sup> diag(*<sup>d</sup>* (*j*) <sup>1</sup> , ... , *d* (*j*) *<sup>n</sup>* ) is a diagonal matrix with *d* (*j*) *ll* <sup>=</sup> <sup>√</sup><sup>1</sup> *an* (*Ajl* <sup>−</sup> *pjl*), for *<sup>l</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>*. Note that diagonal entries of matrix **<sup>L</sup>**(*j*) (except *<sup>L</sup>*(*j*) *jj* ) do not depend on the r.v. values *Ajk* for *<sup>k</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>*. By **<sup>L</sup>**(*j*,0), we denote the matrix obtained from **<sup>L</sup>**(*j*) by deleting both the *<sup>j</sup>*-th column and *<sup>j</sup>*-th row. **<sup>R</sup>**(*j*,0) denotes the resolvent matrix corresponding to the matrix **<sup>L</sup>**(*j*,0). Analogously to (21), we represent the diagonal entries of resolvent matrix **<sup>R</sup>**(*j*) = (**L**(*j*) <sup>−</sup> *<sup>z</sup>***I**)−<sup>1</sup> in the form

$$
\widehat{\mathcal{R}}\_{jl}^{(j)} = \frac{1}{\widehat{L}\_{jj}^{(j)} - z - \sum\_{l,k: l \neq j, k \neq j} \widehat{\mathcal{R}}\_{kl}^{(j,0)} \widehat{L}\_{jl} \widehat{L}\_{jk}}.\tag{29}
$$

Introduce the following notations

$$\begin{split} \hat{\varepsilon}\_{j1} &:= \sum\_{\{j:k:\,l\neq j,k\neq j\}} [\hat{R}^{(j,0)}]\_{kl} \hat{L}\_{jl} \hat{L}\_{jk\prime} \quad \hat{\varepsilon}\_{j2} = \frac{1}{\tilde{\alpha}\_n} \sum\_{k:k\neq j} [\hat{R}^{(j,0)}]\_{kk} ((A\_{jk} - p\_{jk})^2 - p\_{jk}(1 - p\_{jk})) \\ \hat{\varepsilon}\_{j3} &= \frac{1}{\tilde{\alpha}\_n} \sum\_{k:k\neq j} \hat{R}^{(j,0)}\_{kk} \left( p\_{jk}(1 - p\_{jk}) - \frac{\tilde{\alpha}\_n}{n} \right), \\ \hat{\varepsilon}\_{j4} &= \frac{1}{n} \sum\_{k:k\neq j} \hat{R}^{(j,0)}\_{kk} - \frac{1}{n} \sum\_{k=1}^n \hat{R}\_{kk\prime} \\ \hat{\varepsilon}\_{j5} &= \frac{1}{n} \sum\_{k=1}^n \hat{R}\_{kk} - \mathbb{E} \frac{1}{n} \sum\_{k=1}^n \hat{R}\_{kk}. \end{split}$$

Put *ε<sup>j</sup>* <sup>=</sup> <sup>∑</sup><sup>5</sup> *<sup>ν</sup>*=<sup>1</sup> *εjν*. Let

$$
\widehat{\zeta}\_j := \widehat{L}\_{jj}^{(j)} = \frac{1}{\sqrt{\widehat{a}\_n}} \sum\_{k \ne j} (A\_{jk} - p\_{jk}) \dots
$$

In these notations, we may write

$$\mathbb{E}[\widehat{\mathcal{R}}^{(j)}]\_{j\boldsymbol{j}} = \mathbb{E}\frac{1}{\widehat{\zeta}\_{j} - z - \widehat{s}\_{\boldsymbol{n}}(z) - \widehat{\varepsilon}\_{j}}'$$

where *sn*(*z*) = <sup>E</sup><sup>1</sup> *<sup>n</sup>*Tr**R**. We continue as follows

$$\mathbb{E}[\hat{\mathcal{R}}^{(j)}]\_{\langle j\rangle} = \mathbb{E}\frac{1}{\hat{\mathbb{G}}\_{j} - z - \hat{s}\_{\mathfrak{n}}(z)} + \mathbb{E}\frac{\hat{\mathbb{G}}\_{j}}{\mathbb{G}\_{j} - z - \hat{s}\_{\mathfrak{n}}(z)}\hat{R}^{(j)}\_{\langle j\rangle}(z). \tag{30}$$

Summing the last equality in *j* = 1, . . . , *n*, we obtain

$$\hat{s}\_{\rm ll}(z) = \mathbb{E} \frac{1}{\hat{\mathbb{G}}\_{\rm J} - z - \hat{s}\_{\rm n}(z)} + \mathbb{E} \frac{\hat{\mathbb{E}}\_{\rm J}}{\hat{\mathbb{G}}\_{\rm J} - z - \hat{s}\_{\rm n}(z)} \hat{R}\_{\rm JJ}^{(\rm J)} + \mathbb{E} (\hat{R}\_{\rm JJ} - \hat{R}\_{\rm JJ}^{\rm J}),\tag{31}$$

where J denotes a random variable which is uniform distributed on the set {1, ... , *n*} and independent on all other random variables. Similar to inequality (25), we have

$$|\mathbb{E}\frac{1}{\widehat{\zeta}\_{j} - z - \widehat{s}\_{n}(z)} - s\_{\mathcal{X}}(z + \widehat{s}\_{n}(z))| \le \frac{1}{v^{2}}\widehat{\Delta}\_{n}.\tag{32}$$

According to Lemma A12

$$\left| \mathbb{E} \frac{1}{\widehat{\zeta}\_{j} - z - \widehat{s}\_{n}(z)} - s\_{\mathcal{S}}(z + \widehat{s}\_{n}(z)) \right| \to 0 \text{ as } n \to \infty. \tag{33}$$

Furthermore, since Im *<sup>z</sup>* <sup>+</sup> Im *sn*(*z*) <sup>≥</sup> *<sup>v</sup>* and <sup>|</sup>*R*(J) JJ | ≤ *v*<sup>−</sup>1, we have

$$|\mathbb{E}\frac{\overline{\mathbb{E}}\_{\mathbb{J}}}{\widehat{\zeta}\_{\mathbb{J}} - z - \widehat{s}\_{\mathbb{N}}(z)}\widehat{R}\_{\mathbb{J}\mathbb{J}}^{(\mathbb{J})}| \leq \nu^{-2} \mathbb{E}|\widehat{\varepsilon}\_{\mathbb{J}}|.\tag{34}$$

By Lemmas A13–A17,

$$\lim\_{M \to \infty} \mathbb{E}|\widehat{\varepsilon}\_{\mathbb{J}}| = 0.\tag{35}$$

Furthermore, we note that

$$
\hat{\mathbf{R}} = \hat{\mathbf{R}}^{(\mathbf{J})} - \hat{\mathbf{R}}^{(\mathbf{J})} \hat{\mathbf{D}}^{(\mathbf{J})} \hat{\mathbf{R}}.\tag{36}
$$

This relation implies that

$$\|\mathbb{E}(\hat{R}\_{\mathrm{JJ}} - \hat{R}\_{\mathrm{JJ}}^{(\mathbb{J})}) \le \max\_{1 \le j \le n} \mathbb{E} \|\hat{\mathbf{R}} - \hat{\mathbf{R}}^{(j)}\| \le \upsilon^{-2} \max\_{1 \le j \le n} \mathbb{E} \|\hat{\mathbf{D}}^{(j)}\|. \tag{37}$$

It is straightforward to check that

$$\mathbb{E}\|\hat{\mathbf{D}}^{(j)}\| \le \frac{1}{\sqrt{\widehat{a}\_{ll}}} \mathbb{E} \max\_{1 \le l \le n} |A\_{jl} - p\_{jl}| \le \frac{1}{\sqrt{\widehat{a}\_{ll}}} \to 0 \text{ as } n \to \infty. \tag{38}$$

Combining relations (33), (35), (38), we obtain

$$\varkappa\_n(z) := s\_n(z) - s\_{\mathcal{R}}(z + s\_n(z)) \to 0 \text{ as } n \to \infty. \tag{39}$$

The last relation and Lemma A1 completed the proof of Theorem 2. Thus, Theorem 2 is proved.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A**

*Definition of Additive Free Convolution*

We give the definition of the additive free convolution of distribution functions following the paper [16] (Section 5).

**Definition A1.** *A pair* (A, *ϕ*) *consisting of a unital algebra* A *and a linear functional ϕ* : A → C *with ϕ*(1) = 1 *is called the free probability space. Elements of* A *are called random variables, the numbers ϕ*(*ai*(1)···*ai*(*n*)) *for such random variables a*1, ... , *ak* ∈ A *are called moments, and the collection of all moments is called the joint distribution of a*1, ... , *ak. Equivalently, we may say that the joint distribution of a*1, ... , *ak is given by the linear functional μa*1,...,*ak* : C*X*1, ... , *Xk* → C *with μa*1,...,*ak* (*P*(*X*1, ... , *Xk*)) = *ϕ*(*P*(*a*1, ... , *ak*))*, where* C*X*1, ... , *Xk denotes the algebra of all polynomials in k non-commutative indeterminantes X*1,..., *Xk.*

If for a given element *a* ∈ A there exists a unique probability measure *μ<sup>a</sup>* on R such that *t kdμa*(*t*) = *ϕ*(*ak*) for all *k* ∈ N, we identify the distribution of *a* with the probability measure *μa*.

**Definition A2.** *Let* (A, *ϕ*) *be a non-commutative probability space.*


Consider two random variables *a* and *b* which are free. Then, distributions of *a* + *b* (in the sense of linear functionals) depend only on the distribution of *a* and *b*.

**Definition A3.** *For free random variables a and b, the distribution of a* + *b is called the free additive convolution of μ<sup>a</sup> and μ<sup>b</sup> and is denoted by*

$$
\mu\_{a \boxplus b} = \mu\_a \boxplus \mu\_b \dots
$$

To compute the free convolution of concrete distributions, we may use the so-called *R*-transform introduced by Voiculescu [17]. Let *s*(*z*) be the Stieltjes transform of some distribution function *F*(*x*). Denote by *s*−1(*z*) the inverse function of *s*(*z*) in the science of composition. Define *R*-transform as follows

$$\mathcal{R}(z) = -\mathbf{s}^{-1}(z) - \frac{1}{z}.$$

Let *F*(*x*) be the semicircle distribution function. Its Stieltjes transform satisfies the equation

$$s^2(z) + zs(z) + 1 = 0$$

Denote by *Rsc*(*z*) the *R*-transform of the semicicular law. Simple calulations show that

$$R\_{\text{sc}}(z) = z.$$

We denote dy *Rf c*(*z*) the *R*-transform of the free convolution semicircular law and Gaussian law. Let *Rg* denote the *R*-transform of the standard normal law. Then

$$R\_{f\varepsilon}(z) = R\_{\&}(z) + R\_{\mathcal{S}}(z) \dots$$

See for instance, refs. [18,19]. Using the definition of the *R*-transform via the Stieltjes transform, we obtain

$$-s\_{fc}^{-1}(z) = z - s\_{\mathcal{g}}^{-1}(z).$$

It is straightforward to show that this equality implies

$$s\_{fc}(z) = s\_{\mathcal{J}}(z + s\_{fc}(z)).\tag{A1}$$

We prove the following simple but important lemma.

**Lemma A1.** *Let a sequence of Stieltjes transforms of the distribution functions Fn*(*x*) *satisfy the equations*

$$s\_n(z) = s\_{\mathcal{S}}(z + s\_n(z)) + \varkappa\_n(z),\tag{A2}$$

*where*

$$\ast\_{\mathbb{N}}(z) \to 0 \text{ as } n \to \infty.$$

*Then, the distribution functions Fn*(*x*) *weakly converge to the distribution function Ff c*(*x*)*, which is free convolution of the semicircular law and the standard normal law.*

**Proof.** It is enough to prove that the Stieltjes transform *sn*(*z*) converges in some region with nonempty interior to the Stieltjes transform *sf c*(*z*), which satisfies equation (A1). We shall consider the region of *<sup>z</sup>* <sup>=</sup> *<sup>u</sup>* <sup>+</sup> *iv* with *<sup>v</sup>* <sup>&</sup>gt; <sup>√</sup>2. Since the derivative of *sg*(*z*) does not exceed the level 1/*v*2, we may write

or

$$|s\_{\mathfrak{N}}(z) - s\_{\mathfrak{m}}(z)| \le 2|\varkappa\_{\mathfrak{N}}(z)| + 2|\varkappa\_{\mathfrak{m}}(z)| \to 0 \text{ as } n, m \to \infty. \tag{A3}$$


The sequence of the Stieltjes transforms *sn*(*z*) is Cauchy; consequently, there exists a limit say *sf c*(*z*) of this sequence,

$$\lim\_{n \to \infty} s\_n(z) = s\_{fc}(z).$$

Taking the limit in the equation (A2), we obtain

$$s\_{fc}(z) = s\_{\mathfrak{J}}(z + s\_{fc}(z)).$$

The last equality implies that *sf c*(*z*) is the Stieltjes transform of the semicircular law and the standard Gaussian law. Thus, Lemma is proved.

#### **Appendix B. Weighted Graphs**

*Appendix B.1. Variance of Stieltjes Transform of Empirical Measure*


1 2

In this section, we estimate the variance of *mn*(*z*) = <sup>1</sup> *<sup>n</sup>*Tr**R**, where **<sup>R</sup>** :<sup>=</sup> **RL**(*z*)=(**L** <sup>−</sup> *<sup>z</sup>***I**)<sup>−</sup>1. We prove the following Lemma.

**Lemma A2.** *For any z* = *u* + *iv with v* > 0*, the following inequality holds*

$$\lim\_{n \to \infty} \mathbb{E} |\frac{1}{n} \text{Tr} \mathbf{R} - \frac{1}{n} \mathbb{E} \text{Tr} \mathbf{R}| = 0. \tag{A4}$$

**Proof.** The proof of this lemma is using the martingale representation of *ξ* − E*ξ*. This method in Random Matrix Theory was firstly used by Girko, see for instance [20]. We introduce the sequence of *σ*-algebras M*<sup>k</sup>* generated by random variables *Xj*,*<sup>l</sup>* for 1 ≤ *j*, *l* ≤ *k*. It is easy to see that M*<sup>k</sup>* ⊂ M*k*<sup>+</sup>1. Denote by E*<sup>k</sup>* the conditional expectation with respect to *σ*-algebra M*k*. For *k* = 0, E<sup>0</sup> = E. Introduce random variables

$$\gamma\_k := \mathbb{E}\_k \frac{1}{n} \text{Tr} \mathbf{R} - \mathbb{E}\_{k-1} \frac{1}{n} \text{Tr} \mathbf{R}.\tag{A5}$$

The sequence of *γk*, for *k* = 1, . . . , *n* is martingale difference and

$$\frac{1}{n}\text{Tr}\mathbf{R} - \mathbb{E}\frac{1}{n}\text{Tr}\mathbf{R} = \sum\_{k=1}^{n} \gamma\_k.$$

Introduce the sub-matrices **<sup>L</sup>**(*k*) obtained from **<sup>L</sup>** by deleting both the *<sup>k</sup>*-th row and *<sup>k</sup>*-th column. Denote by **<sup>R</sup>**(*k*) <sup>=</sup> **<sup>R</sup>**(*k*)(*z*) the corresponding resolvent matrix, **<sup>R</sup>**(*k*)(*z*)=(**L**(*k*) <sup>−</sup> *<sup>z</sup>***I**)<sup>−</sup>1. Note that the matrix **<sup>L</sup>**(*k*) depends on the random variables *Xkl*, *<sup>l</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>* via diagonal entries. To overcome this difficulty, we introduce the matrix **<sup>L</sup>**(*k*,0) obtained from **<sup>L</sup>**(*k*) by replacing diagonal entries with *<sup>L</sup>*(*k*) *jj* :<sup>=</sup> <sup>√</sup><sup>1</sup> *an* <sup>∑</sup>*l*:*<sup>l</sup>*<sup>=</sup>*k*,*<sup>l</sup>*<sup>=</sup>*<sup>j</sup> AjlXjl*. The corresponding resolvent matrix is denoted via **<sup>R</sup>**(k,0). We have now

$$\mathbb{E}\_k \text{Tr} \mathbf{R}^{(k,0)} = \mathbb{E}\_{k-1} \mathbf{R}^{(k,0)}.$$

This allows us to write

$$\begin{split} \gamma\_{k} &= \mathbb{E}\_{k} (\frac{1}{n} (\operatorname{Tr} \mathbf{R} - \operatorname{Tr} \mathbf{R}^{(k)}) - \mathbb{E}\_{k-1} (\frac{1}{n} (\operatorname{Tr} \mathbf{R} - \operatorname{Tr} \mathbf{R}^{(k)})) \\ &+ \mathbb{E}\_{k} (\frac{1}{n} (\operatorname{Tr} \mathbf{R}^{(k)} - \operatorname{Tr} \mathbf{R}^{(k,0)})) - \mathbb{E}\_{k-1} (\frac{1}{n} (\operatorname{Tr} \mathbf{R}^{(k)} - \operatorname{Tr} \mathbf{R}^{(k,0)})) =: \gamma\_{k}^{(1)} + \gamma\_{k}^{(2)}. \end{split}$$

By the overlapping theorem, for *z* = *u* + *iv*,

$$\left| \frac{1}{n} \text{Tr} \mathbf{R}\_{\mathbf{L}}(z) - \frac{1}{n} \text{Tr} \mathbf{R}^{(k)}(z) \right| \le \frac{1}{n\upsilon}. \tag{A6}$$

From here, we immediately obtain

$$\|\gamma\_k^{(1)}\| \le \frac{2}{m\nu'} $$

and *<sup>n</sup>*

$$\sum\_{k=1}^{n} \mathbb{E} |\gamma\_k|^2 \le \frac{4}{n\nu^2}.\tag{A7}$$

To complete the proof, it remains to show that

$$\lim\_{m \to \infty} \sum\_{k=1}^{n} \mathbb{E} |\gamma\_k^{(2)}|^2 = 0. \tag{A8}$$

Note that

$$\mathbb{E}|\gamma\_k^{(2)}|^2 \le 2\mathbb{E}|\frac{1}{n}\text{Tr}\mathbf{R}^{(k)} - \frac{1}{n}\text{Tr}\mathbf{R}^{(k,0)}|^2. \tag{A9}$$

Introduce the diagonal matrix **D**(*k*) with diagonal entries

$$D\_{ll}^{(k)} = \frac{1}{\sqrt{a\_n}} \mathcal{A}\_{kl} \mathcal{X}\_{kl}, \quad l \neq k.$$

In these notations, we have

$$\frac{1}{n}\text{Tr}\mathbf{R}^{(k)} - \frac{1}{n}\text{Tr}\mathbf{R}^{(k,0)} = \frac{1}{n}\text{Tr}\mathbf{R}^{(k)}\mathbf{D}^{(k,0)}\mathbf{R}^{(k,0)} = \frac{1}{n\sqrt{a\_n}}\sum\_{l\neq k, j\neq k} R\_{lj}^{(k)}A\_{kj}X\_{kj}R\_{jl}^{(k,0)}.\tag{A10}$$

This implies that

$$\sum\_{k=1}^{n} \mathbb{E} |\gamma\_k^{(2)}|^2 \le \frac{4}{n^2 a\_n} \sum\_{k=1}^{n} \mathbb{E} |\sum\_{j \ne k} A\_{kj} X\_{kj} \left( \sum\_{l \ne k} R\_{lj}^{(k)} R\_{jl}^{(k,0)} \right) |^2. \tag{A11}$$

We continue this inequality as follows

$$\begin{split} \sum\_{k=1}^{n} \mathbb{E} |\gamma\_{k}^{(2)}|^{2} &\leq \frac{8}{n^{2} a\_{n}} \sum\_{k=1}^{n} \mathbb{E} \Big| \sum\_{j \neq k} A\_{kj} X\_{kj} \Big( \sum\_{l \neq k} R\_{lj}^{(k)} R\_{jl}^{(k,0)} \mathbb{I} \{ A\_{kj} |X\_{kj}| \leq \tau \sqrt{a\_{n}} \} \Big)^{2} \\ &+ \frac{8}{n^{2} a\_{n}} \sum\_{k=1}^{n} \mathbb{E} |\sum\_{j \neq k} A\_{kj} X\_{kj} \Big( \sum\_{l \neq k} R\_{lj}^{(k)} R\_{jl}^{(k,0)} \big) \mathbb{I} \{ A\_{kj} |X\_{kj}| > \tau \sqrt{a\_{n}} \} \Big|^{2} .\end{split} \tag{A12}$$

Applying Cauchy's inequality to the second term in the right-hand side of the last inequality, we obtain

$$\begin{split} \frac{8}{n^{2}a\_{n}}\sum\_{k=1}^{n}\mathbb{E}|\sum\_{j\neq k}A\_{kj}X\_{kj}\{\sum\_{l\neq k}R^{(k,0)}\_{lj}R^{(k)}\_{jl}\}\mathbb{I}\{A\_{kj}|X\_{kj}| &> \tau\sqrt{a\_{n}}\}^{2} \\ \leq \frac{8}{na\_{n}}\sum\_{k=1}^{n}\sum\_{j\neq k}\mathbb{E}A\_{jk}X\_{kj}^{2}|\sum\_{l\neq k}R^{(k)}\_{lj}R^{(k,0)}\_{jl}|^{2}\mathbb{I}\{A\_{kj}|X\_{kj}| > \tau\sqrt{a\_{n}}\}. \end{split} \tag{A13}$$

It is straightforward to check that

$$\left| \sum\_{l \neq k} \mathcal{R}\_{lj}^{(k)} \mathcal{R}\_{jl}^{(k,0)} \right|^2 \leq \upsilon^{-4}. \tag{A14}$$

Using this bound, we obtain

$$\frac{8}{n^{2}a\_{\rm n}}\sum\_{k=1}^{n}\mathbb{E}|\sum\_{j\neq k}A\_{kj}X\_{kj}\left(\sum\_{l\neq k}R\_{lj}^{(k)}R\_{jl}^{(k,0)}\right)\mathbb{I}\{A\_{kj}|X\_{kj}|>\tau\sqrt{a\_{\rm n}}\}|^{2}\leq 8\upsilon^{-4}L\_{\rm n}(\tau).\tag{A15}$$

We estimate now the first term in the r.h.s. of (A12). Using that

$$\mathbf{R}^{(k)} = \mathbf{R}^{(k,0)} + \mathbf{R}^{(k,0)} \mathbf{D}^{(k)} \mathbf{R}^{(k)},\tag{A16}$$

we may write

$$\begin{split} \frac{8}{n^{2}a\_{n}} & \sum\_{k=1}^{n} \mathbb{E} \left| \sum\_{j \neq k} A\_{kj} X\_{kj} \left( \sum\_{l \neq k} R^{(k)}\_{lj} R^{(k,0)}\_{jl} \right) \mathbb{I} \{ A\_{kj} | \mathbf{X}\_{kj} | \leq \tau \sqrt{a\_{n}} \} \right|^{2} \\ & \leq \frac{8}{n^{2}a\_{n}} \sum\_{k=1}^{n} \mathbb{E} \left| \sum\_{j \neq k} A\_{kj} X\_{kj} \left( \sum\_{l \neq k} R^{(k,0)}\_{lj} R^{(k,0)}\_{jl} \right) \mathbb{I} \{ A\_{kj} | \mathbf{X}\_{kj} | \leq \tau \sqrt{a\_{n}} \} \right|^{2} \\ & + \frac{8}{n^{2}a\_{n}^{2}} \sum\_{k=1}^{n} \mathbb{E} \left| \sum\_{j \neq k} A\_{kj} X\_{kj} \left( \sum\_{l \neq k} \sum\_{s=1}^{n} \mathbf{X}\_{ks} A\_{ks} R^{(k,0)}\_{ls} \mathbf{R}^{(k,0)}\_{jl} R^{(k,0)}\_{jl} \right) \mathbb{I} \{ A\_{kj} | \mathbf{X}\_{kj} | \leq \tau \sqrt{a\_{n}} \} \right|^{2}. \end{split} \tag{A17}$$

By the independence of random variables *AjkXjk* for *<sup>j</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>* and matrix **<sup>R</sup>**(*k*,0), we have

$$\begin{split} &\frac{8}{n^{2}a\_{n}}\sum\_{k=1}^{n}\mathbb{E}\left|\sum\_{j\neq k}A\_{kj}\mathbf{X}\_{kj}\left(\sum\_{l\neq k}\mathsf{R}^{(k,0)}\_{lj}\,\mathbf{R}^{(k,0)}\_{jl}\right)\mathbb{I}\{A\_{kj}|\mathbf{X}\_{kj}|\leq\pi\sqrt{a\_{n}}\}\right|^{2} \\ &\leq \frac{8}{n^{2}a\_{n}\upsilon^{4}}\sum\_{k=1}^{n}\sum\_{j\neq k}p\_{jk}\sigma\_{jk}^{2}+\frac{1}{n^{2}a\_{n}^{2}\tau^{2}\upsilon^{4}}\sum\_{k=1}^{n}\left(\sum\_{j=1}^{n}p\_{jk}\mathbb{E}X\_{jk}^{2}\mathbb{I}\{|X\_{jk}|>\tau\sqrt{a\_{n}}\}\right)^{2} \\ &\leq \frac{8}{n\upsilon^{4}}+\left(\frac{L\_{n}(\tau)}{\tau\upsilon^{2}}\right)^{2}. \end{split} \tag{A18}$$

For the second term in the r.h.s. of (A17), we have

$$\begin{split} &\frac{8}{n^{2}\tilde{a}\_{n}^{2}}\sum\_{k=1}^{n}\mathbb{E}\left|\sum\_{j\neq k}A\_{kj}X\_{kj}\left(\sum\_{l\neq k}\sum\_{s=1}^{n}X\_{ks}A\_{ks}R\_{ls}^{(k,0)}R\_{sj}^{(k,0)}R\_{jl}^{(k,0)}\right)\mathbb{I}\{A\_{kj}|X\_{kj}| \leq \tau\sqrt{a\_{n}}\}\right|^{2} \\ &=\frac{8}{n^{2}\tilde{a}\_{n}^{2}}\sum\_{k=1}^{n}\mathbb{E}\left|\sum\_{s\neq k}A\_{ks}X\_{ks}\left(\sum\_{j=1}^{n}X\_{kj}A\_{kj}\sum\_{l\neq k}R\_{ls}^{(k,0)}R\_{sj}^{(k,0)}R\_{jl}^{(k,0)}\mathbb{I}\{A\_{kj}|X\_{kj}| \leq \tau\sqrt{a\_{n}}\}\right)\right|^{2} \\ &\leq\frac{8}{n\tilde{a}\_{n}^{2}}\sum\_{k=1}^{n}\mathbb{E}\left|\sum\_{s\neq k}A\_{ks}|X\_{ks}|^{2}\right|\sum\_{j=1}X\_{kj}A\_{kj}\sum\_{l\neq k}R\_{ls}^{(k,0)}R\_{js}^{(k,0)}R\_{jl}^{(k,0)}\mathbb{I}\{A\_{kj}|X\_{kj}| \leq \tau\sqrt{a\_{n}}\}\right|^{2}. \end{split}$$

Note that

$$\sum\_{r=1}^{n} |R\_{rj}^{(k)}| \left| \sum\_{l \neq k} \hat{R}\_{lr}^{(k)} \hat{R}\_{jl}^{(k)} \right| \leq \left( \sum\_{r=1}^{n} |R\_{jr}^{(k)}|^2 \right)^{\frac{1}{2}} \left( \sum\_{r=1}^{n} |[R^{(k,0)}]\_{jr}^2|^2 \right)^{\frac{1}{2}} \leq \upsilon^{-3}. \tag{A20}$$

Using this inequality, we obtain

$$\begin{split} \frac{8}{n^2 a\_n^2} \sum\_{k=1}^n \mathbb{E} \left| \sum\_{j \neq k} A\_{kj} \mathcal{X}\_{kj} \left( \sum\_{l \neq k} \sum\_{r=1}^n \mathcal{X}\_{kr} A\_{kr} \mathcal{R}\_{lr}^{(k,0)} \mathcal{R}\_{rj}^{(k,0)} \mathcal{R}\_{jl}^{(k,0)} \right) \right|^2 \prod\_{r=1}^n \mathbb{I} \{ A\_{kr} | \mathcal{X}\_{kr} | \le \pi \sqrt{a\_n} \} \\ \leq \frac{8\pi^2}{na\_n \upsilon^6} \sum\_{k=1}^n \sum\_{j \neq k} p\_{jk} \sigma\_{jk}^2 = \frac{8\pi^2}{\upsilon^6} . \end{split} \tag{A21}$$

Combining inequalities (A7), (A12), (A20), we obtain

$$|\mathbb{E}|\text{Tr}\mathbf{R} - \mathbb{E}\text{Tr}\mathbf{R}|^2 \le \frac{\mathbb{C}}{n\upsilon^2} + \frac{\mathbb{C}\tau^2}{\upsilon^6} + \frac{\mathbb{C}L\_n(\tau)}{\upsilon^4}.\tag{A22}$$

Passing to the limit first in *n* → ∞ and then in *τ* → 0, we obtain

$$\lim\_{n \to \infty} \mathbb{E} |\frac{1}{n} (\text{Tr} \mathbf{R} - \mathbb{E} \text{Tr} \mathbf{R})|^2 = 0. \tag{A23}$$

Thus, lemma is proved.

In what follows, we shall assume that *z* = *u* + *iv* is fixed.

*Appendix B.2. Convergence of Diagonal Entries Distribution Functions of Laplace Matrices to the Normal Law*

**Lemma A3.** *Under conditions CP*(0) *and CX*(0)*, we have*

$$\lim\_{n} \frac{1}{n} \sum\_{j=1}^{n} \frac{\max\_{1 \le k \le n} p\_{jk} \sigma\_{jk}^{2}}{a\_n} = 0. \tag{A24}$$

**Proof.** We fix arbitrary *τ* > 0. We may write

$$\frac{1}{n}\sum\_{j=1}^{n}\frac{\max\limits\_{1\le k\le n} p\_{jk}\sigma\_{jk}^{2}}{a\_{n}} \le \tau^{2} + \frac{1}{na\_{n}}\sum\_{j=1}^{n}\sum\_{k=1}^{n} p\_{jk}\mathbb{E}|X\_{jk}|^{2}\mathbb{I}\{|X\_{jk}| < \tau\sqrt{a\_{n}}\}.\tag{A25}$$

By condition *CX*(0), we obtain

$$\limsup\_{n\to\infty} \frac{1}{n} \sum\_{j=1}^n \frac{\max\_{1\le k\le n} p\_{jk} \sigma\_{jk}^2}{a\_n} \le \tau^2.$$

Because *τ* is arbitrary, we obtain the claim.

**Lemma A4.** *Under conditions CP*(0)*, CP*(2) *and CX*(0)*, CX*(1)*, we have*

$$\lim\_{n \to \infty} \sup\_{\mathbf{x}} |F\_{\mathbb{R}}(\mathbf{x}) - \Phi(\mathbf{x})| = 0 \tag{A26}$$

**Proof.** Let J be an independent on *Ajk* and *Xjk* random variable uniform distributed on the set {1, ... , *n*}. We consider the characterictic function of *ζ*<sup>J</sup> = <sup>√</sup><sup>1</sup> *an* <sup>∑</sup>*<sup>n</sup> <sup>k</sup>*=<sup>1</sup> *<sup>A</sup>*J,*kX*<sup>J</sup>,*k*, *fn*(*t*) = <sup>E</sup>exp{*itζ*J} <sup>=</sup> 1 *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> Eexp{*itζj*}. Introduce the following set of indices

$$\mathcal{M} = \mathcal{M}\_1 \cap \mathcal{M}\_2 \cap \mathcal{M}\_3,\tag{A27}$$

where

$$\mathcal{M}\_{1} := \left\{ j \in \{1, \ldots, n\} \, : \, \frac{1}{a\_{n}} \Big|\, \sum\_{k=1}^{n} p\_{jk} \sigma\_{jk}^{2} - 1 \, | \le \frac{1}{16} \right\},$$

$$\mathcal{M}\_{2} := \left\{ j \in \{1, \ldots, n\} \, : \, \frac{1}{a\_{n}} \sum\_{k=1}^{n} p\_{jk} \mathbb{E}X\_{jk}^{2} \mathbb{E} \{ |X\_{jk}| > \tau \sqrt{a\_{n}} \} \le \frac{1}{16} \right\},$$

$$\mathcal{M}\_{3} := \left\{ j \in \{1, \ldots, n\} \, : \, \frac{1}{a\_{n}} \max\_{1 \le k \le n} p\_{jk} \sigma\_{jk}^{2} \le \frac{1}{16t^{2}} \right\}. \tag{A28}$$

We denote by A*<sup>c</sup>* the complement set of A and by |A|, we denote the cardinality of set A. Note that by condition *CP*(1)

$$\frac{|\mathcal{M}\_1^c|}{n} \le 16 \frac{1}{n a\_{\mathcal{U}}} \sum\_{j=1}^n \sum\_{k=1}^n |p\_{jk} \sigma\_{jk}^2 - \frac{a\_{\mathcal{U}}}{n}| \to 0, \text{ as } n \to \infty. \tag{A29}$$

Analogously, by *CX*(1),

$$\frac{|\mathcal{M}\_2^c|}{n} \le 16L\_n(\tau) \to 0,\text{ as } n \to \infty. \tag{A30}$$

Finally, by Lemma A3

$$\frac{|\mathcal{M}\_3{3}^{\mathbb{C}}|}{n} \le 16t^2 \frac{1}{n a\_{\text{fl}}} \sum\_{j=1}^{n} \max\_{1 \le k \le n} p\_{jk} \sigma\_{jk}^2 \to 0 \text{ as } n \to \infty. \tag{A31}$$

Combining the last three relations, we obtain

$$\lim\_{n \to \infty} \frac{|\mathcal{M}^{\varepsilon}|}{n} = 0.\tag{A32}$$

Note that by the independence of *Ajk* and *Xjk*,

$$f\_{\boldsymbol{\eta}\boldsymbol{\eta}}(t) := \mathbb{E}\exp\{\frac{it}{\sqrt{a\_n}}\zeta\_{\boldsymbol{\eta}}\} = \prod\_{k=1}^n \mathbb{E}\exp\{\frac{it}{\sqrt{a\_n}}A\_{\boldsymbol{j}k}X\_{\boldsymbol{j}k}\} =: \prod\_{k=1}^n f\_{\boldsymbol{\eta}\boldsymbol{j}k}(t).$$

Furthermore,

$$f\_{n\parallel k}(t) = 1 + p\_{\slash k}(\mathbb{E}\exp\{\frac{it}{\sqrt{a\_{\hbar}}}X\_{\not\parallel k}\} - 1),\tag{A33}$$

and by condition *CP*(0)

$$|f\_{njk}(t) - 1| \le \frac{t^2}{2a\_n} p\_{jk} \sigma\_{jk}^2 \le \frac{t^2}{2a\_n} \max\_{1 \le j,k \le n} p\_{jk} \sigma\_{jk}^2 \to 0 \text{ as } n \to \infty. \tag{A34}$$

Without loss of generality, we may assume that

$$\max\_{1 \le j \le k \le n} |f\_{njk}(t) - 1| \le \frac{1}{4} \tag{A35}$$

and applying Taylor's formula, we write that

$$\mathbb{E}\ln f\_{\mathrm{uljk}}(t) = p\_{\mathrm{jk}}\left(\mathbb{E}\exp\{\frac{\mathrm{it}}{\sqrt{a\_{\mathrm{ll}}}}X\_{\mathrm{jk}}\} - 1\right) + 2\theta(t)p\_{\mathrm{jk}}^2 \left|\mathbb{E}\exp\{\frac{\mathrm{it}}{\sqrt{a\_{\mathrm{ll}}}}X\_{\mathrm{jk}}\} - 1\right|^2,\tag{A.36}$$

where *θ*(*t*) denotes some function such that |*θ*(*t*)| ≤ 1. Futhermore, by Taylor's formula

$$\begin{split} \mathbb{E}\exp\{\frac{it}{\sqrt{a\_{\mathrm{ll}}}}X\_{\mathrm{jk}}\} - 1 &= -\frac{t^{2}}{2a\_{\mathrm{ll}}}\sigma\_{\mathrm{jk}}^{2} + \theta\_{1}(t)\frac{|t|^{3}}{4a\_{\mathrm{ll}}^{2}}\mathbb{E}|X\_{\mathrm{jk}}|^{3}\mathbb{E}\{|X\_{\mathrm{jk}}| \le \tau\sqrt{a\_{\mathrm{ll}}}\} \\ &+ \theta\_{2}(t)\mathbb{E}\left|\exp\{\frac{it}{\sqrt{a\_{\mathrm{ll}}}}X\_{\mathrm{jk}}\} - 1 - \frac{it}{\sqrt{a\_{\mathrm{ll}}}}X\_{\mathrm{jk}} + \frac{t^{2}}{2a\_{\mathrm{ll}}}X\_{\mathrm{jk}}^{2}\right|\mathbb{E}\{|X\_{\mathrm{jk}}| > \tau\sqrt{a\_{\mathrm{ll}}}\}, \end{split} \tag{A37}$$

where *θi*(*t*), *i* = 1, 2 denotes some functions such that |*θi*(*t*)| ≤ 1. Using this equality, we may write

$$\begin{split} \ln f\_{n|k}(t) &= -\frac{t^2}{2a\_{\mathrm{fl}}} p\_{jk} \sigma\_{jk}^2 + \theta\_1(t) \frac{\tau |t|^3}{6a\_{\mathrm{fl}}} p\_{jk} \sigma\_{jk}^2 \\ &+ \theta\_2(t) \frac{t^2}{a\_{\mathrm{fl}}} p\_{jk} \mathbb{E} |X\_{jk}|^2 \mathbb{I} \{ |X\_{jk}| \ge \tau \sqrt{a\_{\mathrm{fl}}} \} + \theta\_3(t) \frac{t^4}{4a\_{\mathrm{fl}}^2} p\_{jk}^2 \sigma\_{jk}^4. \end{split} \tag{A38}$$

Summing this equality by *k* = 1..., *n*, we obtain

$$\begin{split} \ln f\_{nl}(t) &= -\frac{t^2}{2} \frac{1}{a\_{\text{fl}}} \sum\_{k=1}^n p\_{jk} \sigma\_{jk}^2 + \theta\_l(t) \tau \frac{|t|^3}{6a\_{\text{fl}}} \sum\_{k=1}^n p\_{jk} \sigma\_{jk}^2 \\ &+ \theta\_2(t) \frac{t^2}{a\_{\text{fl}}} \sum\_{k=1}^n p\_{jk} \mathbb{E}|X\_{jk}|^2 \mathbb{I}\{ |X\_{jk}| \ge \tau \sqrt{a\_{\text{fl}}} \} \\ &+ \theta\_3(t) \frac{t^4}{4} \frac{\max\_{1 \le j,k \le n} p\_{jk} \sigma\_{jk}^2}{a\_{\text{fl}}} \frac{1}{a\_{\text{fl}}} \sum\_{k=1}^n p\_{jk} \sigma\_{jk}^2. \end{split} \tag{A.39}$$

For <sup>8</sup> <sup>17</sup>|*t*<sup>|</sup> > *τ* > 0, we have

$$|\ln f\_{n\bar{j}}(t) + \frac{t^2}{2}| \le \frac{t^2}{3}. \tag{A40}$$

This implies that for *j* ∈ M

$$\begin{split} |f\_{\eta\bar{\jmath}}(t) - \exp\{-\frac{t^2}{2}\}| &\leq \mathcal{C}\left(t^2(|\frac{1}{a\_{\bar{n}}}\sum\_{k=1}^n p\_{jk}\sigma\_{jk}^2 - 1| + \frac{1}{a\_{\bar{n}}}\sum\_{k=1}^n p\_{jk}\mathbb{E}|X\_{jk}|^2\mathbb{I}\{|X\_{jk}| > \tau\sqrt{a\_{\bar{n}}}\}\right) \\ &+ \tau|t|^3 + \frac{t^4 \max\_{1\leq j,k\leq n} p\_{jk}\sigma\_{jk}^2}{a\_{\bar{n}}}\Big). \end{split} \tag{A41}$$

From this inequality, it follows that

$$\begin{split} |f\_{n}(t) - \exp\{-\frac{t^{2}}{2}\}| &\leq \frac{2|\mathcal{M}^{\varepsilon}|}{n} \\ &+ \frac{1}{n} \sum\_{j=1}^{n} \left(t^{2} (|\frac{1}{a\_{n}} \sum\_{k=1}^{n} p\_{jk} \sigma\_{jk}^{2} - 1| + \frac{1}{a\_{n}} \sum\_{k=1}^{n} p\_{jk} \mathbb{E}|X\_{jk}|^{2} \mathbb{E}\{|X\_{jk}| > \tau \sqrt{a\_{n}}\}\right) \\ &+ \tau |t|^{3} + \frac{t^{4} \max\_{1 \leq j,k \leq n} p\_{jk} \sigma\_{jk}^{2}}{a\_{n}} \Big). \end{split} \tag{A42}$$

By conditions *CP*(0) and *CX*(0), relation (A32) and Lemma A3, we obtain

$$\lim\_{n \to \infty} f\_n(t) = \exp\{-\frac{t^2}{2}\}.\tag{A43}$$

Thus, the lemma is proved.

**Lemma A5.** *Under the conditions of Theorem 1, we have*

$$\lim\_{n \to \infty} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} |\mathcal{R}\_{j\bar{j}} - \vec{\mathcal{R}}\_{j\bar{j}}^{(j)}| = 0. \tag{A44}$$

**Proof.** By **V**, we shall denote the operator norm of matrix **<sup>V</sup>**. Matrices **<sup>R</sup>**(*j*) and **<sup>D</sup>** (*j*) are defined in the beginning of Section 3 before the relation (18). Note that

> **RD** (*j*) **R**(*j*) ≤ *<sup>v</sup>*−2**D** (*j*) . (A45)

It is easy to check that

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\mathcal{R}\_{j\bar{j}}-\overline{\mathcal{R}}\_{j\bar{j}}^{(j)}|\leq\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}||\mathbf{R}-\overline{\mathbf{R}}^{(j)}||.\tag{A46}$$

Using that

$$\mathbf{R} = \overline{\mathbf{R}}^{(j)} - \mathbf{R}\overline{\mathbf{D}}^{(j)}\overline{\mathbf{R}}^{(j)},\tag{A47}$$

we obtain

$$\|\|\mathbf{R} - \overline{\mathbf{R}}^{(j)}\|\| \le v^{-2} \|\|\overline{\mathbf{D}}^{(j)}\|\|. \tag{A48}$$

Futhermore, for any *τ* > 0, we have

$$\mathbb{E}\|\tilde{\mathbf{D}}^{(j)}\| \le \frac{1}{\sqrt{a\_{\mathrm{fl}}}} \mathbb{E} \max\_{1 \le l \le n, l \ne j} \{|X\_{jl}|A\_{jl}\} \le \tau + \frac{1}{\tau a\_{\mathrm{fl}}} \sum\_{l=1}^{n} p\_{jl} \mathbb{E}X\_{jl}^{2} \mathbb{I}\{|X\_{jl}| > \tau \sqrt{a\_{\mathrm{fl}}}\}.\tag{A49}$$

Summing this inequality in *j* = 1, . . . , *n*, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\mathcal{R}\_{jj}-\overline{\mathcal{R}}\_{jj}^{(j)}|\leq v^{-2}(\tau+\frac{1}{\tau}L\_n(\tau)).\tag{A50}$$

Since *τ* is arbitrary, this inequality and condition *CX*(0) together imply (A44). Thus, Lemma A5 is proved.

*Appendix B.3. The Bounds of* <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> E|*εjν*|*, for <sup>ν</sup>* = 1, . . . , 7 **Lemma A6.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{\tau} \sum\_{j=1}^{n} \mathbb{E}|\varepsilon\_{j1}| \le \frac{\tau}{\upsilon} + \frac{1}{\upsilon} \left(\frac{\max\_{1 \le j,k \le n} p\_{jk} \sigma\_{jk}^{2}}{a\_{ll}}\right)^{\frac{1}{2}} L\_{ll}(\tau)^{\frac{1}{2}}.\tag{A51}$$

**Proof.** By definition of *εj*1, we may write

$$\varepsilon\_{j1} := \frac{1}{a\_n} \sum\_{l\_l \neq k: l\_l \neq j, k \neq j} [\vec{R}^{(j,0)}]\_{kl} A\_{jk} A\_{jl} X\_{jk} X\_{jl}. \tag{A52}$$

Applying the Cauchy inequality, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}| \le \left(\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}|^{2}\right)^{\frac{1}{2}}.\tag{A.53}$$

Simple calculations show that

$$\frac{1}{n}\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}| \le \left(\frac{1}{na\_n^2}\sum\_{j=1}^{n}\sum\_{k\neq j}\sum\_{l\neq j} \mathbb{E}|\tilde{R}\_{kl}^{(j,0)}|^2 p\_{jk}p\_{jl}\sigma\_{jk}^2\sigma\_{jl}^2\right)^{\frac{1}{2}},\tag{A54}$$

We introduce the following notations

$$\mathbf{W}\_{\rangle} = (|\vec{R}\_{kl}^{(j,0)}|^2)\_{k,l=1'}^n \quad \mathbf{H}\_{\rangle} = (p\_{\rangle1} \sigma\_{\rangle1'}^2 \dots \sigma\_{\ranglen} \sigma\_{\ranglen}^2)^T. \tag{A55}$$

In these notations, we write

$$\frac{1}{n}\sum\_{j=1}^n \mathbb{E}|\varepsilon\_{j1}| \le \left(\frac{1}{na\_n^2} \sum\_{j=1}^n \mathbf{H}^{(j)} \prescript{T}{}{\mathbf{W}}^{(j)} \mathbf{H}^{(j)}\right)^{\frac{1}{2}}.$$

Using that

$$\sum\_{l=1}^{n} |\overrightarrow{R}\_{kl}^{(j,0)}|^2 \le \frac{1}{\upsilon^2},\tag{A56}$$

we obtain that the spectral norm of matrix **W**(*j*) satiesfies the inequality

$$\|\mathbf{W}^{(j)}\| \le \frac{1}{\upsilon^2},\tag{A57}$$

and

$$\|\|\mathbf{H}^{(j)}^{(j)}\mathbf{W}^{(j)}\mathbf{H}^{(j)}\|\le\|\mathbf{W}^{(j)}\|\|\mathbf{H}^{(j)}\|^2 \le \frac{1}{\nu^2} \sum\_{k=1}^n p\_{jk}^2 \sigma\_{jk}^4. \tag{A58}$$

Using the last bound, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}| \le \frac{1}{\upsilon} \left(\frac{1}{n a\_{\mathrm{il}}^2} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk}^2 \sigma\_{jk}^4\right)^{\frac{1}{2}}.\tag{A59}$$

Furthermore, we apply the bound

$$
\sigma\_{jk}^2 \le \tau^2 a\_{\ell^k} + \mathbb{E}X\_{jk}^2 \mathbb{I}\{ |X\_{jk}| > \tau \sqrt{a\_{\ell^k}} \}. \tag{A60}
$$

We obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}| \le \frac{1}{\upsilon} \left(\tau^2 + \frac{1}{na\_n^2} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk}^2 \sigma\_{jk}^2 \mathbb{E}|X\_{jk}|^2 \mathbb{I}\{|X\_{jk}| > \tau\sqrt{a\_n}\}\right)^{\frac{1}{2}}.\tag{A61}$$

We continue as follows

$$\frac{1}{n}\sum\_{j=1}^n \mathbb{E}|\varepsilon\_{j1}| \le \frac{\tau}{\upsilon} + \frac{1}{\upsilon} \left(\frac{\max\limits\_{1 \le j,k \le n} p\_{jk}\sigma\_{jk}^2}{a\_{ll}}\right)^{\frac{1}{2}} L\_{\mathcal{U}}(\tau)^{\frac{1}{2}}.$$

Thus, Lemma is proved.

**Lemma A7.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j2}| \le \frac{1}{\upsilon}L\_n(\tau) + \frac{\tau}{\upsilon}.\tag{A62}$$

**Proof.** We recall the definition of *εj*2,

$$\varepsilon\_{j2} = \frac{1}{a\_{\rm il}} \sum\_{k:k \neq j} [\overline{\mathcal{R}}^{(j,0)}]\_{kk} (A\_{jk} - p\_{jk}) X\_{jk}^2 \tag{A63}$$

Using triangle inequality and Cauchy's inequality, we may write

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E}|\varepsilon\_{j2}| \leq \frac{1}{n a\_{\mathrm{fl}} \upsilon} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E}X\_{jk}^{2} \mathbb{I}\{ |X\_{jk}| \geq \tau \sqrt{a\_{\mathrm{fl}}} \} \\ + \left( \frac{1}{n a\_{\mathrm{fl}}^{2}} \sum\_{j=1}^{n} \mathbb{E} \left| \sum\_{k: k \neq j} [\overline{\mathcal{R}}^{(j,0)}]\_{kk} (A\_{jk} - p\_{jk}) X\_{jk}^{2} \mathbb{I}\{ |X\_{jk}| \geq \tau \sqrt{a\_{\mathrm{fl}}} \} \right|^{2} \right)^{\frac{1}{2}}. \end{split} \tag{A64}$$

Since <sup>E</sup>[*R*(*j*,0)]*kk*(*Ajk* <sup>−</sup> *pjk*)*X*<sup>2</sup> *jk*I{|*Xjk*| ≥ *<sup>τ</sup>* <sup>√</sup>*an*} <sup>=</sup> 0 and random variables *Ajk*, *Xjk* are independent for *<sup>k</sup>* <sup>=</sup> 1, . . . *<sup>n</sup>* and independent on [*R*(*j*,0)]*kk*, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j2}| \le \frac{1}{\upsilon}L\_{\mathfrak{U}}(\tau) + \frac{\tau}{\upsilon}\left(\frac{1}{n\mathfrak{u}\_{n}}\sum\_{j=1}^{n}\sum\_{k:k\neq j}p\_{jk}\sigma\_{jk}^{2}\right)^{\frac{1}{2}}$$

$$=\frac{1}{\upsilon}L\_{\mathfrak{U}}(\tau) + \frac{\tau}{\upsilon}\tag{A65}$$

Thus, the lemma is proved.

**Lemma A8.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j3}| \le \frac{3}{\upsilon}L\_{\text{ul}}(\tau) + \frac{\tau}{\upsilon}.\tag{A66}$$

**Proof.** By definition of *εj*3, we have

$$\varepsilon\_{\beta} = \frac{1}{a\_n} \sum\_{k:k \neq j} [\overline{\mathcal{R}}^{(j,0)}(z)]\_{kk} p\_{jk} (X\_{jk}^2 - \sigma\_{jk}^2), \tag{A67}$$

We may write

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} |\varepsilon\_{j3}| &\leq \frac{1}{\upsilon} \frac{1}{n a\_n} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} |X\_{jk}^2 - \sigma\_{jk}^2| \mathbb{1} \{ |X\_{jk}| > \tau \sqrt{a\_n} \} \\ &+ \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \left| \frac{1}{a\_n} \sum\_{k=1}^{n} p\_{jk} \tilde{R}\_{kk}^{(j,0)} (X\_{jk}^2 - \sigma\_{jk}^2) \mathbb{1} \{ |X\_{jk}| \le \tau \sqrt{a\_n} \} \right| \end{split} \tag{A68}$$

Furthermore,

$$\begin{split} \frac{1}{n\boldsymbol{a}\_{\boldsymbol{n}}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} |\boldsymbol{X}\_{jk}^{2} - \sigma\_{jk}^{2}| \mathbb{E} \{ |\boldsymbol{X}\_{jk}| > \tau \sqrt{a\_{\boldsymbol{n}}} \} &\leq L\_{\boldsymbol{n}}(\tau) \\ &+ \frac{1}{n\boldsymbol{a}\_{\boldsymbol{n}}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \sigma\_{jk}^{2} \mathbb{E} \mathbb{I} \{ |\boldsymbol{X}\_{jk}| > \tau \sqrt{a\_{\boldsymbol{n}}} \}. \end{split} \tag{A69}$$

Using inequality (A60), we obtain

$$\begin{split} \frac{1}{n\boldsymbol{a}\_{n}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \sigma\_{jk}^{2} \mathbb{E} \mathbb{I} \{ |X\_{jk}| > \tau \sqrt{a\_{n}} \} &\leq L\_{\boldsymbol{n}}(\tau) \\ + \frac{1}{n\boldsymbol{a}\_{n}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} |X\_{jk}|^{2} \mathbb{I} \{ |X\_{jk}| > \tau \sqrt{a\_{n}} \} &\mathbb{E} \{ |X\_{jk}| > \tau \sqrt{a\_{n}} \} \leq 2L\_{\boldsymbol{n}}(\tau). \end{split}$$

We estimate now the second term in the right-hand side of (A68). Applying triangle inequality, we obtain

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \left| \frac{1}{a\_n} \sum\_{k=1}^{n} p\_{jk} \widetilde{\mathbf{R}}\_{kk}^{(j,0)} (X\_{jk}^2 - \sigma\_{jk}^2) \mathbb{I} \{ |X\_{jk}| \le \tau \sqrt{a\_n} \} \right| \\ &\le \frac{1}{n} \sum\_{j=1}^{n} \left| \frac{1}{a\_n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} \widetilde{R}\_{kk}^{(j,0)} \mathbb{E} (X\_{jk}^2 - \sigma\_{jk}^2) \mathbb{I} \{ |X\_{jk}| \le \tau \sqrt{a\_n} \} \right| \\ &+ \left( \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} \left| \frac{1}{a\_n} \sum\_{k=1}^{n} \bar{R}\_{kk}^{(j,0)} \mathbb{I} \{ |X\_{jk}| \le \tau \sqrt{a\_n} \} - \mathbb{E} X\_{jk}^2 \mathbb{I} \{ |X\_{jk}| \le \tau \sqrt{a\_n} \} \right|^2 \right)^{\frac{1}{2}}. \tag{A70} \end{split}$$

Simple calculations show that

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}\left|\frac{1}{a\_{\mathrm{il}}}\sum\_{k=1}^{n}\tilde{R}\_{kk}^{(j,0)}\left(X\_{jk}^{2}\mathbb{E}\{|X\_{jk}|\leq\tau\sqrt{a\_{\mathrm{il}}}\}-\mathbb{E}X\_{jk}^{2}\mathbb{E}\{|X\_{jk}|\leq\tau\sqrt{a\_{\mathrm{il}}}\}\right)\right|^{2}$$

$$\leq\frac{1}{\upsilon^{2}na\_{n}^{2}}\sum\_{j=1}^{n}\sum\_{k=1}^{n}p\_{jk}^{2}\mathbb{E}|X\_{jk}|^{4}\mathbb{E}\{|X\_{jk}|\leq\tau\sqrt{a\_{n}}\}$$

$$\leq\frac{\tau^{2}}{\upsilon^{2}}\frac{1}{na\_{n}}\sum\_{j=1}^{n}\sum\_{k=1}^{n}p\_{jk}\sigma\_{jk}^{2}=\frac{\tau^{2}}{\upsilon^{2}}.\tag{A71}$$

Finally, we note that

$$\mathbb{E}(X\_{jk}^2 - \sigma\_{jk}^2)\mathbb{I}\{|X\_{jk}| \le \tau\sqrt{a\_n}\} = \mathbb{E}(X\_{jk}^2 - \sigma\_{jk}^2)\mathbb{I}\{|X\_{jk}| > \tau\sqrt{a\_n}\}.\tag{A72}$$

Combining inequalities (A68), (A70), (A71), we obtain the result of the lemma. Thus, the lemma is proved.

**Lemma A9.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\mathbf{4}}| \le \frac{1}{\varpi n a\_{\mathrm{II}}}\sum\_{j=1}^{n}\sum\_{k=1}^{n}|p\_{jk}\sigma\_{jk}^{2} - \frac{1}{n}\sum\_{l=1}^{n}p\_{jl}\sigma\_{jl}^{2}|.\tag{A73}$$

**Proof.** By definition of *εj*4, we have

$$\varepsilon\_{j4} = \frac{1}{a\_{n\parallel}} \sum\_{k:k\neq j} \overline{\mathcal{R}}\_{kk}^{(j,0)} \left( p\_{jk} \sigma\_{jk}^2 - \frac{1}{n} \sum\_{l=1}^n p\_{jl} \sigma\_{jl}^2 \right). \tag{A74}$$

Using that <sup>|</sup>*R*(*j*,0) *kk* | ≤ <sup>1</sup> *<sup>v</sup>* , we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\mathbf{4}}| \le \frac{1}{\upsilon n a\_{\mathrm{II}}}\sum\_{j=1}^{n}\sum\_{k=1}^{n}|p\_{jk}\sigma\_{jk}^{2} - \frac{1}{n}\sum\_{l=1}^{n}p\_{jl}\sigma\_{jl}^{2}|.\tag{A75}$$

**Lemma A10.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\S}| \le \frac{1}{\varpi n}\sum\_{j=1}^{n}\left|\frac{1}{a\_n}\sum\_{l=1}^{n}p\_{jl}\sigma\_{jl}^2 - 1\right|.\tag{A76}$$

**Proof.** Recall that

$$\varepsilon\_{j\mathbb{S}} = \frac{1}{n} \sum\_{k:k\neq j} \overline{R}\_{kk}^{(j,0)} \left( \frac{1}{a\_n} \sum\_{l=1}^n p\_{jl} \sigma\_{jl}^2 - 1 \right). \tag{A77}$$

Using that <sup>|</sup>*R*(*j*,0) *kk* | ≤ *<sup>v</sup>*<sup>−</sup>1, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\S}| \le \frac{1}{\varpi n}\sum\_{j=1}^{n}\left|\frac{1}{a\_n}\sum\_{l=1}^{n}p\_{jl}\sigma\_{jl}^2 - 1\right|.\tag{A78}$$

Thus, the lemma is proved.

**Lemma A11.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\Theta}| \le \frac{\tau}{\upsilon^2} + \frac{1}{n\upsilon^2\tau}L\_{\text{il}}(\tau). \tag{A79}$$

**Proof.** By definition of *εj*6, we have

$$\varepsilon\_{j\mathfrak{K}} = \frac{1}{n} \sum\_{k:k \neq j} [\overleftarrow{R}^{(j,0)}]\_{kk} - \frac{1}{n} \sum\_{k=1}^{n} [R]\_{kk}. \tag{A80}$$

By the triangle inequality, we obtain

$$\frac{1}{n}\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j\Theta}| \leq \frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\frac{1}{n}\text{Tr}\tilde{\mathbf{R}}^{(j,0)} - \frac{1}{n}\text{Tr}\tilde{\mathbf{R}}^{(j)}| + \frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\frac{1}{n}\text{Tr}\tilde{\mathbf{R}}^{(j)} - \text{Tr}\mathbf{R}|.\tag{A81}$$

By the overlapping theorem, we have

$$|\frac{1}{n}\text{Tr}\tilde{\mathbf{R}}^{(j,0)} - \frac{1}{n}\text{Tr}\tilde{\mathbf{R}}^{(j)}| \le \frac{1}{n\upsilon}.\tag{A82}$$

It remains to estimate the second term in the r.h.s. of (A81). Note that

$$
\overline{\mathbf{R}}^{(j)} - \mathbf{R} = \overline{\mathbf{R}}^{(j)} \mathbf{D}^{(j)} \mathbf{R}.\tag{A83}
$$

This equality implies that

$$\text{Tr}\mathbf{\tilde{R}}^{(j)} - \text{Tr}\mathbf{R} = \frac{1}{\sqrt{a\_{\text{II}}}} \sum\_{l=1}^{n} \sum\_{k=1}^{n} \mathcal{R}\_{kl} A\_{jk} \mathcal{X}\_{jk} \overline{\mathcal{R}}\_{lk}^{(j)}.\tag{A84}$$

Summing this equality in *j*, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}\left|\frac{1}{n}\text{Tr}\overline{\mathbf{R}}^{(j)} - \frac{1}{n}\text{Tr}\mathbf{R}\right| \le \frac{1}{n^2\sqrt{a\_n}}\sum\_{j=1}^{n}\mathbb{E}|\sum\_{l=1}^{n}\sum\_{k=1}^{n}R\_{kl}A\_{jk}X\_{jk}\overline{R}\_{lk}^{(j)}|.\tag{A85}$$

Using that

$$\sum\_{l=1}^{n} |\mathcal{R}\_{kl}\vec{\mathcal{R}}\_{kl}^{(j)}| \le \frac{1}{\mathcal{D}^2},\tag{A86}$$

we obtain

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} |\frac{1}{n} \text{Tr} \tilde{\mathbf{R}}^{(j)} - \frac{1}{n} \text{Tr} \mathbf{R}| &\leq \frac{1}{\upsilon^{2} n^{2} \sqrt{a\_{n}}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} |X\_{jk}| \mathbb{E} \{ |X\_{jk}| \leq \tau \sqrt{a\_{n}} \} \\ &+ \frac{1}{n^{2} \upsilon^{2} a\_{n} \tau} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} \mathbb{E} X\_{jk}^{2} \mathbb{E} \{ |X\_{jk}| > \tau \sqrt{a\_{n}} \} \leq \frac{\tau}{\upsilon^{2}} + \frac{1}{n \upsilon^{2} \tau} L\_{n}(\tau). \end{split} \tag{A87}$$

Thus, the lemma is proved.

#### **Appendix C. Unweigthed Graphs**

*Appendix C.1. Convergence of Diagonal Entries Distribution Functions of Laplace Matrices to the Normal Law*

We denote by *<sup>F</sup>n*(*x*) the distribution function of random variable *<sup>ζ</sup>* J and

$$\widehat{\Delta}\_{\mathfrak{n}} := \sup\_{\mathfrak{x}} |\widehat{F}\_{\mathfrak{n}}(\mathfrak{x}) - \Phi(\mathfrak{x})|. \tag{A88}$$

**Lemma A12.** *Under the conditions of Theorem 2, we have*

$$\lim\_{n \to \infty} \sup\_{\mathbf{x}} |\widehat{F}\_n(\mathbf{x}) - \Phi(\mathbf{x})| = 0. \tag{A89}$$

**Proof.** We consider the characteristic function of *ζ* J, *f <sup>n</sup>*(*t*) = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> Eexp{*it<sup>ζ</sup> j*}. Introduce the following set of indices

$$\widehat{\mathcal{M}} = \coloneqq \left\{ j \in \{ 1, \ldots, n \} \, : \, \frac{1}{\widehat{a}\_n} \sum\_{k=1}^n |p\_{jk}(1 - p\_{jk}) - \frac{\widehat{a}\_n}{n}| \le \frac{1}{16} \right\}. \tag{A90}$$

We denote by A*<sup>c</sup>* a complement set of A and by |A|, we denote the cardinality of set A. Note that, by condition *CP*(1),

$$\frac{|\widehat{\mathcal{M}}^c|}{n} \le 16 \frac{1}{n a\_n} \sum\_{j=1}^n \sum\_{k=1}^n |p\_{jk} \sigma\_{jk}^2 - \frac{a\_n}{n}| \to 0, \text{ as } n \to \infty. \tag{A91}$$

Note that, by independence of *Ajk*,

$$\widehat{f}\_{\eta j}(t) := \mathbb{E} \exp\{\frac{it}{\sqrt{\widehat{a}\_{\eta}}} \widehat{\zeta}\_{j}\} = \prod\_{k=1}^{n} \mathbb{E} \exp\{\frac{it}{\sqrt{\widehat{a}\_{\eta}}} (A\_{jk} - p\_{jk})\} =: \prod\_{k=1}^{n} \widehat{f}\_{\eta jk}(t)$$

Applying the Taylor formula, we may write

$$\widehat{f}\_{n|\vec{k}}(t) = 1 - \frac{t^2 p\_{jk} (1 - p\_{jk})}{2\widehat{a}\_{ll}} + \theta(t) \frac{|t|^3}{6\widehat{a}\_{ll}^7} p\_{jk} (1 - p\_{jk}),\tag{A92}$$

where *θ*(*t*) denotes some function such that |*θ*(*t*)| ≤ 1.

Using this equality, we may write

$$\begin{split} \ln \widehat{f}\_{njk}(t) &= -\frac{t^2}{2\widehat{a}\_{ll}} p\_{jk}(1-p\_{jk}) + \theta\_1(t) \frac{\tau |t|^3}{6\widehat{a}\_{ll}^{\frac{3}{2}}} p\_{jk}(1-p\_{jk}) \\ &+ \theta\_2(t) \frac{t^4 p\_{jk}^2 (1-p\_{jk})^2}{\widehat{a}\_{ll}^2} + \theta\_3(t) \frac{t^6 p\_{jk}^2 (1-p\_{jk})^2}{\widehat{a}\_{ll}^3} . \end{split} \tag{A93}$$

Summing this equality by *k* = 1..., *n*, we obtain

$$\begin{split} \ln \widehat{f}\_{nl}(t) &= -\frac{t^2}{2} - \frac{t^2}{2} \frac{1}{\widehat{a}\_{\mathbb{H}}} \sum\_{k=1}^n \left( p\_{jk}(1-p\_{jk}) - \frac{\widehat{a}\_n}{n} \right) + \theta\_1(t) \frac{|t|^3}{6\widehat{a}\_n^3} \sum\_{k=1}^n p\_{jk}(1-p\_{jk}) \\ &+ \theta\_2(t) \frac{t^4}{\widehat{a}\_{\mathbb{H}}^2} \sum\_{k=1}^n p\_{jk}^2 (1-p\_{jk})^2 + \theta\_3(t) \frac{t^6}{\widehat{a}\_{\mathbb{H}}^3} \sum\_{k=1}^n p\_{jk}^2 (1-p\_{jk})^2. \end{split} \tag{A94}$$

Note that for *<sup>j</sup>* <sup>∈</sup> <sup>M</sup><,

$$\frac{1}{n\_n} \sum\_{k=1}^n p\_{jk} (1 - p\_{jk}) \le \frac{17}{16}, \text{ for } j \in \widehat{\mathcal{M}}, \tag{A95}$$

and

$$\lim\_{n \to \infty} \frac{|\widehat{\mathcal{M}}^c|}{n} = 0.\tag{A96}$$

Similar to (A42), we may write

$$\begin{split} |\widehat{f}\_{n}(t) - \exp\{-\frac{t^{2}}{2}\}| &\leq \frac{2|\widehat{\mathcal{M}}^{\epsilon}|}{n} + \frac{t^{2}}{2} \frac{1}{n\widehat{a}\_{n}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} |p\_{jk}(1 - p\_{jk}) - \frac{\widehat{a}\_{n}}{n}| \\ &+ \frac{\mathbb{C}|t|^{3}}{\sqrt{\widehat{a}\_{n}}} + \frac{\mathbb{C}t^{4}}{\widehat{a}\_{n}} + \frac{\mathbb{C}|t|^{6}}{\widehat{a}\_{n}^{2}} \end{split} \tag{A97}$$

This inequality implies that

$$\lim\_{m \to \infty} \widehat{f}\_{\mathbb{M}}(t) = \exp\{-\frac{t^2}{2}\}. \tag{A98}$$

Thus, Lemma A12 is proved.

In what follows, we shall assume that *z* = *u* + *iv* is fixed.

*Appendix C.2. The Bounds of* <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> <sup>E</sup>|*εjν*|*, for <sup>ν</sup>* <sup>=</sup> 1, . . . , 5 **Lemma A13.** *Under conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\widehat{\varepsilon}\_{j1}| \le \left(\frac{1}{4a\_{\mathrm{il}}\upsilon^2}\right)^{\frac{1}{2}}.\tag{A99}$$

**Proof.** By definition of *εj*<sup>1</sup> we may write

$$\widehat{\varepsilon}\_{f1} := \frac{1}{\widehat{a}\_n} \sum\_{l \neq k: l \neq j, k \neq j} [\widehat{R}^{(j,0)}]\_{kl} (A\_{jk} - p\_{jk})(A\_{jl} - p\_{jl}). \tag{A100}$$

Applying the Cauchy inequality, we obtain

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}| \le \left(\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\varepsilon\_{j1}|^2\right)^{\frac{1}{2}}.\tag{A101}$$

Simple calculations show that

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} |\hat{\varepsilon}\_{j1}| &\leq \left( \frac{1}{n a\_n^2} \sum\_{j=1}^{n} \sum\_{k \neq j} \sum\_{l \neq j} \mathbb{E} |\hat{R}\_{kl}^{(j,0)}|^2 p\_{jk} p\_{jl} (1 - p\_{jk}) (1 - p\_{jl}) \right)^{\frac{1}{2}} \\ &\leq \left( \frac{1}{4n a\_n^2} \sum\_{j=1}^{n} \sum\_{k \neq j} \sum\_{l \neq j} \mathbb{E} |\hat{R}\_{kl}^{(j,0)}|^2 p\_{jk} (1 - p\_{jk}) \right)^{\frac{1}{2}} \\ &\leq \left( \frac{1}{4n a\_n^2 n^2} \sum\_{j=1}^{n} \sum\_{k \neq j} p\_{jk} (1 - p\_{jk}) \right)^{\frac{1}{2}} \leq \left( \frac{1}{4a\_n n^2} \right)^{\frac{1}{2}}. \end{split} \tag{A102}$$

Thus, Lemma A13 is proved.

**Lemma A14.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\widehat{\varepsilon}\_{j2}| \le \frac{1}{\sqrt{\widehat{a}\_n \upsilon}}.\tag{A103}$$

**Proof.** We recall the definition of *εj*2,

$$\widehat{\varepsilon}\_{\!\!\!\!\!\!} = \frac{1}{\widehat{a}\_{\text{il}}} \sum\_{\mathbf{k}:\!k\neq\!\!\!\!\!/} [\widehat{\mathcal{R}}^{(j,0)}]\_{\mathbf{k}\mathbf{k}} ((A\_{\!\!\!\!k\!\!-} p\_{\langle\mathbf{k}\rangle})^2 - p\_{\langle\!\!\!\!k\!\!/} (\mathbf{1} - p\_{\langle\!\!\!\!R})). \tag{A104}$$

Using the triangle inequality and the Cauchy inequality, we may write

$$\begin{split} \frac{1}{n} \sum\_{j=1}^{n} \mathbb{E} |\widehat{\varepsilon}\_{j2}| &\leq \left( \frac{1}{n\widehat{a}\_{n}^{2}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} \mathbb{E} |\widehat{R}\_{k\widehat{k}}^{(j,0)}|^{2} p\_{jk} (1 - p\_{jk}) (1 - 2p\_{jk})^{2} \right)^{\frac{1}{2}} \\ &\leq \left( \frac{1}{\widehat{a}\_{n} \upsilon^{2}} \frac{1}{n\widehat{a}\_{n}} \sum\_{j=1}^{n} \sum\_{k=1}^{n} p\_{jk} (1 - p\_{jk}) \right)^{\frac{1}{2}} = \left( \frac{1}{\widehat{a}\_{n} \upsilon^{2}} \right)^{\frac{1}{2}}. \end{split} \tag{A105}$$

Thus, Lemma A14 is proved.

**Lemma A15.** *Under conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\widehat{\varepsilon}\_{j3}| \le \frac{1}{\upsilon}\frac{1}{na\_n}\sum\_{j=1}^{n}\sum\_{k=1}^{n}|p\_{jk}(1-p\_{jk}) - \frac{\widehat{a}\_n}{n}|.\tag{A106}$$

**Proof.** By definition of *εj*3, we have

$$\widehat{\varepsilon}\_{\widehat{\beta}} = \frac{1}{a\_{\text{II}}} \sum\_{k:k \neq j} [\widehat{\mathcal{R}}^{(j,0)}]\_{kk} (p\_{jk}(1 - p\_{jk}) - \frac{\widehat{a}\_{\text{II}}}{n}). \tag{A107}$$

We may write

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\widehat{\varepsilon}\_{j3}| \le \frac{1}{\upsilon}\frac{1}{n\widehat{a}\_n}\sum\_{j=1}^{n}\sum\_{k=1}^{n}|p\_{jk}(1-p\_{jk}) - \frac{\widehat{a}\_n}{n}|.\tag{A108}$$

Thus, Lemma A15 is proved.

**Lemma A16.** *Under the conditions of Theorem 1, we have*

$$\frac{1}{n}\sum\_{j=1}^{n}\mathbb{E}|\widehat{\varepsilon}\_{j4}| \le \frac{1}{\upsilon^2\sqrt{\widehat{a}\_n}}.\tag{A109}$$

**Proof.** Recall that

$$
\widehat{\varepsilon}\_{\restriction 4} = \frac{1}{n} \sum\_{k:k \neq j} \widehat{\mathcal{R}}\_{kk}^{(j,0)} - \frac{1}{n} \sum\_{k=1}^{n} \widehat{\mathcal{R}}\_{kk}. \tag{A110}
$$

Note that

Furthermore,

$$\left| \frac{1}{n} \text{Tr} \hat{\mathbf{R}}^{(j)} - \frac{1}{n} \text{Tr} \hat{\mathbf{R}}^{(j,0)} \right| \le \frac{1}{n\upsilon}.\tag{A111}$$

$$
\hat{\mathbf{R}} - \hat{\mathbf{R}}^{(j)} = \hat{\mathbf{R}} \hat{\mathbf{D}}^{(j)} \hat{\mathbf{R}}^{(j)}.\tag{A12}
$$

Recall that **A** denotes the operator norm of matrix **A**. The last equality and inequality max{**R**, **R**(*j*)} ≤ *<sup>v</sup>*−<sup>1</sup> implies that

$$|\frac{1}{n}\text{Tr}(\widehat{\mathbf{R}} - \widehat{\mathbf{R}}^{(j)})| \le \|\widehat{\mathbf{R}} - \widehat{\mathbf{R}}^{(j)}\| \le \|\widehat{\mathbf{R}}\| \|\widehat{\mathbf{D}}^{(j)}\| \|\widehat{\mathbf{R}}^{(j)}\| \le \upsilon^{-2} \|\widehat{\mathbf{D}}^{(j)}\|.\tag{A113}$$

Note that

$$\mathbb{E}\|\widehat{\mathbf{D}}^{(j)}\| \le \frac{1}{\sqrt{\widehat{a}\_n}} \mathbb{E} \max\_{1 \le k \le n} |A\_{jk} - p\_{jk}| \le \frac{1}{\sqrt{\widehat{a}\_n}}.\tag{A114}$$

Combining the last two inequalities, we obtain the claim. Thus, Lemma A16 is proved.

*Appendix C.3. Variance of* <sup>1</sup> *n*Tr**R**

In this section, we estimate the variance of *mn*(*z*) = <sup>1</sup> *<sup>n</sup>*Tr**R**, where **<sup>R</sup>** <sup>=</sup> **<sup>R</sup>**(*z*)=(**L** <sup>−</sup> *<sup>z</sup>***I**)<sup>−</sup>1. We prove the following lemma.

**Lemma A17.** *For any v* > 0 *and z* = *u* + *iv, the following inequality holds*

$$\lim\_{n \to \infty} \mathbb{E} |\frac{1}{n} \text{Tr} \widehat{\mathbf{R}} - \mathbb{E} \frac{1}{n} \text{Tr} \widehat{\mathbf{R}}| = 0. \tag{A115}$$

**Proof.** The proof of this lemma is similar to the proof of Lemma A2. We introduce the sequence of *σ*-algebras M*<sup>k</sup>* generated by random variables *Aj*,*<sup>l</sup>* for 1 ≤ *j*, *l* ≤ *k*. It is easy to see that M*<sup>k</sup>* ⊂ M*k*<sup>+</sup>1. Denote by E*<sup>k</sup>* the conditional expectation with respect to *σ*-algebra M*k*. For *k* = 0, E<sup>0</sup> = E. Introduce random variables

$$\widehat{\gamma\_k} := \mathbb{E}\_k(\frac{1}{n}\mathrm{Tr}\widehat{\mathbf{R}}) - \mathbb{E}\_{k-1}(\frac{1}{n}\mathrm{Tr}\widehat{\mathbf{R}}).\tag{A116}$$

The sequence of *<sup>γ</sup>k*, for *<sup>k</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>* is a martingale difference and

$$\frac{1}{n}\text{Tr}\hat{\mathbf{R}} - \mathbb{E}\frac{1}{n}\text{Tr}\hat{\mathbf{R}} = \sum\_{k=1}^{n} \hat{\gamma}\_k \cdots$$

Furthermore, introduce the sub-matrices **<sup>L</sup>**(*k*) obtained from **<sup>L</sup>** by replacing the diagonal entries with *<sup>L</sup>*(*k*) *ll* :<sup>=</sup> <sup>√</sup><sup>1</sup> *an* <sup>∑</sup>*l*:*<sup>l</sup>*<sup>=</sup>*k*,*<sup>l</sup>*<sup>=</sup>*j*(*Ajl* <sup>−</sup> *pjl*). Denote by **<sup>R</sup>**(*k*)(*z*) the corresponding resolvent matrix, **<sup>R</sup>**(*k*)(*z*) = (**L**(*k*) <sup>−</sup> *<sup>z</sup>***I***n*−1)<sup>−</sup>1. We introduce the matrix **<sup>L</sup>**(*k*,0) obtained from **<sup>L</sup>**(*k*) by deleting both the *<sup>k</sup>*-th row and *<sup>k</sup>*-th column. The corresponding resolvent matrix we denote via **<sup>R</sup>**(*k*,0). We have now

$$\mathbb{E}\_k \text{Tr} \hat{\mathbf{R}}^{(k,0)} = \mathbb{E}\_{k-1} \hat{\mathbf{R}}^{(k,0)}.$$

This allows us to write

$$\begin{split} \hat{\gamma}\_{k} &= \mathbb{E}\_{k} (\frac{1}{n} (\operatorname{Tr} \hat{\mathbf{R}} - \operatorname{Tr} \hat{\mathbf{R}}^{(k)})) - \mathbb{E}\_{k-1} (\frac{1}{n} (\operatorname{Tr} \hat{\mathbf{R}} - \operatorname{Tr} \hat{\mathbf{R}}^{(k)})) \\ &+ \mathbb{E}\_{k} (\frac{1}{n} (\operatorname{Tr} \hat{\mathbf{R}}^{(k)} - \operatorname{Tr} \hat{\mathbf{R}}^{(k,0)})) - \mathbb{E}\_{k-1} (\frac{1}{n} (\operatorname{Tr} \hat{\mathbf{R}}^{(k)} - \operatorname{Tr} \hat{\mathbf{R}}^{(k,0)})) =: \hat{\gamma}\_{k}^{(1)} + \hat{\gamma}\_{k}^{(2)}. \end{split}$$

By the overlapping theorem

$$\left| \frac{1}{n} \text{Tr} \widehat{\mathbf{R}}^{(k)} - \frac{1}{n} \text{Tr} \widehat{\mathbf{R}}^{(k,0)} \right| \le \frac{1}{n\upsilon}. \tag{A117}$$

From here, we immediately obtain

$$\left|\hat{\eta}\_{k}^{(2)}\right| \leq \frac{2}{m\upsilon},$$

and *<sup>n</sup>*

$$\sum\_{k=1}^{n} \mathbb{E} |\hat{\gamma}\_k^{(2)}|^2 \le \frac{4}{n\upsilon^2}. \tag{A118}$$

To complete the proof, it remains to show that

$$\lim\_{n \to \infty} \sum\_{k=1}^{n} \mathbb{E} |\hat{\gamma}\_k^{(1)}|^2 = 0. \tag{A119}$$

Note that

$$\mathbb{E}\left|\hat{\gamma}\_k^{(1)}\right|^2 \le 2\mathbb{E}|\frac{1}{n}\text{Tr}\hat{\mathbf{R}} - \frac{1}{n}\text{Tr}\hat{\mathbf{R}}^{(k)}|^2. \tag{A120}$$

Introduce the diagonal matrix **<sup>D</sup>** (*k*) with diagonal entries

$$
\hat{D}\_{ll}^{(k)} = \frac{1}{\sqrt{a\_{nl}}} (\mathcal{A}\_{kl} - p\_{kl}), \quad l \neq k.
$$

In these notations, we have

$$\frac{1}{n}\text{Tr}\hat{\mathbf{R}} - \frac{1}{n}\text{Tr}\hat{\mathbf{R}}^{(k)} = -\frac{1}{n}\text{Tr}\hat{\mathbf{R}}\hat{\mathbf{D}}^{(k)}\hat{\mathbf{R}}^{(k)} = -\frac{1}{n\sqrt{a\_{\text{fl}}}}\sum\_{l\neq k, j\neq k} \hat{\mathbf{R}}^{(k)}\_{lj}(A\_{kl} - p\_{kl})\hat{\mathbf{R}}^{(k)}\_{jl}.\tag{A121}$$

This implies that

$$\sum\_{k=1}^{n} \mathbb{E} |\hat{\gamma}\_k^{(1)}|^2 \le \frac{4}{n^2 \hat{a}\_n} \sum\_{k=1}^{n} \mathbb{E} |\sum\_{j \ne k} (A\_{kj} - p\_{kj}) \left(\sum\_{l \ne k} \hat{R}\_{lj} \hat{R}\_{jl}^{(k)}\right)|^2. \tag{A122}$$

We continue this inequality as follows

$$\sum\_{k=1}^{n} \mathbb{E} |\widehat{\gamma}\_{k}^{(1)}|^{2} \le \frac{8}{n^{2} \widehat{a}\_{n}} \sum\_{k=1}^{n} \mathbb{E} \left| \sum\_{j \ne k} (A\_{kj} - p\_{kj}) \left( \sum\_{l \ne k} R\_{lj}^{(k)} \widehat{R}\_{jl}^{(k)} \right) \right|^{2}$$

$$\le \frac{8}{n^{2} \mathcal{O}^{4} \widehat{a}\_{\text{il}}} \sum\_{k=1}^{n} \sum\_{j \ne k} p\_{jk} (1 - p\_{jk}) \le \frac{8}{n \upsilon^{2}}.\tag{A123}$$

Inequalities (A118) and (A123) completed the proof. Thus, Lemma A17 is proved.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

**Gerd Christoph 1,\*,† and Vladimir V. Ulyanov 2,3,†**


**Abstract:** This article completes our studies on the formal construction of asymptotic approximations for statistics based on a random number of observations. Second order Chebyshev–Edgeworth expansions of asymptotically normally or chi-squared distributed statistics from samples with negative binomial or Pareto-like distributed random sample sizes are obtained. The results can have applications for a wide spectrum of asymptotically normally or chi-square distributed statistics. Random, non-random, and mixed scaling factors for each of the studied statistics produce three different limit distributions. In addition to the expected normal or chi-squared distributions, Student's *t*-, Laplace, Fisher, gamma, and weighted sums of generalized gamma distributions also occur.

**Keywords:** second order Chebyshev–Edgeworth expansions; negative binomially distributed sample sizes; Pareto-like distributed sample sizes; asymptotically normally distributed statistics; asymptotically chi-square distributed statistics; scaled Student's *t*-distribution; normal distribution; discrete Pareto distribution; generalized Laplace distribution; weighted sums of generalized gamma distributions

**MSC:** 62E17; 62H10; 60E05

#### **1. Introduction**

To improve the convergence properties of sums of independent identically distributed random variables in the Central Limit Theorem, asymptotic expansions of distribution functions of normalized sums were considered. The history of asymptotic expansions in nonparametric statistics is presented in detail in Wallace [1], Bickel [2], and Hall [3], among others. Chebyshev–Edgeworth expansions, with which we are concerned here, are presented in great detail in Bhattacharya and Rao [4] for random vectors and in Petrov [5] for one-dimensional random variables. For instance, in Pfanzagl [6] and Bentkus et al. [7], the authors emphasize that asymptotic expansions can provide more effective approximations for asymptotic studies in statistical theory. Second order approximations of distribution functions of sums of random variables are of great importance because they take into account the skewness and kurtosis of the random variable in addition to the expected value and the variance, as in the Central Limit Theorem. In Burnashev [8], second order expansions are proved for the asymptotically normally distributed sample median *Mm* on a sample of size *m* and its MSE. Based on this, for a Laplace population with density *e*−|*x*<sup>|</sup> /2, the actual MSE with exact data is compared numerically with approximations data. For the normal approximation, the influence of the remaining term is below 10% only for *m* > 250, while for the approximation with the second order expansion, the influence of the remaining term is below 10% already from *m* = 8. For a Cauchy population with smooth and heavy tailed density 1/(*π* (1 + *x*2)), for the normal approximation, the influence of the remaining term is below 10% for *m* ≥ 23, while for the approximation

**Citation:** Christoph, G.; Ulyanov, V.V. Second Order Chebyshev–Edgeworth-Type Approximations for Statistics Based on Random Size Samples. *Mathematics* **2023**, *11*, 1848. https:// doi.org/10.3390/math11081848

Academic Editor: Steve Drekic

Received: 27 February 2023 Revised: 3 April 2023 Accepted: 11 April 2023 Published: 13 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

with the second order expansion, the influence of the remaining term is below 10% already from *m* = 11. Consequently, as Burnashev [8] pointed out, asymptotic expansions can significantly improve the exactness of statistical conclusions, even in the case of a small number of observations. The results in the abovementioned papers are based on nonrandom sample sizes or non-random number of observations.

When planning statistical studies, situations often arise where the sample sizes are unknown in advance and they are modeled as realizations of random variables. Many models from medicine, finance, risk theory, physics, and reliability lead to samples with random dimensions. For instance, in the papers by Nunes et al. [9,10,11], different models in medical research random size samples were investigated in order to prevent false conclusions. In Esquível et al. [12], the authors give an informative overview of statistical inference with a random number of observations and some applications. Results for mean and variance for normally distributed samples, calculation of quantiles, and interval estimates with random sample size were also proved. Döbler [13] gives a detailed review of the literature on random sums as well as recent results on approximation in various metrics. In Schluter and Trede [14] (Theorem 1, Proposition 1), the authors show, using the convergence of a negative binomial random sum, that the growth rate of cities is Student t-distributed with 2 degrees of freedom. Their empirical investigations verify the result. The references in the above-cited papers provide further applications for random dimension sampling.

Bening et al. [15,16] proved convergence rates and asymptotic expansions for distributions of statistics *TNn* based on samples with random dimension *Nn* ≥ 1. Here, *Tm* is a statistic based on a non-random number *m* ≥ 1 of independent observations. The random variables size *Nn* ≥ 1 form a sequence of integer random sample sizes that depends on a natural parameter *n* with *Nn* → ∞ in probability for *n* → ∞. Inequalities with a convergence rate are assumed for the approximations of the distribution functions of both the normalized statistics *Tm* and the normalized random sample sizes *Nn*. As examples, convergence rates and first order asymptotic expansions are derived for the statistics *TNn* , where *Tm* is an asymptotically normal statistic and the random sample size *Nn* is either negatively binomial or Pareto-like distributed.

In Christoph et al. [17], inequalities for the second order approximations of the distribution functions of normalized negative binomial and Pareto-like sample sizes were proved. Consequently, second order Chebyshev–Edgeworth approximations and the corresponding Cornish–Fisher expansions could be obtained for the distribution of the normalized arithmetic mean of a sample with normalized negative binomial or Pareto-like sample sizes where the remainders are of order *n*<sup>−</sup>3/2.

The present work provides a supplement to our paper, Christoph and Ulyanov [18], where we have developed a formal second order design for asymptotic Chebyshev– Edgeworth approximations. We considered asymptotically normal statistics with sample size having negative binomial distribution as well as asymptotically chi-squared statistics with Pareto-like distributed sample sizes. In addition to the distributions of statistic *Tm* and random sample size *Nn*, three scaling factors for *TNn* are also introduced, leading to different expansions. It is the first paper to consider approximations for asymptotic chi-square statistics based on random sample sizes. Some more applications of random sample size sampling were also mentioned.

In the present paper, we provide similar results for asymptotically normal statistics of samples with Pareto-like distributed sample sizes and for asymptotically chi-squared statistics with sample size having negative binomial distribution.

For better reader convenience, we list in Section 2 some notations, conditions, and statements that were also used in Christoph and Ulyanov [18]. Section 3 states the necessary approximations for the statistics *Tm* and the sample sizes *Nn*. The dependence of the limit distributions of the scaled statistic *TNn* on the distributions of the statistic *Tm* and the sample size *Nn*, as well as the scaling factors, is discussed in Section 4. Section 5 then presents the main results. As examples, we consider the same statistic *Tm* as in Christoph and Ulyanov [18] (Corollaries 1 and 2), but with changed sample sizes. Section 6 provides the proofs of the main results, leaving three auxiliary lemmas to Appendix A. Conclusions are presented in Section 7.

#### **2. Notation and Preliminaries**

Let (Ω, A, P) be a probability space on which all occurring random variables are given. Set positive numbers, real axis, integer part [*y*] of real y, and indicator function as follows:

$$\mathbb{N}\_{+} = \{1, 2, \ldots\}, \quad \mathbb{R} = (\text{--os}, \text{os}), \quad y - 1 < [y] \le y \quad \text{and} \quad \mathbb{I}\_{A} = \mathbb{I}\_{A}(\mathbf{x}) = \begin{cases} 1, & \mathbf{x} \in A \subset \mathbb{R} \\ 0, & \mathbf{x} \notin A \subset \mathbb{R} \end{cases}$$

Let *X*1, *X*2, *X*<sup>3</sup> ... ∈ R be independent identically distributed random variables. Define the statistic

$$T\_m := T\_m(X\_1, \dots, X\_m) \quad \text{with} \quad m \in \mathbb{N}\_+.$$

based on the random sample {*X*1, *X*2,..., *Xm*} with a non-random sample size *m* ∈ N+.

Consider the sequence of discrete random variables *N*1, *N*2, ..., depending on an integer parameter *n* ≥ 1. This integer *Nn* ≥ 1 indicates the random dimension of the observations *X*1, ... , *XNn* . Let us assume that the sample size *Nn* does not depend on *X*1, *X*2, *X*<sup>3</sup> ..., where *Nn* → ∞ in probability when *n* → ∞. Define for each *n* ∈ N<sup>+</sup> the statistic *TNn* obtained from a random sample {*X*1, *X*2,..., *XNn* } by

$$T\_{N\_n}(\omega) := T\_{N\_n(\omega)}\left(X\_1(\omega), X\_2(\omega), \dots, X\_{N\_n(\omega)}(\omega)\right) \quad \text{for each} \quad \omega \in \Omega. \tag{1}$$

It follows from Esquível et al. [12] (Theorem 2.1.1) that the statistic *TNn* is well-defined in (1).

Since we want to prove second order approximations for the statistic *TNn* in form of inequalities, we need the corresponding assumptions for the statistic *Tm* and for the random sample size *Nn* as well.

For the statistic *Tm* with E*Tm* = 0 and the random sample sizes *Nn* ∈ N<sup>+</sup> we suppose conditions on the structure of the approximating functions as well as on the convergence rate:

**Assumption 1.** *There are a distribution function F*(*x*)*, bounded functions f*1(*x*)*, f*2(*x*) *which are differentiable for all x* = 0*, γ* ∈ {−1, −1/2, 0, 1/2, 1}*, a* > 1/2 *as well as* 0 < *C*<sup>1</sup> < ∞ *such that*

$$\sup\_{\mathbf{x}} \mathbb{P}\left(m^{\gamma}T\_{\mathbf{m}} \le \mathbf{x}\right) - F(\mathbf{x}) - m^{-1/2} f\_1(\mathbf{x}) - \mathbb{I}\_{a \ge 1}(a) \, m^{-1} f\_2(\mathbf{x})\Big|\_{} \le C\_1 \, m^{-a}, \quad m \le 1. \tag{2}$$

**Assumption 2.** *There exists a distribution function H*(*y*) *with H*(0+) = 0*, a bounded variation function h*2(*y*)*, a sequence of numbers* 0 < *gn* ↑ ∞*, b* > 0*, and* 0 < *C*<sup>2</sup> < ∞ *such that for n* ∈ N<sup>+</sup>

$$\begin{aligned} \sup\_{y \ge 0} |\mathbb{P}[\xi\_n^{-1} N\_n \le y) - H(y)| &\le \mathbb{C}\_2 n^{-b}, &\quad \text{for} \quad 0 < b \le 1, \\\sup\_{y \ge 0} |\mathbb{P}[\xi\_n^{-1} N\_n \le y) - H(y) - n^{-1} h\_2(y)| &\le \mathbb{C}\_2 n^{-b}, &\quad \text{for} \quad b > 1. \end{aligned} \tag{3}$$

**Remark 1.** *Assumptions 1 and 2 require inequalities for the approximations of Tm and Nn for all m*, *n* ∈ N+*, leading to inequalities for the approximations of TNn . See also Remark 5 below on Poisson and binomial random variables Nn. For these sample sizes, we are so far only aware of estimates of the remaining terms with small-o or large-*O *convergence rates. About the differences between inequalities and* O *order bounds, see, e.g., Fujikoshi and Ulyanov [19] (Chapter 1).*

**Remark 2.** *In Bening et al. [16], these conditions are formulated more generally. Assumption 1 requires the existence of f*1*,. . . , fl with a* > *l*/2 *and Assumption 2 that of h*1*,. . . ,hk with b* > *k*/2*.* *We restrict ourselves here, as in Christoph and Ulyanov [18], to the required approximation functions.*

Assumptions 1 and 2 lead to the approximations for the distribution functions of statistics *TNn* :

**Proposition 1.** (Christoph and Ulyanov [18], Proposition 1) *Let γ* ∈ {−1, −1/2, 0, 1/2, 1}*. The statistic Tm and the sample size Nn are supposed to satisfy Assumptions 1 and 2, respectively. Then,*

$$\sup\_{\mathbf{x}\in\mathbb{R}}\left|\mathbb{P}\left(\mathbf{g}\_{n}^{\gamma}T\_{\mathcal{N}u}\leq\mathbf{x}\right)-\mathcal{G}\_{n}\left(\mathbf{x},\mathbf{1}/\mathcal{g}\_{n}\right)\right|\leq\mathcal{C}\_{1}\,\mathbb{E}\left(\mathcal{N}\_{n}^{-\sigma}\right)+\left(\mathcal{C}\_{3}D\_{n}+\mathcal{C}\_{4}\right)n^{-b},\tag{4}$$

*where a* > 0, *b* > 0 *are the convergence rates in (2) and (3),*

$$G\_n(\mathbf{x}, 1/\mathbf{g}\_n) \quad = \int\_{1/\mathbf{g}\_n}^{\infty} \left( F(\mathbf{x}\,\mathbf{y}^\gamma) + \frac{f\_1(\mathbf{x}\mathbf{y}^\gamma)}{\sqrt{\mathcal{g}\_n \mathbf{y}}} + \frac{f\_2(\mathbf{x}\mathbf{y}^\gamma)}{\mathcal{g}\_n \mathbf{y}} \right) d\left( H(\mathbf{y}) + \frac{h\_2(\mathbf{y})}{n}\right), \tag{5}$$

$$D\_n = \sup\_{\mathbf{x}} \int\_{1/\mathcal{g}\_n}^{\infty} \left| \frac{\partial}{\partial \mathcal{Y}} \left( F(\mathbf{x}\mathcal{Y}) + \frac{f\_1(\mathbf{x}\mathcal{Y})}{\sqrt{\mathcal{g}\_n \mathcal{Y}}} + \frac{f\_2(\mathbf{x}\mathcal{Y})}{\mathcal{Y}\mathcal{g}\_n} \right) \right| d\mathcal{y},\tag{6}$$

*and f*1(*z*), *f*2(*z*), *h*2(*y*) *are given in (2) and (3). The constants C*1, *C*3, *C*<sup>4</sup> *do not depend on n.*

Bening et al. [16] proved general transfer theorems under the conditions indicated in Remark 2 only for case *γ* ≥ 0. Therefore, the proof is repeated in Christoph and Ulyanov [20] (Appendix A.1).

#### **3. Second Order Estimates for Both the Statistics** *Tm* **and the Sample Sizes** *Nn*

First we consider the following statistics *Tm* with non-random sample size *m* and E*Tm* = 0 with the corresponding second order approximations. Let the asymptotically normal statistic *Tm* satisfy the following inequality:

$$\left| \mathbb{P}(\sqrt{m}T\_{\mathfrak{m}} \le \mathbf{x}) - \Phi(\mathbf{x}) - \left( m^{-1/2} (p\_0 + p\_2 \mathbf{x}^2) + m^{-1} (p\_1 \mathbf{x} + p\_3 \mathbf{x}^3 + p\_5 \mathbf{x}^5) \mathbb{I}\_{a > 1}(a) \right) \boldsymbol{\varrho}(\mathbf{x}) \right| \le \mathbb{C} \, m^{-a} \tag{7}$$

with *a* > 0 and Φ(*x*) refers to the standard normal distribution function with density function *ϕ*(*y*):

$$\Phi(\mathbf{x}) = \int\_{-\infty}^{\mathbf{x}} \varrho(y) dy, \quad \mathbf{x} \in \mathbb{R}, \quad \text{and} \quad \varrho(y) = \frac{1}{\sqrt{2\pi}} \ e^{-y^2/2}, \quad y \in \mathbb{R}.$$

Asymptotically chi-squared distributed statistics *Tm* satisfy the following inequality:

$$\left| \mathbb{P}(mT\_m \le \mathbf{x}) - G\_d(\mathbf{x}) - m^{-1} (q\_1 \mathbf{x} + q\_2 \mathbf{x}^2) \mathbf{g}\_d(\mathbf{x}) \right| \le C m^{-2},\tag{8}$$

where *Gd*(*x*), *d* ∈ N+, denotes the chi-squared distribution function with *d* degrees of freedom and the density function *gd*(*y*):

$$g\_d(y) = \frac{1}{2^{d/2} \Gamma(d/2)} y^{(d-2)/2} e^{-y/2}, \ y > 0, \text{ and } \quad G\_d(\mathbf{x}) = \mathbb{P}(\chi^2\_d \le \mathbf{x}) = \int\_0^\mathbf{x} g\_d(y) dy, \ \mathbf{x} > 0.$$

In Christoph and Ulyanov [18] (Sections 3.1 and 3.2), some examples of such statistics *Tm* are given that satisfy (7) or (8) and consequently, Assumption 1.

As already announced, we consider the following random sample sizes *Nn* with the corresponding second order approximations.

The Pareto-like random sample sizes *Nn*(*s*) are defined as follows:

Let *Yj*(*s*) ∈ N+, *j* = 1, 2, ... be independent discrete Pareto II random variables with parameter *s* > 0, which are discretized from continuous Lomax (Pareto II) random variables on N+, for a review, see, e.g., Buddana and Kozubowski [21]. For *s* > 0, there are defined

$$\mathbb{P}(Y\_j(s) \le k) = \frac{k}{s + k'}, \quad \mathrm{N}\_{\mathbf{n}}(s) = \max\_{1 \le j \le n} Y\_j(s) \quad \text{and} \quad \mathbb{P}(\mathrm{N}\_{\mathbf{n}}(s) \le k) = \left(\frac{k}{s + k}\right)^n, \quad n, k \in \mathbb{N}\_+.\tag{9}$$

**Proposition 2.** (Christoph and Ulyanov [18], Proposition 4) *Let Nn*(*s*) *be the discrete Paretolike random variable whose distribution function is given in (9); then, for all integers n* ≥ 1 *and fixed positive s* > 0*, we have*

$$\sup\_{0 \le y \le 0} \left| \mathbb{P} \left( \frac{N\_n(s)}{n} \le y \right) - \mathcal{W}\_s(y) - \frac{h\_{2;s}(y)}{n} \right| \le \frac{C\_2(s)}{n^2} \tag{10}$$

$$\mathcal{W}\_{\mathbb{S}}(y) = \mathbf{e}^{-s/y} \ y > 0, \quad h\_{\mathbb{S}^\mathbb{S}}(y) = \frac{s \, \mathrm{e}^{-s/y}}{2 \, y^2} \left(s - 1 + 2Q\_1(\boldsymbol{n} \, y)\right), \ y > 0,\tag{11}$$

*with jump correcting function Q*1(*y*) = 1/2 − (*y* − [*y*]) *and C*2(*s*) > 0 *does not depend on n. Furthermore,*

$$\mathbb{E}\left(\mathcal{N}\_{\hbar}(s)\right)^{-a} \le C(a,s) \ n^{-\min\{a,2\}},\tag{12}$$

*with optimal bound in* (12) *for* 0 < *a* ≤ 2 *, where a is the convergence rate in* (7)*.*

**Remark 3.** *The inverse exponential random variable W*(*s*) *with distribution function Hs*(*y*) = P(*W*(*s*) ≤ *y*) = *e*−*s*/*y*I(0 , <sup>∞</sup>)(*y*) *and rate parameter s* > <sup>0</sup> *is "heavy tailed" with shape parameter 1 as is* P(*Nn*(*s*) ≤ *y*)*. Thus, the expected values of these two random variables do not exist.*

Suppose the positive integer *Nn*(*r*) has a (shifted by 1) negative binomial distribution with probability of success 1/*n*, *n* ∈ N+, parameter *r* > 0, probabilities

$$\mathbb{P}(\mathcal{N}\_n(r) = j) = \frac{\Gamma(j + r - 1)}{\Gamma(j)} \left(\frac{1}{n}\right)^r \left(1 - \frac{1}{n}\right)^{j - 1}, \ j \in \mathbb{N}\_+ \text{ and } \ g\_n = \mathbb{E}(\mathcal{N}\_n(r)) = r(n - 1) + 1. \tag{13}$$

In statistical studies, for counting models, the negative binomial and Poisson distributions are the two most important ones. In Schluter and Trede [14] (Section 2.1), the authors emphasize that the negative binomial distribution with its two parameters can typically observe over-dispersion in count data, while this is not the case with the one-parameter Poisson distribution. They proved in a more general framework

$$\limsup\_{n \to \infty} \sup\_{y} |\mathbb{P}(N\_n(r)/\mathbb{g}\_n \le y) - G\_{r,r}(y)| = 0,\tag{14}$$

while *Gr*,*r*(*y*) denotes the gamma distribution that has identical scale and shape parameters *r* > 0, whose density is

$$\mathcal{g}\_{r,r}(y) = \frac{r^r}{\Gamma(r)} y^{r-1} e^{-ry} \mathbb{I}\_{(0,\infty)}(y), \quad y \in \mathbb{R}.$$

In Bening and Korolev [22] (Lemma 2.2), the result (14) was also obtained.

**Proposition 3.** (Christoph and Ulyanov [18], Proposition 3) *Let r* > 0*. The discrete random variable Nn*(*r*) *has probabilities and expected value gn given in (13). Then, for all n* ∈ N+*:*

$$\sup\_{\mathbf{x}} \sup\_{\mathcal{Y} \ge 0} \left| \mathbb{P} \left( \frac{N\_n(r)}{\mathcal{G}\_n} \le y \right) - \mathcal{G}\_{r,r}(y) - \frac{h\_{2r}(y)}{n} \right| \le \mathcal{C}\_2(r) \ n^{-\min\{r, 2\}},\tag{15}$$

*where C*2(*r*) > 0 *does not depend on n and with the jump correcting function Q*1(*y*) = 1/2 − (*y* − [*y*])*,*

$$h\_{2r}(y) = \begin{cases} 0, & \text{for } r \le 1, \\ \frac{g\_{r,r}(y)}{2r} \left( (y-1)(2-r) + 2Q\_1(g\_n y) \right), & \text{for } r > 1. \end{cases} \tag{16}$$

*Moreover, negative moments* E(*Nn*(*r*))−*<sup>a</sup> satisfy the estimation for all r* > 0*, α* > 0

$$\mathbb{E}\left(N\_{\hbar}(r)\right)^{-a} \le \mathbb{C}(r) \begin{cases} n^{-\min\{r,a\}}, r \ne a \\ \ln(n) \ n^{-a}, r = a \end{cases} \tag{17}$$

*and the convergence rate in case r* = *α cannot be improved.*

**Remark 4.** *Second order Chebyshev–Edgeworth expansions (10) and (15) with r* > 1 *were first proved in Christoph et al. [17] (Theorems 4 and 1). Approximations in (10) and (15) with remainder estimations Cs*/*n or Cr n*<sup>−</sup> min{*r*,1} *are given, e.g., in Bening et al. [16] and Gavrilenko et al. [23]. In Christoph et al. [24] (Corollaries 5.4 and 6.5), leading terms for the negative moments of Nn*(*r*) *and Nn*(*s*) *are derived that lead to (17) and (12).*

**Remark 5.** *The negative binomial distribution belongs to the class of Panjer distributions, which also includes the Poisson and binomial distributions. Samples with binomial or Poisson distributed sample sizes were studied among others in the above-cited papers [9–12]. Convergence rate bounds for statistics based on such samples are given in Döbler [13], Korolev [25], Bulinski and Slepov [26]. Döbler [13], Korolev and Shevtsova [27], Sunklodas [28] obtained Berry–Esseen bounds for sums based on samples with binomial and Poisson sample sizes. To the best of the authors' knowledge, Chebyshev–Edgeworth expansions for these lattice distributed random variables have only been proven so far with bounds of small-o or large-*O *rates, see, e.g., Petrov [29] (Chapter 6, Theorem 6) or Kolassa and McCullagh [30]. Therefore, inequality* (3) *in Assumption 2 is not fulfilled.*

#### **4. Limit Distributions of Statistics with Random Size Samples using Different Scaling Factors**

We now consider the statistics *Tm* and the sample sizes *Nn*, which are supposed to satisfy the inequalities (2) and (3) in Assumptions 1 and 2, respectively. Let us investigate the scaled statistics *g<sup>γ</sup> <sup>n</sup> <sup>N</sup>γ*∗−*<sup>γ</sup> <sup>n</sup> TNn* with the sequence *gn* <sup>↑</sup> <sup>∞</sup> as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>. We analyze the two cases Φ and *Gu* as limiting distributions *F* in Assumption 1 with respect to the exponents *γ*<sup>∗</sup> and *γ*: If *F* = Φ, then *γ*<sup>∗</sup> = 1/2 and *γ* ∈ {−1/2, 0, 1/2}, while if *F* = *Gu*, then *γ*<sup>∗</sup> = 1 and *γ* ∈ {−1, 0, 1}. Then, conditioning on *Nn* and using (2) and (3), we have

$$\mathbb{P}\left(g\_n^{\gamma} N \boldsymbol{\Omega}\_n^{\gamma^\*-\gamma} T\_{\mathcal{N}\_n} \le \mathbf{x}\right) = \mathbb{P}\left(\mathcal{N}\_n^{\gamma^\*} T\_{\mathcal{N}\_n} \le \mathbf{x} \left(\mathcal{N}\_n / g\_n\right)^{\gamma}\right) \\ = \sum\_{m=1}^{\infty} \mathbb{P}\left(m \boldsymbol{\upgamma}^{\gamma^\*} T\_m \le \mathbf{x} (m / g\_n)^{\gamma}\right) \mathbb{P}(\mathcal{N}\_n = m)$$

$$\stackrel{(2)}{\approx} \mathbb{E}\left(\mathbb{P}\left(\mathbf{x} (\mathcal{N}\_n / g\_n)^{\gamma}\right)\right) = \int\_{1/g\_n}^{\infty} F(\mathbf{x} \boldsymbol{y}^{\gamma}) d\mathbb{P}(\mathcal{N}\_n / g\_n \le \mathbf{y}) \stackrel{(3)}{\approx} \int\_{1/g\_n}^{\infty} F(\mathbf{x} \boldsymbol{y}^{\gamma}) dH(\boldsymbol{y}). \tag{18}$$

Consequently, the limit distribution of the scaled statistic *g<sup>γ</sup> <sup>n</sup> <sup>N</sup>γ*∗−*<sup>γ</sup> <sup>n</sup> TNn* is a scale mixture of underlying *F* with mixing distribution *H*: P *gγ <sup>n</sup> <sup>N</sup>γ*∗−*<sup>γ</sup> <sup>n</sup> TNn* <sup>≤</sup> *<sup>x</sup>* → <sup>∞</sup> <sup>0</sup> *<sup>F</sup>*(*xyγ*)*dH*(*y*), as *n* → ∞. Refer to, e.g., Choy and Chan [31], Fujikoshi et al. [32] (Chapter 13), and Fujikoshi and Ulyanov [19] (Chapter 2) and the references therein.

The limiting distributions <sup>∞</sup> 1/*gn <sup>F</sup>*(*xyγ*)*dH*(*y*) therefore only arise from the leading distributions *F*(*x*) and *H*(*y*) in the inequalities (2) and (3) and also depend on the parameter *γ*.

In Christoph and Ulyanov [18] (Sections 5 and 6), the cases *F*(*x*) = Φ(*x*) with *H*(*y*) = *Gr*,*r*(*y*) as well as *F*(*x*) = *Gu*(*x*) with *H*(*y*) = *Ws*(*y*) were considered. Now, we interchange the distributions of random sample sizes *Nn*. We first study the limiting distributions of asymptotically normally distributed statistics with Pareto-like distributed sample sizes *Nn*(*s*) and also asymptotically chi-squared distributed statistics with negative binomial distributed sample sizes *Nn*(*r*). Since *Ws*(1/*n*) = *<sup>e</sup>*−*s n* and *Gr*,*r*(1/*gn*) <sup>≤</sup> *<sup>r</sup>r*−<sup>1</sup> <sup>Γ</sup>(*r*) *<sup>g</sup>*−*<sup>r</sup> <sup>n</sup>* hold, the integral range in the last integral in (18) can be extended from (1/*gn*, ∞) to (0, ∞) for further investigations.

*4.1. The Case F*(*x*) = Φ(*x*) *and H*(*y*) = *Ws*(*y*)

In Christoph and Ulyanov [20,33], asymptotically normally distributed statistics *Tm* for samples of *m*-dimensional normally distributed vectors were considered: correlation coefficient as well as the three geometric features: the length of a vector, the distance, and the angle between two vectors. Inequalities for second order approximations for statistic *Tm* are derived when the dimension *m* is replaced by Pareto-like distributed random dimension *Nn*(*s*). For the median of a sample with random sample size *Nn*(*s*) analogous results are shown in Christoph et al. [24] (Section 6). All these asymptotically normally distributed statistics *TNn*(*s*) with Pareto-like random dimensions or sample sizes have the same limiting distribution.

Let *γ* ∈ {1/2, 0, −1/2}. Since E*Nn*(*s*) = ∞, we choose as *gn* = *n*. Then, the limit laws for

$$\mathbb{P}\left(n^{\gamma}\mathcal{N}\_{\mathfrak{n}}(s)^{1/2-\gamma}T\_{\mathcal{N}\_{\mathfrak{n}}(s)} \le \mathbf{x}\right) \text{ are } V\_{\gamma}(\mathbf{x},s) \\ = \int\_{0}^{\infty} \Phi(\mathbf{x}\,\mathbf{y}^{\gamma})dH\_{\mathfrak{n}}(\mathbf{y}) \\ = \int\_{0}^{\infty} \Phi(\mathbf{x}\,\mathbf{y}^{\gamma})\,\frac{s}{\mathcal{y}^{2}}e^{-s\cdot\mathbf{y}}d\mathbf{y}.$$

with corresponding densities

$$v\_{\gamma}(\mathbf{x},s) = \frac{s}{\sqrt{2\pi}} \int\_0^\infty y^{\gamma-2} e^{-(x^2y^{2\gamma}/2 + s/y)} dy = \begin{cases} l\_{1/\sqrt{s}}(\mathbf{x}) & = \frac{\sqrt{2\pi}}{2} e^{-\sqrt{2s}|\mathbf{x}|}, & \gamma = \frac{1}{2}, \\\ q(\mathbf{x}) & = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}, & \gamma = 0, \\\ s\_2^\*(\mathbf{x}; \sqrt{s}) & = \frac{1}{2\sqrt{2s}} \left(1 + \frac{x^2}{2s}\right)^{-3/2}, & \gamma = -\frac{1}{2}, \end{cases} \tag{19}$$

Therefore, the limit distributions *Vγ*(*x*,*s*) are the Laplace law *L*1/√*s*(*x*) with density *l* 1/√*s*(*x*) and scale parameter *λ* = 1/ <sup>√</sup>*<sup>s</sup>* for *<sup>γ</sup>* <sup>=</sup> 1/2, the standard normal law <sup>Φ</sup>(*x*) and density *<sup>ϕ</sup>*(*x*) for *γ* = 0 and for *γ* = −1/2 the scaled Student's t-distribution *S*<sup>∗</sup> <sup>2</sup> (*x*; <sup>√</sup>*s*) with 2 degrees of freedom and density *s*<sup>∗</sup> <sup>2</sup> (*x*; <sup>√</sup>*s*). These mixed scale distributions *<sup>V</sup>γ*(*x*,*s*) are discussed in more detail in Christoph and Ulyanov [20] (Section 4.2).

$$\text{4.2. The Case } F(\mathfrak{x}) = G\_d(\mathfrak{x}) \text{ and } H(\mathfrak{y}) = G\_{r\mathcal{I}}(\mathfrak{y})$$

Asymptotically chi-squared distributed statistics of samples with random sample size were considered for the first time in Christoph and Ulyanov [18] in case of *H*(*y*) = *Ws*(*y*) = *e*−*s*/*y*, *y* > 0.

Now, negatively binomial distributed sample sizes *Nn*(*r*) are considered. With *γ* ∈ {1, 0, −1} and *gn* = E*Nn*(*r*) = *r*(*n* − 1) + 1, the limit distributions for

$$\mathbb{P}\left(\operatorname{g}\_{\mathrm{n}}^{\gamma}\operatorname{N}\_{\mathrm{n}}(\boldsymbol{r})^{1-\gamma}\operatorname{T}\_{\mathrm{N}\_{\mathrm{n}}(\boldsymbol{r})}\leq\mathbf{x}\right)\text{ are }V\_{\gamma}(\mathbf{x};d,\boldsymbol{r})=\int\_{0}^{\infty}\operatorname{G}\_{\mathrm{d}}(\mathbf{x}\,\boldsymbol{y}^{\gamma})d\operatorname{G}\_{\mathrm{r}\boldsymbol{r}}(\boldsymbol{y})=\int\_{0}^{\infty}\operatorname{G}\_{\mathrm{d}}(\mathbf{x}\,\boldsymbol{y}^{\gamma})\,\frac{\boldsymbol{r}^{\boldsymbol{r}}}{\Gamma(\boldsymbol{r})}\boldsymbol{y}^{\gamma-1}\,\boldsymbol{e}^{-\boldsymbol{r}\boldsymbol{y}}\,d\boldsymbol{y}.$$

The corresponding densities are

$$\begin{split} \label{E} v\_{\gamma}(\mathbf{x};d,r) &= \quad \frac{r^{\mathsf{T}}x^{d/2-1}}{\Gamma(r)2^{d/2}\Gamma(d/2)} \int\_{0}^{\infty} y^{r+\gamma d/2-1} e^{-(xy^{r}/2+ry)} dy \\ &= \begin{cases} f^{\*}(\mathbf{x};d,2r) &= \frac{\Gamma(d/2+r)}{\Gamma(d/2)} \frac{x^{d/2-1}}{\Gamma(r)2^{d/2}} \left(1+\frac{\mathbf{x}}{2\tau}\right)^{-(d+2r)/2}, & \gamma=1, \\ g\_{d}(\mathbf{x}) &= \frac{1}{2^{d/2}\Gamma(d/2)} x^{d/2-1} e^{-x/2}, & \gamma=0, \\ w\_{r-d/2}(\mathbf{x};d,r) &= \frac{r}{\Gamma(r)\Gamma(d/2)} \left(\frac{\mathbf{x}\cdot\mathbf{r}}{2}\right)^{r/2+d/4-1} K\_{r-d/2}(\sqrt{2\tau\mathbf{x}}). & \gamma=-1. \end{cases} \end{split}$$

We prove (20) for *γ* = ±1 in Section 6 in the proof of Theorem 2.

The scale mixtures *Vγ*(*x*; *d*,*r*) are the (scaled by d) *F*-distribution *F*∗(*x*; *d*, 2 *r*) = *<sup>F</sup>*(*x*/*d*; *<sup>d</sup>*, 2 *<sup>r</sup>*) with parameters *<sup>d</sup>* ∈ N<sup>+</sup> and *<sup>r</sup>* > 0 and density *<sup>f</sup>* <sup>∗</sup>(*x*; *<sup>d</sup>*; 2 *<sup>r</sup>*) = <sup>1</sup> *<sup>d</sup> <sup>f</sup>*( *<sup>x</sup> <sup>d</sup>* ; *<sup>d</sup>*; 2 *<sup>r</sup>*) for *γ* = 1, the chi-squared distribution *Gd*(*x*) with *d* degrees of freedom and density *gd*(*x*) for *γ* = 0 and a gamma distribution of generalized type *Wr*<sup>−</sup>*d*/2(*x*; *d*,*r*) occurs with density *wr*<sup>−</sup>*d*/2(*x*; *d*,*r*) for *γ* = −1. The modified Bessel function of the third kind or Macdonald functions *Kλ*(*u*) also occurred in Christoph and Ulyanov [18,20] in generalized gamma and Laplace densities.

**Remark 6.** *The Macdonald function satisfying order-reflection formula K*−*λ*(*u*) = *Kλ*(*u*) *and Kλ*(*u*) *may be expressed for λ* = *m* + 1/2 *with integer m in closed forms. In Oldham et al. [34] (Formulas 51:4:1 and 26:13:3), the Macdonald functions K*−*λ*(*u*) = *Kλ*(*u*) *for λ* = 1/2, 3/2, 5/2, 7/2, 9/2 *are explicitly given. Using Prudnikov et al. [35] (Formulas 2.3.16.1-3), the densities wr*<sup>−</sup>*d*/2(*x*; *d*,*r*) = *wm*<sup>+</sup>1/2(*x*; *d*,*r*) *can be calculated:*

$$w\_{m+1/2}(\mathbf{x};d,r) = \frac{r^r \mathbf{x}^{d/2-1}}{\Gamma(r) 2^{d/2} \Gamma(d/2)} \begin{cases} (-1)^m \sqrt{\pi} \frac{\partial^m}{\partial r^m} \left( r^{-1/2} e^{-\sqrt{2\pi} \mathbf{x}} \right), & m = 0, 1, 2, \dots, \\\ (-2)^{-m} \sqrt{\frac{\pi}{r}} \frac{\partial^{-m}}{\partial \mathbf{x}^{-m}} e^{-\sqrt{2\pi} \mathbf{x}}, & m = 0, -1, -2, \dots \end{cases} \tag{21}$$

**Example 1.** *Some densities wm*<sup>+</sup>1/2(*x*; *d*,*r*) *for m* = *r* − (*d* + 1)/2 = −2, −1, 0, 1, 2*:*

$$\begin{aligned} m=-2 & \quad d=7, \; r=2 & \quad w\_{-3/2}(\mathbf{x}; \mathbf{7}, 2) = \frac{4\,\mathbf{x}}{15} \,(1+\sqrt{4\,\mathbf{x}}) \,e^{-\sqrt{4\,\mathbf{x}}}\\ m=-1 & \quad d=4, \; r=3/2 & \quad w\_{-1/2}(\mathbf{x}; \mathbf{4}, 3/2) = \frac{3}{4} \,\sqrt{3\,\mathbf{x}} \,e^{-\sqrt{3\,\mathbf{x}}}\\ m=0 & \quad d=4, \; r=5/2 & \quad w\_{1/2}(\mathbf{x}; \mathbf{4}, 5/2) = \frac{1}{12} \,\sqrt{25\,\mathbf{x}} \,e^{-\sqrt{5\,\mathbf{x}}}\\ m=0 & \quad d=3, \; r=2 & \quad w\_{1/2}(\mathbf{x}; 3,2) = \sqrt{4\,\mathbf{x}} \,e^{-\sqrt{4\,\mathbf{x}}}\\ m=1 & \quad d=3, \; r=3 & \quad w\_{3/2}(\mathbf{x}; 3,3) = \frac{3}{8} \,\left(6\,\mathbf{x} + \sqrt{6\,\mathbf{x}}\right) \,e^{-\sqrt{6\,\mathbf{x}}}\\ m=2 & \quad d=3, \; r=4 & \quad w\_{5/2}(\mathbf{x}; 3,4) = \frac{1}{12} \,\left((8\,\mathbf{x})^{3/2} + 24\,\mathbf{x} + 3\sqrt{8\,\mathbf{x}}\right) \,e^{-\sqrt{8\,\mathbf{x}}} \end{aligned}$$

**Remark 7.** *If m* = *r* − (*d* + 1)/2 *is an integer, the distribution functions Wm*<sup>+</sup>1/2(*x*; *d*,*r*) *of the densities wm*<sup>+</sup>1/2(*x*; *d*,*r*) *can also be calculated explicitly by substitution and partial integration.*

**Example 2.** *Distribution functions Wλ*(*x*; *d*,*r*) *for given densities wλ*(*x*; *d*,*r*) *with λ* = ±1/2*:*

$$\left(w\_{-1/2}(\mathbf{x}; 4, \frac{3}{2})\right) \quad = \frac{3}{4}\sqrt{3\mathbf{x}}e^{-\sqrt{3x}} \text{ and } \left.W\_{-1/2}(\mathbf{x}; 4, \frac{3}{2})\right) = 1 - \frac{1}{2}\left(2\sqrt{3\mathbf{x}} + 3\mathbf{x} + 2\right)e^{-\sqrt{3x}} \tag{22}$$

$$\begin{array}{rcl} \begin{array}{rcl} \begin{array}{rcl} \frac{\cdot}{2} & = & \frac{\cdot}{12} \end{array} & =\\ \begin{array}{rcl} \frac{25 \times}{12}e^{-\sqrt{5x}} \text{ and } & W\_{1/2}(\mathbf{x}; 4, \frac{5}{2}) = 1 - \left(\frac{(5\mathbf{x})^{3/2}}{6} + \frac{5\mathbf{x}}{2} + \frac{\sqrt{5\mathbf{x}}}{6} + 1\right)e^{-\sqrt{5\mathbf{x}}} \end{array} \end{array} \end{array} \end{array} \tag{23}$$

$$\text{Var}\_{1/2}(\mathbf{x}; 3, 2) \quad = \sqrt{4\mathbf{x}} \, e^{-\sqrt{4\mathbf{x}}} \text{ and } \text{W}\_{1/2}(\mathbf{x}; 3, 2) = 1 - \left(2\mathbf{x} + 2\sqrt{\mathbf{x}} + 1\right) e^{-\sqrt{4\mathbf{x}}}.\tag{24}$$

**Remark 8.** *The generalized gamma distribution G*∗(*x*; *β*, *α*, *λ*) *has two shape parameters α and β, a scale parameter λ, and the density*

$$\log^\*(x;\beta,a,\lambda) = \frac{|a|\,\lambda^{\beta}}{\Gamma(\beta)} x^{a\beta - 1} e^{-\lambda x^a}, \quad x \ge 0, \quad |a| > 0, \ \beta > 0, \ \lambda > 0. \tag{25}$$

*The density (25) is given in Korolev and Zeifman [36] and Korolev and Gorshenin [37] and summarizes many known densities. Generalized gamma distributions are defined in many different ways, but they do not correspond to the ones that occur above.*

**Remark 9.** *The densities wm*<sup>+</sup>1/2(*x*; *d*,*r*) *with integer m* = *r* − (*d* + 1)/2 *are generalized gamma densities g*∗(*x*; *β*, *α*, *λ*) *given in formula* (25) *or may be represented as linear combinations of such densities. The parameters <sup>α</sup>* <sup>=</sup> 1/2 *and <sup>λ</sup>* <sup>=</sup> <sup>√</sup><sup>2</sup> *<sup>r</sup> apply in all densities <sup>g</sup>*∗(*x*; *<sup>β</sup>*, *<sup>α</sup>*, *<sup>λ</sup>*)*. The parameter β also depends on the number of derivatives m* = *r* − (*d* + 1)/2 *in the densities* (21)*.*

**Example 3.** *Some linear combinations of generalized gamma densities:*

$$\begin{array}{rcl} w\_{1/2}(\mathbf{x};3,2) &=& \operatorname{g}^\*(\mathbf{x};3,1/2,\sqrt{4}) \\ w\_{3/2}(\mathbf{x};3,3) &=& \frac{3}{4} \operatorname{g}^\*(\mathbf{x};4,1/2,\sqrt{6}) + \frac{1}{4} \operatorname{g}^\*(\mathbf{x};3,1/2,\sqrt{6}) \\ w\_{5/2}(\mathbf{x};3,4) &=& \frac{1}{2} \operatorname{g}^\*(\mathbf{x};5,1/2,\sqrt{8}) + \frac{3}{8} \operatorname{g}^\*(\mathbf{x};4,1/2,\sqrt{8}) + \frac{1}{8} \operatorname{g}^\*(\mathbf{x};3,1/2,\sqrt{8}). \end{array}$$

#### **5. Main Results**

Inequalities for approximations to scaled statistics P *gγ <sup>n</sup> <sup>N</sup>γ*∗−*<sup>γ</sup> <sup>n</sup> TNn* <sup>≤</sup> *<sup>x</sup>* for *γ* ∈ {0, ±1/2, ±1} will be presented. Here, *γ*<sup>∗</sup> = 1/2 and *γ* ∈ {0, ±1/2} when the statistic *Tm* is asymptotically normally distributed, or *γ*<sup>∗</sup> = 1 and *γ* ∈ {0, ±1} when normalized *Tm* has chi-squared limit distribution.

#### *5.1. Asymptotically Normal Statistics Tm and Pareto-like Sample Sizes Nn*(*s*)

Let asymptotically normal statistic *Tm* satisfy inequality (7) with coefficients *pk* and the rate of convergence *a* > 0. The Pareto-like sample size *Nn* = *Nn*(*s*), *s* > 0, is given in (9), which fulfills the inequality (10). For the scaling factors, select *γ*<sup>∗</sup> = 1/2 and *γ* ∈ {0, ± 1/2} in formula (18).

**Theorem 1.** *Under the conditions given above, the following approximations apply:*

*i: Let <sup>γ</sup>* <sup>=</sup> 1/2*. The non-random scaling factor* <sup>√</sup>*<sup>n</sup> for the statistic TNn*(*s*) *leads to approximations by the Laplace distribution L*1/√*s*(*x*) *with the density l* 1/√*s*(*x*) *stated in* (19) *for γ* = 1/2*:*

$$\sup\_{\mathbf{x}} \left| \mathbb{P} \left( \sqrt{n} \, T\_{N\_n(s)} \le \mathbf{x} \right) - L\_{1/\sqrt{s}, \mathbf{u}}(\mathbf{x}) \right| \le C\_{\mathbf{s}} n^{-\min\{a, 2\}}$$

*where a* > 0 *is the rate of convergence in* (7) *and*

$$\begin{split} L\_{1/\sqrt{s};n}(\mathbf{x}) &= \ & L\_{1/\sqrt{s}}(\mathbf{x}) + l\_{1/\sqrt{s}}(\mathbf{x}) \left( \frac{\mathbb{E}\_{\{a>1/2\}}(a)}{\sqrt{n}} \left[ p\_2 \mathbf{x}^2 + p\_0 \left( \frac{|\mathbf{x}|}{\sqrt{2s}} + \frac{1}{2s} \right) \right] \\ &+ \frac{\mathbb{E}\_{\{a>1\}}(a)}{n} \left[ p\_5 \mathbf{x}^3 |\mathbf{x}| \sqrt{2s} + p\_3 \mathbf{x}^3 + \left( p\_1 + \frac{s-1}{4} \right) \mathbf{x} \left( \frac{|\mathbf{x}|}{\sqrt{2s}} + \frac{1}{2s} \right) \right] \end{split}$$

*ii: Let γ* = <sup>0</sup>*. The random scaling factor* -*Nn*(*s*) *with TNn*(*s*) *leads to the normal approximation* Φ(*x*)*:*

$$\sup\_{\mathbf{x}} \left| \mathbb{P} \left( \sqrt{N\_n(s)} \, \big|\, T\_{N\_n(s)} \le \mathbf{x} \right) - \Phi(\mathbf{x}) - \varphi\_{n,2}(\mathbf{x}) \right| \le C\_s n^{-\min\left\{ a, 2 \right\}\_s}$$

*where a* > 0 *is the rate of convergence in* (7) *and*

$$\varphi\_{n,2}(\mathbf{x}) = \mathbf{q}(\mathbf{x}) \left( \frac{\sqrt{\pi}(p\_0 + p\_2 \mathbf{x}^2)}{2\sqrt{s\,n}} \mathbb{I}\_{\{a > 1/2\}}(a) + \frac{p\_1 \mathbf{x} + p\_3 \mathbf{x}^3 + p\_5 \mathbf{x}^5}{s\,n} \mathbb{I}\_{\{a > 1\}}(a) \right).$$

*iii: Let <sup>γ</sup>* <sup>=</sup> <sup>−</sup>1/2*. The mixed scaling factor <sup>n</sup>*−1/2 *Nn*(*s*) *at TNn*(*s*) *results in Scaled Student's t-distribution S*<sup>∗</sup> <sup>2</sup> (*x*; <sup>√</sup>*s*) *with density s*<sup>∗</sup> <sup>2</sup> (*x*; <sup>√</sup>*s*) *given in* (19) *for <sup>γ</sup>* <sup>=</sup> <sup>−</sup>1/2*:*

$$\sup\_{n} \left| \mathbb{P} \left( n^{-1/2} N\_n(s) \, T\_{N\_n(s)} \le \infty \right) - S^\*\_{n,2}(\mathfrak{x}) \right| \le C\_s n^{-\min\left\{ a, 2 \right\}\_s}$$

*where a* > 0 *is the rate of convergence in* (7) *and*

$$\begin{split} S^\*\_{n/2}(\mathbf{x};\sqrt{s}) &= \quad S^\*\_2(\mathbf{x};\sqrt{s}) + s^\*\_2(\mathbf{x};\sqrt{s}) \left( \frac{\mathbb{I}\_{\{a>1/2\}}(a)}{\sqrt{n}} \left[ p\_0 + \frac{3p\_2\mathbf{x}^2\{\mathbb{C}\} \times \mathbb{M}}{(\mathbf{x}^2 + 2s)} \right] \right. \\ &\left. + \frac{\mathbb{I}\_{\{a>1\}}(a)}{n} \left[ \frac{3p\_1\mathbf{x}}{\mathbf{x}^2 + 2s} + \frac{15p\_3\mathbf{x}^3}{(\mathbf{x}^2 + 2s)^2} + \frac{105p\_5\mathbf{x}^5}{(\mathbf{x}^2 + 2s)^3} + \frac{3(s-1)}{4} \frac{\mathbf{x}}{(\mathbf{x}^2 + 2s)} \right] \right). \end{split}$$

As applications of the Theorem 1, we now examine the Student t-distribution, the Student *t*-test statistic, and the sample mean as asymptotically normal statistics *Tm* considered in Christoph and Ulyanov [18] (Section 3.1 and Corollary 1) for the case of negative binomial sample sizes *Nn* = *Nn*(*r*).

#### **Corollary 1.** *Let the conditions of Theorem 1 be satisfied:*

*i: Let γ* = 1/2*. In case of the Student's t-statistic Tm* = *Z*/ -*χ*<sup>2</sup> *<sup>m</sup> with m degrees of freedom estimated in [18] (Formula (18)), inequality* (7) *is valid with p*<sup>0</sup> = *p*<sup>2</sup> = *p*<sup>5</sup> = 0*, p*<sup>1</sup> = *p*<sup>3</sup> = 1/4 *and <sup>a</sup>* <sup>=</sup> <sup>2</sup>*. The non-random scaling factor* <sup>√</sup>*<sup>n</sup> and Pareto-like Nn*(*s*) *sample sizes lead to:*

$$\sup\_{\mathbf{x}} \left| \mathbb{P} \left( \frac{\sqrt{n} \, \mathbf{Z}}{\sqrt{\chi^2\_{N\_{\mathbf{x}}(s)}}} \le \mathbf{x} \right) - L\_{1/\sqrt{s}}(\mathbf{x}) - \frac{l\_{1/\sqrt{s}}(\mathbf{x})}{8n} \left( 2 \, \mathbf{x}^3 + \mathbf{x} \left( 1 + |\mathbf{x}| \sqrt{2s} \right) \right) \right| \le C\_s n^{-2}$$

*ii: Let γ* = 0*. Let Tm* = (*Xm* − *μ*)/*σ*ˆ*<sup>m</sup> be the Student's t-statistic with sample mean Xm and sample variance σ*ˆ*m, which was considered in [18] (Formulas (21) and (20)). The first order approximation* (7) *with p*<sup>0</sup> = *λ*3/6*, p*<sup>2</sup> = *λ*3/3*, a* = 1*, the Pareto-like random sample sizes Nn*(*s*) *and the random scaling factor* -*Nn*(*s*) *result in:*

$$\sup\_{\mathbf{x}} \sup\_{\mathbf{x}} \left| \mathbb{P} \left( \sqrt{N\_{\mathbf{n}}(\mathbf{s})} \, T\_{N\_{\mathbf{n}}(\mathbf{s})} \le \mathbf{x} \right) - \Phi(\mathbf{x}) - \varphi(\mathbf{x}) \frac{\sqrt{\pi}(\lambda\_3 + 2\lambda\_3 \mathbf{x}^2)}{12\sqrt{sn}} \right| \le \mathbb{C}\_{\mathbf{s}} n^{-1} \lambda\_3$$

*iii: Let γ* = −1/2*. Considering sample mean Tm* = *Xm estimated in [18] (Formulas (15) and (16)), one has (7) with p*<sup>0</sup> = −*p*<sup>2</sup> = *λ*3/6*, p*<sup>1</sup> = *λ*4/8 − 5*λ*<sup>2</sup> <sup>3</sup>/24*, p*<sup>3</sup> = −*λ*4/24 + 5*λ*<sup>2</sup> <sup>3</sup>/36*, <sup>p</sup>*<sup>5</sup> = −*λ*<sup>2</sup> <sup>3</sup>/72*, a* = 3/2*, Pareto-like random sample sizes Nn*(*s*) *and mixed scaling factor n*−1/2 *Nn*(*s*)*, then*

$$\sup\_{n} \left| \mathbb{P} \left( n^{-1/2} N\_n(s) \, T\_{N\_n(s)} \le x \right) - S\_2^\*(x; \sqrt{s}) - s\_{n,2}^\*(x; \sqrt{s}) \right| \le C\_s n^{-3/2} \lambda$$

*with*

$$\begin{split} s\_{n;2}^{\*}(\mathbf{x};\sqrt{s}) &= \quad s\_{2}^{\*}(\mathbf{x};\sqrt{s}) \left( \frac{1}{\sqrt{n}} \left( \frac{\lambda\_{3}}{6} - \frac{\lambda\_{3}\mathbf{x}^{2}}{2(\mathbf{x}^{2}+2s)} \right) \right. \\ &\left. + \frac{1}{n} \left( \frac{(3\lambda\_{4}-5\lambda\_{3}^{2})\mathbf{x}}{8(\mathbf{x}^{2}+2s)} - \frac{5(3\lambda\_{4}-10\lambda\_{5})\mathbf{x}^{3}}{24(\mathbf{x}^{2}+2s)^{2}} - \frac{35\lambda\_{3}^{2}\mathbf{x}^{5}}{24(\mathbf{x}^{2}+2s)^{3}} + \frac{3\left(s-1\right)\mathbf{x}}{4\left(\mathbf{x}^{2}+2s\right)} \right) \right) . \end{split}$$

*5.2. Asymptotically Chi-Squared Distributed Tm with Negative Binomially Distributed Sample Sizes Nn*(*r*)

Let the asymptotically chi-squared distributed statistics *Tm* satisfy inequality (8) with coefficients *q*1, *q*<sup>2</sup> and the rate of convergence *a* = 2. The negative binomially distributed sample sizes *Nn* = *Nn*(*r*) with parameter *r* > 0 and success probability 1/*n* are given in (13) and fulfill the inequality (15). For the scaling factors, choose *γ*<sup>∗</sup> = 1 and *γ* ∈ {0, ±1} in formula (18).

**Theorem 2.** *Under the conditions given above, the following approximations apply.*

*i: Let <sup>γ</sup>* = <sup>1</sup>*. The non-random scaling factor gn* = E*Nn*(*r*) = *<sup>r</sup>*(*<sup>n</sup>* − <sup>1</sup>) + <sup>1</sup> *at statistics TNn*(*r*) *leads to approximations by the scaled F-distribution F*∗(*x*; *d*, 2 *r*) = *F*(*x*/*d*; *d*, 2 *r*) *having parameters <sup>d</sup>* ∈ N<sup>+</sup> *and <sup>r</sup>* > <sup>0</sup> *and density <sup>f</sup>* <sup>∗</sup>(*x*; *<sup>d</sup>*; 2 *<sup>r</sup>*) = <sup>1</sup> *<sup>d</sup> <sup>f</sup>*( *<sup>x</sup> <sup>d</sup>* ; *<sup>d</sup>*; 2 *<sup>r</sup>*) *given in* (20) *with γ* = 1*:*

$$\sup\_{\mathbf{x}} \left| \mathbb{P} \left( \mathcal{g}\_{\mathbf{n}} \, T\_{\mathrm{Nn}(r)} \le \mathbf{x} \right) - \mathcal{F}^\*(\mathbf{x}; d, 2 \, r) - f^\*\_{\mathbf{n}}(\mathbf{x}; d, 2 \, r) \right| \le \mathbb{C}\_r \begin{cases} n^{-\min\{r, 2\}}, & r \ne 2, \\ n^{-2} \ln n, & r = 2, \end{cases}$$

*where*

$$f\_n^\*(\mathbf{x}; d, 2 \, r) = \frac{f^\*(\mathbf{x}; d, 2 \, r)}{g\_n} \mathbb{I}\_{\{r > 1\}}(r) \left( \left( q\_1 - \frac{2 - r}{2} \right) \frac{\mathbf{x} \left( 2 \, r + \mathbf{x} \right)}{2 \, r + d - 2} + q\_2 \, \mathbf{x}^2 + \frac{\mathbf{x}(2 - r)}{2} \right). \tag{26}$$

*ii: For γ* = 0 *and random scaling factor Nn*(*r*) *at TNn*(*r*)*, the approximation Gd*(*x*) *does not change:*

$$\sup\_{r} \left| \mathbb{P} \left( N\_n(r) \, T\_{N\_n(r)} \le x \right) - G\_d(x; n) \right| \le C\_r \begin{cases} n^{-\min\{r, 2\}}, & r \ne 2, \\ n^{-2} \ln n, & r = 2, \end{cases}$$

*where*

$$G\_d(\mathfrak{x};n) = G\_d(\mathfrak{x}) + \frac{\mathfrak{g}\_d(\mathfrak{x})}{\mathfrak{g}\_n} \mathbb{I}\_{\{r > 1\}}(r) (q\_1 \mathfrak{x} + q\_2 \mathfrak{x}^2) \frac{r}{r-1}.$$

*iii: Let γ* = −1 *and r* ≥ 2*. The mixed scaling factor g*−<sup>1</sup> *<sup>n</sup> N*<sup>2</sup> *<sup>n</sup>*(*r*) *at TNn*(*r*) *results in a gamma distribution of generalized type Wr*<sup>−</sup>*d*/2(*x*; *d*,*r*) *with density wr*<sup>−</sup>*d*/2(*x*; *d*,*r*) *given in* (20) *for γ* = −1*:*

$$\sup\_{x} \left| \mathbb{P} \left( \frac{N\_n^2(r)}{\mathcal{g}\_n} T\_{N\_n(r)} \le x \right) - W\_{r - d/2; n}(x; d, r) \right| \le C\_r \begin{cases} n^{-2}, & r > 2, \\ n^{-2} \ln n, & r = 2, 2 \end{cases}$$

*where*

$$\begin{split} \mathcal{W}\_{r-d/2;n}(\mathbf{x};d,r) &= \; \mathcal{W}\_{r-d/2}(\mathbf{x};d,r) + \frac{w\_{r-d/2}(\mathbf{x};d,r)}{\mathcal{g}\_{n}} \mathbb{I}\_{\{r>1\}}(r) \bigg(2\,q\_{2}\,r\,\mathbf{x} + \frac{(r-2)\,\mathbf{x}}{2} \\ &+ \quad \frac{\sqrt{2r\mathbf{x}}}{2} \bigg(2\,q\_{1} + 2\,q\_{2}(d+2-2\,r) + 2 - r\right) \frac{\mathcal{K}\_{r-d/2-1}(\sqrt{2r\mathbf{x}})}{\mathcal{K}\_{r-d/2}(\sqrt{2r\mathbf{x}})} \bigg). \end{split}$$

The restriction *r* ≥ 2 in Theorem 2(iii) has a purely proof-technical character. In Proposition 4, a result is shown with *r* = 3/2.

**Remark 10.** *The function <sup>R</sup>*(*u*; *<sup>d</sup>*,*r*) = *<sup>K</sup>λ*−1(*u*) *<sup>K</sup>λ*(*u*) *can be calculated explicitly for <sup>λ</sup>* <sup>=</sup> *<sup>m</sup>* <sup>+</sup> 1/2 *with integer m* = *r* − (*d* + 1)/2*. Then, for example, R*( <sup>√</sup><sup>3</sup> *<sup>x</sup>*; 4, 3/2) = <sup>1</sup> <sup>+</sup> <sup>√</sup> 1 <sup>3</sup> *<sup>x</sup> and R*( <sup>√</sup><sup>4</sup> *<sup>x</sup>*; 3, 2) = <sup>1</sup>*.*

**Example 4.** *Let γ* = −1 *in* (20)*, r* = 2 *and d* = 3*. Then, for an asymptotically chi-squared distributed test variable Tm satisfying* (8)*, with scale factor <sup>N</sup>*<sup>2</sup> *<sup>n</sup>*(2) <sup>2</sup>*<sup>n</sup>* <sup>−</sup> <sup>1</sup>*, the estimation holds:*

$$\sup\_{\mathbf{x},\mathbf{z}>0} \left| \mathbb{P} \left( \frac{N\_n^2(2)}{2n-1} T\_{N\_n(2)} \le \mathbf{x} \right) - W\_{1/2}(\mathbf{x}; 3, 2) + \frac{w\_{1/2}(\mathbf{x}; 3, 2)}{4 \left( 2n - 1 \right)} \left( \sqrt{4\mathbf{x}} \left( q\_2 \sqrt{4\mathbf{x}} + q\_1 + q\_2 \right) \right) \right| \le C\_2 \frac{\ln n}{n^2},$$
 
$$\text{where } W\_{1/2} \text{ and } w\_{1/2} \text{ are considered in (1.10.1)}$$

*where W*1/2(*x*; 3, 2) *and w*1/2(*x*; 3, 2) *are specified in* (24)*.*

As applications to Theorem 2, we now examine Hotelling's *T*<sup>2</sup> <sup>0</sup> distribution and normalized quotients of two independent chi-square distributions as asymptotic chi-square

distributions, considered in Christoph and Ulyanov [18] (Section 3.2 and Corollary 2) where the sample sizes *Nn* = *Nn*(*s*) had Pareto-like distribution.

#### **Corollary 2.** *The conditions of the Theorem 2 shall be fulfilled:*

*i: Let γ* = 1*. Consider Hotelling's generalized T*<sup>2</sup> <sup>0</sup> *-statistic <sup>T</sup>*<sup>2</sup> <sup>0</sup> = *Tm* = tr **S***q***S**−<sup>1</sup> *<sup>m</sup> with independently distributed random matrices* **S***<sup>q</sup> and* **S***<sup>m</sup> having Wishart distributions Wp*(*q*,**I***p*) *and Wp*(*m*,**I***p*)*, respectively. Then, inequality (8) holds with limit distribution Gd*(*x*)*, d* = *p q, q*<sup>1</sup> = (*p* + 1 − *q*)/2 *and q*<sup>2</sup> = (*p* + 1 + *q*)/(2*d* + 4)*. The non-random scaling factor gn* = E*Nn*(*r*) *by TNn*(*r*) *leads to*

$$\sup\_{\mathbf{x}} \sup\_{\mathbf{x}} \left| \mathbb{P} \left( \mathcal{g}\_{\mathbb{R}^{n}} T\_{\mathcal{N}\_{n} \left( r \right)} \le \mathbf{x} \right) - \mathcal{F}^{\*} (\mathbf{x}; p \, q, 2 \, r) - f\_{\mathbf{n}}^{\*} (\mathbf{x}; p \, q, 2 \, r) \right| \le C\_{\mathbf{r}} \begin{cases} n^{-\min\left\{ r, 2 \right\}}, & r \ne 2, \\ n^{-2} \text{ in } n, & r = 2, \end{cases} \tag{27}$$

*where the scaled F-distribution F*∗(*x*; *p q*, 2 *r*) *with density f* <sup>∗</sup>(*x*; *p q*, 2 *r*) *is given in* (20) *for γ* = 1

$$f\_n^\*(\mathbf{x}; p \, q \, 2r) \quad = \, \frac{f^\*(\mathbf{x}; p \, q \, 2r)}{g\_n} \mathbb{E}\_{\{r > 1\}}(r) \left( \left( \frac{p + 1 - q}{2} - \frac{2 - r}{2} \right) \frac{\mathbf{x} \, (2r + \mathbf{x})}{2 \, r + p \, q - 2} \right)$$

$$+ \frac{(p + 1 + q) \, \mathbf{x}^2}{(2p \, q + 4)} + \frac{\mathbf{x}(2 - r)}{2} \Big). \tag{28}$$

*ii: Let γ* = 0*, χ*<sup>2</sup> *<sup>d</sup> and <sup>χ</sup>*<sup>2</sup> *<sup>m</sup> be independent and Tm* = *χ*<sup>2</sup> *d*/*χ*<sup>2</sup> *<sup>m</sup> be scale mixtures satisfying inequality* (8) *with coefficients q*<sup>1</sup> = (*d* − 2)/2 *and q*<sup>2</sup> = −1/2*. Random degrees of freedom Nn*(*r*) *instead of m and random scaling factor Nn*(*r*) *lead to*

$$\sup\_{r>0} \left| \mathbb{P} \left( N\_n(r) \, T\_{N\_n(r)} \le x \right) - G\_d(x; n) \right| \le C\_r \begin{cases} n^{-\min\{r, 2\}}, & r \ne 2, \\ n^{-2} \ln n, & r = 2, \end{cases}$$

*where*

$$\mathcal{G}\_d(\mathbf{x}; \mathfrak{n}) = \mathcal{G}\_d(\mathbf{x}) + \frac{\mathcal{g}\_d(\mathbf{x})}{2 \mathcal{g}\_n} \mathbb{I}\_{\{r > 1\}}(r) ((d-2)\mathfrak{x} - \mathfrak{x}^2) \frac{r}{r-1}.$$

*iii: Let γ* = −1*. The statistics Tm* = *χ*<sup>2</sup> 4/*χ*<sup>2</sup> *<sup>m</sup> satisfy the inequality (8) with the limiting distribution G*4(*x*) *and the coefficients q*<sup>1</sup> = 1 *and q*<sup>2</sup> = −1/2*. The mixed scaling factor g*−<sup>1</sup> *<sup>n</sup> N*<sup>2</sup> *<sup>n</sup>*(*r*) *at TNn*(*r*) *results in a limiting gamma distribution of generalized type Wr*<sup>−</sup>*d*/2(*x*; *d*,*r*)*. Only if r* − (*d* + 1)/2 = *m is an integer, the involved Macdonald functions Kr*<sup>−</sup>*d*/2( <sup>√</sup><sup>2</sup> *r x*) *may be explicitly calculated. Since d* = 4*, we choose r* = 5/2 *and find r* − (*d* + 1)/2 = 0*. Then, uniformly in x* > 0*:*

$$\left| \mathbb{P} \left( \frac{\mathsf{N}\_n^2(5/2)}{(5n-3)/2} \frac{\chi\_4^2}{\chi\_{\mathsf{N}\_n(5/2)}^2} \le \mathbf{x} \right) - W\_{1/2}(\mathbf{x}; 4, 5/2) + \frac{w\_{1/2}(\mathbf{x}; 4, 5/2)}{2 \, (5n-3)} \Big( 9\mathbf{x} - \sqrt{5}\mathbf{x} \Big) \right| \le \frac{C\_{3/2}}{n^{3/2}},$$

*where W*1/2(*x*; 4, 5/2) *and w*1/2(*x*; 4, 5/2) *are specified in* (23)*.*

**Remark 11.** *In the paper Monahkov [38], an analogous to* (27) *estimation is shown, but with 11 approximation terms in corresponding formula* (28)*. Instead of* (8) *with q*<sup>1</sup> = (*p* + 1 − *q*)/2*, q*<sup>2</sup> = (*p* + 1 + *q*)/(2*d* + 4) *and d* = *p q, the following equivalent inequality is used; see Fujikoshi et al. [39] (Theorem 4.1(ii)):*

$$\sup\_{\mathbf{x}} \sup\_{\mathbf{x}} \left| \mathbb{P} \left( m \operatorname{tr} \left( \mathbf{S}\_{\theta} \mathbf{S}\_{m}^{-1} \right) \leq \mathbf{x} \right) - G\_{d}(\mathbf{x}) - \frac{d}{4 \cdot m} \left( a\_{0} \mathbf{G}\_{d}(\mathbf{x}) + a\_{1} \mathbf{G}\_{d+2}(\mathbf{x}) + a\_{2} \mathbf{G}\_{d+4}(\mathbf{x}) \right) \right| \leq \frac{\mathsf{C}}{m^{2}}$$

*where a*<sup>0</sup> = *q* − *p* − 1*, a*<sup>1</sup> = −2*q, a*<sup>2</sup> = *q* + *p* + 1 *with a*<sup>0</sup> + *a*<sup>1</sup> + *a*<sup>2</sup> = 0 *and d* = *p q.*

**Proposition 4.** *Let γ* = −1*. Consider the statistics Tm* = *χ*<sup>2</sup> 4/*χ*<sup>2</sup> *m, satisfying the inequality (8) with the limiting distribution G*4(*x*)*, the coefficients q*<sup>1</sup> = 1 *and q*<sup>2</sup> = −1/2 *and the mixed scaling factor g*−<sup>1</sup> *<sup>n</sup> N*<sup>2</sup> *<sup>n</sup>*(*r*) *at TNn*(*r*)*. If r* = 3/2 *and d* = 4*, then r* − (*d* + 1)/2 = −1*, gn* = (3*n* − 1)/2 *and, uniformly in x* > 0*:*

$$\left| \mathbb{P} \left( \frac{N\_n^2(3/2)}{(3n-1)/2} \frac{\chi\_4^2}{\chi\_{N\_0(3/2)}^2} \le \mathbf{x} \right) - \mathcal{W}\_{-1/2}(\mathbf{x}; 4, 3/2) + \frac{w\_{-1/2}(\mathbf{x}; 4, 3/2)}{2(3n-1)} \left( 7\mathbf{x} + \sqrt{3}\mathbf{x} + 1 \right) \right| \le \frac{C\_{3/2}}{n^{3/2}},$$

*where W*−1/2(*x*; 4, 3/2) *and w*−1/2(*x*; 4, 3/2) *are specified in* (22)*.*

#### **6. Proofs**

For the proofs of Theorems 1 and 2, we use Proposition 1. The statistics *Tm* and the sample size *Nn* are either asymptotically normally and discretely Pareto-like distributed (i.e., *F* = Φ and *H* = *Ws*) or asymptotically chi-squared and negatively binomially distributed (i.e., *F* = *Gd* and *H* = *Gr*,*r*). In both cases, the size *Dn* defined in (6) is uniformly bounded for all *n* ∈ *N*+, see Christoph and Ulyanov [18] (Lemma A1). Next, the bounds that are required in (4) for the negative moments of sample sizes E*Nn*(*s*)−*<sup>a</sup>* and E*Nn*(*r*)−*<sup>a</sup>* are provided by (12) and (17). Furthermore, it follows from Christoph and Ulyanov [18] (Proposition 2 and Lemma A2) that in both cases the domain of integration of the integrals in the function *Gn*(*x*, 1/*gn*) defined in (5) can be extended from (1/*gn*, ∞) to (0, ∞):

$$\sup\_{\mathbf{x}} \left| G\_n(\mathbf{x}, \mathbf{1}/\mathcal{g}\_n) - G\_{n,2}(\mathbf{x}) \right| \le \mathbb{C} \operatorname{g}\_n^{-b} \lambda$$

where *b* = 2 if *F* = Φ and *H* = *Ws* or *b* = min{*r*, 2} if *F* = *Gd* and *H* = *Gr*,*r*, respectively, and

$$\mathbf{G}\_{n2}(\mathbf{x}) = \begin{cases} \int\_{0}^{\infty} \mathrm{F}(\mathbf{x}\,\mathbf{y}^{\gamma}) dH(\mathbf{y}), & \text{for} \quad 0 < b \le 1/2, \\\int\_{0}^{\infty} \left( \mathrm{F}(\mathbf{x}\,\mathbf{y}^{\gamma}) + \frac{f\_{1}(\mathbf{x}\,\mathbf{y}^{\gamma})}{\sqrt{\mathcal{S}\_{n}\mathbf{y}^{\gamma}}} \right) dH(\mathbf{y}) =: \mathbf{G}\_{n1}(\mathbf{x}), & \text{for} \quad 1/2 < b \le 1, \\\ \mathbf{G}\_{n1}(\mathbf{x}) + \int\_{0}^{\infty} \frac{f\_{2}(\mathbf{x}\,\mathbf{y}^{\gamma})}{\mathcal{S}\_{n}\mathbf{y}^{\gamma}} dH(\mathbf{y}) + \int\_{0}^{\infty} \frac{\mathrm{F}(\mathbf{x}\,\mathbf{y}^{\gamma})}{\mathcal{H}} d\mathbf{h}\_{2}(\mathbf{y}), & \text{for} \quad b > 1, \end{cases} \tag{29}$$

We still have to calculate the integrals in (29) that contain *f*1, *f*2, and *h*2, respectively.

**Proof of Theorem 1.** We now consider *F* = Φ, *H* = *Hs* and *γ* ∈ {0; ±1/2}. Here, *f*1(*xyγ*)=(*p*<sup>0</sup> + *p*2*x*<sup>2</sup> *y*2*γ*)*ϕ*(*x yγ*), *f*2(*xyγ*)=(*p*1*x y<sup>γ</sup>* + *p*3*x*<sup>3</sup> *y*3*<sup>γ</sup>* + *p*5*x*<sup>5</sup> *y*5*γ*)*ϕ*(*x yγ*) and we divide the function *h*2(*y*) = *h*2;*s*(*y*) given in (11) into two parts: *h*<sup>∗</sup> 2;*s*(*y*) = *s*(*s* − 1) e−*s*/*<sup>y</sup>* /(2 *y*2) and *h*∗∗ 2;*s*(*y*) = *s Q*1(*n y*) *<sup>y</sup>*−<sup>2</sup> *<sup>e</sup>*−*s*/*y*. The densities of the limit distributions *<sup>V</sup>γ*(*x*; *<sup>d</sup>*,*r*) = <sup>∞</sup> <sup>0</sup> <sup>Φ</sup>(*x yγ*)*dWs*(*y*) were given in (20). If *<sup>γ</sup>* <sup>=</sup> 1/2 to calculate the integrals in (29) involving *f*1(*x* <sup>√</sup>*y*), *<sup>f</sup>*2(*<sup>x</sup>* <sup>√</sup>*y*) and *<sup>h</sup>*<sup>∗</sup> 2;*s*(*y*) we use Prudnikov et al. [35] (Formulas 2.3.16.2 and 2.3.16.3):

$$\int\_0^\infty y^{-m-1/2} e^{-py-q/y} dy = \begin{cases} (-1)^{-m} \sqrt{\pi} \frac{\partial^{-m}}{\partial p^{-m}} \left( p^{-1/2} e^{-2\sqrt{pq}} \right), & m = 0, -1, -2, \dots \\ (-1)^m \frac{\sqrt{\pi}}{\sqrt{p}} \frac{\partial^m}{\partial q^m} \left( e^{-2\sqrt{pq}} \right), & m = 0, 1, 2, \dots \end{cases}, \quad p, q > 0,\tag{30}$$

for *p* = *x*2/2 > 0, *q* = *s* > 0 and *m* = 0, 1, 2, respectively. The corresponding integral with *h*∗∗ 2;*s*(*y*) was estimated in Christoph et al. [17] (see Proof of Theorem 5) by *<sup>c</sup>*(*s*)*e*−√*<sup>π</sup> s n*/2 <sup>≤</sup> *C*(*s*)*n*<sup>−</sup>2.

In case of *γ* = 0, we obtain <sup>∞</sup> <sup>0</sup> Φ(*x*)*dh*2(*y*) = Φ(*x*) *h*2(∞) − lim*y*→<sup>0</sup> *h*2(*y*) = 0. To calculate the integrals with *f*1(*x*) and *f*2(*x*) we use [35] (Formula 2.3.3.1) with *α* = 3/2, 2 and *q* = *s*:

$$\int\_0^\infty y^{-a-1} e^{-q/y} dy \stackrel{1/y=z}{=} \int\_0^\infty z^{a-1} e^{-qz} dz = \Gamma(a) \, q^{-a}, \quad a > 0, \quad q > 0. \tag{31}$$

If *<sup>γ</sup>* <sup>=</sup> <sup>−</sup> 1/2, the integrals with *<sup>f</sup>*1(*x*/√*y*), *<sup>f</sup>*2(*x*/√*y*) and *<sup>h</sup>*<sup>∗</sup> 2,*s*(*x*/√*y*) are calculated using (31) with *α* = 3/2, 5/2, 7/2, 9/2 and *q* = *s* + *x*2/2. From Christoph and Ulyanov [20] (see Proof of Theorem 8), it follows that holds: *<sup>n</sup>*−<sup>1</sup> sup*<sup>x</sup>* <sup>∞</sup> <sup>0</sup> <sup>Φ</sup>(*x*/√*y*)*dh*∗∗ 2;*s*(*y*) <sup>≤</sup> *<sup>C</sup>*(*s*)*n*−<sup>2</sup> and Theorem 1 is proved.

**Proof of Theorem 2.** Now, we consider the case *F*(*x*) = *Gd*(*x*), *H*(*y*) = *Gr*,*r*(*y*) and *γ* ∈ {0; ±1}. This combination has not yet been studied in the literature. Only if *γ* = 1, there is a result by Monahkov [38]; see Remark 11 above. Then, *f*1(*xyγ*) = 0, *f*2(*xyγ*) = (*q*1*x y<sup>γ</sup>* + *q*2*x*<sup>2</sup> *y*2*γ*)*gd*(*x yγ*) and we divide the function *h*2(*y*) = *h*2;*r*(*y*) given in (16) into two parts: *h*<sup>∗</sup> 2;*r*(*y*)=(<sup>2</sup> *<sup>r</sup>*)−<sup>1</sup> *gr*,*r*(*y*)(*<sup>y</sup>* − <sup>1</sup>)(<sup>2</sup> − *<sup>r</sup>*) and *<sup>h</sup>*∗∗ 2;*r*(*y*) = *<sup>r</sup>*−<sup>1</sup> *gr*,*r*(*y*)*Q*<sup>1</sup> *gn y* .

For *γ* = 1, the density *v*1(*x*; *d*,*r*) in (20) and the integrals in (29) with *f*2(*x y*) and *h*<sup>∗</sup> 2;*r*(*y*) are computed with (31) for *α* = *r* + *d*/2, *r* + *d*/2−1. The integral with *h*∗∗ 2;*r*(*y*) is estimated in (A1) in Lemma A1. Together with the inequality |1/*gn* − 1/(*rn*)| ≤ max{2,*r*}(*r* − 1)(*rn*)<sup>−</sup>2, we get (26).

In case of *γ* = 0, we obtain <sup>∞</sup> <sup>0</sup> *Gd*(*x*)*dh*2(*y*) = *Gd*(*x*) *h*2,*r*(∞) − lim*y*→<sup>0</sup> *h*2,*r*(*y*) = 0. To calculate the integrals with *f*2(*x*), we use (31) with *α* = *r* − 1 and *q* = *r*.

If *γ* = − 1 the density *v*−1(*x*; *d*,*r*) in (20) and the integrals with *f*2(*x*/*y*) and *h*<sup>∗</sup> 2,*r*(*y*) are calculated using Prudnikov et al. [35] (Formula 2.3.16.1):

$$\int\_0^\infty y^{a-1} e^{-p \cdot y - q/y} dy = 2(p/q)^{a/2} \, \mathrm{K}\_a(2\sqrt{pq}), \quad p, q > 0, 1$$

with *α* = *r* − *d*/2, *r* − *d*/2 − 1, *r* − *d*/2 − 2, *p* = *r* and *q* = *x*/2. We use the order-reflection formula *Kα*(*u*) = *K*−*α*(*u*) and the recursion formula; see Oldham et al. [34] (Chapter 51.5):

$$K\_{r-d/2-2}(\sqrt{2rx}) = K\_{d/2+2-r}(\sqrt{2rx}) = \frac{2\left(d/2-r+1\right)}{\sqrt{2rx}} K\_{d/2-r+1}(\sqrt{2rx}) + K\_{d/2-r}(\sqrt{2rx}).$$

The integral with *h*∗∗ 2;*r*(*y*) is estimated in (A4) in Lemma A2 and Theorem 2 is proved.

**Proof of Proposition 4.** We consider *γ* = −1, *r* = 3/2 *d* = 4 and *gn* = (3*n* − 1)/2. The integrals in (29) with *f*2(*x*/*y*) and *h*<sup>∗</sup> 2,*r*(*y*) are calculated using (30) with *m* = −1, −2, −3, *p* = *r* and *q* = *x*/2. The integral with *h*∗∗ 2,*<sup>r</sup>* is estimated in (A5) in Lemma A3 and Proposition 4 is proved.

#### **7. Conclusions**

The common goal of the present work and that of Christoph and Ulyanov [18] is to develop formal second order Chebyshev–Edgeworth expansions for sample statistics with random sample sizes. Corresponding expansions are assumed for the statistics with non-random sample sizes as well as for the random sample sizes. The statistics examined are asymptotically normally distributed and, for the first time in this setting, also asymptotically chi-squared distributed. The random sample sizes have negative binomial or Pareto-like distributions. The formal construction of the approximating functions allows the results to be used for a whole family of asymptotically normal or chi-squared distributed statistics. The Student *t*-distribution with *m* degrees of freedom, the one-sample Student *t*-test statistic, and the sample mean are considered as examples of asymptotic normal statistics. Hotelling's generalized *T*<sup>2</sup> <sup>0</sup> statistic and scale mixture of a normalized quotient of two independent chi-squared random variables were studied as examples of the asymptotic chi-squared distributions. In addition, random, non-random, and mixed scaling factors for the statistics are considered, which have a significant influence on the limit distributions. The limit laws are scale mixtures of the normal with mixing gamma or chi-squared with mixing inverse exponential distributions. In addition to the normal distribution and the chi-square distribution, there are a variety of limit distributions: the Laplace, the scaled Student t-, the scaled Fisher, the generalized gamma, and linear combinations of generalized gamma distributions.

The remaining terms in the approximations of the scaled statistics are estimated by inequalities.

**Author Contributions:** Conceptualization, G.C. and V.V.U.; methodology, V.V.U. and G.C.; formal analysis, G.C. and V.V.U.; investigation, G.C. and V.V.U.; writing—original draft, G.C. and V.V.U.; writing—review and editing, V.V.U. and G.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding. It was carried out within the project "Analysis of the quality of approximations in the statistical analysis of multivariate observations" of the Magdeburg University, the program of the Moscow Center for Fundamental and Applied Mathematics, Lomonosov Moscow State University, and HSE University Basic Research Programs.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank the Editor for his support and the Reviewers for their appropriate comments which have improved the quality of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Auxiliary Lemmas**

**Lemma A1.** *Let r* > 1 *then*

$$|f\_1(\mathbf{x})| = \left| \int\_0^\infty G\_d(\mathbf{x}\,y) dh\_{2\mathcal{T}}^{\*\*}(y) \right| \\
\quad \le \frac{c(r,d)}{g\_n^{r-1}} \quad \text{with} \quad h\_{2\mathcal{T}}^{\*\*}(y) = r^{-1} g\_{r,r}(y) Q\_1(g\_n \, y). \tag{A1}$$

**Proof of Lemma A1.** We use the Fourier series expansion of the jump correcting function *Q*1(*y*) at all non-integer points *y*; see Prudnikov et al. [35] (Formula 5.4.2.9 for *a* = 0):

$$Q\_1(y) = \frac{1}{2} - (y - [y]) = \sum\_{k=1}^{\infty} \frac{\sin(2\pi k \, y)}{k \, \pi}, \quad y \neq [y]. \tag{A2}$$

and Prudnikov et al. [35] (Formula 2.5.31.4):

$$\int\_0^\infty y^{a-1} e^{-py} \sin(by) dy = \frac{\Gamma(a)}{(b^2 + p^2)^{a/2}} \sin(a \arctan(b/p)) \quad \text{with} \quad a > -1, \ b, p > 0. \tag{A3}$$

Integration by parts in the integral *J*1(*x*), using (A2), interchanging sum and integral and applying (A3) with *α* = *r* + *d*/2 − 1, *p* = (*r* + *x*/2) and *b* = 2*πkgn* leads to

$$\begin{split} f\_1(\mathbf{x}) &= \ -\frac{r^{r-1} \mathbf{x}^{d/2}}{\Gamma(r) 2^{d/2} \Gamma(d/2)} \int\_0^\infty y^{r+d/2-2} Q\_1(\mathbf{g}\_n y) \ e^{-(r+\mathbf{x}/2)y} dy \\ &= \ -\frac{r^{r-1} \mathbf{x}^{d/2}}{\pi \Gamma(r) 2^{d/2} \Gamma(d/2)} \sum\_{k=1}^\infty \frac{1}{k} \int\_0^\infty y^{r+d/2-2} e^{-(r+\mathbf{x}/2)y} \sin\left(2\pi k \mathbf{g}\_n y\right) dy \\ &= \ -\frac{r^{r-1} \Gamma(r+d/2-1)}{\pi \Gamma(r) 2^{d/2} \Gamma(d/2)} \sum\_{k=1}^\infty \frac{a\_k(\mathbf{x}; \mathbf{n})}{k} \end{split}$$

with

$$a\_k(\mathbf{x};n) = \frac{\mathbf{x}^{d/2}\sin\left((r+d/2-1)\arctan(2\pi k \mathbf{g}\_n/(r+\mathbf{x}/2))\right)}{\left((2\pi k \mathbf{g}\_n)^2 + (r+\mathbf{x}/2)^2\right)^{(r+d/2-1)/2}}.$$

Now, we split the exponent (*r* + *d*/2 − 1)/2 = (*r* − 1)/2 + *d*/4 and obtain

$$|a\_k(\mathfrak{x}; n)| \quad \le \quad \frac{\mathfrak{x}^{d/2}}{(2\pi k \mathfrak{g}\_n)^{r-1} (r + \mathfrak{x}/2)^{d/2}} \le \frac{2^{d/2}}{(2\pi \mathfrak{x} \, \mathfrak{g}\_n)^{r-1}}.$$

Since *r* > 1, we find uniform in *x* ≥ 0

$$|f\_1(x)| \le \frac{c\_1(r,d)}{g\_n^{r-1}} \sum\_{k=1}^{\infty} k^{-r} = \frac{c(r,d)}{g\_n^{r-1}}$$

and Lemma A1 is proved.

**Lemma A2.** *Let r* ≥ 2*, then*

$$|I\_{-1}(\mathbf{x})| = \left| \int\_0^\infty G\_d(\mathbf{x}/y) dh\_{2\mathcal{T}}^{\*\*}(y) \right| \\
\quad \le \frac{c(r,d)}{g\_{\mathcal{I}}} \quad \text{with} \quad h\_{2\mathcal{T}}^{\*\*}(y) = r^{-1} g\_{\mathcal{I}\mathcal{I}}(y) Q\_1(g\_{\mathcal{I}}y). \tag{A4}$$

**Proof of Lemma A2.** Integration by parts in the integral *J*−1(*x*), using the Fourier series expansion (A2), interchanging sum and integral, we find

$$J\_{-1}(\mathbf{x}) = \frac{r^{r-1} \mathbf{x}^{d/2}}{\Gamma(r) 2^{d/2} \Gamma(d/2)} \int\_0^\infty y^{r-d/2-2} Q\_1(y\_n y) \, e^{-(ry+x/(2y))} dy = \frac{r^{r-1}}{\pi \Gamma(r) 2^{d/2} \Gamma(d/2)} \sum\_{k=1}^\infty \frac{l\_{k, \mathbf{n}}(\mathbf{x})}{k}.$$
 
$$\text{with } \; J\_{k, \mathbf{n}}(\mathbf{x}) = \int\_0^\infty \mathbf{x}^{d/2} y^{r-d/2-2} \, e^{-(ry+x/(2y))} \sin \left(2\pi k \mathbf{g}\_n y\right) dy.$$

In the literature, we have only found integrals *Jk*,*n*(*x*) with power functions *y*−1/2 and *y*<sup>−</sup>3/2. Therefore, we integrate by parts in the integral *Jk*,*n*(*x*):

$$J\_{k,n}(\mathbf{x}) = \frac{-1}{2} \int\_0^\infty \left( (d - 2r + 4) f\_1(\mathbf{x}, y) + 2r f\_2(\mathbf{x}, y) - f\_3(\mathbf{x}, y) \right) e^{-(ry + \mathbf{x}/(2y))} \frac{\cos(2\pi k \mathbf{g}\_n y)}{2\pi k \mathbf{g}\_n} dy,$$

where *f*1(*x*, *y*) = *xd*/2*yr*−*d*/2<sup>−</sup>3, *f*2(*x*, *y*) = *xd*/2*yr*−*d*/2−<sup>2</sup> and *f*3(*x*, *y*) = *xd*/2+1*yr*−*d*/2<sup>−</sup>4. Since *r* ≥ 2 and *d* ≥ 1 we obtain *yr*−<sup>2</sup> *e*−*ry*/2 ≤ *cr* and (*x*/*y*)(*d*−1)/2 *e*−*x*/(4*y*) ≤ *cd*. Using (30) with *m* = 0, 1, 2, *p* = *r*/2, and *q* = *x*/4 we find

$$\int\_0^\infty f\_1(\mathbf{x}, y) dy \le c\_r c\_d \mathbf{x}^{1/2} \int\_0^\infty y^{-3/2} e^{-(ry/2 + \mathbf{x}\cdot(4y))} dy = c\_r c\_d 2\sqrt{\pi} e^{-\sqrt{\pi x/2}} \le \mathbb{C}\_1(r, d),$$

$$\int\_0^\infty f\_2(\mathbf{x}, y) dy \le c\_r c\_d \mathbf{x}^{1/2} \int\_0^\infty \frac{y^{-1/2}}{e^{(ry/2 + \mathbf{x}\cdot(4y))}} dy = c\_r c\_d \sqrt{2\pi \mathbf{x}\cdot\mathbf{r}} e^{-\sqrt{2\pi}\mathbf{x}'/2} \le \mathbb{C}\_2(r, d),$$

$$\int\_0^\infty f\_3(\mathbf{x}, y) dy \le c\_r c\_d \mathbf{x}^{3/2} \int\_0^\infty \frac{y^{-5/2}}{e^{(ry/2 + \mathbf{x}\cdot(4y))}} dy = c\_r c\_d 2\sqrt{\pi t} (\sqrt{2\pi \mathbf{x}} + 2) \, e^{-\sqrt{\pi x}/2} \le \mathbb{C}\_3(r, d)$$

$$\text{and}$$

$$\mathbf{1}$$

and

$$|J\_{k,n}| \le \frac{1}{4\pi k \text{g}\_n} \left( |d - 2r + 4|\mathbb{C}\_1(r,d) + 2r\mathbb{C}\_2(r,d) + \mathbb{C}\_3(r,d) \right) \le \frac{\mathbb{C}^\*(r,d)}{k \text{g}\_n}.$$

Hence,

$$|J\_{-1}(\mathfrak{x})| \le \frac{r^{r-1}}{\pi \, \Gamma(r) \, 2^{d/2} \, \Gamma(d/2)} \, \frac{\pi^2}{6 \, \mathfrak{g}\_n} \, \mathbb{C}^\*(r, d) \le \frac{c(r, d)}{\mathfrak{g}\_n}.$$

Lemma A2 is proved.

**Lemma A3.** *Let γ* = −1*, r* = 3/2*, d* = 4 *and gn* = (3*n* − 1)/2*, then*

$$|f\_{-1}^\*(\mathbf{x})| = \left| \int\_0^\infty \mathbb{G}\_d(\mathbf{x}/y) dh\_{2,3/2}^{\*\*}(y) \right| \le \frac{c(3/2, 4)}{\sqrt{g\_n}} \quad \text{with} \quad h\_{2,3/2}^{\*\*}(y) = (2/3) \, g\_{3/2,3/2}(y) \mathbf{Q}\_1(\mathbf{g}\_n y). \tag{A5}$$

**Proof of Lemma A3.** Integration by parts in the integral *J*<sup>∗</sup> <sup>−</sup>1(*x*), using the Fourier series expansion (A2), interchanging sum and integral, we find

$$J\_{-1}^\*(x) = \frac{\sqrt{3/2}\,x^2}{4\,\Gamma(3/2)} \int\_0^\infty y^{-5/2} \, Q\_1(y\_ny) \, e^{-(3y/2+x/(2y))} dy = \frac{\sqrt{3/2}}{\sqrt{\pi}} \sum\_{k=1}^\infty \frac{J\_{k,n}^\*(x)}{k}$$

with

$$J\_{k,n}^\*(\mathbf{x}) = \pi^2 \int\_0^\infty y^{-5/2} e^{-(3y/2 + \mathbf{x}/(2y))} \sin\left(2\pi k g\_n y\right) dy.$$

Using Prudnikov et al. [35] (Formula 2.5.37.3), with the real constants *p* > 0, *q* > 0 and *b* > 0, we obtain

$$\int\_0^\infty y^{-3/2} \, e^{-py - q/y} \sin(b \, y) dy = \frac{\sqrt{\pi}}{\sqrt{q}} e^{-2\sqrt{q}z\_+} \sin(2\sqrt{q}z\_-) \quad \text{and} \quad 2z\_\pm^2 = \sqrt{p^2 + b^2} \pm p. \tag{A6}$$

It was shown in Christoph et al. [17] (Proof of Theorem 5) that Leibniz's integral rule allows differentiation to *q* under the integral sign in (A6). Therefore,

$$\begin{aligned} \int\_0^\infty y^{-5/2} e^{-p \cdot y - q/y} \sin(b \, y) dy &= \begin{pmatrix} \sqrt{\pi}/2 \end{pmatrix} e^{-2\sqrt{q}z\_+} \begin{pmatrix} q^{-3/2} \sin(2\sqrt{q} \, z\_-) \end{pmatrix} \\ &+ 2 \, q^{-1} z\_+ \sin(2\sqrt{q} \, z\_-) \quad - \, 2 \, q^{-1} z\_- \cos(2\sqrt{q} \, z\_-) \end{pmatrix}. \end{aligned}$$

Since 0 < *z*<sup>−</sup> ≤ *z*+, *p* = 3/2, *q* = *x*/2, *b* = 2*πkgn*, *k* ≥ 1 and *gn* ≥ 1 we find *z*<sup>+</sup> ≥ -*π k gn*,

$$|I\_{k,n}^\*(\mathbf{x})| \le \mathbf{x}^2 \frac{\sqrt{\pi}}{2} e^{-\sqrt{2\mathbf{x}}z\_+} \left(\frac{2\sqrt{2}}{\mathbf{x}^{3/2}} + \frac{8}{\mathbf{x}} z\_+\right) = \frac{\sqrt{\pi}}{z\_+} e^{-\sqrt{2\mathbf{x}}z\_+} \left(\sqrt{2\mathbf{x}}z\_+ + 4\mathbf{x}z\_+^2\right) \le \frac{e^{-1} + 8e^{-2}}{\sqrt{k\,n}}$$

and

$$|J\_{-1}(\mathfrak{x})| \le \frac{\sqrt{3/2}}{\sqrt{\pi}} \sum\_{k=1}^{\infty} \frac{e^{-1} + 8e^{-2}}{k^{3/2} \sqrt{\mathfrak{Z}^n}}.$$

Lemma A3 is proved.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

**Yoon-Tae Kim and Hyun-Suk Park \***

Division of Data Science and Data Science Convergence Research Center, Hallym University, Chuncheon 24252, Republic of Korea; ytkim@hallym.ac.kr

**\*** Correspondence: hspark@hallym.ac.kr; Tel.: +82-33-248-2036

**Abstract:** The *Kolmogorov* and *total variation* distance between the laws of random variables have upper bounds represented by the *L*1-norm of densities when random variables have densities. In this paper, we derive an upper bound, in terms of densities such as the Kolmogorov and total variation distance, for several probabilistic distances (e.g., Kolmogorov distance, total variation distance, *Wasserstein* distance, *Forter–Mourier* distance, etc.) between the laws of *F* and *G* in the case where a random variable *F* follows the invariant measure that admits a density and a differentiable random variable *G*, in the sense of Malliavin calculus, and also allows a density function.

**Keywords:** Malliavin calculus; invariant measure; density function; Stein's bound; fourth moment theorem; probabilistic distance; Scheffe's theorem

**MSC:** 60H07; 60F17; 60F25;

#### **1. Introduction**

Let *B* = {*B*(*h*), *h* ∈ H}, where H is a real separable Hilbert space, be an isonormal Gaussian process defined on some probability space (Ω, F, P) (see Definition 1). The authors in [1] discovered a celebrated central limit theorem, called the "*fourth moment theorem*", for a sequence of random variables belonging to a fixed Wiener chaos associated with *B* (see Section 2 for the definition of Wiener chaos).

**Theorem 1** (Fourth moment theorem)**.** *Let* {*Fn*, *n* ≥ 1} *be a sequence of random variables belonging to the q*(≥ 2)*th Wiener chaos with* E[*F*<sup>2</sup> *<sup>n</sup>* ] = <sup>1</sup> *for all <sup>n</sup>* <sup>≥</sup> <sup>1</sup>*. Then Fn* <sup>L</sup> −→ *Z if and only if* E[*F*<sup>4</sup> *<sup>n</sup>* ] <sup>→</sup> <sup>3</sup> <sup>=</sup> <sup>E</sup>[*Z*4]*, where <sup>Z</sup> is a standard Gaussian random variable and the notation* <sup>L</sup> −→ *denotes the convergence in distribution.*

After that, the authors in [2] obtained a quantitative bound of the distances between the laws of *F* and *Z* by developing the techniques based on the combination between Malliavin calculus (see, e.g., [3–7]) and Stein's method for normal approximation (see, e.g., [8–10]). These distances can be defined in several ways. More precisely, the distance between the laws of *F* and *Z* is given by

$$d(F, Z) \le \mathcal{C}\_d \sqrt{\mathbb{E}[(1 - \langle DF, -DL^{-1}F \rangle\_{\mathfrak{H}})^2]}.\tag{1}$$

where *D* and *L*−<sup>1</sup> denote the Malliavin derivative and the pseudo-inverse of the Ornstein– Uhlhenbeck generator, respectively (see Definitions 2 and 5), and the constant *Cd* in (1) only depends on the distance *d* considered. In the particular case where *F* is an element in

**Citation:** Kim, Y.-T.; Park, H.-S. Bound for an Approximation of Invariant Density of Diffusions via Density Formula in Malliavin Calculus. *Mathematics* **2023**, *11*, 2302. https://doi.org/10.3390/math11102302

Academic Editor: Manuel Alberto M. Ferreira

Received: 17 March 2023 Revised: 29 April 2023 Accepted: 2 May 2023 Published: 15 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the *q*th Wiener chaos of *B* with E[*F*2] = 1, the upper bound (1) for Kolmogorov distance (*Cd* = 1) is given by

$$d\_{Kol}(F, Z) \le \sqrt{\frac{q - 1}{3q} (\mathbb{E}[F^4] - 3)}. \tag{2}$$

where E[*F*4] − 3 is the fourth cumulant of *F*.

The application of the Stein's method related to Malliavin calculus has been extended from the normal distribution to the cases of Gamma and Pearson distributions (see e.g., [11,12]). Furthermore, the authors in [13] extend the upper bound (1) to a more general class of probability distribution. For a differentiable random variable in the sense of the Malliavin calculus, they obtain the upper bound of distance between its law and a law of a random variable with a density that is continuous, bounded, and strictly positive in the interval (*l*, *u*) (−∞ ≤ *l* < *u* ≤ ∞) with finite variance. Their approach is based on the construction of an ergodic diffusion that has a density *p* as an invariant measure. The diffusion with the invariant density *p* has the form

$$dX\_t = b(X\_t)dt + \sqrt{a(X\_t)}dW\_{t\prime} \tag{3}$$

where *W* is a standard Brownian motion. Then, they consider the generator of the diffusion process *X* and use the integration by parts (see Definition 3 for the integration by parts formula) to find an upper bound for the distance between the law of a differentiable random variable *G* and the law of a random variable *F* with density *pF*. This bound contains *D* and *L*−<sup>1</sup> as in the bound (1). Precisely, for a suitable class of functions F,

$$\begin{split} \sup\_{f \in \mathcal{F}} &|\mathbb{E}[f(\mathcal{G}) - f(\mathcal{F})]| \\ \leq & \quad \text{CE}\left[\left|\frac{1}{2}a(\mathcal{G}) + \mathbb{E}\left[\langle -DL^{-1}(b(\mathcal{G}) - \mathbb{E}[b(\mathcal{G})]), DG\rangle\_{\mathcal{H}}\right] \mathcal{G}\right|\right] \\ &+ \mathbb{C}|\mathbb{E}[b(\mathcal{G})]|. \end{split} \tag{4}$$

If a random variable *G* admits a density with respect to the Lebesgue measure, the Kolmogorov ( i.e., F = {**1**(*l*,*z*); *z* ∈ (*l*, *u*)}) and *total variation* distance ( F = {**1***B*; *B* ∈ B(R)}) can be bounded by

$$\sup\_{f \in \mathcal{F}} |\mathbb{E}[f(G) - f(F)]| \le \int\_{-\infty}^{\infty} |p\_G(\mathbf{x}) - p\_F(\mathbf{x})| d\mathbf{x}.\tag{5}$$

We note that Scheffe's theorem implies that the pointwise convergence of densities is stronger than convergence in distribution. In this paper, we assume that the law of *G* admits a density with respect to the Lebesgue measure. This assumption on *G* is satisfied for all distributions considered throughout examples in the paper [13]. Using the bound of (4) and the diffusion coefficient in (3) given by

$$a(\mathbf{x}) = \frac{-2\int\_I^\chi (y-m)p\_F(y)dy}{p\_F(\mathbf{x})},$$

we derive a bound of general distances in the left-hand side of (4), being expressed in terms of the density functions of two random variables *F* and *G* as in the case of Kolmogorov and total variation distances. In addition, we deal with the computation of the conditional expectation in (4). When *G* is general, it is difficult to find an explicit computation of this expectation. The random variables in all examples covered in [13] are just functions of a Gaussian vector. In this case, it is possible to compute the explicit expectation. If the law of these random variables admits a density with respect to the Lebesgue measure, like all examples considered in [13], we can find the formula from which we can easily compute this expectation.

The rest of the paper is organized as follows. Section 2 reviews some basic notations and the results of Malliavin calculus. In Section 3, we describe the construction of a diffusion process with an invariant density *p* and derive an upper bound between the laws of *F* and *G* in terms of densities. In Section 4, we introduce a method that can directly compute the conditional expectation in (4). Finally, as an application of our main results, in Section 5, we obtain an upper bound of an example considered in [13]. Throughout this paper, *c* (or *C*) stands for an absolute constant with possibly different values in different places.

#### **2. Preliminaries**

In this section, we briefly review some basic facts about Malliavin calculus for Gaussian processes. For a more detailed explanation, see [6,7]. Fix a real separable Hilbert space H, with inner product ·, ·H.

**Definition 1.** *We say that a stochastic process B* = {*B*(*h*), *h* ∈ H} *defined on* (Ω, F, *P*) *is an isonormal Gaussian process if B is a centered Gaussian family of random variables such that* E[*B*(*g*)*B*(*h*)] = *g*, *h*<sup>H</sup> *for every g*, *h* ∈ H*.*

For the rest of this paper, we assume that F is the *σ*-field generated by *X*. To simplify the notation, we write *L*2(Ω) instead of *L*2(Ω, F, *P*). For each *q* ≥ 1, we write H*<sup>q</sup>* to denote the closed linear subspace of L2(Ω) generated by the random variables *Hq*(*B*(*h*)), *h* ∈ H, *h*<sup>H</sup> = 1, where the space *Hq* is the *q*th Hermite polynomial. The space H*<sup>q</sup>* is called the *q*th Wiener chaos of *B*. Let S denote the class of smooth and cylindrical random variables *F* of the form

$$F = f(B(\varphi\_1), \cdot, \cdot, B(\varphi\_m)), \ m \ge 1,\tag{6}$$

where *f* : R*<sup>m</sup>* → R is a C∞-function such that its partial derivatives have at most polynomial growth, and *ϕ<sup>i</sup>* ∈ H, *i* = 1, ··· , *m*. Then, the space S is dense in *Lq*(Ω) for every *q* ≥ 1.

**Definition 2.** *For a given integer p* ≥ 1 *and F* ∈ S*, the pth Malliavin derivative of F with respect to B is the element of L*2(Ω; H,*p*)*, where the space* H,*<sup>p</sup> denotes the symmetric tensor product of* H*, defined by*

$$D^p F = \sum\_{i\_1,\dots,i\_p=1}^m \frac{\partial^p f}{\partial x\_{1\prime},\dots,\partial x\_p} (B(\varphi\_1),\dots,B(\varphi\_n)) \varphi\_{i\_1} \otimes \dots \otimes \varphi\_{i\_p}.\tag{7}$$

For a fixed *p* ∈ [1, ∞) and an integer *k* ≥ 1, we denote by D*k*,*<sup>p</sup>* the closure of its associated smooth random variable class of S with respect to the norm

$$||F||\_{k,p}^p = \mathbb{E}[|F|^p] + \sum\_{\ell=1}^k \mathbb{E}[||D^\ell F||\_{\mathfrak{H}^{\otimes \ell}}^p].$$

For a given integer *p* ≥ 1, we denote by *δ<sup>p</sup>* : *L*2(Ω; H⊗*p*) → *L*2(Ω) the adjoint of the operator *D<sup>p</sup>* : D*k*,2 → *L*2(Ω; H,*q*), called the *multiple divergence operator* of order *p*. The domain of *δp*, denoted by Dom(*δp*), is the subset of L2(Ω; H⊗*p*) composed of those elements *u* such that

$$|\mathbb{E}[\langle D^{\mathcal{P}}F, u\rangle\_{\mathcal{S}^{\otimes \mathcal{P}}}]| \le C(\mathbb{E}[|F|^2]^{1/2} \text{ for all } F \in \mathbb{D}^{p,2}.)$$

**Definition 3.** *If u* ∈ Dom(*δp*)*, then δp*(*u*)*is the element of L*2(Ω) *defined by the duality relationship*

$$\mathbb{E}[F\delta^p(\mu)] = \mathbb{E}[\langle D^p F, \mu \rangle\_{\mathcal{H}^{\otimes p}}] \text{ for every } F \in \mathbb{D}^{p, 2}. \tag{8}$$

The above formula (8) is called an integration by parts formula. For a given integer *q* ≥ 1 and *f* ∈ H,*q*, the *q*th multiple integral of *f* is defined by *Iq*(*f*) = *δq*(*f*). Let *h* ∈ H with *h*<sup>H</sup> = 1. Then, for any integer *q* ≥ 1, we have *Iq*(*h*⊗*q*) = *q*!*Hq*(*B*(*h*)). From this, the linear mapping *Iq* : H,*<sup>q</sup>* → H*<sup>q</sup>* by *Iq*(*h*⊗*q*) = *q*!*Hq*(*B*(*h*)) has an isometric property. It is well known that any square integrable random variable *F* ∈ *L*2(Ω) can be expanded into a series of multiple integrals:

$$F = \mathbb{E}[F] + \sum\_{q=1}^{\infty} I\_q(f\_q)\_q$$

where the series converges in *L*2, and the functions *fq* ∈ H,*q*, *q* ≥ 1, are uniquely determined by *F*. Moreover, if *F* ∈ D*m*,2, then *fq* = <sup>1</sup> *q*! E[*DqF*] for all *q* ≤ *m*.

**Definition 4.** *For a given F* ∈ *L*2(Ω)*, we say that F belongs to Dom*(*L*) *if*

$$\sum\_{q=1}^{\infty} q^2 \mathbb{E}[J\_q(F)^2] < \infty,$$

*where Jq is the projection operator from L*2(Ω) *into* H*q, that is, Jq*(*F*) = *Proj*(*F*|H*q*)*, q* = 0, 1, 2 ...*. For such an F, the operator L is defined through the projection operator Jq, q* = 0, 1, 2 ...*, as LF* = − ∑<sup>∞</sup> *<sup>q</sup>*=<sup>1</sup> *q JqF.*

It is not difficult to see that the operator *L* coincides with the infinitesimal generator of the Ornstein–Uhlhenbeck semigroup {*Pt*, *t* ≥ 0}. The following gives a crucial relationship between the operator *D*, *δ*, and *L*: Let *F* ∈ *L*2(Ω). Then, we have *F* ∈ *Dom*(*L*) if and only if *F* ∈ D1,2 and *DF* ∈ *Dom*(*δ*). In this case, *δ*(*DF*) = −*LF*, that is, for *F* ∈ *L*2(Ω), the statement *F* ∈ Dom(*L*) is equivalent to *F* ∈ Dom(*δD*).

**Definition 5.** *For any F* ∈ *L*2(Ω)*, we define the operator L*−1*, called the pseudo-inverse of L, as L*−1*F* = ∑<sup>∞</sup> *<sup>q</sup>*=<sup>1</sup> <sup>1</sup> *<sup>q</sup> Jq*(*F*)*.*

Note that *L*−<sup>1</sup> is an operator with values in D2,2 and *LL*−1*F* = *F* − E[*F*] for all *F* ∈ *L*2(Ω).

#### **3. Diffusion Process with Invariant Measures**

In this section, we explain how a diffusion process is constructed to have an invariant measure *μ* that admits a density function, say *p*, with respect to the Lebesgue measure (see [13,14] for more information). Let *μ* be a probability measure on *I* = (*l*, *u*) (−∞ ≤ *l* < *u* ≤ ∞) with a continuous, bounded, and strictly positive density function *p*. We take a function *b* : *I* → R that is continuous such that *e* ∈ (*l*, *u*) exists for which *b*(*x*) > 0 for *x* ∈ (*l*,*e*) and *b*(*x*) < 0 for *x* ∈ (*e*, *u*) are satisfied. Moreover, the function *bp* is bounded on *I* and

$$\int\_{I}^{u} b(\mathfrak{x}) p(\mathfrak{x}) d\mathfrak{x} = 0. \tag{9}$$

For *x* ∈ *I*, let us set

$$a(\mathbf{x}) = \frac{2\int\_I^\mathbf{x} b(\mathbf{y}) p\_F(\mathbf{y}) d\mathbf{y}}{p(\mathbf{x})}.\tag{10}$$

Then, the stochastic differential equation (sde)

$$dX\_t = b(X\_t)dt + \sqrt{a(X\_t)}dB\_t\tag{11}$$

has a unique ergodic Markovian weak solution with the invariant measure *μ*.

The authors prove in [15] that the convergence of the elements of a Markov chaos to a Pearson distribution can be still bounded with just the first four moments by using the new concept of a *chaos grade*. Pearson diffusions are examples of the Markov triple and Itô diffusion given by the sde

$$dX\_t = -(X\_t - m)dt + \sqrt{a(X\_t)}dB\_{t\prime} \tag{12}$$

where *m* is the expectation of *μ*, and

$$a(\mathbf{x}) = \frac{-2\int\_{l}^{\mathbf{x}} (y - m)p(y)dy}{p(\mathbf{x})} \text{ for } \mathbf{x} \in (l, \mu). \tag{13}$$

Let us define

$$
\tilde{h}\_f(y) = \frac{2\int\_l^y (f(u) - \mathbb{E}[f(F)]) p(u) du}{a(y) p\_F(y)},
$$

where *F* is a random variable having its law of *μ*. For *f* ∈ C0(*I*), where C0(*I*) = { *f* : *I* → R| *f* is continuous on *I* vanishing at the boundary of *I*}, we define

$$h\_f(x) = \int\_0^x \bar{h}\_f(y) dy.$$

Then, *hf* satisfies that

$$f - \mathbb{E}[f(F)] = b(\mathfrak{x})h\_f'(\mathfrak{x}) + \frac{1}{2}a(\mathfrak{x})h\_f''(\mathfrak{x}).$$

In [13], the authors derive the Stein's bound between the probability measure *μ* and the law of an arbitrary random variable *G*. This bound extends the results in [2,12] in the case where *μ* is a standard Gaussian and Gamma distribution, respectively.

**Theorem 2** (Kusuoka and Tudor (2012) [13])**.** *Let F be a random variable having the target law μ with a probability distribution associated to the diffusion given by sde (11). Let G be an I-valued random variable in* D1,2 *with b*(*G*) ∈ L2(Ω)*. Then, for every f* : *I* → R *such that* ˜ *hf and* ˜ *h <sup>f</sup> are bounded, the following holds:*

$$\begin{aligned} & \left| \mathbb{E} \left[ f(\mathcal{G}) - f(\mathcal{F}) \right] \right| \\ & \le \quad \| \bar{h}'\_f \|\\_\infty \mathbb{E} \left[ \left| \frac{1}{2} a(\mathcal{G}) + \langle -DL^{-1} (b(\mathcal{G}) - \mathbb{E}[b(\mathcal{G})]), DG \rangle\_{\mathfrak{H}} \right| \right] \\ & \quad + \| \bar{h}\_f \|\\_\infty \mathbb{E} [b(\mathcal{G})] \| . \end{aligned} \tag{14}$$

*and*

$$\begin{split} & \left| \mathbb{E} \left[ f(\mathcal{G}) - f(\mathcal{F}) \right] \right| \\ & \leq \quad \| \bar{h}'\_f \|\, \approx \mathbb{E} \left[ \left| \mathbb{E} \left[ \frac{1}{2} a(\mathcal{G}) + \langle -DL^{-1} (b(\mathcal{G}) - \mathbb{E}[b(\mathcal{G})]), DG \rangle\_{\mathcal{H}} \Big| \mathcal{G} \right] \right| \right] \\ & \quad + \| \bar{h}\_f \|\, \approx \| \mathbb{E}[b(\mathcal{G})] \|. \end{split} \tag{15}$$

When the laws of *F* and *G* admit densities *pF* and *PG* (with respect to Lebesgue measure), respectively, we derive an upper bound (14) in terms of the densities of *F* and *G* by using Theorem 2.

**Theorem 3.** *Let F be a random variable having the law μ with the density pF associated to the diffusion given by sde (11). Let G be a random variable in* D1,2 *with b*(*G*) ∈ L2(Ω)*. Suppose that the law of G has the density pG with respect to the Lebesgue measure. Then, for every f* : *I* → R *such that* ˜ *hf and* ˜ *h <sup>f</sup> are bounded, we find that*

$$\begin{aligned} & \left| \mathbb{E} \left[ f(G) - f(F) \right] \right| \\ \leq & \left| \| h\_f' \|\\_\infty \mathbb{E} \left[ \left| \int\_G^\infty b(y) \left( \frac{p\_F(y)}{p\_F(G)} - \frac{p\_G(y)}{p\_G(G)} \right) dy \right| \right] \right| \\ & + \left( \| h\_f' \|\\_\infty \mathbb{E} \left[ \frac{\int\_G^\infty p\_G(y) dy}{p\_G(G)} \right] + \| h\_f' \|\\_\infty \right) \| \mathbb{E} [b(G)] \|. \end{aligned} \tag{16}$$

**Proof.** Let *ϕ* : R → R be a C1-function having a bounded derivative *ϕ* with a compact support. Using the integration by parts yields

E *ϕ* (*G*)E @ − *DL*−1(*b*(*G*) − E[*b*(*G*)]), *DG*A H *G* = E @ − *DL*−1(*b*(*G*) − E[*b*(*G*)]), *Dϕ*(*G*) A H = E *ϕ*(*G*)(*b*(*G*) − E[*b*(*G*)]) = − <sup>∞</sup> −∞ *<sup>ϕ</sup>*(*x*) *<sup>d</sup> dx* <sup>∞</sup> *x* (*b*(*y*) <sup>−</sup> <sup>E</sup>[*b*(*G*)])*pG*(*y*)*dy dx* = −*ϕ* (*x*) <sup>∞</sup> *x* (*b*(*y*) − E[*b*(*G*)])*pG*(*y*)*dy* ∞ −∞ + <sup>∞</sup> −∞ *ϕ* (*x*) <sup>∞</sup> *x* (*b*(*y*) − E[*b*(*G*)])*pG*(*y*)*dydx* = E 5 *ϕ* (*G*) <sup>∞</sup> *<sup>G</sup>* (*b*(*y*) <sup>−</sup> <sup>E</sup>[*b*(*G*)])*pG*(*y*)*dy pG*(*G*) 6 . (17)

The above equality (17) obviously shows that

$$\begin{aligned} &\mathbb{E}\left[\left<-DL^{-1}(b(\mathcal{G})-\mathbb{E}[b(\mathcal{G})]), DG\right>\_{\mathcal{G}}\middle|\mathcal{G}\right] \\ &=\frac{\int\_{\mathcal{G}}^{\infty}(b(y)-\mathbb{E}[b(\mathcal{G})])p\_{\mathcal{G}}(y)dy}{p\_{\mathcal{G}}(\mathcal{G})}.\end{aligned} \tag{18}$$

Using the relations (10) and (17), the first expectation in the right-hand side of (15) can be written as

$$\begin{aligned} &\mathbb{E}\left[\left|\frac{1}{2}a(G) + \mathbb{E}\left[\langle -DL^{-1}(b(G) - \mathbb{E}[b(G)]), DG \rangle\_{\mathfrak{H}} \Big| G \right] \right|\right] \\ &= \quad \mathbb{E}\left[\left|\frac{\int\_{-\infty}^{G} b(y) p\_F(y) dy}{p\_F(G)} + \frac{\int\_{G}^{\infty} (b(y) - \mathbb{E}[b(G)]) p\_G(y) dy}{p\_G(G)} \right| \right] \end{aligned} \tag{19}$$

Since

$$\frac{\int\_{I}^{u} b(y) p\_F(y) dy}{p\_F(G)} = 0,$$

we have that

$$\frac{\int\_{-\infty}^{G} b(y) p\_F(y) dy}{p\_F(G)} = -\frac{\int\_{G}^{\infty} b(y) p\_F(y) dy}{p\_F(G)}.$$

This implies that (19) can be written as

$$\begin{split} & \mathbb{E}\left[ \left| \mathbb{E} \left[ \frac{1}{2} a(G) + \langle -DL^{-1}(b(G) - \mathbb{E}[b(G)]), DG \rangle\_{\mathcal{H}} \Big| G \right] \right| \right] \\ & \leq \quad \mathbb{E} \left[ \left| \frac{\int\_{G}^{\infty} b(y) p\_{F}(y) dy}{p\_{F}(G)} - \frac{\int\_{G}^{\infty} b(y) p\_{G}(y) dy}{p\_{G}(G)} \right| \right] \\ & \quad + \left| \mathbb{E}[b(G)] \right| \mathbb{E} \left[ \frac{\int\_{G}^{\infty} p\_{G}(y) dy}{p\_{G}(G)} \right]. \end{split} \tag{20}$$

Combining (15) and (20) completes the proof of this theorem.

**Remark 1.** *In Theorem 2 of [13], the authors prove that if a random variable G* ∈ D1,2 *has the invariant measure μ, then* E[*b*(*G*)] = 0 *and*

$$\mathbb{E}\left[\frac{1}{2}a(\mathcal{G}) + \left\langle -DL^{-1}b(\mathcal{G}), DG \right\rangle\_{\mathfrak{H}} \Big| \mathcal{G} \right] = 0. \tag{21}$$

*Furthermore, if μ admits the density pF, it is obvious from (19) that (21) holds.*

**Remark 2.** *We think it would be interesting to give numerical examples from the computational validity in Theorem 3. In this respect, although not a numerical example, we give a simple example to deduce an upper bound for between the laws of two centered Gaussain random variables.*

**Proposition 1.** *Let F and G be two centered Gaussian random variables with variances σ*<sup>2</sup> <sup>1</sup> > 0 *and σ*<sup>2</sup> <sup>2</sup> > 0*. Then,*

$$d\_{\mathcal{F}}(F,G) \le \sup\_{f \in \mathcal{F}} \|h\_f'\|\_{\infty} \left|\sigma\_F^2 - \sigma\_G^2\right|\_{\prime} \tag{22}$$

*where* F *is the class of functions to be chosen depending on the type of the distance d.*

**Proof.** Obviously, the random variable *F* has the law *μ* with the density

$$p\_F(\mathbf{x}) = \frac{1}{\sqrt{2\pi}\sigma\_\mathcal{F}} \exp\left(-\frac{\mathfrak{x}^2}{2\sigma\_\mathcal{F}^2}\right),$$

associated to the diffusion given by sde with *b*(*x*) = −*x* and *a*(*x*) = 2*σ*<sup>2</sup> *<sup>F</sup>*. Since E[*b*(*G*)] = 0, the second sum in (16) is vanished. Hence, from Theorem 3, it follows that

$$\begin{split} & \quad \left| \mathbb{E} \left[ f(G) - f(F) \right] \right| \\ & \leq \quad \| h\_f' \|\_{\infty} \mathbb{E} \left[ \left| e^{\frac{\tilde{G}^2}{2\nu\_F^2}} \int\_G^{\infty} y e^{-\frac{y^2}{2\nu\_F^2}} dy - e^{\frac{\tilde{G}^2}{2\nu\_G^2}} \int\_G^{\infty} y e^{-\frac{y^2}{2\nu\_G^2}} dy \right| \right] \\ & = \quad \| h\_f' \|\_{\infty} \mathbb{E} \left[ \left| o^2\_F e^{\frac{\tilde{G}^2}{2\nu\_F^2}} \int\_{-\frac{\tilde{G}^2}{2\nu\_F^2}}^{\infty} e^{-u} du - o^2\_G e^{2\nu\_G^2} \int\_{-\frac{\tilde{G}^2}{2\nu\_G^2}}^{\infty} e^{-u} du \right| \right] \\ & = \quad \| h\_f' \|\_{\infty} |o^2\_F - o^2\_G|. \end{split} \tag{23}$$

Since the distance *d*<sup>F</sup> (*F*, *G*) between two distributions *F* and *G* is given by

$$d\_{\mathcal{F}}(F,G) = \sup\_{f \in \mathcal{F}} \left| \mathbb{E} \left[ f(G) - f(F) \right] \right|.$$

the proof of this proposition is completed.

Depending on the choice of F, several types of distances can be defined (see Section 5.2). Comparing the upper bound in Proposition 3.6.1 of [6] obtained from an elementary application of Stein's method with the upper bound in (22) is very interesting. This shows that our study is differentiated from the existing ones.

#### **4. Computation of E** −**DL**−**1**(**b**(**G**) − **E**[**b**(**G**)]), **DG**H|**G**

When *G* is general, it is difficult to find an explicit computation of the right-hand side of (15). In particular, when @ − *DL*−1(*b*(*G*) − E[*b*(*G*)]), *DG*A <sup>H</sup> is not measurable with respect to the *σ*-field generated by *G*, there are cases where it is impossible to compute the expectation. The next proposition in [4] contains an explicit example.

**Proposition 2.** *Let DG* = Ψ*G*(*B*)*, where B is an isonormal Gaussian process and* Ψ*<sup>G</sup>* : R<sup>H</sup> → H *is a uniquely defined measurable function a.e. Then, we have*

$$\begin{aligned} & \left\langle -DL^{-1}(G - \mathbb{E}[G]), DG \right\rangle\_{\mathfrak{H}} \\ &= \int\_0^\infty e^{-t} \langle \Psi\_G(B), \mathbb{E}' \left[ \Psi\_G(e^{-t}B + \sqrt{1 - e^{-2t}}B') \right] \rangle\_{\mathfrak{H}} dt, \end{aligned} \tag{24}$$

*so that*

$$\begin{split} &\mathbb{E}\left[\left\langle -DL^{-1}(G-\mathbb{E}[G]), DG \right\rangle\_{\mathfrak{H}} |G\right] \\ &= \int\_{0}^{\infty} e^{-t} \mathbb{E}\left[ \langle \Psi\_{G}(B), \Psi\_{G}(e^{-t}B + \sqrt{1-e^{-2t}}B') \rangle\_{\mathfrak{H}} |G\right] dt. \end{split} \tag{25}$$

*Here, B and B are defined on the product space* (Ω × Ω , F⊗F , P ⊗ P ) *such that B stands for an independent copy of B.* **E** *and* E *denote the expectation with respect to* P ⊗ P *and* P *, respectively.*

If *G* = *h*(*N*) − E[*h*(*N*)], where *h* : R*<sup>d</sup>* → R is a C1-function with bounded derivative and *N* = (*N*1, ... *Nd*) is a *d*-dimensional Gaussian random variable with zero mean and covariance *hi*, *hj*<sup>H</sup> = E[*NiNj*]=(*Ci*,*j*), *i*, *j* = 1, ... , *d*, where {*hi*, *i* = 1, ... , *n*} stands for the canonical basis of H. By using Proposition 2, the following useful formula can be proved:

$$\left\langle -DL^{-1}(G - \mathbb{E}[G]), DG \right\rangle\_{\mathfrak{H}}$$

$$=\int\_{0}^{\infty} e^{-x} \mathbb{E}' \left[ \sum\_{i,j=1}^{d} \mathbb{C}\_{i,j} \frac{\partial h}{\partial \boldsymbol{x}\_{i}}(N) \frac{\partial h}{\partial \boldsymbol{x}\_{j}} \left( e^{-x}N + \sqrt{1 - e^{-2x}}N' \right) \right] d\boldsymbol{x}.\tag{26}$$

In order to show the significance of the bound (15), the authors in [13] consider the several random variables *G*. Here, among these random variables, we consider random variables with the uniform and Laplace distribution. The random variable defined by

$$G = e^{-\frac{1}{2}\left(B(f) + B(g)\right)}\_{r}$$

where *B*(*f*) and *B*(*g*) are independent standard Gaussian random variables, has the uniform distribution U([0, 1]). The authors in [13] compute the right-hand side of (26) to prove that

$$\mathbb{E}[\left\langle -DL^{-1}(G - \mathbb{E}[G]), DG \right\rangle\_{\mathfrak{H}}|G] = G(1 - G). \tag{27}$$

Computing in this way is tedious and lengthy. To overcome this situation, we can use Equation (18) to prove that (27) holds. Since *G* has the uniform distribution U([0, 1]), we have

$$\begin{split} \mathbb{E}\left[ \left\langle -DL^{-1}(G-\frac{1}{2}), DG \right\rangle\_{\mathfrak{H}} \middle| G \right] &=& \frac{\int\_{G}^{\infty} (y-\frac{1}{2}) \mathbf{1}\_{[0,1]}(y) dy}{\mathbf{1}\_{[0,1]}(G)} \\ &=& G(1-G). \end{split} \tag{28}$$

In the case where *G* has a Laplace distribution, the authors in [13] consider two random variables:

$$G\_1 \quad = \frac{1}{2} \left( B(h\_1)^2 + B(h\_2)^2 - B(h\_3)^2 - B(h\_4)^2 \right),\tag{29}$$

$$G\_2 \quad = \quad B(h\_1)B(h\_2) + B(h\_3)B(h\_4). \tag{30}$$

where *hi*, *i* = 1, ... , 4, are orthonormal functions in *L*2([0, *T*]). It can be easily seen that *Gi*, *i* = 1, 2, has the Laplace distribution with parameter 1. In the paper [13], the authors prove, using Theorem 2 in [13], that for *i* = 1, 2,

$$\mathbb{E}\left[\left<-DL^{-1}G\_{i\prime}DG\_{i}\right>\_{\mathfrak{H}}\big|G\_{i}\right]=1+|G\_{i}|.\tag{31}$$

The authors argue that these identities are difficult to be proven directly. Here, we introduce a method that can directly prove these identities (31) by using the formula given in (18). Since *Gi*, *i* = 1, 2, has a Laplace distribution with parameter 1, we find that for *i* = 1, 2,

$$\mathbb{E}\left[\left<-DL^{-1}(G\_i-\frac{1}{2}),DG\_i\right>\_{\mathcal{H}}|G\_i\right]=\frac{\frac{1}{2}\int\_{G\_i}^{\infty}ye^{-|y|}dy}{\frac{1}{2}e^{-|G\_i|}}.\tag{32}$$

An elementary computation yields that for *Gi* ≥ 0 a.s,

$$\frac{\frac{1}{2}\int\_{G\_i}^{\infty}ye^{-|y|}dy}{\frac{1}{2}e^{-|G\_i|}} = \frac{e^{-G\_i}(1+G\_i)}{e^{-G\_i}} = 1+G\_{i\prime} \tag{33}$$

and for *Gi* < 0 a.s.

$$\begin{array}{rcl} \frac{1}{2} \int\_{\frac{1}{2}}^{\infty} y e^{-|y|} dy\\ \frac{1}{2} e^{-|G\_i|} \end{array} = \begin{array}{rcl} \frac{1}{2} \int\_{G\_i}^{0} y e^y dy + \frac{1}{2} \int\_0^{\infty} y e^{-y} dy\\ \frac{1}{2} e^{G\_i} \end{array}$$

$$= \begin{array}{rcl} \frac{e^{G\_i} (1 - G\_i)}{e^{G\_i}} = 1 - G\_i. \end{array} \tag{34}$$

Combining (33) and (34) proves that the identity (31) holds.

#### **5. Example**

In this section, we illustrate the upper bound of probabilistic distances in Theorem 3 through an example considered in [13]. We denote the Wiener integral of *h* ∈ *L*2([0, *T*]) by *W*(*h*). Let {*hi*, *i* = 1, 2, ...} be a sequence of orthonormal bases of *L*2([0, *T*]) and {*GN*, *N* = 1, 2 . . .} a sequence of random variables defined by

$$G\_N = e^{-\frac{1}{\sqrt{2N}} \sum\_{i=1}^N \left( \mathcal{W}(h\_i)^2 - 1 \right)}.\tag{35}$$

Let *F* be a random variable having log normal distribution with mean *m* = 0 and variance *σ*<sup>2</sup> = 1. Then, the density of *F* is given by

$$p\_F(\mathbf{x}) = \frac{1}{\sqrt{2\pi\mathbf{x}}} \exp\left(-\frac{1}{2} (\log \mathbf{x})^2\right) \mathbf{1}\_{\left(0,\infty\right)}(\mathbf{x}).\tag{36}$$

Next, we compute the density of the random variable *GN* given by (35). We first compute the cumulative distribution function of *GN*. Let us set *XN* = ∑*<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *W*(*hi*)2. Then, the random variable *XN* <sup>=</sup> *<sup>N</sup>* <sup>−</sup> <sup>√</sup>2*<sup>N</sup>* log *GN* has a Gamma distribution with parameters *α* = *<sup>N</sup>* <sup>2</sup> and *<sup>β</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> , that is,

$$P\_{\mathcal{X}\_N}(\mathbf{x}) = \frac{1}{2^{\frac{N}{2}}\Gamma(\frac{N}{2})} \mathbf{x}^{\frac{N}{2}-1} e^{-\frac{\mathbf{x}}{2}} \mathbf{1}\_{(0,\infty)}(\mathbf{x}).\tag{37}$$

Using (37), we find that for *x* ≥ 0,

$$\mathbb{P}(\mathbf{G}\_N \le \mathbf{x}) \quad = \quad \mathbb{P}\left(-\frac{1}{\sqrt{2N}} \sum\_{i=1}^N (W(h\_i)^2 - 1) \le \log \mathbf{x}\right)$$

$$\begin{aligned} &= \quad \mathbb{P}\{X\_N \ge N - \sqrt{2N}\log \mathbf{x}\} \\ &= \quad \int\_{N-\sqrt{2N}\log \mathbf{x}}^{\infty} p\_{X\_N}(y) dy. \end{aligned} \tag{38}$$

Differentiating Equation (38) proves that

$$p\_{G\_N}(\mathbf{x}) = \frac{\sqrt{2N}}{\pi} p\_{X\_N} \left( N - \sqrt{2N} \log \mathbf{x} \right). \tag{39}$$

From (39), it follows that

$$\begin{split} p\_{\mathbb{G}\_{\mathbb{N}}}(\mathbf{x}) &= \ \frac{\sqrt{2N}}{2^{\frac{N}{2}}\Gamma(\frac{N}{2})\mathbf{x}} \left( N - \sqrt{2N} \log \mathbf{x} \right)^{\frac{N}{2}-1} e^{-\frac{1}{2} \left( N - \sqrt{2N} \log \mathbf{x} \right)} \mathbf{1}\_{(0,\infty)}(\mathbf{x}) \\ &= \ \frac{\sqrt{2N}}{2^{\frac{N}{2}}\Gamma(\frac{N}{2})\mathbf{x}} \exp \left\{ \left( \frac{N}{2} - 1 \right) \log(N - \sqrt{2N} \log \mathbf{x}) \right. \\ &\left. - \frac{1}{2} (N - \sqrt{2N} \log \mathbf{x}) \right\} \mathbf{1}\_{(0,\infty)}(\mathbf{x}). \end{split} \tag{40}$$

#### *5.1. Scheffe's Theorem*

First, we prove that *GN* converges in distribution to *F* by using Scheffe's theorem and then find a convergence rate of the Kolmogorov and total variation distance. The right-hand side of (40) can be written as

$$p\_{G\_N}(\mathbf{x}) = \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma(\frac{N}{2}) \mathbf{x}} \exp\left\{ \left(\frac{N}{2} - 1\right) \log N - \frac{N}{2} \right\}$$

$$\times \exp\left\{ \left(\frac{N}{2} - 1\right) \log \left(1 - \sqrt{\frac{2}{N}} \log \mathbf{x} \right) \right\}$$

$$+ \sqrt{\frac{N}{2}} \log \mathbf{x} \right\} \mathbf{1}\_{\left(0, \infty\right)}(\mathbf{x}). \tag{41}$$

For any fixed *x* ∈ (0, ∞), we have, from (36) and (41), that

$$\begin{split} p\_{\rm GN}(\mathbf{x}) - p\_F(\mathbf{x}) &= \quad \left[ \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma\left(\frac{N}{2}\right)} \exp\left\{ \left(\frac{N}{2} - 1\right) \log N - \frac{N}{2} \right\} \right. \\ &\quad \left. - \frac{1}{\sqrt{2\pi}} \right] \frac{1}{\mathbf{x}} e^{-\frac{1}{2} (\log \mathbf{x})^2} \\ &\quad + \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma\left(\frac{N}{2}\right)} \exp\left\{ \left(\frac{N}{2} - 1\right) \log N - \frac{N}{2} \right\} \\ &\quad \times \frac{1}{\mathbf{x}} \left[ \exp\left\{ \left(\frac{N}{2} - 1\right) \log \left(1 - \sqrt{\frac{2}{N}} \log \mathbf{x} \right) \right. \right. \\ &\quad \left. + \sqrt{\frac{N}{2}} \log \mathbf{x} \right\} - e^{-\frac{1}{2} (\log \mathbf{x})^2} \right] \\ &= \ \ \ \ \ \ \boldsymbol{A}\_{1N} + \boldsymbol{A}\_{2N} . \end{split} \tag{42}$$

To estimate the first term *A*1,*<sup>N</sup>* in (42), we can use the following specific version of the Stirling formula of the Γ function, incorporating upper and lower bounds (see [16]):

**Lemma 1.** *Let S*(*x*) = *xx*<sup>−</sup> <sup>1</sup> <sup>2</sup> *e*<sup>−</sup>*x. Then for all x* > 0*,*

$$
\sqrt{2\pi}\mathcal{S}(\mathbf{x}) \le \Gamma(\mathbf{x}) \le \sqrt{2\pi}\mathcal{S}(\mathbf{x})e^{\frac{1}{12\pi}}.\tag{43}
$$

The term |*A*1,*N*| in (42) can be written as

$$|A\_{1,N}| = \frac{1}{\sqrt{2\pi}} |1 - A\_{11,N} \times A\_{12,N}| \frac{1}{\mathfrak{X}} e^{-\frac{1}{\mathfrak{Z}}(\log \mathfrak{X})^2} ,\tag{44}$$

where

$$\begin{array}{rcl} A\_{11,N} &=& \frac{\sqrt{2\pi}\sqrt{\frac{2}{N}}(\frac{N}{2})^{\frac{N}{2}}e^{-\frac{N}{2}}}{\Gamma(\frac{N}{2})},\\ A\_{12,N} &=& \frac{\sqrt{2N}e^{(\frac{N}{2}-1)\log N - \frac{N}{2}}}{2^{\frac{N}{2}}\sqrt{\frac{2}{N}}(\frac{N}{2})^{\frac{N}{2}}e^{-\frac{N}{2}}}.\end{array}$$

Obviously,

$$A\_{12,N} = \frac{\sqrt{2N}2^{\frac{N}{2}-1}(\frac{N}{2})^{\frac{N}{2}-1}}{2^{\frac{N}{2}}\sqrt{\frac{2}{N}}(\frac{N}{2})^{\frac{N}{2}}} = 1. \tag{45}$$

Hence, form (43) and (44),

$$\begin{split} |A\_{1,N}| &= \left| \frac{1}{\sqrt{2\pi}} \right| \frac{\Gamma(\frac{N}{2}) - \sqrt{2\pi} (\frac{N}{2})^{\frac{N}{2} - \frac{1}{2}} e^{-\frac{N}{2}}}{\Gamma(\frac{N}{2})} \right| \\ &\leq \left| \frac{1}{\sqrt{2\pi}} (1 - e^{\frac{1}{12N}}) \right| \\ &= \left| \frac{1}{12\sqrt{2\pi}N} + o(\frac{1}{N}) . \end{split} \tag{46}$$

Using the Taylor expansion of log 1 − 2 *<sup>N</sup>* log *x* 

$$\log\left(1-\sqrt{\frac{2}{N}}\log x\right) = -\sqrt{\frac{2}{N}}\log x - \frac{2}{2N}(\log x)^2 + o\_x(N^{-1}),$$

we write *A*2,*<sup>N</sup>* as

$$\begin{split} A\_{2,N} &= \, \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma(\frac{N}{2})} \exp\left\{ \left( \frac{N}{2} - 1 \right) \log N - \frac{N}{2} \right\} \\ &\times \frac{1}{\chi} \bigg[ \exp\left\{ -\frac{1}{2} (\log x)^2 + o(1) \right\} - e^{-\frac{1}{2} (\log x)^2} \bigg]. \end{split} \tag{47}$$

Since

$$\lim\_{N \to \infty} \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma(\frac{N}{2})} \exp\left\{ \left(\frac{N}{2} - 1\right) \log N - \frac{N}{2} \right\} = \frac{1}{\sqrt{2\pi}} \lambda$$

we will have that lim*N*→<sup>∞</sup> *A*2,*<sup>N</sup>* = 0, and hence, from (42),

$$\lim\_{N \to \infty} p\_{G\_N}(\mathfrak{x}) = p\_F(\mathfrak{x}) \text{ for all } \mathfrak{x} \in (0,1).$$

This convergence implies, from Scheffe's theorem, that as *N* → ∞,

$$\int\_0^\infty |p\_{G\_N}(\mathbf{x}) - p\_F(\mathbf{x})| d\mathbf{x} \to 0.$$

An upper bound for the Kolmogorov and total variation distance is given in (5). Hence, *GN* converges in distribution to *F*. Next, we find the rate of convergence for an upper bound for these distances by using the bound (5). By using the change of variables log *x* = *z*, we find, from (36) and (40), that

$$\begin{split} d(G, F) &\leq \int\_{0}^{\infty} \left| p\_{F}(\mathbf{x}) - p\_{G}(\mathbf{x}) \right| d\mathbf{x} \\ &= \int\_{-\infty}^{\infty} \left| \frac{1}{\sqrt{2\pi}} e^{-\frac{z^{2}}{2}} - \frac{\sqrt{2N}}{2^{\frac{N}{2}} \Gamma\left(\frac{N}{2}\right)} e^{(\frac{N}{2}-1)\log N - \frac{N}{2}} \right| \\ &\times e^{(\frac{N}{2}-1)\log(1 - \sqrt{\frac{2}{N}}z) + \sqrt{\frac{N}{2}}z} \Big| dz. \end{split} \tag{48}$$

Using the Taylor expansion of log(1 − 2 *<sup>N</sup> z*), the right-hand side of (48) can be represented as

$$\begin{split} d(G,F) &\leq \left. \frac{1}{2} \int\_{-\infty}^{\infty} \left| \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} - \frac{\sqrt{2N}}{2^{\frac{\Delta}{2}} \Gamma\left(\frac{N}{2}\right)} e^{(\frac{N}{2}-1)\log N - \frac{N}{2}} \right| \\ &\times e^{-\frac{z^2}{2} + \sqrt{\frac{\Delta}{2}} z + o\_z(N^{-\frac{1}{2}})} \right| dz \\ &\leq \left. \frac{1}{2} \left| \frac{1}{\sqrt{2\pi}} - \frac{\sqrt{2N}}{2^{\frac{\Delta}{2}} \Gamma\left(\frac{N}{2}\right)} e^{(\frac{N}{2}-1)\log N - \frac{N}{2}} \right| \int\_{-\infty}^{\infty} e^{-\frac{z^2}{2}} dz \\ &\qquad + \frac{\sqrt{2N}}{2^{\frac{\Delta}{2} + 1} \Gamma\left(\frac{N}{2}\right)} e^{(\frac{N}{2}-1)\log N - \frac{N}{2}} \int\_{-\infty}^{\infty} e^{-\frac{z^2}{2}} \\ &\times \left| 1 - e^{\sqrt{\frac{\Delta}{2}} z + o\_z(N^{-\frac{1}{2}})} \right| dz \\ &=: B\_{1,N} + B\_{2,N} . \end{split}$$

From (46), it follows that

$$B\_{1,N} \le \frac{\mathbb{C}}{\sqrt{N}}.\tag{50}$$

Obviously,

$$B\_{2,N} \quad \le \quad \mathbb{C} \int\_{-\infty}^{\infty} e^{-\frac{z^2}{2}} \left| 1 - e^{\sqrt{\frac{2}{N}}z + \frac{z^2}{N} + \dots + o\_z(N^{-\frac{1}{2}})} \right| dz$$

$$\le \quad \frac{\mathbb{C}}{\sqrt{N}}.\tag{51}$$

From (50) and (51), we prove that the rate of convergence of the Kolmogorov and total variation distance between the laws of *F* and *GN* is of order <sup>√</sup><sup>1</sup> *N* .

#### *5.2. General Distance*

In this section, we consider general distances between the laws of *F* and *GN* defined by

$$d\,\mathcal{F}(\mathcal{G}\_N, \mathcal{F}) = \sup\_{f \in \mathcal{F}} \left| \mathbb{E}[f(\mathcal{G}\_N)] - \mathbb{E}[f(\mathcal{F})] \right| \,\!\!/ \tag{52}$$

where F is a class of functions defined on R. Depending on the choice of F , several types of distances can be defined. In addition to the Kolmogorov distance and total variation distance, the following distances can be obtained: for example, if F = { *f* : *f <sup>L</sup>* ≤ 1}, where ·*<sup>L</sup>* denotes the Lipschitz seminorm defined by

$$\|f\|\_{L} = \sup\left\{ \frac{f(\mathbf{x}) - f(y)}{|\mathbf{x} - y|} : \mathbf{x} \neq y \right\}.$$

then the distance in (52) is called *Wasserstein*. If F = { *f* : *f <sup>L</sup>* + *f* <sup>∞</sup> ≤ 1}, the Fortet-Mourier will be obtained. The rate of convergence of this distance can be found by using the bound given in Theorem 3. The drift coefficient of the associated diffusion is given by

$$a(\mathbf{x}) = \frac{2e^{\mathbf{m} + \frac{\mathbf{q}^2}{2}}}{p\_F(\mathbf{x})} \left[ \Phi\left(\frac{\log \mathbf{x} - m}{\sigma}\right) - \Phi\left(\frac{\log \mathbf{x} - m}{\sigma} - \sigma\right) \right],\tag{53}$$

where the function Φ denotes the distribution function of the standard Gaussian distribution. Let us set *G*¯*<sup>N</sup>* = *GN* − E[*GN*]. From (18) and (39), it follows that

$$\begin{aligned} &\quad \mathbb{E}\left[\left<-DL^{-1}\mathcal{G}\_{N}\mathcal{D}\mathcal{G}\right>\_{\mathcal{D}}\big|\mathcal{G}\_{N}\right] \\ &=\quad \frac{\int\_{G\_{N}}^{\infty}(y-m)p\_{G\_{N}}(y)dy}{p\_{G\_{N}}(\mathcal{G}\_{N})} \\ &=\quad \frac{\mathcal{G}\_{N}\int\_{G\_{N}}^{\infty}(y-m)\frac{\sqrt{2N}}{y}p\_{X\_{N}}\left(N-\sqrt{2N}\log y\right)dy}{\sqrt{2N}p\_{X\_{N}}\left(N-\sqrt{2N}\log G\_{N}\right)} \\ &=\quad \frac{\mathcal{G}\_{N}\int\_{-\infty}^{X\_{N}}(e^{-\frac{1}{\sqrt{2N}}(x-N)}-m)p\_{X\_{N}}(x)dx}{\sqrt{2N}p\_{X\_{N}}(X\_{N})},\end{aligned} \tag{54}$$

where *m* is the expectation of *GN* given by

$$m = e^{\sqrt{\frac{N}{2}}} \left( 1 + \sqrt{\frac{2}{N}} \right)^{-\frac{N}{2}}.$$

The right-hand side of (54) can be written as

$$\begin{split} & \quad \mathbb{E}\left[\left<-DL^{-1}\bar{G}\_{N}\,DG\right>\_{\mathcal{B}}|\bar{G}\_{N}\right] \\ &= \frac{e^{\sqrt{\frac{N}{2}}}G\_{N}\int\_{-\infty}^{X\_{N}}\left[e^{-\frac{1}{\sqrt{N}}}-\left(1+\sqrt{\frac{2}{N}}\right)^{-\frac{N}{2}}\right]p\_{X\_{N}}(x)dx}{\sqrt{2N}p\_{X\_{N}}(X\_{N})} \\ &= \frac{e^{\sqrt{\frac{N}{2}}}G\_{N}X\_{N}^{1-\frac{N}{2}}e^{-\frac{X\_{N}}{2}}}{\sqrt{2N}} \\ & \quad \times \int\_{0}^{X\_{N}}\left[e^{-\frac{1}{\sqrt{2N}}}-\left(1+\sqrt{\frac{2}{N}}\right)^{-\frac{N}{2}}\right]x^{\frac{N}{2}-1}e^{-\frac{x}{2}}dx \\ &= \frac{e^{\sqrt{\frac{N}{2}}}G\_{N}X\_{N}^{1-\frac{N}{2}}e^{-\frac{X\_{N}}{2}}}{\sqrt{2N}}\left\{\int\_{0}^{X\_{N}}x^{\frac{N}{2}-1}e^{-\frac{1}{2}(\sqrt{\frac{N}{N}}+1)x}dx \\ & \quad - \int\_{0}^{X\_{N}}\left(1+\sqrt{\frac{2}{N}}\right)^{-\frac{N}{2}}x^{\frac{N}{2}-1}e^{-\frac{x}{2}}dx\right\}. \end{split} \tag{55}$$

Using the change of variables <sup>2</sup> *<sup>N</sup>* + 1 *x* = *y*, we express the right-hand side of (55) as

$$\begin{split} \mathbb{E}\left[\left<-DL^{-1}\mathbb{G}\_{N},DG\right>\_{\mathfrak{H}}|\mathcal{G}\_{N}\right] &=& \frac{e^{\sqrt{\frac{N}{2}}}G\_{N}X\_{N}^{1-\frac{N}{2}}e^{\frac{X\_{N}}{2}}}(1+\sqrt{\frac{2}{N}})^{-\frac{N}{2}}\\ &\times \int\_{X\_{N}}^{(\sqrt{\frac{N}{2}}+1)X\_{N}}\mathbf{x}^{\frac{N}{2}-1}e^{-\frac{x}{2}}d\mathbf{x}.\end{split} \tag{56}$$

By using the expansion

$$(1 + \sqrt{\frac{2}{N}})^{-\frac{N}{2}} = e^{-\sqrt{\frac{2}{2}} + \frac{1}{2} - \frac{1}{3\sqrt{2N}} + o\left(N^{-\frac{1}{2}}\right)},$$

the right-hand side of (56) can be expressed as

$$\begin{split} \mathbb{E}\left[\left<-DL^{-1}\mathbb{G}\_{N},DG\right>\_{\mathfrak{H}}|\mathcal{G}\_{N}\right] &=& \frac{e^{\frac{1}{2}-\frac{1}{3\sqrt{2N}}+o(N^{-\frac{1}{2}})}G\_{N}X\_{N}^{1-\frac{N}{2}}e^{\frac{X\_{N}}{2}}}{\sqrt{2N}}\\ &\times \int\_{X\_{N}}^{(\sqrt{\frac{1}{N}}+1)X\_{N}}{\mathbf{x}^{\frac{1}{2}}-1}x^{\frac{N}{2}-1}e^{-\frac{\mathbf{x}}{2}}d\mathbf{x}. \end{split} \tag{57}$$

The change of variables *x*−*XN* 2 *<sup>N</sup> XN* = *z* shows that (57) is

$$\begin{split} \mathbb{E}\left[\left<-DL^{-1}\mathbb{G}\_{N},DG\right>\_{\mathfrak{H}}|\mathcal{G}\_{N}\right] &=& \frac{e^{\frac{1}{2}-\frac{1}{3\sqrt{2N}}+o\left(N^{-\frac{1}{2}}\right)}G\_{N}X\_{N} \\ &\times \int\_{0}^{1} \left(1+\sqrt{\frac{2}{N}}z\right)^{\frac{N}{2}-1}e^{-\frac{X\_{N}z}{\sqrt{2N}}}dz. \end{split} \tag{58}$$

The Taylor expansion of log 1 + 2 *N z* , 0 ≤ *z* ≤ 1, is given by

$$\log\left(1+\sqrt{\frac{2}{N}}z\right) = \sqrt{\frac{2}{N}}z - \frac{1}{N}z^2 + o(N^{-1}).\tag{59}$$

Applying this expansion (59) to a function 1 + 2 *N z N* <sup>2</sup> −1 , we have

$$\begin{split} \left(1+\sqrt{\frac{2}{N}}z\right)^{\frac{N}{2}-1} &= \ \_ {e}^{(\frac{N}{2}-1)\left(\sqrt{\frac{2}{N}}z-\frac{1}{N}z^{2}+o(N^{-1})\right)}\\ &= \ \_ {e}^{(\frac{N}{2}-\frac{z^{2}}{2}+No(N^{-1})} {e^{-\sqrt{\frac{2}{N}}z+o(N^{-\frac{1}{2}})}}.\end{split} \tag{60}$$

Substituting (60) into the integrand in (58) yields that

$$\begin{split} \mathbb{E}\left[\left<-DL^{-1}\mathbb{G}\_{N},DG\right>\_{\mathfrak{H}}|\mathcal{G}\_{N}\right] &=& \frac{e^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} G\_{N} X\_{N} \\ &\times \int\_{0}^{1} e^{-\frac{\xi^{2}}{2} - \frac{X\_{N}z}{\sqrt{2N}} + \sqrt{\frac{\Delta}{2}}z + o(N^{-\frac{1}{2}})} dz. \end{split} \tag{61}$$

From (36) and (53), the drift coefficient of diffusion is given by

$$\begin{split} \frac{1}{2}a(G\_N) &= \frac{e^{\frac{1}{2}}}{p\_F(G\_N)} \int\_{\log G\_N - 1}^{\log G\_N} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz \\ &= \sqrt{2\pi} e^{\frac{1}{2}} G\_N e^{\frac{1}{2} \left(\frac{1}{\sqrt{2N}}(X\_N - N)\right)^2} \int\_{-\frac{1}{\sqrt{2N}}(X\_N - N)}^{-\frac{1}{\sqrt{2N}}(X\_N - N)} \frac{1}{\sqrt{2\pi}} e^{-\frac{z^2}{2}} dz \\ &= \frac{e^{\frac{1}{2}} G\_N e^{\frac{1}{2N}(X\_N - N)^2}}{\sqrt{2N}} \int\_{X\_N}^{X\_N + \sqrt{2N}} e^{-\frac{(y - N)^2}{4N}} dz \\ &= \frac{e^{\frac{1}{2}} G\_N}{\sqrt{2N}} \int\_{X\_N}^{X\_N + \sqrt{2N}} e^{-\frac{(y - X\_N)^2}{4N} - \frac{(X\_N - N)(y - X\_N)}{2N}} dy. \tag{62} \end{split}$$

The use of the change of variables (*y*−*XN*) <sup>√</sup>2*<sup>N</sup>* <sup>=</sup> *<sup>z</sup>* makes the right-hand side of (62) equal to

$$\frac{1}{2}a(G\_N) \quad = \quad e^{\frac{1}{2}}G\_N \int\_0^1 e^{-\frac{z^2}{2} - \frac{X\_N z}{\sqrt{2N}} + \sqrt{\frac{k}{2}}z} dz. \tag{63}$$

From (61) and (63), we write <sup>1</sup> <sup>2</sup> *<sup>a</sup>*(*GN*) <sup>−</sup> *gG*¯*<sup>N</sup>* (*G*¯*N*) = *<sup>D</sup>*1,*<sup>N</sup>* <sup>+</sup> *<sup>D</sup>*2,*<sup>N</sup>* <sup>+</sup> *<sup>D</sup>*3,*N*, where

$$\begin{array}{rcl} D\_{1,N} & = & e^{\frac{1}{2}\left(1 - e^{-\frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})}\right)} G\_N \\ & & \times \int\_0^1 e^{-\frac{z^2}{2} - \frac{X\_N z}{\sqrt{2N}} + \sqrt{\frac{N}{2}}z} dz, \\ D\_{2,N} & = & e^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} G\_N \left(1 - - \frac{X\_N}{N}\right) \\ & & \times \int\_0^1 e^{-\frac{z^2}{2} - \frac{X\_N z}{\sqrt{2N}} + \sqrt{\frac{N}{2}}z} dz, \\ D\_{3,N} & = & e^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} G\_N \frac{X\_N}{N} \\ & & \times \int\_0^1 e^{-\frac{z^2}{2} - \frac{X\_N z}{\sqrt{2N}} + \sqrt{\frac{N}{2}}z} (1 - e^{o(N^{-\frac{1}{2}})}) dz. \end{array}$$

**Lemma 2.** *For every x* > 0*, we have*

$$\mathbb{E}[G\_N^x] = e^{\frac{x^2}{2} + o\_\pi(N^{-\beta})}.\tag{64}$$

*where* 0 < *β* < <sup>1</sup> 2 *.*

**Proof.** We write *GN* = *e N* <sup>2</sup> × *e* − √ *XN* <sup>2</sup>*<sup>N</sup>* , where *XN* ∼ Γ( *<sup>N</sup>* <sup>2</sup> , <sup>1</sup> <sup>2</sup> ). Hence,

$$\begin{split} \mathbb{E}[G\_N^x] &= \ & \epsilon^x \sqrt{\frac{N}{2}} \mathbb{E} \left[ \epsilon^{-\frac{x}{\sqrt{2N}}X\_N} \right] \\ &= & \epsilon^x \sqrt{\frac{N}{2}} \left( 1 + \frac{2x}{\sqrt{2N}} \right)^{-\frac{N}{2}} . \end{split} \tag{65}$$

Since

$$\log\left(1+\frac{2\chi}{\sqrt{2N}}\right) = \frac{2\chi}{\sqrt{2N}} - \frac{2\chi^2}{2N} + o\_\chi(N^{-\alpha}), \text{ a } < \frac{3}{2}\chi$$

we have

$$\begin{split} \left(1+\frac{2\chi}{\sqrt{2N}}\right)^{-\frac{N}{2}} &= \, \_e e^{-\frac{N}{2}\log\left(1+\frac{2\chi}{\sqrt{2N}}\right)}\\ &= \, \_e e^{-\chi\sqrt{\frac{N}{2}}+\frac{\chi^2}{2}+o\_x(N^{-\beta})}, \; 0 < \beta < \frac{1}{2}. \end{split} \tag{66}$$

Substituting (66) into (65) proves this lemma.

Next, we estimate E[|*Dk*,*N*|], *k* = 1, 2, 3. The Cauchy–Schwartz inequality and Lemma 2 give the estimate

$$\begin{split} \mathbb{E}[|D\_{1,N}|] &\quad \leq \quad e^{\frac{1}{2}} |1 - e^{-\frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})}| \sqrt{\mathbb{E}[G\_N^2]} \\ &\quad \times \left( \int\_0^1 e^{-z^2} \mathbb{E}\left[G\_N^{2\varepsilon}\right] dz \right)^{\frac{1}{2}} \\ &\leq \quad \frac{\varepsilon}{3\sqrt{2N}} (1 + o(1)) e^{o(1)} \leq \frac{c}{\sqrt{N}}. \end{split} \tag{67}$$

By Hölder inequality and Lemma 2, we have

$$\begin{split} \mathbb{E}[|D\_{2N}|] &\quad \leq \quad \epsilon^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} (\mathbb{E}[G\_{N}^{3}])^{\frac{1}{3}} \frac{\mathbb{E}[|N - X\_{N}|^{3}] \frac{1}{\lambda}}{N} \\ &\quad \times \Big( \int\_{0}^{1} e^{-\frac{3z^{2}}{2}} \mathbb{E}[G\_{N}^{3z}] dz \Big)^{\frac{1}{3}} \\ &\leq \quad \epsilon^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} e^{\frac{3}{2} + o(N^{-\beta})} \Big( \mathbb{E}\Big[\Big(\frac{N - X\_{N}}{\sqrt{2N}}\Big)^{4} \Big] \Big)^{\frac{1}{2}} \sqrt{\frac{2}{N}} \\ &\quad \times \Big( \int\_{0}^{1} e^{3z^{2} + o\_{\mathrm{c}}(N^{-\beta})} dz \Big)^{\frac{1}{3}} \\ &\leq \quad \epsilon^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} e^{\frac{3}{2} + o(N^{-\beta})} \Big( 3 + \frac{12}{N} \Big)^{\frac{1}{4}} \sqrt{\frac{2}{N}} e^{1 + o(N^{-\beta})} \\ &\leq \quad \frac{c}{\sqrt{N}}. \end{split} \tag{68}$$

Similarly,

$$\begin{split} \mathbb{E}[|D\_{3,N}|] &\quad \leq \quad e^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} \left( \mathbb{E}[G\_N^3] \right)^{\frac{1}{2}} \frac{\left( \mathbb{E}[|X\_N|^3] \right)^{\frac{1}{2}}}{N} \\ &\quad \times \left( \int\_0^1 e^{-\frac{3\sigma^2}{2}} \mathbb{E}[G\_N^{3z}] dz \right)^{\frac{1}{2}} |1 - e^{o(N^{-\frac{1}{2}})}| \\ &\leq \quad e^{\frac{1}{2} - \frac{1}{3\sqrt{2N}} + o(N^{-\frac{1}{2}})} e^{\frac{3}{2} + o(N^{-\beta})} (1 + o(1)) \\ &\quad \times e^{1 + o(N^{-\beta})} |1 - e^{o(N^{-\frac{1}{2}})}| \\ &\leq \quad \frac{c}{\sqrt{N}}. \end{split}$$

Combining the bounds in (67), (68) and (69), we obtain

$$\begin{split} \left| \mathbb{E} \left[ f(\mathcal{G}\_{N}) - f(F) \right] \right| &\leq \quad \left| \| h\_{f}' \|\\_\infty \mathbb{E} \left[ \left| \frac{1}{2} a(\mathcal{G}\_{N}) - \mathcal{g}\_{\mathcal{G}\_{N}}(\mathcal{G}\_{N}) \right| \right] \right| \\ &\leq \quad \frac{c}{\sqrt{N}}. \end{split} \tag{70}$$

Therefore, we find that the rate of convergence of the general distance is of order <sup>√</sup><sup>1</sup> *N* .

#### **6. Conclusions and Future Works**

When a random variable *F* follows the invariant measure that admits a density and a differentiable random variable *G* in the sense of Malliavin allows a density function, this paper derives an upper bound on several probabilistic distances (e.g., Kolmogorov distance, total variation distance, Wasserstein distance, and Forter–Mourier distance, etc.) between the laws of *F* and *G* in terms of two densities. Among these distances, it is well known that the upper bound of the Kolmogorov and total variation distance can be easily expressed in terms of densities. The significant feature of our works is to show that the bounds of distances other than the two distances mentioned above can be expressed in some form of two density functions. An insight into the main result of this study is that it is possible by applying our results to express an upper bound for the distance of two distributions in terms of two density functions even when it is difficult to express the distance as a density function of two distributions.

Future works will be carried out in two directions: (1) Using the results worked in this paper, we plan to conduct a study on the upper bound that is more rigorous than the results obtained in the papers [15,17]. (2) In the case when *G* is a random variable belonging to a fixed Wiener chaos, we will prove the fourth moment theorem by using the bound obtained in this paper.

**Author Contributions:** Conceptualization, Y.-T.K. and H.-S.P.; Methodology, H.-S.P.; Writing original draft, H.-S.P.; Writing—review and editing, H.-S.P.; Funding acquisition, H.-S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Hallym University Research Fund (HRF-202209-001).

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We are very grateful to the anonymous referees for their suggestions and valuable advice.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **On Structured Random Matrices Defined by Matrix Substitutions**

**Manuel L. Esquível 1,\* and Nadezhda P. Krasii <sup>2</sup>**


**Abstract:** The structure of the random matrices introduced in this work is given by deterministic matrices—the skeletons of the random matrices—built with an algorithm of matrix substitutions with entries in a finite field of integers modulo some prime number, akin to the algorithm of one dimensional automatic sequences. A random matrix has the structure of a given skeleton if to the same number of an entry of the skeleton, in the finite field, it corresponds a random variable having, at least, as its expected value the correspondent value of the number in the finite field. Affine matrix substitutions are introduced and fixed point theorems are proven that allow the consideration of steady states of the structure which are essential for an efficient observation. For some more restricted classes of structured random matrices the parameter estimation of the entries is addressed, as well as the convergence in law and also some aspects of the spectral analysis of the random operators associated with the random matrix. Finally, aiming at possible applications, it is shown that there is a procedure to associate a canonical random surface to every random structured matrix of a certain class.

**Keywords:** random fields; random Matrices; random linear operators; notions of recurrence; symbolic dynamics; automata sequences

**MSC:** 60G60; 60B20; 47B80; 37B20; 37B10; 11B85

### **1. Introduction**

Let us start with some motivations. A generic problem in *Big Data* analysis may have as a starting point a large matrix having columns to represent the questions and the lines to represent the subject's answers (see [1], p. 28). The typical observed matrix may appear to be random. The questions can admit answers that can be either categorical—and so can be modelled by random variables taking values in a finite set—or be quantitative and be modelled by random variables taking values in some set of numbers; in this case, we can also have random variables taking values in a finite set by consider a partition in intervals of the range of the real valued random variables. A natural generic question about these matrices is to determine the existence of a possible structure of the matrix. One initial idea, to better understand this line of problems, is to build matrices with random entries but with a prescribed structure and try to recover this structure by means of some statistical tests or by the spectral analysis of the matrix. These ideas give a practical motivation for this study.

Let us situate our work in the context of the subject of *substitutions*. The analysis of *scalar* or *string* substitutions so to say, is a widely studied subject for which [2,3] are comprehensive references. Important results in the subject of *substitutions* are to be found also under the denomination of *automated sequences*, for instance in [4,5]. To the best of our present knowledge, the study of matrix valued substitutions has received no special attention in the literature. In this work, we propose a first approach to this topic. There

**Citation:** Esquível, M.L.; Krasii, N.P. On Structured Random Matrices Defined by Matrix Substitutions. *Mathematics* **2023**, *11*, 2505. https:// doi.org/10.3390/math11112505

Academic Editor: Luca Gemignani

Received: 30 March 2023 Revised: 16 May 2023 Accepted: 16 May 2023 Published: 29 May 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

has been work in multidimensional substitutions but in a different perspective than the adopted here that can be studied in [6–8] and in the chapter by J. Peyriére in [9] and other references therein.

An important starting point of the study of spectral statistics of random matrices is the work [10]. In it, the author focuses on three ensembles of asymmetric Gaussian random matrices derived from the Gaussian Orthogonal, Gaussian Unitary and Gaussian Symplectic random matrix ensembles by relaxing the Hermitian character. The three sets of matrices have a common Gaussian probability measure but they exhibit profound differences in their spectral patterns, differences that are qualitatively described in this work although the quantitative description was further improved by other authors. The difficult study of generic properties of random matrices related to the spectral analysis has received much attention in recent years as perfectly demonstrated in the following works: [11–18]. Readable introductions to the subject are presented in [19–25].

For a remarkable general formulation of the circular law that is most useful for our purposes we will refer the following result that conveys the flavour of an universality result that may be a relevant guide for the statistical analysis of possible existing particular types of structure in large observed matrices.

**Theorem 1** (Circular law, Tao and Wu [22])**.** *If Mn is a n* × *n matrix with entries that are independent identically distributed with a complex centred and standardised random variable. Then, given,*

$$\mu\_{\frac{M}{\sqrt{n}}}(\mathfrak{x},\mathfrak{y}) := \frac{1}{n} \# \{ 1 \le i \le n : \Re \lambda\_i \le \mathfrak{x} \text{ } \Re \lambda\_i \le \mathfrak{y} \} \text{ } \mathfrak{x}$$

*the empirical spectral distribution of the eigenvalues λ<sup>i</sup> of* (1/ <sup>√</sup>*n*)*Mn, we have that the sequence* (*μ* <sup>√</sup> *M n* (*x*, *y*))*n*≥<sup>1</sup> *converges to the uniform measure on the unit disc given by:*

$$d\mu\_{\text{scalar}}(\mathfrak{x}, \mathfrak{y}) = \frac{1}{\pi} \mathbf{1}\_{\{|\mathfrak{x}|^2 + |\mathfrak{y}|^2 \le 1\}}(\mathfrak{x}, \mathfrak{y}) d\mathfrak{x} d\mathfrak{y} \dots$$

We stress that until this optimal formulation was reached, several other technically involved formulations were obtained attesting the intrinsic difficulty of the subject, displayed in the works on the subject first referred above. Let us quote Terence Tao for a synthesis of the recent short history of the subject: *A rigorous proof of the circular law was then established by Bai, assuming additional moment and boundedness conditions on the individual entries. These additional conditions were then slowly removed in a sequence of papers by Gotze–Tikhimirov, Girko, Pan–Zhou, and Tao–Vu* .

We now refer to recent developments in the study of random matrices having some structure, the main topic that is dealt with in the present work, in particular results on the spacing distribution, on invertibility, and appearance of large structures and on the spectral analysis of these random matrices. These works may give an idea of the amount of exploratory work needed in the subject of random matrices with structure.

In [26], the authors consider four specific sparse patterned random matrices, namely the Symmetric Circulant, Reverse Circulant, Toeplitz, and the Hankel matrices. The entries are assumed to be Bernoulli with success probability linearly decreasing to zero. The moment approach is used to show that the expected empirical spectral distribution converges weakly for all these sparse matrices. The work in [27] is a complementary reference where the author investigates the existence and properties of the limiting spectral distribution of different patterned random matrices as the dimension grows. The method of moments and normal approximation with some combinatorics is used to deal with the Wigner matrix, the sample covariance matrix, the Toeplitz matrix, the Hankel matrix, the sample auto-covariance matrix, and the k-Circulant matrices.

In [28], a bound on the growth of the smallest singular value is found for random matrices with independent uniformly anti-concentrated entries with no restrictions on the null mean or identical distribution of the entries. The result obtained covers inhomogeneous matrices with different variances of the entries as long as the sum of second moments has sub-quadratic growth with the order of the matrix. Following this work, the reference [29] extends the results of Tao and Vu and Krishnapur on the universality of empirical spectral distributions to a class of inhomogeneous complex random matrices where the entries are linear images of standardised independent random variables satisfying a lower bound and Pastur's condition. The proof uses an anti-concentration for sums of non-identically distributed independent complex random variables.

In [30], the semicircle law is established for a sequence of random symmetric matrices that may be considered as adjacency matrices of random graphs; the random matrices have independent entries given by the product of independent standardised random variables, the weight of the edges, with Bernoulli random variables that gives the probability of the edge. The empirical distribution of the eigenvalues of the normalised random matrix converges in the Kolmogorov distance to the distribution function of the semicircle law under boundedness and average conditions.

The work [31] deals with random ray pattern matrices that is matrices for which each of its nonzero entries has modulus one. A ray pattern matrix corresponds to a weighted digraph. A random model of ray pattern matrices with order *n* is introduced, where a uniformly random ray pattern matrix is defined to be the adjacency matrix of a simple random digraph whose arcs are weighted with i.i.d. random variables uniformly distributed over the unit circle in the complex plane. In this paper, it is shown that the threshold function for a random ray pattern matrix to be ray nonsingular is 1/*n*. This function is also a threshold function for the property that giant strong components appear in the simple random digraph.

The work [32] deals with patterned random matrices which are real symmetric with substantially less independent entries than in real symmetric matrices. The main results are the calculation of spacing distribution for order three matrices deriving the distributions analytically. As expected, spacing distribution displays a range of behaviours based on the structural constraints imposed on the matrices.

In this work, we propose and study an algorithm to build sequences of random matrices, with independent entries, that have a built in structure. Furthermore, we explore some aspects of this kind of random matrices related to identification, spectral analysis, and an idea for applications. An overview of the content of this work is now detailed.


#### **2. Structured Matrices Built by Substitutions**

We start by presenting two examples of an algorithm to build sequences of arbitrary large matrices with entries in a finite set. For technical reasons we suppose that the entries of the structured matrices take values in some finite field, for instance:

$$\mathbb{Z}\_p = \mathbb{Z}/p\mathbb{Z} = \{0, 1, 2, \dots, p-1\}\_{\text{-}p}$$

*p* being a prime number. The identification of the entries of the matrix as elements of *<sup>p</sup>* matters, essentially for the matrix substitution procedure used to build these structured matrices. Further ahead we will also consider that the entries of the matrix represent integer real numbers.

We will proceed to show, in Section 3, that in a certain class of matrix substitution maps we define, namely the affine matrix substitution maps, every such map admits either a fixed point or a periodic point.

#### *2.1. A Matrix Sequence Built by Iterated Application of a Matrix Substitution*

In the following examples, we suppose that the matrices entries take values in the field <sup>3</sup> = {0, 1, 2}. We now consider an example of a sequence of matrices with a structure defined by substitutions. The main idea of the construction of this sequence of matrices is the following. We start with some initial matrix *M*0. The second matrix in the sequence, the matrix *M*1, is obtained by replacing each term of the *M*<sup>0</sup> matrix by the matrices given by *σ*0, *σ*1, *σ*<sup>2</sup> according to the entry of *M*<sup>0</sup> we are replacing is, respectively, 0, 1, 2.

$$M\_0 = \begin{pmatrix} 2 & 0 & 1 \\ 1 & 2 & 1 \\ 1 & 0 & 2 \end{pmatrix} \ \sigma\_0 = \begin{pmatrix} 0 & 1 & 2 \\ 1 & 1 & 2 \\ 2 & 0 & 1 \end{pmatrix} \ \ \sigma\_1 = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 1 & 0 & 1 \end{pmatrix} \ \ \sigma\_2 = \begin{pmatrix} 1 & 2 & 2 \\ 0 & 1 & 2 \\ 0 & 0 & 1 \end{pmatrix} \ . \tag{1}$$

In Section 3 we present a formal description of this procedure in a more general case. With this algorithm we have that,

$$M\_1 = \begin{pmatrix} 1 & 2 & 2 & 0 & 1 & 2 & 1 & 0 & 0 \\ 0 & 1 & 2 & 1 & 1 & 2 & 0 & 2 & 0 \\ 0 & 0 & 1 & 2 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 2 & 2 & 1 & 0 & 0 \\ 0 & 2 & 0 & 0 & 1 & 2 & 0 & 2 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 & 2 & 1 & 2 & 2 \\ 0 & 2 & 0 & 1 & 1 & 2 & 0 & 1 & 2 \\ 1 & 0 & 1 & 2 & 0 & 1 & 0 & 0 & 1 \end{pmatrix} \tag{2}$$

and also,


#### *2.2. A Matrix Sequence Built by Kronecker Power Iterations*

An apparently different way of building substitution structured matrices is by means of Kronecker powers of an initially given matrix that we now illustrate. The initial matrix is given by:

$$R\_0 = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 2 \end{pmatrix}.$$

The sequence of the matrices taking values in <sup>3</sup> = {0, 1, 2} is defined by induction for *n* + 1 by taking the Kronecker product of the matrix for index *n* with *R*<sup>0</sup> modulo 3 to keep the entries of the matrix in 3, that is,

$$
\mathbb{R}\_{n+1} := [\mathbb{R}\_n \otimes \mathbb{R}\_0] \pmod{3}.
$$

So, the second matrix of the sequence is,

$$R\_1 = \begin{pmatrix} 1 & 2 & 0 & 2 & 1 & 0 & 0 & 0 & 0 \\ 0 & 2 & 2 & 0 & 1 & 1 & 0 & 0 & 0 \\ 2 & 0 & 1 & 1 & 0 & 2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 2 & 1 & 0 & 2 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 1 & 0 & 2 & 1 & 0 & 2 \\ 2 & 1 & 0 & 0 & 0 & 0 & 1 & 2 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 & 0 & 2 & 2 \\ 1 & 0 & 2 & 0 & 0 & 0 & 2 & 0 & 1 \end{pmatrix} /$$

and the third matrix of the sequence is:


.

**Remark 1** (Kronecker power matrices are matrix substitutions)**.** *We observe that the above example of a Kronecker power matrix sequence corresponds to a special kind of substitution, the linear matrix substitution (see Definition 3 ahead). In fact, the algorithm for building a Kronecker power series of matrices is given by the substitutions in the sense of Section 2.1 with the matrices σ*0, *σ*<sup>1</sup> *and σ*<sup>2</sup> *defined by:*

$$
\sigma\_0 = \begin{pmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{pmatrix}
\
\sigma\_1 = R\_0 = \begin{pmatrix} 2 & 1 & 0 \\ 0 & 1 & 1 \\ 1 & 0 & 2 \end{pmatrix}
\
\sigma\_2 = \begin{pmatrix} 1 & 2 & 0 \\ 0 & 2 & 2 \\ 2 & 0 & 1 \end{pmatrix}.
$$

*This is a consequence of the fact that computing a Kronecker power sequence starting with the matrix R*<sup>0</sup> *is equivalent to computing a matrix substitution given by:*

*σ*<sup>0</sup> = (0 · *R*<sup>0</sup> mod 3) = **0**3×3, *σ*<sup>1</sup> = (1 · *R*<sup>0</sup> mod 3) = *R*0, *σ*<sup>2</sup> = (2 · *R*<sup>0</sup> mod 3).

We observe that the two kinds of substitutions give rise to different structured matrices. For instance, the distribution of the absolute values of the eigenvalues—in , that is, supposing that the entries are complex—of the seventh iteration of substitutions for these two types of matrix substitutions are different and is shown, as histograms in Figure 1

**Figure 1.** Histogram of absolute values of the eigenvalues of the structured matrices *R*<sup>7</sup> and *M*7.

Another significant difference between the two constructions is noticeable in the form of the dispersion, in the plane, of the eigenvalues that can be seen in Figure 2.

**Figure 2.** Dispersion or real and imaginary parts of eigenvalues of *R*<sup>7</sup> and *M*7.

**Remark 2.** *The dispersion of eigenvalues observed in Figure 2 is to be compared to the dispersion of samples of randomised matrices of both kinds, Kronecker and simple, presented in Figure 3 ahead. It is as if the general structure of this dispersion remains despite the randomisation, at least whenever the variance of the random variables is small. This leads to conjecture that it may be important to determine the spectral distribution of the substitution matrices in order to infer for the spectral distribution of the randomised matrices.*

**Figure 3.** Eigenvalues distribution in of a sample of 40 matrices with affine substitution induced structure and increasing variance

#### **3. On the Fixed Points of Affine Matrix Substitutions**

In this Section we present fixed point theorems for *affine matrix substitutions* . The work here presented rests upon a procedure to build sequences of structured matrices, by means of matrix substitutions. In order for such matrices to be a usable model, subject to observation, some stable resulting structure should result from the procedure. Our view is that this stable structure should be either a fixed point or at least a periodic point of a map on some space of matrices. We opt to consider spaces of infinite matrices. A general and historic approach to the subject of infinite matrices is given in [33]. A more recent account of important results on this subject is given in [34]. Furthermore, a flavour of a specific kind of problems can be read in [35]. The perspective of considering an infinite matrix as a linear operator on some Banach space of power summable sequences is exploited in the reference book [36] in which the concept of band-dominated operators, corresponding to operators that are limits of operators defined by infinite matrices with a finite number of non-null lines and columns, plays an important role. A particular case of this concept is of crucial importance in our work to prove the existence of a particular kind of observable fixed point.

To begin with we define some spaces of finite and infinite matrices with entries in *p*.

#### *3.1. Some Spaces of Matrices*

Let us briefly describe the setting. For simplicity, let *p* be a prime number and let *<sup>p</sup>* = {0, 1, ... , *p* − 1} be the finite field with #*<sup>p</sup>* = *p*. The set *<sup>p</sup>* may be though as the alphabet when the perspective of finite automata is adopted or, in the context of Big Data the set that codifies the possible answers. We next define the space of infinite matrices with entries in the field *p*.

$$\mathcal{M}\_{+\infty} := \left\{ M = \left[ a\_{\vec{l}\vec{j}} \right]\_{\vec{i}, \vec{j} \ge 1} : a\_{\vec{i}\vec{j}} \in \mathbb{Z}\_p \right\} = \mathbb{Z}\_p^{(\mathcal{N}\backslash\{0\} \times \mathcal{N} \backslash \{0\})} \,. \tag{3}$$

We have that M+<sup>∞</sup> is a vectorial space over the field *p*. Let M<sup>0</sup> be a particular subspace of M+<sup>∞</sup> which may be identified to a set of finite square matrices if all infinite parts of rows and infinite parts of columns having as entries only 0 ∈ *<sup>p</sup>* are discarded, that is:

$$\mathcal{M}\_0 := \left\{ M = \left[ a\_{i\bar{j}} \right]\_{i,j \ge 1} \in \mathcal{M}\_{+\infty} : \exists n \ge 1 \,\forall i, j \ge n \,\, a\_{i\bar{j}} = 0 \right\} \dots$$

We have that M<sup>0</sup> is a vectorial subspace of M+<sup>∞</sup> and we observe that *M* ∈ M<sup>0</sup> can have null lines and columns. We now decompose M<sup>0</sup> by observing that for each *M* ∈ M<sup>0</sup> there always exists *nM*, the first integer *n* ≥ 1 such that for all *i*, *j* > *nM* we have that *aij* = 0. Using this property, let us define M# *<sup>n</sup>*×*<sup>n</sup>* = M# *<sup>n</sup>*×*n*(*p*) ⊂ M<sup>0</sup> as:

$$\mathcal{M}\_{n \times n}^{\#} := \left\{ \mathcal{M} = \left[ a\_{\overline{i}\overline{j}} \right]\_{i,j \ge 1} : \exists n \ge 1, \left( \exists \mathbf{i}, a\_{\mathrm{in}} \ne \mathbf{0} \lor \exists j, a\_{\mathbf{n}\overline{j}} \ne \mathbf{0} \right) \land \left( \forall \mathbf{i}, j > n \text{ } a\_{\mathbf{i}\overline{j}} = \mathbf{0} \right) \right\}.$$

that is, M# *<sup>n</sup>*×*<sup>n</sup>* is a subset of M<sup>0</sup> of infinite square matrices having a leading principal matrix of exact order *n* such that neither the column or the line of order *n* have all its entries equal to zero and such that all columns or rows of order greater or equal to *n* + 1 have only zero entries. M# *<sup>n</sup>*×*<sup>n</sup>* is not a subspace as the sum of two matrices in M# *<sup>n</sup>*×*<sup>n</sup>* may be an element of M# *<sup>n</sup>*−1×*n*−<sup>1</sup> by the fact that the entries belong to *<sup>p</sup>* and the sum is to be computed modulus *p*. We then may define:

$$\mathcal{M}\_{n \times n}(\mathbb{Z}\_p) = \mathcal{M}\_{n \times n} := \bigcup\_{1 \le k \le n} \mathcal{M}\_{k \times k}^\# \,. \tag{4}$$

which is a vectorial space of infinite matrices over *p*, a subset of M0, defined in such a way such that the decomposition is of partition type, and that we have,

$$\mathcal{M}\_0 = \bigcup\_{n \ge 1} \mathcal{M}\_{n \times n}(\mathbb{Z}\_p) \,. \tag{5}$$

We now introduce a sequence of infinite matrices associated with a given matrix substitution map. This sequence will be obtained by operating substitutions either on the finite matrix corresponding to the leading principal matrix of the infinite matrix or, directly, on the infinite matrix.

**Definition 1** (Matrix substitution map)**.** *The matrix substitution map associated with matrix substitution rules is defined in the following sequence of steps.*


$$\forall j \in \mathbb{Z}\_p \; \sigma(j) = \sum\_{k=0}^{p-1} \sigma\_k \mathbb{1}\_{\{k=j\}}(j) \; \; \prime \tag{6}$$

*We now have an associated finite matrix substitution map denoted by* Φ<∞*<sup>σ</sup> defined by:*

$$\forall A = \left[a\_{\bar{i},\bar{j}}\right]\_{1 \le \bar{i},\bar{j} \le r} \in \mathcal{M}\_{n \times n}^{<\infty} \; \Phi\_{\sigma}^{<\infty}(A) = \left[\sigma(a\_{\bar{i},\bar{j}})\right]\_{1 \le \bar{i},\bar{j} \le r} \in \mathcal{M}\_{d \cdot n \times d \cdot n}^{<\infty} \; . \tag{7}$$


$$\forall n \ge 0 \quad M\_{m+1} = \Phi\_{\sigma}^{<\infty}(M\_m) \; , M \in \mathcal{M}\_{d \cdot n \times d \cdot n}^{<\infty} \; ; \; M\_{m+1} = \Phi\_{\sigma}(M\_m) \; ; \; M \in \mathcal{M}\_{+\infty} \; . \tag{8}$$

**Remark 3** (A substantiation for operating on finite order matrices)**.** *The procedure of applying matrix substitutions to the leading principal matrix of the infinite matrices is designed to overcome the restriction of having σ*<sup>0</sup> *always equal to the null matrix with only* 0 ∈ *<sup>p</sup> entries.*

**Remark 4** (Generalisations and open problems)**.** *It is possible to generalize this procedure in several ways. For instance, we could have two different matrix substitution maps applied successively. There are several interesting problems under the perspective of this setting.*


In this Section we consider the existence of fixed points of matrix substitution maps both for matrices in M+<sup>∞</sup> and in M0.

3.2.1. Fixed Points for Matrix Substitution Maps over Infinite Matrices

Let us first deal with fixed points in M+<sup>∞</sup> (see the definition in Formula (3)) of a linear matrix substitution map Φ*σ*. We consider the definition of a matrix substitution map given in Definition 1 for matrices in the space of infinite matrices M+∞. For infinite matrices we will show that a matrix substitution map defined on M+<sup>∞</sup> may be seen as a usual substitution of constant length on a finite set in the sense of ([3], p. 87).

**Theorem 2** (On the existence of fixed points for infinite matrices)**.** *Let σ* : *<sup>p</sup>* \$→ M<<sup>∞</sup> *<sup>d</sup>*×*d*(*p*) *be a global substitution taking values in a space of finite matrices, of order d, with entries in p, and let* Φ*<sup>σ</sup> be the associated matrix substitution map defined on* M+∞*. Then, there exists an integer ρ and M* ∈ M+<sup>∞</sup> *such that,*

$$M = \Phi^{\rho}\_{\sigma}(M) := \underbrace{\Phi\_{\sigma} \circ \Phi\_{\sigma} \circ \cdots \circ \Phi\_{\sigma}}\_{\rho \text{ times}}(M) \; , \; \rho$$

*that is, M is a fixed point for the matrix substitution map* Φ*<sup>ρ</sup> <sup>σ</sup>*(*M*) *defined for M* ∈ M+∞*.*

**Proof.** We will show that to each matrix substitution map there corresponds a univocal substitution map in the usual sense and then, we will apply a well known result that guarantees the existence of fixed points for usual substitution maps (see [3], pp. 87–88). We first observe that given *s* = [*sij*]1≤*i*,*j*≤*<sup>d</sup>* a *d* × *d* matrix with entries in *<sup>p</sup>* we have an enumeration of these entries given by (*sk*)*k*=1,...,*d*<sup>2</sup> with:

$$s\_{i\bar{j}} = \mathbb{S}\_{(i-1)d+j} = \mathbb{S}\_k\ .$$

This type of enumeration of a finite matrix will be applied to to the matrices of the substitutions *σ<sup>k</sup>* in order to convert the matrix *σ<sup>k</sup>* in a *word* of length constituted by letters taken from *p*. The reversion of this enumeration works as follows. Given a finite word having *d*<sup>2</sup> letters we associate to it a *d* × *d* square matrix having as its first line the first *d* letters of the word, as its second line the letters of order *d* + 1 to 2*d*<sup>2</sup> and so on and so forth. It is clear that applying the enumeration and then the reversion gives the initial matrix.

Next, we have that given an infinite matrix, *M* = [*mij*]*i*,*j*≥<sup>1</sup> with entries in *<sup>p</sup>* we have an enumeration of these entries given by (*ml*)*l*≥<sup>1</sup> with:

$$m\_{i\bar{j}} = \bar{m}\_{\frac{(i+j-1)(i+j-2)}{2} + i} = \bar{m}\_I \dots$$

This second type of enumeration will be applied to convert an infinite matrix with entries in *<sup>p</sup>* in an infinite *word*. Again, let us detail how the reversion of this enumeration process works. Take an infinite word and consider the associated infinite matrix as follows: the first letter of the word is the first entry of the matrix; the second and the third letters of the word give the first diagonal, just below the first entry, in the direction up-down; the forth, fifth, and sixth letters of the word give the second diagonal, just below the first diagonal, in the direction up–down and so on and so forth. It is clear also that applying the second enumeration and then this reversion process gives the initial matrix. Now, take the global matrix substitution rule *σ* that replaces each *k* ∈ *<sup>p</sup>* by the *d* × *d* matrix *σk*. Consider the associated words *σ<sup>k</sup>* with letters in *<sup>p</sup>* obtained by applying the first enumeration to the matrices *σk*. Take an infinite matrix *M* with entries in *<sup>p</sup>* and apply the second enumeration rule to *<sup>M</sup>* to obtain an infinite word *<sup>M</sup>* = (*ml*)*l*≥1; we may define first an usual substitution rule *<sup>σ</sup>* on *<sup>p</sup>* by *σ*(*k*) = *σ<sup>k</sup>* and also an usual word substitution map <sup>Φ</sup> *<sup>σ</sup>* on the set of infinite words built with letters in *<sup>p</sup>* by:

$$\Phi\_{\sigma}(\mathcal{M}) = (\Phi\_{\sigma}(\tilde{m}\_l))\_{l \ge 1 \ne l}$$

which is an infinite word obtained from the infinite word *<sup>M</sup>* by replacing each one of its letters *<sup>k</sup>* <sup>∈</sup> *<sup>p</sup>* by the correspondent word *σk*. Recall Proposition V.1 in ([3], p. 88) that guarantees the existence of some infinite word *<sup>M</sup>* and some integer *<sup>ρ</sup>* such that:

$$\tilde{\Phi}^{\rho}\_{\sigma}(\tilde{M}) = \underbrace{\tilde{\Phi}\_{\sigma} \circ \tilde{\Phi}\_{\sigma} \circ \cdots \circ \tilde{\Phi}\_{\sigma}}\_{\rho \text{ times}}(M) = \tilde{M} \llcorner$$

and consider the infinite matrix *M* such that the second type of enumeration applied to it returns *<sup>M</sup>*. It is clear that if we apply the second enumeration process to <sup>Φ</sup>*<sup>ρ</sup> <sup>σ</sup>*(*M*) we obtain Φ*ρ <sup>σ</sup>*(*M* ) which is equal to *<sup>M</sup>* and by reverting the enumeration process on *<sup>M</sup>* we finally obtain *M*, that is:

$$
\Phi\_{\sigma}^{\rho}(M) = M\_{\text{-}\sigma}
$$

as stated above.

3.2.2. Fixed Points for Matrix Affine Substitutions Maps Defined over Finite Matrices

We can obtain finite dimensional fixed points of matrix substitution maps by applying Theorem 2.

**Definition 2** (Generalised fixed points for a finite matrix substitution map)**.** *Let us consider a given integer n* ≥ 1*. The matrix M* ∈ M<<sup>∞</sup> *<sup>n</sup>*×*n*(*p*) *(see Definition 1) is a finite matrix fixed point* *of the matrix substitution map* (Φ<∞*<sup>σ</sup>* ) *if and only if there exists an integer ρ* ≥ 1 *such that the leading principal part of order n of* (Φ<∞*<sup>σ</sup>* )*ρ*(*M*) *is equal to M.*

**Proposition 1.** *For any integer n* ≥ 2 *and a given matrix substitution map* (Φ<∞*<sup>σ</sup>* ) *there exists fixed points in the sense of Definition 2.*

**Proof.** We only have to apply Theorem 2 in order to obtain a fixed point of order *ρ* of *M* ∈ M+<sup>∞</sup> for the matrix substitution map Φ*<sup>σ</sup>* and then to consider the leading principal matrix of order *n* of *M*. We obtain that (Φ<∞*<sup>σ</sup>* )*ρ*(*M*) ∈ M<<sup>∞</sup> *ndρ*×*ndρ*(*p*) and since we have that the leading principal part of Φ*<sup>ρ</sup> <sup>σ</sup>*(*M*) of order *ndρ* is equal to the finite matrix (Φ<∞*<sup>σ</sup>* )*ρ*(*M*) we will have that the leading principal part of order *n* of (Φ<∞*<sup>σ</sup>* )*ρ*(*M*) is equal to *M*.

We will pursue next the goal of obtaining fixed points of matrix substitution maps in an algorithmic way, that is, by dealing with finite matrices. Let us now introduce topological structures over the spaces of matrices defined in Section 3. In order to define semi-norms over M*n*×*n*, a space we may identify to the space of finite matrices of order *n* over the field *<sup>p</sup>* <sup>=</sup> /*p*, we will consider the trivial absolute value |·|*<sup>p</sup>* (see [37], pp. 197–198), given by:

$$\forall k \in \mathbb{Z}\_p \ \left| k \right|\_p = \begin{cases} 0 & \text{if } k = 0 \\ 1 & \text{if } k \neq 0 \end{cases}.$$

If *<sup>p</sup>* is considered as a vectorial space over itself then, due to the properties of an absolute value over a field, we have that |·|*<sup>p</sup>* may be considered as a norm over the vectorial space *p*. For *M* ∈ M*n*×*n*(*p*) let the modified sum semi-norm be given, for *m* > 1, by:

$$\left\|\left\|M\right\|\right\|\_{m} := \frac{1}{m^2} \sum\_{1 \le i,j \le m} \left|a\_{ij}\right|\_p \le 1\,\,. \tag{9}$$

Essentially, *M<sup>m</sup>* counts the proportion of nonzero elements in the leading principal matrix of order *m* of *M*. We observe that—with *m* the order of the semi-norm and *n* the order of the matrix—as *m* > *n* grows, *M<sup>m</sup>* will tend to zero. ·*<sup>m</sup>* is a semi-norm as the proportion of nonzero entries of the sum of two matrices—with entries in the field *p*—can only decrease with respect with the sum of the proportions of each matrix. As a consequence of the decomposition of M*n*×*n*(*p*) in Formula (4), we have that:

$$\|\|M\|\|\_{\left[n\right]} = \|\|M\|\|\_{\mathcal{M}\_{n \times n}(\mathbb{Z}\_p)} := \frac{1}{n^2} \sum\_{1 \le i, j \le n} |a\_{ij}|\_p \le 1 \,\,\,\,\tag{10}$$

is a norm over <sup>M</sup>*n*×*n*(*p*) and, with the norm *M*M*n*×*n*(*p*) the space of matrices <sup>M</sup>*n*×*n*(*p*) is, obviously, a Fréchet space. Now, let *j* : M*n*×*<sup>n</sup>* \$→ M(*n*+1)×(*n*+1) be the natural injection which is well defined taking into account Formula (4). Since we have that, for *M* ∈ M*n*×*<sup>n</sup>* \ M*n*+1×*n*+<sup>1</sup> that for *i* = *n* + 1 or *j* = *n* + 1, *aij <sup>p</sup>* = 0, we then have,

$$\begin{split} \left\lVert j(M) \right\rVert\_{\left[n+1\right]} &= \frac{1}{(n+1)^2} \sum\_{1 \le i,j \le n+1} \left| a\_{ij} \right|\_p \\ &\le \frac{1}{(n+1)^2} \sum\_{1 \le i,j \le n} \left| a\_{ij} \right|\_p + \frac{1}{(n+1)^2} \sum\_{i=n+1 \lor j=n+1} \left| a\_{ij} \right|\_p \\ &\le \frac{1}{n^2} \sum\_{1 \le i,j \le n} \left| a\_{ij} \right|\_p = \left\lVert M \right\rVert\_{\left[n\right]} .\end{split} \tag{11}$$

As a consequence *j* maps continuously (M*n*×*n*, ·*n*) into M*n*+1×*n*+1, ·*n*+<sup>1</sup> . Furthermore, as a consequence, we may consider over M<sup>0</sup> the inductive topology generated by

the family of Fréchet spaces (M*n*×*n*, ·*n*)*n*≥<sup>1</sup> (see ([38], pp. 53–65) or ([39], pp. 57–60), or ([40], pp. 222–225)).

**Remark 5** (On the topology of the space M0)**.** *Let τ be this topology over* M0*. As a consequence of the well known results in the theory of LF spaces, we have that:*


**Remark 6** (A comparable topology)**.** *If we consider over* M+<sup>∞</sup> *the family of semi-norms* (*sm*)*m*≥1*, given by:*

$$s\_m\left(\left[a\_{ij}\right]\_{i,j\geq 1}\right) := \sup\_{n\leq m} \frac{1}{n^2} \sum\_{1\leq i,j\leq n} \left|a\_{ij}\right|\_{p^{-\prime}}\tag{12}$$

*we have that (see [38], p. 64 for a proof of this result)* M+<sup>∞</sup> *is a Fréchet space, we have that* (M0, *τ*) *embeds continuously in* M+<sup>∞</sup> *and that the closure of* (M0, *τ*) *is* M+∞*.*

Now, let us consider *M<sup>σ</sup>* ≡ (*Mn*)*n*≥<sup>0</sup> with *Mn*+<sup>1</sup> = Φ*σ*(*Mn*) Our first goal is to study the contraction properties of Φ*<sup>σ</sup>* over M0. The second goal is to extend Φ*<sup>σ</sup>* to M+∞, also as a contraction. This allows us to identify an invariant set. For that purpose we have to identify conditions under which Φ*<sup>σ</sup>* is linear, or affine over M0.

**Definition 3** (Linear matrix substitutions)**.** *The matrix substitution map* Φ*<sup>σ</sup> (see Formulas* (6)*–*(8)*) is defined to be a linear matrix substitution map over* M<sup>0</sup> *iff for all k*, *k* ∈ *<sup>p</sup> we have that:*

$$
\sigma\_k + \sigma\_{k'} = \sigma\_{(k+k' \text{ mod } p)} \quad \text{and} \quad k' \cdot \sigma\_k = \sigma\_{(k' \cdot k \text{ mod } p)} \tag{13}
$$

**Remark 7** (A substantiation of Definition 3)**.** *With k* + *k* ∈ *<sup>p</sup> and k* · *k* ∈ *<sup>p</sup> we will obviously have that,*

$$\begin{split} \Phi\_{\sigma}(M+N) &= \left\lfloor \sigma\left(a\_{\vec{i}\vec{j}} + b\_{\vec{i}\vec{j}}\right) \right\rfloor\_{1 \le i,j \le n} = \left\lfloor \sigma\left(a\_{\vec{i}\vec{j}}\right) \right\rfloor\_{1 \le i,j \le n} + \left\lfloor \sigma\left(b\_{\vec{i}\vec{j}}\right) \right\rfloor\_{1 \le i,j \le n} \\ &= \Phi\_{\sigma}(M) + \Phi\_{\sigma}(N) \ . \end{split}$$

*In fact, for the sum property—as for the product property the justification is similar—we have by definition,*

$$\left[\sigma\left(a\_{i\bar{\jmath}}\right) = \sigma\_k \text{ iff } a\_{i\bar{\jmath}} = k\right] \text{ and } \left[\sigma\left(b\_{i\bar{\jmath}}\right) = \sigma\_{k'} \text{ iff } b\_{i\bar{\jmath}} = k'\right] \text{ } k$$

*and so,*

$$
\sigma(a\_{ij}) + \sigma(b\_{ij}) = \sigma\_k + \sigma\_{k'} = \sigma\_{(k+k' \bmod p)} = \sigma(a\_{ij} + b\_{ij}) \text{ iff } a\_{ij} + b\_{ij} = (k + k' \bmod p) \dots
$$

**Remark 8** (A consequence of Definition 3)**.** *Condition* (13) *for having a matrix substitution linear implies that σ*<sup>0</sup> = 0 ∈ *<sup>p</sup> because we should have for all k* ∈ {0, 1, 2, ... *p* − 1} *that σ*<sup>0</sup> + *σ<sup>k</sup>* = *σk.*

**Remark 9** (Examples of linear matrix global substitution rules)**.** *A first example of a linear matrix substitution in* <sup>3</sup> *is given by:*

$$
\sigma\_0 = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} \ \sigma\_1 = \begin{pmatrix} 0 & 2 \\ 1 & 1 \end{pmatrix} \ \sigma\_2 = \begin{pmatrix} 0 & 1 \\ 2 & 2 \end{pmatrix} \ \vdots
$$

*Let us return to the example of Section 2.2. We observe that:*

$$\sigma\_2 + \sigma\_1(\text{ mod } 3) = \sigma\_{(2+1 \text{ mod } 3)} = \sigma\_0 = \mathbf{0}\_{3 \times 3}\text{ }\dots$$

*thus showing that the substitution is a linear matrix substitution. A linear matrix substitution is essentially defined by its σ*<sup>1</sup> *substitution and so, every linear matrix substitution is derived from a Kronecker power matrix equal to σ*<sup>1</sup> *as defined in Section 2.2. We stress that not all matrix substitutions are linear as the first example in Section 2.1 shows. In fact, with the notations and definitions of this first example, we have that:*

$$(\sigma\_1 + \sigma\_1)\_{(\text{mod } \mathfrak{Z})} = \begin{pmatrix} 2 & 0 & 0 \\ 0 & 1 & 0 \\ 2 & 0 & 2 \end{pmatrix} \ (\sigma\_2 - \sigma\_0)\_{(\text{mod } \mathfrak{Z})} = \begin{pmatrix} 1 & 1 & 0 \\ 2 & 0 & 0 \\ 1 & 0 & 0 \end{pmatrix} \ \rho$$

*and* (*σ*<sup>2</sup> − *σ*0mod 3) = (*σ*<sup>1</sup> + *σ*1mod 3) *thus showing that the substitution is not linear.*

**Remark 10** (On the contraction character of a matrix substitution map)**.** *Let us suppose that we have some matrix with constant entries, for instance:*

$$M = \left[ a\_{i\bar{j}} \right]\_{i, \bar{j} \ge 1} \in \mathcal{M}\_{n \times n} \text{ with } a\_{i\bar{j}} \equiv p - 1.$$

*Then, with the usual absolute value over p,*

$$||\mathcal{M}||\_{\left[n\right]} = \frac{1}{n^2} \sum\_{1 \le i,j \le n} |a\_{ij}|\_p = \frac{1}{n^2} \sum\_{1 \le i,j \le n} 1 = 1 \dots$$

*Now suppose, in the worst case scenario, that σp*−<sup>1</sup> ∈ M<<sup>∞</sup> *<sup>d</sup>*×*<sup>d</sup> is a matrix with all its entries, except one, equal to p* − 1 *and the exception is* 0*. We now have as a consequence that all the entries of leading principal matrix of order n* + 1 *of Mn*+<sup>1</sup> = Φ*σ*(*Mn*) *will be equal to p* − 1 *with n*<sup>2</sup> *entries that will be equal to* 0*. It then follows that,*

$$\begin{aligned} \|M\_{\mathfrak{n}+1}\|\_{\left[d\cdot\mathfrak{n}\right]} &= \|\Phi\_{\sigma}(M\_{\mathfrak{n}})\|\_{\left[d\cdot\mathfrak{n}\right]} = \frac{1}{\left(d\cdot\mathfrak{n}\right)^{2}} \sum\_{1 \le i,j \le d\cdot\mathfrak{n}} \left|a\_{ij}\right|\_{p} \\ &= \frac{1}{\left(d\cdot\mathfrak{n}\right)^{2}} \left[\left(d\cdot\mathfrak{n}\right)^{2} - \mathfrak{n}^{2}\right] = 1 - \frac{1}{d^{2}} = \left(1 - \frac{1}{d^{2}}\right) \left\|M\right\|\_{\left[\mathfrak{n}\right]} \end{aligned}$$

*since M<sup>n</sup>* = 1*. This example shows that the contraction properties of* Φ*<sup>σ</sup> depend on the proportion of zeros vis-a-vis the nonzero entries of the substitutions.*

**Proposition 2** (Linear matrix substitutions that are contractions)**.** *Let* Φ*<sup>σ</sup> be a linear matrix substitution map associated with a global substitution rule σ such that the maximum number of zeros in each σk, for k* ∈ {1, ... , *k* − 1}*, is r with* 1 ≤ *r* < *d*2*. We recall that σ*<sup>0</sup> *is the square matrix with d*<sup>2</sup> *entries all equal to* <sup>0</sup> ∈ *p. Then, the map* Φ*<sup>σ</sup> is a contraction from* M*n*×*<sup>n</sup> into* M*n*·*d*×*n*·*<sup>d</sup> for every n* ≥ 1*.*

**Proof.** Take a matrix *A* ∈ M*n*×*<sup>n</sup>* such that the number of zero entries in the leading principal matrix of order *n* of *A* is *s* with 0 ≤ *s* < *n*2. The case where *A* is a null matrix is irrelevant because, in this case, Φ*σ*(*A*) is the null matrix. Then in the leading principal matrix of order *nd* of Φ*σ*(*A*) there will be at least *sd*<sup>2</sup> zero entries due to the substitution of each zero in *A* by *d*<sup>2</sup> zeros of the matrix *σ*<sup>0</sup> which is a matrix of order *d*. Now, there are *n*<sup>2</sup> − *s* entries on *A* which are different of zero and for each of these non-null entries there correspond a maximum of *r* zero entries in Φ*σ*(*A*). As a consequence the total number of zero entries in Φ*σ*(*A*) is bounded by *sd*<sup>2</sup> + (*n*<sup>2</sup> − *s*)*r*. As such, we have that the proportion of nonzero elements in Φ*σ*(*A*) has the following upper bound:

$$\|\Phi\_{\sigma}(A)\|\_{[d:n]} \le 1 - \frac{sd^2 + (n^2 - s)r}{n^2 d^2} = \left(1 - \frac{s}{n^2}\right)\left(1 - \frac{r}{d^2}\right) = \left(1 - \frac{r}{d^2}\right) \|M\|\_{[n]},\tag{14}$$

and so, Φ*<sup>σ</sup>* is a contraction with constant 1 − *r*/*d*<sup>2</sup> < 1.

**Remark 11** (On the fixed points of linear matrix substitutions maps)**.** *We have first to observe that if* Φ*<sup>σ</sup> is a linear matrix substitution map associated with any global substitution rule σ then any null matrix M* = *aij <sup>i</sup>*,*j*≥<sup>1</sup> ∈ M*n*×*n, that is, such that aij* <sup>≡</sup> <sup>0</sup> <sup>∈</sup> *p, is a fixed point of* <sup>Φ</sup>*σ. In fact, since h aij* ≡ 0 ∈ *<sup>p</sup> and σ*<sup>0</sup> = **0** *we have,*

$$
\Phi\_{\sigma}(M) = M = \mathbf{0} \dots
$$

*Let us describe now the non-null other fixed points of* Φ*σ, a linear matrix substitution map belonging to* M<sup>0</sup> *(see Formula* (5)*). Consider a non-null matrix M* = *aij <sup>i</sup>*,*j*≥<sup>1</sup> ∈ M*n*×*<sup>n</sup> such that* Φ*σ*(*M*) = *M. By recalling that* Φ*σ*(*M*) ∈ M*nd*×*nd and reverting to the leading principal matrices of both M—a finite matrix of order n—and* Φ*σ*(*M*)*—which in turn is a finite matrix of order nd—we may conclude that, with* 0 = *a*11*, if a*<sup>11</sup> = *k for k* ∈ {1, 2, ... , *p* − 1} ⊂ *p, then σ*(*a*11) = *σk*(*a*11) = *M* = **0***. Moreover, we should also have, due to* Φ*σ*(*M*) = *M, that:*

$$\forall (i, j) \neq (1, 1), a\_{ij} \neq a\_{11} \text{ and } \forall l \in \{1, 2, \dots, p - 1\}, l \neq k \Rightarrow \sigma\_l = \mathbf{0} \text{ .} $$

*We may conclude that if we are given a linear matrix substitution map then either the the correspondent global substitution rule has the particular structure described above or there exists no other fixed points in* M<sup>0</sup> *besides the null matrix.*

In order to overcome the limitation of the fixed points for linear matrix substitutions maps we may consider other matrix substitution maps such as the ones defined next.

**Definition 4** (Affine matrix substitutions)**.** *A matrix substitution map* Φ *is an affine matrix substitution map if there exists a linear global substitution rule σ and a constant global substitution rule ν<sup>c</sup> such that,*

$$
\Phi = \Phi\_{\mathcal{v}} +\_{(\text{mod }p)} \Phi\_{\mathcal{v}\_{\mathcal{c}}} = \Phi\_{\mathcal{v} +\_{(\text{mod }p)} \mathcal{v}\_{\mathcal{c}} \ \text{ } \!} \tag{15}
$$

*with* Φ*<sup>σ</sup> the linear matrix substitution map associated with σ and* Φ*ν<sup>c</sup> the constant matrix substitution map associated with νc.*

**Remark 12.** *The important equality in the right-hand side of Formula* (15) *can be verified by resorting to the definition of a matrix substitution map associated with a global substitution rule.*

We will now consider Definition 2 of the generalised fixed points for finite matrix substitution maps. Recall that according to the definition in Formula (7) for we have that Φ<∞*<sup>σ</sup>* (*M*) ∈ M<<sup>∞</sup> *<sup>d</sup>*·*n*×*d*·*<sup>n</sup>* and introduce the following notation,

$$\left. \Phi\_{\sigma + \nu\_{\mathbb{C}}}^{<\infty}(\mathcal{M}) \right|\_{n\text{ }\prime} \tag{16}$$

to denote the leading principal part of order *n* of Φ<<sup>∞</sup> *σ*+*ν<sup>c</sup>* (*M*) for *M* ∈ M*n*×*n*.

**Theorem 3** (Fixed points of affine matrix substitutions)**.** *Consider an affine matrix substitution* Φ*σ*+*ν<sup>c</sup>* = Φ*<sup>σ</sup>* + Φ*ν<sup>c</sup> such that for the linear part global substitution rule σ, the maximum number of zeros in each σk, for k* ∈ {1, . . . , *k* − 1}*, is r with* 1 ≤ *r* < *d*2*. Then we have that:*


*3. There exists s* ≥ 1 *and L* = *aij <sup>i</sup>*,*j*≥<sup>1</sup> ∈ M*s*×*<sup>s</sup> a fixed point of* <sup>Φ</sup>*σ*+*ν<sup>c</sup> , that is, such that* Φ<<sup>∞</sup> *σ*+*ν<sup>c</sup>* (*L*) *<sup>s</sup>* = *L.*

**Proof.** The first statement follows from Formula (14) of Proposition 2. Recall that, by Formula (10) we have that *M*[*n*] = *M<sup>n</sup>* for *M* ∈ M*n*×*<sup>n</sup>* where for *m* integer *M<sup>m</sup>* is the semi-norm defined in Formula (9). For *M*, *N* ∈ M*n*×*<sup>n</sup>* we have that:

$$\|\Phi\_{\sigma+\nu\_{\varepsilon}}(M) - \Phi\_{\sigma+\nu\_{\varepsilon}}(N)\|\_{[d\cdot n]} = \|\Phi\_{\sigma}(M-N)\|\_{[d\cdot n]} \le \left(1 - \frac{r}{d^2}\right) \|M - N\|\_{[n]},\tag{17}$$

thus showing that the second statement is a consequence of the definition of the inductive topology of M<sup>0</sup> and of a natural definition of a contraction in an *LF* topological vector space. The last statement follows from a usual Banach fixed point theorem type argument, suitably modified. We first show the Cauchy sequence contraction inequality. Let *M* ∈ M*n*×*<sup>n</sup>* be given and, consider the matrix substitutions sequence *Mσ*+*ν<sup>c</sup>* ≡ (*Mn*)*n*≥<sup>0</sup> that is defined, by induction, by:

$$\forall n \ge 0 \quad M\_{n+1} = \Phi\_{\sigma + \nu\_{\mathcal{C}}}(M\_n) = \Phi\_{\sigma + \nu\_{\mathcal{C}}}^{(n+1)}(M\_0) \; . $$

with *M*<sup>0</sup> = *M* and, the iterated application map given, for instance for the second order iteration by Φ(2) *<sup>σ</sup>*+*ν<sup>c</sup>* = Φ*σ*+*ν<sup>c</sup>* ◦ Φ*σ*+*ν<sup>c</sup>* . We now show that *Mσ*+*ν<sup>c</sup>* is a Cauchy sequence in M0. For that, see ([38], p. 30), we have to show that for every *U*, a neighbourhood of zero in M<sup>0</sup> there exists some integer *m*<sup>0</sup> ≥ 1 such that for all *p* ≥ 1 and *m* ≥ *m*<sup>0</sup> we have *Mm*+*<sup>p</sup>* − *Mm* ∈ *U*. We start by using Formula (17) to establish a contraction Cauchy sequence type inequality.

$$\begin{split} \left\lVert \left\lVert M\_{m+p} - M\_{m} \right\rVert \right\rVert\_{[\underline{d}^{m+p}, \underline{n}]} &\leq \sum\_{k=1}^{p} \left\lVert M\_{m+k} - M\_{m+k-1} \right\rVert \left\lVert\_{[\underline{d}^{m+k}, \underline{n}]} \right\rVert \\ &\leq \sum\_{k=1}^{p} \left\lVert \Phi^{(m+k)}\_{\sigma + \nu\_{\varepsilon}}(M\_{0}) - \Phi^{(m+k-1)}\_{\sigma + \nu\_{\varepsilon}}(M\_{0}) \right\rVert \left\lVert\_{[\underline{d}^{m+k}, \underline{n}]} \right\rVert \\ &\leq \left( \sum\_{k=1}^{p} \left( 1 - \frac{r}{d^{2}} \right)^{m+k-1} \right) \left\lVert \Phi\_{\sigma + \nu\_{\varepsilon}}(M\_{0}) - M\_{0} \right\rVert\_{[\underline{d}^{\prime}, \underline{n}]} \\ &= \left( \frac{d^{2}}{r} \right) \left( 1 - \frac{r}{d^{2}} \right)^{m} \left\lVert \Phi\_{\sigma + \nu\_{\varepsilon}}(M\_{0}) - M\_{0} \right\rVert\_{[\underline{d}^{\prime}, \underline{n}]} \cdot \end{split} \tag{18}$$

Since by Köthe's Theorem M<sup>0</sup> is a complete space the conclusion now follows by the following argument. Let us rewrite the inequality (18) in the form:

$$M\_{m+p} - M\_m \in B\_{\left[d^{m+p} \cdot n\right]}(0, c\lambda^m) \; , \tag{19}$$

.

with *B*[*dm*+*p*·*n*](0, *cλm*) the ball centred on zero with radius *cλ<sup>m</sup>* in M*dm*+*p*·*n*×*dm*+*p*·*<sup>n</sup>* with,

$$\mathcal{L} := \frac{d^2}{r} \| \Phi\_{\sigma + \mathbf{v}\_\varepsilon}(M\_0) - M\_0 \|\_{[d \cdot n]} \text{ and } \lambda := \left( 1 - \frac{r}{d^2} \right)^2$$

Now, let *U* be a convex neighbourhood of zero in M0. Then, see ([38], p. 57), for all *n* ≥ 1 we have that *U* ∩ M*n*×*<sup>n</sup>* is a neighbourhood of zero in M*n*×*<sup>n</sup>* and so,

$$\exists \epsilon > 0 \; B\_{\mathcal{M}\_{n \times n}}(0, \epsilon) \subseteq \mathcal{U} \cap \mathcal{M}\_{n \times n}(\subset \mathcal{U})\; .$$

Let *m*<sup>0</sup> be an integer such that for all *m* ≥ *m*<sup>0</sup> we have that *cλ<sup>m</sup>* < , which is possible as *λ* < 1. Now, due to the decreasing properties of the norms of the spaces M*n*×*<sup>n</sup>* we have that

$$\forall p \ge 1, M \ge m\_0 \ B\_{\left[d^{m+p} \cdot n\right]} (0, c\lambda^m) \subset B\_{\mathcal{M}\_{n \times n}} (0, \epsilon) \subset \mathcal{U} \text{ .} $$

thus showing that *Mσ*+*ν<sup>c</sup>* is a Cauchy sequence in M0. Finally, as a consequence of the properties of the topology of the space M0, we have that the sequence *Mσ*+*ν<sup>c</sup>* converges in M<sup>0</sup> and so, for some *s* ≥ 1 we have that *Mσ*+*ν<sup>c</sup>* converges in M*s*×*s*. As a consequence, there exists *L* ∈ M*s*×*<sup>s</sup>* such that:

$$\lim\_{n \to +\infty} \left\| L - \Phi^{(n)}\_{\sigma + \nu\_{\varepsilon}}(M) \right\|\Big|\_{[s]} = 0 \tag{20}$$

We now observe that:

$$\begin{split} \left\|(L-\Phi\_{\sigma+\nu\_{\mathcal{C}}}(L))\right\|\_{[s+1]} &\leq \left\|\Phi\_{\sigma+\nu\_{\mathcal{C}}}(L)-\Phi\_{\sigma+\nu\_{\mathcal{C}}}^{(n+1)}(M)\right\|\Big|\_{[s+1]} \\ &\quad + \left\|\Phi\_{\sigma+\nu\_{\mathcal{C}}}^{(n)}(M)-\Phi\_{\sigma+\nu\_{\mathcal{C}}}^{(n+1)}(M)\right\|\Big|\_{[s+1]} \\ &\quad + \left\|L-\Phi\_{\sigma+\nu\_{\mathcal{C}}}^{(n)}(M)\right\|\Big|\_{[s+1]}. \end{split}$$

Now by the contraction property of Φ*σ*+*ν<sup>c</sup>* shown in Formula (17) by the canonical injection of M*s*+1×*s*+<sup>1</sup> in M*s*×*<sup>s</sup>* shown in Formula (11) we have that:

$$\left\|\Phi\_{\sigma+\nu\_{\mathfrak{c}}}(L) - \Phi\_{\sigma+\nu\_{\mathfrak{c}}}^{(n+1)}(M)\right\|\_{[s+1]} \le \left\|L - \Phi\_{\sigma+\nu\_{\mathfrak{c}}}^{(n)}(M)\right\|\_{[s+1]} \le \left\|L - \Phi\_{\sigma+\nu\_{\mathfrak{c}}}^{(n)}(M)\right\|\_{[s]},$$

and so, by Formulas (18) and (20) we have that *L* − Φ*σ*+*ν<sup>c</sup>* (*L*)[*s*+1] = 0 and this implies that Φ<<sup>∞</sup> *σ*+*ν<sup>c</sup>* (*L*) *<sup>s</sup>* = *<sup>L</sup>*, that is, *<sup>L</sup>* is a generalised fixed point for the finite matrix substitution map Φ*σ*+*ν<sup>c</sup>* .

**Remark 13** (Comparing Theorem 3 and Proposition 2)**.** *Theorem 3 is an improvement of Proposition 2 in two directions. It is a constructive result since it gives an algorithm to obtain a fixed point and while in Proposition 2 the fixed point was a fixed point of some number of iterations of the matrix substitution map in Theorem 3 the fixed point obtained is a fixed point of only one iteration of the matrix substitution map.*

#### **4. Random Matrices Associated to Structured Matrices**

In this Section we consider structured random matrices derived from the structured matrices considered in Section 2. Our approach to the spectral analysis of random matrices derived from matrices built with a matrix substitution procedure relies on the general theory of random linear operators as exposed in [41]. Other more recent approaches to this subject are given in [42–44]. Take a structured matrix built by substitutions—that we will denominate the **skeleton** of the random matrix—and consider the associated random matrix having as entries random variables such that to the occurrence of each field element *i* ∈ *<sup>p</sup>* in the skeleton structured matrix there corresponds a random variable with at least the same expected value as the expected value of a given random variable *Xi*, the same for a given *i* ∈ *p*. We will also consider the more stringent assumption that the entries in the random matrix corresponding to same field element *i* ∈ *<sup>p</sup>* are equi-distributed with a given random variable *Xi*.The random matrix can have independent entries or not. As usual the study of the independent case is easier and we will assume independence. For instance, take the matrix *M*<sup>1</sup> in Formula (2), that is:

$$M\_1 = \left[ m\_{i,j}^1 \right]\_{i,j} = \begin{pmatrix} 1 & 2 & 2 & 0 & 1 & 2 & 1 & 0 & 0 \\ 0 & 1 & 2 & 1 & 1 & 2 & 0 & 2 & 0 \\ 0 & 0 & 1 & 2 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 0 & 1 & 2 & 2 & 1 & 0 & 0 \\ 0 & 2 & 0 & 0 & 1 & 2 & 0 & 2 & 0 \\ 1 & 0 & 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 0 & 0 & 0 & 1 & 2 & 1 & 2 & 2 \\ 0 & 2 & 0 & 1 & 1 & 2 & 0 & 1 & 2 \\ 1 & 0 & 1 & 2 & 0 & 1 & 0 & 0 & 1 \\ \end{pmatrix}$$

This matrix is the skeleton of the following random matrix:

*M*1(*X*#) = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ *X*#1 *X*#2 *X*#2 *X*#0 *X*#1 *X*#2 *X*#1 *X*#0 *X*#0 *X*#0 *X*#1 *X*#2 *X*#1 *X*#1 *X*#2 *X*#0 *X*#2 *X*#0 *X*#0 *X*#0 *X*#1 *X*#2 *X*#0 *X*#1 *X*#1 *X*#0 *X*#1 *X*#1 *X*#0 *X*#0 *X*#1 *X*#2 *X*#2 *X*#1 *X*#0 *X*#0 *X*#0 *X*#2 *X*#0 *X*#0 *X*#1 *X*#2 *X*#0 *X*#2 *X*#0 *X*#1 *X*#0 *X*#1 *X*#0 *X*#0 *X*#1 *X*#1 *X*#0 *X*#1 *X*#1 *X*#0 *X*#0 *X*#0 *X*#1 *X*#2 *X*#1 *X*#2 *X*#2 *X*#0 *X*#2 *X*#0 *X*#1 *X*#1 *X*#2 *X*#0 *X*#1 *X*#2 *X*#1 *X*#0 *X*#1 *X*#2 *X*#0 *X*#1 *X*#0 *X*#0 *X*#1 ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠

,

built with the rules detailed above and so it is a structured random matrix *M*1(*X*#) = *X*#(*m*<sup>1</sup> *i*,*j* ) *i*,*j* with skeleton *M*<sup>1</sup> = *m*1 *i*,*j i*,*j* such that the entries are independent and verify, at least, *X*#(*m*<sup>1</sup> *i*,*j* ) = *m*<sup>1</sup> *i*,*j* .

We will address, in Sections 4.1, 4.2 and 4.4 several questions regarding these structured random matrices, to wit:


#### *4.1. Testing for a Given Matrix Structure in a Realisation of a Stochastic Matrix*

In this Section we will address the problem of testing if a given observed matrix can be considered as a realisation of a random matrix associated with a structured matrix built by a substitution map; this will be performed in a simple case. Let us suppose that we are given a realisation *M* = *xij* <sup>1</sup>≤*i*,*j*≤*<sup>N</sup>* of a random matrix - = *Xij* <sup>1</sup>≤*i*,*j*≤*<sup>N</sup>* having a structure derived from a matrix substitution map. We will admit the following assumptions.


Consider now, for each *i* ∈ *<sup>p</sup>* the sequence *<sup>i</sup> Ni* = (*X<sup>i</sup> <sup>n</sup>*)1≤*n*≤*Np* formed by the random variables of the random matrix that correspond to the entries in the skeleton with value *<sup>i</sup>*; we observe that <sup>∑</sup>*i*∈*<sup>p</sup> Ni* = *<sup>N</sup>*2. We assume furthermore that:

**(C)** For each *i* ∈ *<sup>p</sup>* we have that *Xi* G*i*(*θ*), that is, the correspondent random variable *Xi* has a probability law G*i*(*θ*) with *θ* ∈ **Θ***<sup>i</sup>* ⊂ *<sup>q</sup>* a parameter.

Due to hypothesis **(B)** and **(C)**, the sequence *<sup>p</sup> Ni* is a sample of the given random variable *Xi*. Furthermore, so a test procedure such as, a likelihood ratio test can be applied to determine if the matrix realisation *M* comes from a prescribed model of a random matrix with entries distributions verifying assumption **(C)** and with the skeleton given by a fixed point of the substitution map according to assumption **(A)**.

**Remark 14** (On the detection of a structured random matrix)**.** *Let us suppose that we have an observed large matrix which we suppose to be a realisation of a random matrix with independent centred entries. If the random variables are identically distributed then by force of the circular law, as quoted in Theorem 1 the spectral distribution of the normalised random matrix should be approximately the uniform distribution in the unit circle; a rejection of such a null hypothesis can be thought to be a strong indication of the existence of some particular structure in the matrix, namely that the entries are not identically distributed. For a formulation of such a statistical test, see [45] and also [46–48] and other references therein. Let us observe that it may be impossible to discern between possible existing structure or not; in fact, we have examples that show that if the coefficient of variation is large the distribution of eigenvalues of a structured matrix may have a similar pattern to the distribution of eigenvalues of a unstructured matrix.*

#### *4.2. Convergence in Law of Random Structured Matrices Built by Arbitrary Substitutions*

In this section, we show that if we consider a matrix fixed point of a matrix substitution map then the sequence of random matrices having as skeletons the sequence of iterates, by the matrix substitution map, of a given matrix converges in law to the random matrix that has as skeleton the fixed point of the matrix substitution map. We suppose that we are in the following context and notations.


We recall that if *M*<sup>0</sup> ∈ M+<sup>∞</sup> and *Mn* = Φ*σ*(*Mn*−1) for *n* ≥ 1 then *M*<sup>∞</sup> =M<sup>∞</sup> lim*n*→+<sup>∞</sup> *Mn* the convergence taking place in the topology of M<sup>∞</sup> defined by the increasing sequence of semi-norms given in Formula (12) (see Remark 6).

**Theorem 4** (Convergence in law of random structured matrices)**.** *Suppose that for each i* ∈ *<sup>p</sup> the characteristic function of the random variable Xi is continuous at zero. If for n* ≥ 1*, Mn*(*X*#) *and M*∞(*X*#) *are the random structured matrices with skeletons Mn and M*∞*, , respectively, and as defined above then:*

$$\operatorname{Law}(M\_{\mathbb{N}}(X\_{\#})) \underset{n \to +\infty}{\longrightarrow} \operatorname{Law}(M\_{\infty}(X\_{\#})) \;. \tag{21}$$

**Proof.** Before applying Levy's continuity theorem we clarify the convergence in M∞. The increasing family of semi-norms (*sm*)*m*≥<sup>1</sup> defined by:

$$s\_m(M) = s\_m\left(\left[a\_{i\bar{j}}\right]\_{i,\bar{j}\geq 1}\right) := \sup\_{n\leq m} \frac{1}{n^2} \sum\_{1 \leq i,\bar{j} \leq n} \left|a\_{i\bar{j}}\right|\_{p\ \prime p\ \prime}$$

gives the maximum proportion of non-null terms in the leading principal parts of dimension less or equal to *m* of the matrix *M* = *aij i*,*j*≥1 . Taking *M*<sup>0</sup> ∈ M+<sup>∞</sup> and *Mn* = Φ*σ*(*Mn*−1) for *n* ≥ 1, we have that *M*<sup>∞</sup> =M<sup>∞</sup> lim*n*→+<sup>∞</sup> *Mn* if and only if:

$$\forall m \ge 1 \; \prime \; \lim\_{n \to +\infty} s\_m (M\_n - M\_{\infty}) = 0 \; \dots$$

If this is the case, taking now < 1/*m*, for a given *m* ≥ 1, and if *sm*(*Mn* − *M*∞) ≤ we have necessarily the leading principal parts of order *m* of *Mn* and *M*∞ are equal. This implies that all the entries of the leading principal parts of order *m* of *Mn*(*X*#) and *M*∞(*X*#) have that same laws. Now given an infinite random matrix *M*(*X*#)) = *aij*(*X*#) *<sup>i</sup>*,*j*≥<sup>1</sup> with skeleton *M* = *aij <sup>i</sup>*,*j*≥<sup>1</sup> we may consider its characteristic function *<sup>ϕ</sup>M*(*X*#), for each *<sup>t</sup>* <sup>∈</sup> , by:

$$\forall t \in \mathbb{R}\_{\succ} \ \varphi\_{M(X\_{\mathsf{H}})}(t) = \left[ \varphi\_{a\_{ij}(X\_{\mathsf{H}})}(t) \right]\_{i,j \geq 1} = \left[ \operatorname{\mathbb{E}} \left[ \epsilon^{ita\_{ij}(X\_{\mathsf{H}})} \right] \right]\_{i,j \geq 1}.$$

For each *<sup>t</sup>* <sup>∈</sup> , we have that *<sup>ϕ</sup>Mn*(*X*#)(*t*) and *<sup>ϕ</sup>M*∞(*X*#)(*t*) are infinite matrices with coefficients in . We consider on the space M∞() of infinite matrices *zij <sup>i</sup>*,*j*≥<sup>1</sup> with coefficients in *zij* ∈ the topology defined by the increasing family of semi-norms:

$$\rho\_m\left(\left[z\_{i\bar{j}}\right]\_{i,\bar{j}\geq 1}\right) = \sup\_{n\leq m} \sum\_{1\leq i,j\leq n} \left|z\_{i\bar{j}}\right|\_{n,j}$$

and we now show that:

$$\begin{aligned} \lim\_{n \to +\infty} \varrho\_{M\_n(X\_\emptyset)}(t) &=\_{M\_\infty(\mathbb{C})} \varrho\_{M\_\infty(X\_\emptyset)}(t) \\ &\Leftrightarrow \forall m \ge 1 \quad \lim\_{n \to +\infty} \rho\_m \left( \varrho\_{M\_\hbar(X\_\emptyset)}(t) - \varrho\_{M\_\hbar(X\_\emptyset)}(t) \right) = 0 \,,\end{aligned}$$

for every fixed *t* ∈ . It is enough to consider < 1/*m* for any fixed *m* ≥ 1. As seen above if *n* ≥ 1 is such that *sm*(*Mn* − *M*∞) ≤ we have necessarily the leading principal parts of order *m* of *Mn*(*X*#) and *M*∞(*X*#) have that same laws and so their the characteristic functions of the entries of the respective leading principal parts of order *m* also coincide and so *ρ<sup>m</sup> ϕMn*(*X*#)(*t*) − *ϕM*∞(*X*#)(*t*) = 0. As a consequence of Levy's continuity theorem (see ([49], p. 389) or ([50], p. 144)), we have the thesis of the theorem in Formula (21).

#### *4.3. Spectral Analysis of Some Structured Random Matrices*

In this Section we will provide results shedding light on the spectral analysis of some random structured matrices. The first result shows that under some mild assumptions a random structured matrix defines, almost surely for each one of its realisations, a Hilbert-Schmidt operator on *l* <sup>2</sup>(), the Hilbert space of square summable sequences. The two main references needed in this Section are [51,52] for the results on Hilbert–Schmidt operators and [41] for random linear operators.

**Theorem 5** (Random structured matrices with vanishing second moments)**.** *Consider a random structured matrix M*(*X*(#)) = *<sup>X</sup>mij ij i*,*j with skeleton M* = *mij <sup>i</sup>*,*<sup>j</sup> only verifying* -[*Xmij ij* ] = *mij besides the independence of the entries. Let* (*ei*)*i*≥<sup>1</sup> *be the canonical orthonormal basis of l* <sup>2</sup>()*, that is, ei* = (*e*<sup>1</sup> *<sup>i</sup>* ,*e*<sup>2</sup> *<sup>i</sup>* , ...*e<sup>n</sup> <sup>i</sup>* , ...) *with <sup>e</sup><sup>n</sup> <sup>i</sup>* = *<sup>δ</sup><sup>n</sup> <sup>i</sup> the Kronecker's delta. We assume that the second moments* - *<sup>X</sup>mij ij* 2 *of the random matrix entries go to zero, sufficiently fast as i*, *j grow indefinitely, more precisely:*

$$\sum\_{i,j} \mathbb{E}\left[ \left| X\_{ij}^{m\_{ij}} \right|^2 \right] = \mathbb{C} < +\infty \, . \tag{22}$$

*Then we have that:*

$$\mathbb{P}\left[\sum\_{i,j} \left| \langle M(X(\#))e\_i, e\_j \rangle \right|^2 < +\infty \right] = 1 \,\,. \tag{23}$$

*Moreover, for ω* ∈ Ω *almost surely, M*(*X*(#))(*ω*) *defines a bounded operator in l* <sup>2</sup>() *which is also a Hilbert–Schmidt operator in l*2()*.*

**Proof.** The proof essentially relies on a Skorohod's sufficient condition for random linear operators in Hilbert space. We observe that:

$$\sum\_{i,j} \left| \langle M(X(\#))e\_i, e\_j \rangle \right|^2 = \sum\_{i,j} \left| X\_{ij}^{m\_{ij}} \right|^2 \dots$$

Condition in Formula (22) implies, by Lebesgue's monotone convergence theorem that:

$$\mathbb{E}\left[\sum\_{i,j} \left|X\_{ij}^{m\_{ij}}\right|^2\right] = \sum\_{i,j} \mathbb{E}\left[\left|X\_{ij}^{m\_{ij}}\right|^2\right] = \mathbb{C} < +\infty$$

Furthermore, so, by a standard argument we have the conclusion announced in Formula (23),

$$\mathbb{P}\left[\sum\_{i,j} \left| \langle \mathcal{M}(X(\#))e\_i, e\_j \rangle \right|^2 < +\infty \right] = \mathbb{P}\left[\sum\_{i,j} \left| X\_{ij}^{m\_{ij}} \right|^2 < +\infty \right] = 1 \text{ .}$$

We first have for *ω* ∈ Ω almost surely, that the operator *M*(*ω*) := *M*(*X*(#)(*ω*)) is bounded, since, for all *s* ∈ *l* <sup>2</sup>(), that is such that *<sup>s</sup>* = (*si*)*i*≥<sup>1</sup> with <sup>∑</sup>*i*≥1|*si*| <sup>2</sup> < +∞, we have, by Parseval's equality and by Cauchy–Schwartz's inequality:

$$\begin{split} \left\lVert \left\lVert M(\omega)(\mathbf{s}) \right\rVert^{2} = \sum\_{j\geq 1} \left\lvert \left\langle M(\omega)(\mathbf{s}), e\_{j} \right\rangle \right\rVert^{2} &= \sum\_{j\geq 1} \left\lvert \sum\_{i\geq 1} \left\langle M(\omega) \left(e\_{i} \right), e\_{j} \right\rangle \left\langle \mathbf{s}, e\_{i} \right\rangle \right\rvert^{2} \\ &\leq \sum\_{j\geq 1} \left\lbrack \left( \sum\_{i\geq 1} \left\lvert \left\langle M(\omega)(e\_{i}), e\_{j} \right\rangle \right\rvert^{2} \right) \left( \sum\_{i\geq 1} \left\lvert \left\langle \mathbf{s}, e\_{i} \right\rangle \right\rvert^{2} \right) \right\rbrack \\ &= \left( \sum\_{i,j\geq 1} \left\lvert \left\langle M(\omega)(e\_{i}), e\_{j} \right\rangle \right\rvert^{2} \right) \left\lVert \mathbf{s} \right\rVert^{2} . \end{split} \tag{24}$$

and thus, by Formula (23), the operator *M*(*ω*) is bounded. The final conclusion results from Remark 2 in Skorohod's treaty ([41], p. 8) stating that the condition expressed in Formula (23), is suffices for the matrix operator defined by the random matrix *M*(*X*(#)) to be a Hilbert–Schmidt operator, almost surely. In fact, by Theorem 2 in ([51], p. 34) we have that a sufficient condition for the operator *M*(*ω*) to be an Hilbert–Schmidt operator is that:

$$\sum\_{i\geq 1} \left\| \mathcal{M}(\omega)(e\_i) \right\|^2 = \sum\_{j\geq 1} \sum\_{i\geq 1} \left| \langle \mathcal{M}(\omega)(e\_i), e\_i \rangle \right|^2 < +\infty \,, \, ,$$

and so the last result announced follows.

As a consequence of Theorem 5 and of the spectral theorem we obtain the spectral representation of the kind of structured random matrices we studied in this Section.

**Remark 15** (On the definition of eigenvalues of random structured matrices)**.** *Since every Hilbert–Schmidt operator is compact and the random matrix entries are real the spectral theorem for compact self adjoint operators (see [52], p. 113) shows that, for ω* ∈ Ω *almost surely, there is an orthonormal system* (*φi*(*ω*))*i*≥<sup>1</sup> *of eigenvectors of M*(*ω*) *and the corresponding eigenvalues* (*λi*(*ω*))*i*≥<sup>1</sup> *such that for all s* ∈ *l* <sup>2</sup>() *we have that:*

$$M(\omega)(\mathbf{s}) = \sum\_{i \ge 1} \lambda\_i(\omega) \langle \mathbf{s}, (\phi\_i(\omega)) \phi\_i(\omega) \rangle\_{\omega}$$

*and since the operator M*(*ω*) *is Hilbert–Schmidt we have that:*

$$\begin{split} \sum\_{j\geq 1} \left\| M(\omega)(\boldsymbol{\phi}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}}}(\boldsymbol{\omega})) \right\|^{2} &= \sum\_{j\geq 1} \left\| \sum\_{i\geq 1} \lambda\_{i}(\boldsymbol{\omega}) \left\langle \boldsymbol{\phi}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}}}(\boldsymbol{\omega}), \left(\boldsymbol{\phi}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}}}(\boldsymbol{\omega})\right) \boldsymbol{\phi}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}}}(\boldsymbol{\omega}) \right\rangle \right\|^{2} = \sum\_{j\geq 1} \left\| \lambda\_{j}(\boldsymbol{\omega}) \boldsymbol{\phi}\_{\boldsymbol{\hat{\boldsymbol{\gamma}}}}(\boldsymbol{\omega}) \right\|^{2} \\ &= \sum\_{j\geq 1} \left| \lambda\_{j}(\boldsymbol{\omega}) \right|^{2} < +\infty \ . \end{split}$$

*So, the random structured matrices studied in this Section have, almost surely, square integrable eigenvalues sequences .*

The next result shows that the image of a nonrandom vector by some of the structured random matrices in this Section is, asymptotically, a Gaussian vector.

**Theorem 6** (Gaussian character of images of nonrandom vectors by some structured random matrices)**.** *Consider a random structured matrix M*(*X*(#)) = *<sup>X</sup>mij ij i*,*j with skeleton M* = *mij <sup>i</sup>*,*<sup>j</sup> only verifying* [*Xmij ij* ] = *mij and that and* <sup>V</sup> *<sup>X</sup>mij ij is bounded, besides the independence of the entries. Suppose that x* ∈ *l* <sup>2</sup>(N) ∩ *l* <sup>1</sup>(N)*. Suppose additionally that:*

$$\delta\_L := \max\_{j \le L} \frac{\mathrm{E}\left[ \left| \left< \chi, \boldsymbol{\varepsilon}\_j \right> X\_{ij}^{m\_{ij}} \right|^3 \right]}{\mathrm{E}\left[ \left| \left< \chi, \boldsymbol{\varepsilon}\_j \right> X\_{ij}^{m\_{ij}} \right|^2 \right]} \xrightarrow[L \to +\infty]{} 0 \,\mathrm{d}\,. \tag{25}$$

*Then M*(*X*(#))(*x*) *is a vector which has components that are asymptotically Gaussian, a property that we summarise in the form:*

$$\sum\_{j\geq 1} \langle \mathbf{x}, e\_j \rangle X\_{ij}^{m\_{ij}} \underset{a(j)}{\frown} \mathcal{N}(D, \mathbb{C}^2) = \mathcal{N} \left( \sum\_{j\geq 1} \langle \mathbf{x}, e\_j \rangle m\_{ij\prime} \sum\_{j\geq 1} |\langle \mathbf{x}, e\_j \rangle|^2 \mathbb{V} \left[ X\_{ij}^{m\_{ij}} \right] \right),$$

*for each component of M*(*X*(#))(*x*)*.*

**Proof.** The proof is an application of Lyapunov's central limit theorem for independent but not identically distributed random variables (see [53], p. 362). We consider the operator *M*(*X*(#)) : *l* <sup>2</sup>() \$→ *l* <sup>2</sup>() and for notational purposes that (*ei*)*i*≥<sup>1</sup> is the canonical orthonormal basis of *l* <sup>2</sup>() and that (*e <sup>i</sup>* )*i*≥<sup>1</sup> is its the dual basis. With the notation *M*(*ω*) := *M*(*X*(#)(*ω*)) we have that *M*(*ω*)(*x*) = ∑*i*≥<sup>1</sup> @ *M*(*ω*)(*x*),*e i* A *e <sup>i</sup>* and if we take a nonrandom vector *x* = ∑*i*≥1*x*,*eiei* we have that *M*(*ω*)(*x*) = ∑*i*≥1*x*,*eiM*(*ω*)(*ei*), an expression that may be developed into:

$$\begin{split} M(\omega)(\mathbf{x}) &= \sum\_{i\geq 1} \left\langle \sum\_{j\geq 1} \langle \mathbf{x}, e\_{j} \rangle M(\omega)(e\_{j}), e\_{i}^{\star} \right\rangle e\_{i}^{\star} = \sum\_{i\geq 1} \sum\_{j\geq 1} \langle \mathbf{x}, e\_{j} \rangle \left\langle M(\omega)(e\_{j}), e\_{i}^{\star} \right\rangle e\_{i}^{\star} \\ &= \sum\_{i\geq 1} \left( \sum\_{j\geq 1} \langle \mathbf{x}, e\_{j} \rangle X\_{ij}^{\mathbf{m}\_{ij}} \right) e\_{i}^{\star} \ , \end{split}$$

using the fact that *M*(*ω*) = *<sup>X</sup>mij ij i*,*j* . We observe that using previous notations we have that:

$$\mathbb{E}\left[\langle \mathbf{x}, e\_{\rangle} \rangle X\_{ij}^{m\_{ij}}\right] = \langle \mathbf{x}, e\_{\rangle} \rangle m\_{ij} \text{ and } \mathbb{V}\left[\langle \mathbf{x}, e\_{\rangle} \rangle X\_{ij}^{m\_{ij}}\right] = \left|\langle \mathbf{x}, e\_{\rangle} \rangle\right|^2 \mathbb{V}\left[X\_{ij}^{m\_{ij}}\right] \dots$$

Now due to Lyapunov central limit theorem, the assumption made in Formula (25) and Berry estimate for the rate of convergence, we may write, for a variable *A* = *A*(*L*) = O(*L*):

$$\mathbb{P}\left[\sum\_{j\le L} \frac{\left<\boldsymbol{\chi}, \boldsymbol{\varepsilon}\_{i}\right> X\_{ij}^{m\_{ij}} - \mathbb{E}\left[\left<\boldsymbol{\chi}, \boldsymbol{\varepsilon}\_{i}\right> X\_{ij}^{m\_{ij}}\right]}{\sqrt{\sum\_{j\le L} \left|\left<\boldsymbol{\chi}, \boldsymbol{\varepsilon}\_{j}\right>\right|^{2} \mathbb{V}\left[X\_{ij}^{m\_{ij}}\right]}} \le \mathbf{x}\right] = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} e^{-\frac{\xi^{2}}{2}} dt + A\delta\_{L} \dots$$

The above expression may be written as:

$$\begin{split} \mathbb{P}\left[\sum\_{j\le L} \langle \mathbf{x}, e\_j \rangle X\_{ij}^{m\_{ij}} \le \mathbf{x} \sqrt{\sum\_{j\le L} \left| \langle \mathbf{x}, e\_j \rangle \right|^2 \mathbb{V}\left[X\_{ij}^{m\_{ij}}\right]} + \sum\_{j\le L} \langle \mathbf{x}, e\_j \rangle m\_{ij} \right] \\ = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\mathbf{x}} e^{-\frac{t^2}{2}} dt + A \delta\_L \ . \end{split} \tag{26}$$

Since *x* ∈ *l* <sup>2</sup>() and *<sup>X</sup>mij ij* , the variances of the entries of the matrix *M*(*ω*), are bounded we have that:

$$\sum\_{j\geq 1} \left| \langle \mathbf{x}, e\_j \rangle \right|^2 \mathbb{V} \left[ \mathbf{X}\_{ij}^{m\_{ij}} \right] = \mathbb{C}^2 < +\infty \ . $$

Since *x* ∈ *l* <sup>1</sup>(N) ∩ *l* <sup>2</sup>(N) and *mij* ∈ *<sup>p</sup>* we have that ∑*j*≥<sup>1</sup> @ *x*,*ej* A *mij* <sup>&</sup>lt; <sup>+</sup>∞. As a consequence let:

$$\sum\_{j\geq 1} \langle x, e\_j \rangle m\_{ij} = D \in \mathbb{R} \ .$$

Consider the partial sums ∑*j*≤*<sup>L</sup>* @ *x*,*ej* A *mij* <sup>=</sup> *DL* and ∑*j*≤*<sup>L</sup>* @ *x*,*ej* A 2 V *<sup>X</sup>mij ij* := *CL*. We may write Formula (26) in the form:

$$\mathbb{P}\left[\sum\_{j\le L} \langle \mathbf{x}, e\_j \rangle X\_{ij}^{mj} \le \mathbf{x} \mathbf{C}\_L + D\_L \right] = \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty} e^{-\frac{t^2}{2}} dt + A \delta\_L \le 0$$

which, by a change of variable, amounts to:

$$\mathbb{P}\left[\sum\_{j\le L} \langle \mathbf{x}, e\_j \rangle X\_{ij}^{m\_{ij}} \le y \right] = \frac{1}{\sqrt{2\pi \mathbb{C}^2}} \int\_{-\infty}^{y} e^{-\frac{\left(u - D\right)^2}{2\mathbb{C}^2}} du + A \delta\_L \,. \tag{27}$$

Since we have that:

$$\begin{split} \frac{1}{\sqrt{2\pi\mathbb{C}^{2}}} \int\_{-\infty}^{y} e^{-\frac{(u-D)^{2}}{2\mathbb{C}^{2}}} du &= \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty + D} e^{-\frac{t^{2}}{2}} dt = \lim\_{L \to +\infty} \frac{1}{\sqrt{2\pi}} \int\_{-\infty}^{\infty \mathbb{C}\_{L} + D\_{L}} e^{-\frac{t^{2}}{2}} dt \\ &= \lim\_{L \to +\infty} \frac{1}{\sqrt{2\pi\mathbb{C}\_{L}^{2}}} \int\_{-\infty}^{y} e^{-\frac{(u-D\_{L})^{2}}{2\mathbb{C}\_{L}^{2}}} du \ . \end{split}$$

and from Formula (27), we have immediately:

$$\lim\_{L \to +\infty} \mathbb{P}\left[\sum\_{j \le L} \langle x, e\_j \rangle X\_{ij}^{m\_{ij}} \le y \right] = \frac{1}{\sqrt{2\pi C^2}} \int\_{-\infty}^y e^{-\frac{(u-D)^2}{2C^2}} du \dots$$

We may conclude that, on account of the independence of the entries of the random matrix, we have that *M*(*X*(#))(*x*), for all nonrandom *x*, is a random vector which has components ∑*j*≥<sup>1</sup> @ *x*,*ej* A *<sup>X</sup>mij ij* that are asymptotically Gaussian.

**Remark 16.** *The spectral analysis discussed in Remark 15 ensures a spectral decomposition of the random structured matrix operator M*(*X*(#)(*ω*)) *to exist for ω almost surely and so not only the eigenvalues but also the eigenvectors are random variables. Theorem 6 shows that if there exist an almost surely constant eigenvector of the operator M*(*X*(#)(*ω*)) *then the correspondent eigenvalue is Gaussian.*

Whenever the distributions for the three symbols are identical the effect of having a structured matrix naturally disappears as a consequence of Theorem 1. With different distributions the effects of having structured matrices appear.

For an illustration example in Figure 3 we have chosen,

$$X\_0 \sim \mathcal{N}(0, \sigma^2) \text{ and } X\_1 \sim \mathcal{N}(1, \sigma^2) \text{ and } X\_2 \sim \mathcal{N}(2, \sigma^2)$$

and we took successively larger values for the variance.

**Remark 17** (Identifying a random structured model by spectral analysis)**.** *There are two conclusions that we may obtain from a first analysis of Figure 3. The first is that, as expected, for smaller variances there is a similarity between the distribution of the eigenvalues in the plane of the structure matrix, the skeleton of the random matrix with entries considered in the complex field, and of the associated random matrix; a second observation, stressing well known facts, is that for sufficiently large variance the distribution of the eigenvalues of the random matrix is similar to the distribution of eigenvalues of a random matrix with independent and identically distributed entries as in Theorem 1.*

#### *4.4. Modelling: Random Surfaces Associated to Random Matrices*

In this Section we show that to each structured infinite matrix, under some hypothesis, we can associate in a canonical way a random field, for instance, defining a random surface over the unit square in the plane. The procedure is akin to the ones used to define the multiplicative chaos of Mandelbrot, Kahane, and Peyrière (see [54]) with the difference that we use products of real valued random variables instead of non-negative ones.

Prior to that we first provide a technical observation. The general theory of infinite products of random variables of arbitrary sign is quite elaborated when compared with the theory of infinite sums of random variables (see, for instance, [55–57]). Nevertheless, in the case that the sequence of products is a (sub or super) martingale there are immediately convergence results that can be taken to be used. Consider an infinite matrix *M* which is a fixed point of some matrix substitution map. This assumption is motivated by the idea that an observed matrix structure must have some permanence in time in order to be observed. We will define an infinite random structured matrix with given skeleton *M* as a matrix 

[*Xi*,*j*]*i*,*j*≥<sup>1</sup> having as entries independent random variables, such that E *<sup>X</sup>mi*,*<sup>j</sup> i*,*j* = *mi*,*j*.

We now associate to the columns of the random matrix [*Xi*,*j*]*i*,*j*≥<sup>1</sup> the following sequence of random variables (*Lj*)*j*≥1.

$$L\_j = L\_j(\mathfrak{a}, \gamma) := \gamma \frac{1}{\mathfrak{x}\_j^{\alpha}} \sum\_{i=1}^{+\infty} \frac{X\_{i,j}^{m\_{i,j}}}{p^i} \text{ with } \mathfrak{x}\_j := \sum\_{i=1}^{+\infty} \frac{m\_{i,j}}{p^i}$$

with *α* ≥ 1 and 0 < *γ* ≤ 1. We will also suppose that there are no columns with only zeros in any of the substitution matrices, which implies that there exists > 0 such that *xj* > . The parameters *α* and *γ* will be chosen to satisfy certain conditions ahead. In order to define the random surface we take a partition of ]0, 1[ <sup>2</sup> by a sequence of dyadic cells. A representation of a decreasing sequence of dyadic cells in ]0, 1[ <sup>2</sup> is given in Figure 4.

**Figure 4.** A decreasing sequence of dyadic cells.

In order to link the column random variables of the sequence (*Lj*)*j*≥<sup>1</sup> to the dyadic cells we consider for each decreasing sequence of dyadic cells such as:

$$C = (c(i\_1, i\_1i\_2, i\_1i\_2i\_3, \dots, i\_1i\_2i\_3\dots i\_N))\_{N \ge 1}$$

which is uniquely identified by the indexes identifying the decreasing sequence of dyadic cells *i*1, *i*1*i*2, *i*1*i*2*i*3, ... , *i*1*i*2*i*<sup>3</sup> ... *iN*, ... , *i*1, *i*2, *i*3, ... , *iN*, ···∈ {1, 2, 3, 4}. We have the following algorithm to rename the column random variables of the sequence (*Lj*)*j*≥<sup>1</sup> ≡ (*Lj*(*α*))*j*≥1:

$$\begin{array}{ccccc} W\_1 = L\_1 & W\_2 = L\_2 & W\_3 = L\_3 & W\_4 = L\_4\\ W\_{1,1} = L\_5 & W\_{1,2} = L\_6 & W\_{1,3} = L\_7 & W\_{1,4} = L\_8\\ W\_{2,1} = L\_9 & W\_{2,2} = L\_{10} & W\_{2,3} = L\_{11} & W\_{2,4} = L\_{12}\\ W\_{3,1} = L\_{13} & W\_{3,2} = L\_{14} & W\_{3,3} = L\_{15} & W\_{3,4} = L\_{16}\\ W\_{4,1} = L\_{17} & W\_{4,2} = L\_{18} & W\_{4,3} = L\_{19} & W\_{4,4} = L\_{20} \end{array}$$

The linking algorithm of the column random variables to the dyadic cells of [0, 1] <sup>2</sup> in its first step and second steps is as indicated in Figure 5.

**Figure 5.** The placement of the first four random variables: first step (**left**); The placement of the next 16 random variables: second step (**right**).

We now detail the sequence of random variables that give the height of the random surface. For that purpose we define a sequence of random variables (*MN*)*n*≥<sup>1</sup> uniquely associated with a decreasing sequence of dyadic cells in the following way:

$$M\_N = M\_N(\mathbf{c}(i\_1, i\_1 i\_2, \dots, i\_1 i\_2 \dots i\_N)) := \mathcal{W}\_{i\_1} \cdot \mathcal{W}\_{i\_1 i\_2} \cdot \mathcal{W}\_{i\_1 i\_2 i\_3} \dots \mathcal{W}\_{i\_1 i\_2 i\_3 \dots i\_N} = \prod\_{k=1}^N \mathcal{W}\_{i\_1 i\_2 i\_3 \dots i\_k} \tag{28}$$

observing that *MN* = *MN*(*c*(*i*1, *i*1*i*2, ... , *i*1*i*<sup>2</sup> ... *iN*)) with *c*(*i*1, *i*1*i*2, ... , *i*1*i*<sup>2</sup> ... *iN*) the finite sequence of dyadic cells that goes until the Nth step. We further observe that for every (*s*, *t*) ∈]0, 1[×]0, 1[ there exists an unique sequence *C*(*s*, *t*)=(*c*(*i*1, *i*1*i*2, ... , *i*1*i*<sup>2</sup> ... *iN*))*N*≥<sup>1</sup> of decreasing dyadic cells such that:

$$\{(s,t)\} = \bigcap\_{N \ge 1} c(i\_1, i\_1i\_2, i\_1i\_2i\_3, \dots, i\_1i\_2i\_3, \dots, i\_N)$$

This decreasing sequence of dyadic cells of a given point allows, with an additional hypothesis, the definition of the random surface via the sequence (*MN*)*N*≥<sup>1</sup> = (*MN*(*C*(*s*, *t*))*N*≥1.

Consider the left (negative) tail average for the distribution of *<sup>X</sup>mi*,*<sup>j</sup> <sup>i</sup>*,*<sup>j</sup>* given by:

$$a\_{i, \vec{j}} := -\int\_{-\infty}^{0} \mathfrak{x} dF\_{\mathcal{X}^{m\_{i, \vec{j}}}\_{i, \vec{j}}}(\mathfrak{x})\,\,.$$

We have the following result.

**Theorem 7** (Existence of a nontrivial random field associated with a structured random matrix)**.** *Suppose that the following assumptions are verified:*

*(a) The left tail averages verify:*

$$\sum\_{i=1}^{\infty} \frac{a\_{i,j}}{p^i} \le m < +\infty \,\,\mu$$

*for some constant m.*

*(b) The variances of the random variables <sup>X</sup>mi*,*<sup>j</sup> <sup>i</sup>*,*<sup>j</sup> verify <sup>X</sup>mi*,*<sup>j</sup> i*,*j* = *x*2*α*<sup>0</sup> *<sup>j</sup>* · *vi, for a certain α*<sup>0</sup> = *α*0(*m*) *to be determined later and with vi such that:*

$$1 < V := \sum\_{i=1}^{\infty} \frac{v\_i}{p^{2i}} < +\infty$$

*Then, there is a combination of the parameters α*, *γ such that, for each* (*s*, *t*) ∈]0, 1[×]0, 1[ *the sequence* (*MN*)*N*≥<sup>1</sup> = (*MN*(*C*(*s*, *t*))*N*≥<sup>1</sup> *is a supermartingale that converges almost surely to a random variable X*(*s*,*t*) *defining the random field* (*X*(*s*,*t*))(*s*,*t*)∈]0,1[<sup>2</sup> *, that is:*

$$X\_{(s,t)} := \lim\_{N \to +\infty} M\_N(c(i\_1, i\_1 i\_2, \dots, i\_1 i\_2 \dots i\_N)) \text{ a.s.} \tag{29}$$

*and* - *X*(*s*,*t*) < +∞*. Moreover, X*(*s*,*t*) ≥ 1*, that is, the random variable X*(*s*,*t*) *is not constant.*

**Proof.** We first observe that since *xj* ≥ we have:

$$\mathbb{E}\left[\left|L\_j(\boldsymbol{\alpha},\boldsymbol{\gamma})\right|\right] \leq \frac{\gamma}{\boldsymbol{\alpha}\_j^a} \left(\sum\_{i=1}^{+\infty} \frac{\mathbb{E}\left[\left|X\_{i,j}^{m\_{i,j}}\right|\right]}{p^i}\right) = \frac{\gamma}{\boldsymbol{\alpha}\_j^a} \left(\sum\_{i=1}^{+\infty} \frac{m\_{i,j} + a\_{i,j}}{p^i}\right) \leq \gamma \frac{1+m}{\boldsymbol{\epsilon}^a} \cdot \boldsymbol{\epsilon}$$

We now choose *α* = *α*<sup>0</sup> such that (1 + *m*)/*<sup>α</sup>* ≤ 1. Due to the independence of the of the random variables *<sup>X</sup>mi*,*<sup>j</sup> <sup>i</sup>*,*<sup>j</sup>* , we have that:

$$\mathbb{V}\left[L\_{j}\right] = \frac{\gamma^{2}}{\mathfrak{x}\_{j}^{2a\_{0}}} \sum\_{i=1}^{+\infty} \frac{\mathbb{V}\left[X\_{i,j}^{m\_{i,j}}\right]}{p^{2i}} = \frac{\gamma^{2}}{\mathfrak{x}\_{j}^{2a\_{0}}} \sum\_{i=1}^{+\infty} \frac{\mathfrak{x}\_{j}^{2a\_{0}} \cdot \upsilon\_{i}}{p^{2i}} = \gamma^{2}V\left.\right|$$

We now choose *γ* = *γ*<sup>0</sup> ≤ 1 such that *γ*<sup>2</sup> <sup>0</sup>*V* = 1. The random variables of the sequence (*Wi*<sup>1</sup> , *Wi*1*i*<sup>2</sup> , *Wi*1*i*2*i*<sup>3</sup> , ... *Wi*1*i*2*i*3...*iN* )*N*≥<sup>1</sup> are, in fact, distinct random variables of the sequence (*Lj*(*α*0, *γ*0))*j*≥<sup>1</sup> and so, are independent. It is well known that, since

$$0 \le \mathbb{E}\left[L\_j(\mathfrak{a}\_0, \gamma\_0)\right] = \left|\mathbb{E}\left[L\_j(\mathfrak{a}\_0, \gamma\_0)\right]\right| \le \mathbb{E}\left[|L\_j(\mathfrak{a}\_0, \gamma\_0)|\right] \le 1,$$

a sequence such as the one defined by Formula (28) is a supermartingale with respect to its natural filtration (see, for instance, ([58], p. 475)). Due to the independence we have that:

$$\begin{split} \mathbb{E}[|\mathcal{M}\_{N}|] &= \mathbb{E}[\left|\mathcal{W}\_{\bar{i}\_{1}}\right|] \cdot \mathbb{E}[\left|\left|\mathcal{W}\_{\bar{i}\_{1}\bar{i}\_{2}}\right|\right| \cdot \mathbb{E}[\left|\left|\mathcal{W}\_{\bar{i}\_{1}\bar{i}\_{2}\bar{i}\_{3}\right|}\right|] \cdot \dots \mathbb{E}[\left|\left|\mathcal{W}\_{\bar{i}\_{1}\bar{i}\_{2}\bar{i}\_{3}\dots\bar{i}\_{N}\right|}\right|] \cdot \\ &= \prod\_{k=1}^{N} \mathbb{E}[\left|\left|\mathcal{W}\_{\bar{i}\_{1}\bar{i}\_{2}\bar{i}\_{3}\dots\bar{i}\_{k}}\right|\right|] \leq 1 \;/\end{split}$$

that is, sup*N*≥<sup>1</sup> [|*MN*|] ≤ 1, and so, due to a well known theorem of Doob (see, for instance, ([58], p. 508)) the first conclusion follows. Using the facts that *Lj*(*α*0, *γ*0) = 1 and that the random variables *Wi*1*i*2*i*3...*ik* are distinct elements of the sequence(*Lj*(*α*0, *γ*0))*j*≥1, observing that for *k* = *l* we have that,

$$\begin{aligned} \mathbb{V}[L\_k \cdot L\_l] &= \mathbb{V}[L\_k] \cdot \mathbb{V}[L\_l] + \mathbb{V}[L\_k] \cdot \mathbb{II}[L\_l]^2 \mathbb{V}[L\_l] \cdot \mathbb{II}[L\_k]^2 \\ &= 1 + \mathbb{E}[L\_l]^2 + \mathbb{E}[L\_k]^2 \ge 1 \end{aligned}$$

by induction, we now can state that:

$$\mathbb{V}[\mathcal{M}\_N] = \mathbb{V}\left[\prod\_{k=1}^N \mathcal{W}\_{i\_1 i\_2 i\_3 \dots i\_k}\right] \ge 1 \;/\;$$

and so the second conclusion also follows.

Let us give an idea of a random field built under the hypothesis of Theorem 7. In Figure 6, we present a low order approximation of the random surface associated with the example introduced by Formula (1) in Section 2.1. The skeleton for this approximation is the matrix *M*<sup>7</sup> a square matrix having around 43 million entries.

**Figure 6.** An approximation of low order of the random surface, built upon the skeleton *M*7: surface plot (**left**); Contour plot: (**right**).

**Remark 18** (On the covariance of the random field (*X*(*s*,*t*))(*s*,*t*)∈]0,1[<sup>2</sup> )**.** *Due to the general procedure considered in the construction of the random field it is possible to determine some interesting results on the covariance. In fact let, for two distinct points* (*s*, *t*),(*s* , *t* ) ∈]0, 1[ <sup>2</sup>*, be the correspondent martingale sequences with elements MN*(*C*(*s*, *t*)) *and MN*+*P*(*C*(*s* , *t* )) *with*

*N*, *P* ≥ 1*. Let us suppose that the integer* 0 ≤ *N*<sup>0</sup> < *N is the largest integer such that the points* (*s*, *t*),(*s* , *t* ) *both belong to the same dyadic cell. It is then clear then that:*

$$\begin{split} & \mathbb{E}ov\left[M\_{N}(\mathbb{C}(s,t)),M\_{N+P}(\mathbb{C}(s',t'))\right] \\ &=\mathbb{E}\left[M\_{N\_{0}}(\mathbb{C}(s,t))^{2}\right]\mathbb{E}\left[\prod\_{k=N\_{0}+1}^{N}W\_{i\_{1}i\_{2}i\_{3}...i\_{k}}(\mathbb{C}(s,t))\right]\mathbb{E}\left[\prod\_{k=N\_{0}+1}^{N+P}W\_{i\_{1}i\_{2}i\_{3}...i\_{k}}(\mathbb{C}(s',t'))\right] \\ & -\mathbb{E}\left[M\_{N}(\mathbb{C}(s,t))\right]\mathbb{E}\left[M\_{N+P}(\mathbb{C}(s',t'))\right] .\end{split}$$

*If all the random variables of the sequence* (*Lj*)*j*≥<sup>1</sup> *have mean equal to 1 and then, forcefully, the absolute moment of second order is strictly larger than 1, for instance, equal to 2, then, again by Lebesgue convergence theorem we have that:*

$$\text{Cov}\left[X\_{(s,t)}, X\_{(s',t')}\right] = 2^{N\_0} - 1 \text{ .}$$

*where, as already said, N*<sup>0</sup> ≥ 0 *is the largest integer such that the points* (*s*, *t*),(*s* , *t* ) *both belong to the same dyadic cell. If the points do not belong to any common dyadic cell (see Figure 5), that is if N*<sup>0</sup> = 0*, the covariance is null. The closer the points are, the larger the integer N*<sup>0</sup> *is, and so, the larger the covariance.*

#### **5. Conclusions and Future Work**

In this work, we introduced structured random matrices having a skeleton built from the a matrix substitution process with entries in a finite field. We showed that the iterated application of a particular kind of matrix substitution generates a sequence of matrices that admit a periodic point—that may be a fixed point—or a fixed point for the sequence of matrix principal parts of a given order. The random matrices, with independent entries, having as skeletons matrices derived from this matrix substitution process have remarkable properties whenever the random variables satisfy some uniform properties. It is showed, under adequate hypothesis, that:


A more detailed analysis of the spectral properties of the random matrices here introduced is, for us, open to future work. Furthermore, matrices with a high percentage of zeros can be generated by considering special global matrix substitutions maps; the detailed properties of these matrices will be object of future work. Finally, a reciprocal problem to the one considered in this work is to determine if a large matrix is a fixed point of some global matrix substitution map. A reasonable conjecture is that for every large matrix there exists a global matrix substitution map admitting a fixed point that is close, in some sense, to the given matrix.

**Author Contributions:** Conceptualization, M.L.E.; methodology M.L.E.; software M.L.E.; validation M.L.E. and N.P.K.; formal analysis, M.L.E. and N.P.K.; investigation M.L.E. and N.P.K.; resources M.L.E. and N.P.K.; writing—original draft preparation M.L.E.; writing—review and editing, M.L.E. and N.P.K.; visualization, M.L.E. and N.P.K.; supervision M.L.E.; project administration M.L.E.; funding acquisition M.L.E. All authors have read and agreed to the published version of the manuscript. **Funding:** For the first author this work was partially supported through the project of the Centro de Matemática e Aplicações, UID/MAT/00297/2020 financed by the Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology). The APC was by supported by Fidelidade-Companhia de Seguros, S.A. to which the authors express their warmest acknowledgment.

#### **Data Availability Statement:** Not applicable.

**Acknowledgments:** This work was published with financial support from by the New University of Lisbon. The authors express gratitude to the comments, corrections, and questions of the referees that led to a revised and better version of this work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*Mathematics* Editorial Office E-mail: mathematics@mdpi.com www.mdpi.com/journal/mathematics

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9193-3