**1. Introduction**

Let *X*1, *X*2, ... be a sequence of independent and, for simplicity in this Introduction, identically distributed (i.i.d.) random variables (r.v.s) with *a* := **E***X*1 -= 0. Let *N* be a random variable independent of {*<sup>X</sup>*1, *X*2, ...} and having the geometric distribution Geom(*p*) with parameter *p* ∈ (0, <sup>1</sup>), i.e., **P**(*N* = *n*) = *p*(<sup>1</sup> − *p*)*<sup>n</sup>*−<sup>1</sup> for *n* ∈ N. Denote also *N*0 := *N* − 1 the shifted geometric r.v. Let *Sn* := ∑*n k*=1 *Xk*, *n* ∈ N, *S*0 := 0. The well-known Rényi theorem states that the distribution of a properly normalized geometric random sum *SN* converges weakly to the exponential law as *p* tends to zero. More precisely,

$$\mathcal{W} \coloneqq \frac{\text{S}\_N}{\text{ES}\_N} \xrightarrow{d} \mathcal{E} \text{ as } p \downarrow 0, \quad \text{where } \mathcal{E} \sim \text{Exp}(1) \text{ and } \text{ES}\_N = \mathbb{E}N \to \text{X}\_1 = a/p. \tag{1}$$

Here, the notation Exp(*λ*) stands for the exponential distribution with density *λe*<sup>−</sup>*λ<sup>x</sup>* (0,∞)(*x*), *λ* > 0. Originally, Rényi proved Equation (1) under the additional assumption of nonnegativeness of

{*Xn*}. However, it can be made sure that Equation (1) holds also: (i) for alternating {*Xn*} (by alternating r.v. we mean a r.v. that may take values of both signs); and (ii) for

$$W\_0 := \frac{S\_{N\_0}}{\mathbf{ES\_{N\_0}}} = \frac{pS\_{N\_0}}{a(1-p)}\text{.}$$

in place of *W* (still without any support assumptions on the distribution of {*Xk*}). This can be done, for example, by showing that the characteristic function (ch.f.) of *W* (and also of *W*0) converges pointwisely to that of the exponential distribution.

The importance of every limit theorem only increases if it is accompanied by the corresponding estimates of the rate of convergence. There are several bounds on the accuracy of approximation in Equation (1), mainly w.r.t. the Kolmogorov (uniform) and *ζ*-metrics, which are cited below. All of them assume additional conditions on the distribution of random summands including the finiteness of higher-order moments.

Recall that both the Kolmogorov and *ζs*-metrics are defined as simple probability metrics with *ζ*-structure (see Section 2 of [1]) between probability distributions (d.f.s *F*, *G*) of r.v.s *X*, *Y*:

$$\mathbb{Z}\_{\mathcal{H}}\left(\mathcal{F},\mathcal{G}\right) \equiv \mathbb{Z}\_{\mathcal{H}}\left(\mathcal{E}\left(X\right),\mathcal{E}\left(Y\right)\right) \equiv \mathbb{Z}\_{\mathcal{H}}\left(X,Y\right) := \sup\_{h \in \mathcal{H}} \left| \int\_{\mathbb{R}} h \, dF - \int\_{\mathbb{R}} h \, dG \right| \tag{2}$$

for specific classes H of real Borel functions on R (to simplify the notation, here and in what follows, we use r.v.s as well as their distributions and d.f.s in the arguments of simple probability metrics interchangeably; this should not cause any misunderstanding). The Kolmogorov metric *ρ* is obtained with H = (−∞,*<sup>a</sup>*)(*x*) | *a* ∈ R , the class of indicators of all open intervals with unbounded left endpoint:

$$\rho(F, G) := \sup\_{\mathfrak{x} \in \mathbb{R}} |F(\mathfrak{x}) - G(\mathfrak{x})|\_{\mathfrak{x}}.$$

while *ζ*-metric of order *s* > 0, originally introduced by Zolotarev [2] (see also [3]) as an example of an ideal metric with *ζ*-structure, is defined as *ζ*H with H = F ∞ *s* , where

$$\mathcal{F}\_s^{\infty} := \left\{ h \in \mathcal{F}\_s \colon h \text{ is bounded} \right\},$$

$$\mathcal{F}\_s := \left\{ h \colon \mathbb{R} \to \mathbb{R} \colon \left| h^{(m)}(\mathbf{x}) - h^{(m)}(y) \right| \le |\mathbf{x} - y|^{s - m} \text{ } \forall \mathbf{x}, y \in \mathbb{R} \text{ with } m := \lceil s - 1 \rceil \in \mathbb{N}\_0 \right\}, \text{ s} > 0, 1$$

that is,

$$\mathcal{Z}\_s \left( F, G \right) := \sup\_{h \in \mathcal{F}\_s^{\infty}} \left| \int\_{\mathbb{R}} h \, dF - \int\_{\mathbb{R}} h \, dG \right|. \tag{3}$$

Observe that *h* ∈ F*s* iff *h* ∈ F*<sup>s</sup>*−1, *s* > 1. If **E**|*X*| *s* < ∞ and **E**|*Y*| *s* < <sup>∞</sup>, then *ζs*(*<sup>F</sup>*, *G*) < ∞ and the least upper bound w.r.t. to *h* ∈ F ∞ *s* in Equation (3) may be replaced with that over a wider class F*<sup>s</sup>*. For further properties of *ζs*-metrics, we refer to the works in [3,4] and Section 4 of [5].

In the present paper, we focus mostly on *ζ*1-metrics between distributions with finite first moments; under this assumption, the definition of *ζ*1-metric can be rewritten as

$$\mathcal{J}\_1\left(F,G\right) = \sup\_{h \in \text{Lip}\_1} \left| \int\_{\mathbb{R}} h \, dF - \int\_{\mathbb{R}} h \, dG \right|,\tag{4}$$

where

$$\text{Lip}\_c := \left\{ h \colon \mathbb{R} \to \mathbb{R} \, \Big| \, |h(\mathbf{x}) - h(\mathbf{y})| \le c \, |\mathbf{x} - \mathbf{y}| \quad \forall \mathbf{x}, \mathbf{y} \in \mathbb{R} \right\}, \quad c > 0, 1$$

so that Lip1 = F1. It is worth noting that *ζ*1 has several alternative representations. The Kantorovich–Rubinstein theorem states that *ζ*1(*<sup>X</sup>*,*<sup>Y</sup>*) is minimal with respect to the compound metric **E**|*X* − *<sup>Y</sup>*|, while the results in [6] imply that the optimal coupling is attained at the comonotonic

pair (that is, with (*<sup>X</sup>*,*<sup>Y</sup>*)=(*<sup>F</sup>*−<sup>1</sup>(*U*), *<sup>G</sup>*−<sup>1</sup>(*U*)), *U* having the uniform distribution on (0, <sup>1</sup>), *<sup>F</sup>*−1, *G*−<sup>1</sup> being generalized inverse d.f.s):

$$\mathcal{J}\_1(F,G) = \min\_{\mathcal{X}'(X',Y'):\ X' \stackrel{d}{=} X, Y' \stackrel{d}{=} Y} \mathbb{E}|X'-Y'| = \int\_0^1 \left| F^{-1}(u) - G^{-1}(u) \right| \, du = \int\_{-\infty}^\infty \left| F(\mathbf{x}) - G(\mathbf{x}) \right| \, d\mathbf{x}. \tag{5}$$

The rightmost representation in Equation (5), as the mean metric between the d.f.s *F* and *G*, follows from the geometrical interpretation. The metric *ζ*1 is also called the Kantorovich, or the Wasserstein distance.

Thus, coming back to the convergence rate estimates in Equation (1), we first mention the paper by Solovyev [7], which gives the following uniform bound for nonnegative {*Xk*}, as pointed out in [8]:

$$
\rho(\mathcal{W}\_{0\prime} \mathcal{C}) \le 24p \, \frac{\gamma\_r}{r-2}, \quad 2 < r \le 3,\tag{6}
$$

where *γr* = -**<sup>E</sup>***Xr*1/*a<sup>r</sup>*1/(*<sup>r</sup>*−<sup>1</sup>).

Kalashnikov and Vsekhsvyatskii [9] proved a uniform upper bound for nonnegative summands in terms of their moments of order *s* ∈ (1, 2]:

$$\rho(\mathcal{W}, \mathcal{C}) \le \mathbb{C} p^{s-1} \frac{\mathbb{E}X\_1^s}{a^s},\tag{7}$$

where *C* is an absolute constant.

Kruglov and Korolev [10] gave the following nonuniform bound of the accuracy of the exponential approximation to the normalized geometric distribution (i.e., for degenerate {*Xn*}):

$$\begin{split} \left| \mathbb{P}(pN < \mathbf{x}) - (1 - e^{-\mathbf{x}}) \right| &\leq \mathbf{x} \, \mathbb{1}\_{\{\mathbf{x} < p\}} + \left( e^{-\mathbf{x}} - e^{-\mathbf{Q}(p)\mathbf{x}} \right) \mathbf{1}\_{\{\mathbf{x} \geq p\}} \leq \\ &\leq \mathbf{x} \left[ \mathbb{1}\_{\{\mathbf{x} < p\}} + \frac{p}{2(1 - p)} e^{-\mathbf{x}} \mathbb{1}\_{\{\mathbf{x} \geq p\}} \right], \end{split} \tag{8}$$

where *Q*(*p*)=(<sup>1</sup> − *p*/2)/(<sup>1</sup> − *p*).

Brown [8] proved an asymptotically exact (as *p* → 0) upper bound for nonnegative summands, which does not require moments of order greater than two:

$$\rho(\mathcal{W}\_{0\prime}\mathcal{C}) \le p \, \frac{\mathbb{E}X\_1^2}{a^2} \max\left(1, \frac{1}{2(1-p)}\right). \tag{9}$$

Brown also showed that Equation (9) is tighter than Equation (6) for all 2 < *r* ≤ 3 and *p* ∈ (0, 0.5]. Moreover, Equation (9) can be treated as a specification of Equation (7) for *s* = 2 with a concrete value of *C*.

Sugakova [11] presented some bounds for the d.f. *FSN*0 (*t*) for *t* > 1 using the characteristics of the renewal process built on top of independent and not necessary identically distributed alternating {*Xn*} with identical means.

Kalashnikov [12] provided estimates of the rate of convergence in the Rényi theorem for i.i.d. alternating {*Xn*} w.r.t. *ζs*-metrics of order *s* ∈ [1, 2] and the uniform metric (the latter is done under the additional assumption of bounded density), in particular, for any *s* ∈ (1, 2],

$$\mathbb{Z}\_s(\mathcal{W}, \mathcal{C}) \le p^{s-1} \mathbb{Z}\_s(X\_1, \mathcal{C}),\tag{10}$$

$$\mathbb{Z}\_1(\mathcal{W}, \mathcal{C}) \le p \, \mathbb{Z}\_1(X\_1, \mathcal{C}) + 2(1 - p)p^{s - 1} \, \mathbb{Z}\_s(X\_1, \mathcal{C}),\tag{11}$$

provided that **E***X*1 = 1.

Among other valuable things, Peköz and Röllin [13] exploited Stein's method and equilibrium (stationary renewal) distributions (see Section 3) to estimate the Kantorovich distance between the exponential distribution and that of a normalized geometric random sum *W* of square integrable independent and not necessary identically distributed nonnegative random summands {*Xn*} with identical positive means under the technical assumption **E***Xk* = 1:

$$\mathbb{E}\left(\mathcal{J}\_{1}(\mathcal{W},\mathcal{C})\right) \le 2p \sum\_{n=1}^{\infty} \mathbb{P}(N=n) \,\,\zeta\_{1}(X\_{n}, X\_{n}^{\varepsilon}),\tag{12}$$

where *Xen* has an equilibrium distribution w.r.t. *Xn*, *n* ∈ N. Using the trivial bound *ζ*1(*<sup>X</sup>*,*<sup>Y</sup>*) ≤ **E**|*X*| + **E**|*Y*| that follows from representation (5) and holds true for arbitrary r.v.s *X*,*Y* with finite first moments, the inequality in Equation (12) can be naturally extended to

$$\mathbb{E}\left(\mathcal{N}\_{\prime}\mathcal{S}^{\varepsilon}\right) \le 2p \sup\_{n} \mathbb{E}\_{1}(X\_{n\prime}X\_{n}^{\varepsilon}) \le p \sup\_{n} \left(\mathbb{E}X\_{n}^{2} + 2\right),\tag{13}$$

as done in [14].

Equation (22) of Hung [15] gives the following bound for the Trotter distance between *W* and E in the case of i.i.d. nonnegative summands {*Xn*} with **E***X*1 = 1:

$$d\_T(\mathcal{W}, \mathcal{E}; h) := \sup\_{t \in \mathbb{R}} |\mathsf{E}h(\mathcal{W} + t) - \mathsf{E}h(\mathcal{E} + t)| \le p^{s-1} \left( \mathbb{E}X\_1^2 + 3 \right), \quad h \in \mathcal{F}\_s^\infty, s \in (1, 2]. \tag{14}$$

Given that *ζs*(*<sup>W</sup>*, E ) = sup*h*∈F ∞*sdT*(*<sup>W</sup>*, E ; *h*), the estimate in Equation (14) may be rewritten as

$$\mathbb{Z}\_s(\mathcal{W}, \mathcal{C}) \le p^{s-1} \left( \mathbb{E}X\_1^2 + 3 \right) \quad \text{for} \quad s \in (1, 2]. \tag{15}$$

To compare Equation (15) with Kalashnikov's bound in Equation (10), observe that, by Theorem 1(i,c) below, the dual representation of *ζs*(*<sup>X</sup>*,*<sup>Y</sup>*)-metric as the minimal w.r.t. the compound metric **E**|*X* − *Y*|*s* for *s* ∈ (0, 1] (see, e.g., Corollary 5.2.2 of [4]), and, finally, Theorem 1(g) below, for *s* ∈ (1, 2], we have

$$\begin{split} \mathcal{I}\_{\mathfrak{s}}(X\_{1}, \mathcal{C}) &= \mathcal{I}\_{\mathfrak{s}-1}(X\_{1}^{\mathfrak{s}}, \mathcal{C}^{\mathfrak{s}}) = \mathcal{I}\_{\mathfrak{s}-1}(X\_{1}^{\mathfrak{s}}, \mathcal{C}) = \inf\_{\mathcal{A}'(X,Y) : X \stackrel{d}{=} X\_{1}^{\mathfrak{s}}, \ Y \stackrel{d}{=} \mathcal{E}} \mathsf{E} \, |X - Y|^{\mathfrak{s} - 1} \le \mathsf{E}, \\ &\le \mathsf{E} \, |X\_{1}^{\mathfrak{s}} - \mathcal{C}| + 1 \le \mathsf{E} X\_{1}^{\mathfrak{s}} + \mathsf{E} \, \mathcal{C} + 1 = \mathsf{E} X\_{1}^{2}/2 + 2 < \mathsf{E} X\_{1}^{2} + 3, \end{split}$$

hence, Kalashnikov's bound in Equation (10) is tighter than Equation (15).

Thus, most existing estimates of the rate of convergence in the Rényi theorem were obtained under the additional assumption of nonnegativeness of random summands {*Xn*}. However, there are many applications where geometric random sums appear with alternating random summands, for example, as profit-or-losses in financial mathematics, risk theory, queuing theory, etc. Hence, extensions of such sharp and natural estimates as Equations (9), (12), and (13), say, to the alternating random summands, would not only represent a theoretical interest, but can also be in grea<sup>t</sup> demand by various applications of probability theory.

In the present paper, we focus on *ζ*1-estimates, in particular, we extend bounds in Equations (12) and (13) to the alternating case. More precisely, in Theorem 4 below, we prove that, for square

integrable independent and not necessarily identically distributed random summands {*Xn*} with identical nonzero means (for simplicity, equal to one), the following estimates hold:

$$\mathcal{J}\_1(\mathcal{W}, \mathcal{E}) \le 2p \sum\_{n=1}^{\infty} \mathbf{P}(N=n) \, \mathcal{J}\_1\left(\mathcal{E}(X\_n), \mathcal{E}^\varepsilon(X\_n)\right) \le p \left(\mathbf{E}X\_N^2 - 2\mathbf{P}(X\_N \le 0)\right),\tag{16}$$

$$\zeta\_1(\mathcal{W}\_{0\prime}\mathcal{E}) \le \frac{2p}{1-p} \zeta\_1\left(\delta\_{0\prime}, \mathcal{E}^\varepsilon(X\_N)\right) = \frac{p}{1-p} \operatorname{EX}\_{N\prime}^2 \tag{17}$$

where *δ*0 is the Dirac measure concentrated in zero and L *e* (*Xn*) is the equilibrium transform of L (*Xn*), which is a generalization of the equilibrium distribution introduced in Section 3 below and, generally speaking, is no more a probability measure (therefore, we write L *e* (*Xn*) instead of L (*Xe n*)), but allows eliminating the support constraints on the distribution of *Xn*. The notion of the *ζ*1-metric between signed measures is introduced in Section 2 below and coincides with that of the ordinary *ζ*1-metric in case of probability measures. Thus, the intermediate estimate in Equation (16) coincides with estimate (12), but now also holds true for alternating random summands {*Xn*}. Furthermore, it can easily be seen that the right-hand side of Equation (16) does not exceed

$$p\sup\_{\mathfrak{n}}\left(\mathbb{E}X\_{\mathfrak{n}}^2 - 2\,\mathbb{P}(X\_{\mathfrak{n}} \le 0)\right)$$

and, hence, is tighter than estimate (13) and does not require that {*Xn*}'s take only positive values. The comparison of estimates (16) and Kalashnikov's bound in Equation (11) with *s* = 2

$$\begin{split} \zeta\_1(\mathcal{W}, \mathcal{C}) &\leq p \, \mathbb{J}\_1(X\_1, \mathcal{C}) + 2p(1 - p) \, \mathbb{J}\_2(X\_1, \mathcal{C}) = \\ &= p \, \mathbb{J}\_1(X\_1, \mathcal{C}) + 2p(1 - p) \, \mathbb{J}\_1(\mathcal{A}^\alpha(X\_1), \text{Exp}(1)) \end{split} \tag{18}$$

(for the equality here, see Theorem 1(i) below) is complicated in the general case, since, due to Theorem 3 below, the rightmost expression does not exceed

$$2p(2-p)\mathcal{J}\_1\left(\mathcal{Q}^\circ(X\_1), \mathcal{Q}^\circ(X\_1)\right),$$

which is asymptotically twice greater than the intermediate expression in Equation (16), while the intermediate estimate in Equation (16), by the triangle inequality, yields the bound

$$\mathcal{Z}\_1(\mathsf{W}, \mathcal{C}) \le 2p \mathcal{Z}\_1(X\_1, \mathcal{C}) + 2p \mathcal{Z}\_1(\mathcal{Z}^c(X\_1), \mathrm{Exp}(1))$$

with the first term twice larger than that in Equation (18).

We use the same techniques and recipes as in [13]. First, we bound the left-hand side of Equation (16) from above with *ζ*1 - L (*W*), L *e* (*W*) using Stein's method (see Theorem 3 in Section 4). Second, we estimate *ζ*1 - L (*W*), L *e* (*W*) by the *ζ*1-distances between *Xn* and their equilibrium transforms L *e* (*Xn*), *n* ∈ N. Third, we construct an optimal upper bound for *ζ*1 - L (*Xn*), L *e* (*Xn*) in terms of the second moments of *Xn* and **P**(*Xn* ≤ <sup>0</sup>), *n* ∈ N (see Theorem 2 in Section 3). The resulting upper bounds for *ζ*1(*<sup>W</sup>*, E ) and *ζ*1(*<sup>W</sup>*0, E ) are given in Theorem 4 of Section 5. Furthermore, we provide asymptotic lower bounds for *ζ*1(*<sup>W</sup>*, E ) and *ζ*1(*<sup>W</sup>*0, E ) (see Theorem 5 in Section 5) in terms of the so-called *asymptotically best constants* introduced in Section 5. The constructed lower bounds turn out to be asymptotically four times smaller than the upper ones. Finally, we extend the obtained estimates of the accuracy of the exponential approximation to non-geometric random sums of independent random variables with non-identical nonzero means of identical signs (see Theorem 6 in Section 5).

#### **2. The Kantorovich Distance between Signed Measures**

In the next sections, we need to calculate the Kantorovich (or *ζ*1-) distance between measures on (<sup>R</sup>, B) that are no longer probabilities, but still have unit mass on R. Denote by M<sup>1</sup> the linear space of signed measures on (<sup>R</sup>, B) with finite total variations and finite first moments, and by M10 the subspace of measures *σ* ∈ M<sup>1</sup> with *σ*(R) = 0.

The Kantorovich norm on M10is defined as (see Section 3.2 of [16])

$$\|\sigma\|\_{K} := \sup\_{f \in \mathcal{L} \text{i} \mathbf{P}\_1} \left| \int\_{\mathbb{R}} f \, d\sigma \right| \cdot$$

Now let *μ*, *ν* ∈ M<sup>1</sup> and *μ*(R) = *<sup>ν</sup>*(R), so that *μ* − *ν* ∈ M10. The *induced Kantorovich distance ζ*1 between *μ* and *ν* is

$$\mathcal{J}\_1(\mu, \nu) := \||\mu - \nu\||\_K = \sup\_{f \in \text{Lip}\_1} \left| \int\_{\mathbb{R}} f \, d\mu - \int\_{\mathbb{R}} f \, d\nu \right|. \tag{19}$$

It is easy to see that in the case of probability measures *μ* and *ν* Equation (19) coincides with the definition of *ζ*1-distance given in Equation (4).

Using the Jordan decompositions *μ* = *μ*<sup>+</sup> − *μ*<sup>−</sup> and *ν* = *ν*<sup>+</sup> − *<sup>ν</sup>*<sup>−</sup>, as well as the alternative representation in Equation (5) of the *ζ*1-distance between nonnegative measures *λ* = *μ*<sup>+</sup> + *ν*<sup>−</sup> and *π* = *ν*<sup>+</sup> + *μ*<sup>−</sup> with *λ*(R) = *π*(R) in terms of their d.f.s, after a proper normalization, one can rewrite Equation (19) as

$$\mathbb{Z}\_1(\mu, \nu) = \mathbb{Z}\_1(\lambda, \pi) = \int\_{\mathbb{R}} \left| F\_\lambda(\mathbf{x}) - F\_\pi(\mathbf{x}) \right| \, d\mathbf{x} = \int\_{\mathbb{R}} \left| F\_\mu(\mathbf{x}) - F\_\nu(\mathbf{x}) \right| \, d\mathbf{x},\tag{20}$$

where *<sup>F</sup>μ*(*x*) = *μ* -(−∞, *<sup>x</sup>*), *<sup>F</sup>ν*(*x*) = *ν* -(−∞, *<sup>x</sup>*), *x* ∈ R, are the d.f.s of the signed measures *μ* and *ν*, respectively. In other words, the alternative representation of Zolotarev's *ζ*1-distance in terms of d.f.s in Equation (5) is preserved for signed measures with identical masses of R.

We also use the convolution of signed measures *μ* ∗ *λ*, which is defined word-for-word as that of probability distributions. The uniqueness and multiplication theorems (see, e.g., Chapter 6 of [17] or Section 3.8 of [18]) state that the characteristic function of *μ* (the Fourier–Stieltjes transform of *<sup>F</sup>μ*)

$$\widehat{\mu}(t) := \int\_{\mathbb{R}} \mathfrak{e}^{it\mathbf{x}} \mu(d\mathbf{x}) = \int\_{\mathbb{R}} \mathfrak{e}^{it\mathbf{x}} dF\_{\mu}(\mathbf{x}), \quad t \in \mathbb{R}\_{\prime}$$

defines the signed measure *μ* as well as its d.f. *Fμ* uniquely and

$$
\widehat{\mu \ast \nu} = \widehat{\mu} \cdot \widehat{\nu}.
$$

The following lemma, which is a simple corollary to representation (20), shows that the well-known properties of homogeneity and regularity of the Kantorovich distance between probability distributions are preserved for signed measures, but with a slight correction.

**Lemma 1.** *The Kantorovich distance ζ*1 *on the space* <sup>M</sup><sup>1</sup>*D of finite signed Borel measures on the real line with the masses of* R *equal to D* ∈ R *and finite first moments possesses the following properties:*

**(a) Homogeneity of order 1.** *For every μ*, *ν* ∈ <sup>M</sup><sup>1</sup>*D and c* -= 0*, with μc*(*B*) := *μ*(*cB*)*, <sup>ν</sup>c*(*B*) := *ν*(*cB*) *and cB* := *cx* | *x* ∈ *<sup>B</sup>, B* ∈ B*, we have*

$$
\zeta\_1(\mu\_{c\prime}\nu\_c) = \frac{1}{|c|}\zeta\_1(\mu\_{\prime}\nu).
$$

**(b) Regularity.** *For all μ*, *ν* ∈ M<sup>1</sup> *Dand λ* ∈ M<sup>1</sup>*, we have*

$$
\mathcal{J}\_1(\mu \ast \lambda, \nu \ast \lambda) \le |\lambda|(\mathbb{R}) \cdot \mathcal{J}\_1(\mu, \nu),
$$

*where* |*λ*| := *λ*<sup>+</sup> + *λ*− *is the total variation of λ.*

To avoid abusing the notation, in what follows, we also use *ζ*1(*<sup>F</sup>*, *G*) for the Kantorovich distance between (signed) measures uniquely restored (Section 3.5, Theorem 3.29 of [19]) from distribution functions *F* and *G*.

#### **3. The Equilibrium Transform of Probability Distributions**

The notion of *equilibrium distribution* w.r.t. nonnegative r.v.s with finite positive means originally arises in the renewal theory as the distribution of the initial delay of a renewal process which makes its renewal rate constant (Chapter 11, § 4 of [20]) and, more generally, the renewal process stationary (Chapter 5, § 4 of [21]), which is why it is also called the *stationary renewal distribution*. Equilibrium distribution appears also as the limit distribution of the residual waiting times, or hitting probabilities (Chapter 11, § 4 of [20]) and in the celebrated Pollaczek–Khinchin–Beekman formula which expresses the ruin probability in the classical risk process in terms of geometric random sum of i.i.d. r.v.s whose common distribution is the equilibrium transform of the distributions of claims. Due to the definition given in a more general form in Equation (21) below, equilibrium distribution is also called the *integrated tail* one ([12], p. 37, [22]). Concerning the equilibrium transform, we would also like to mention the work of Harkness and Shantaram [23] who considered the *iterated equilibrium transform* for d.f.s with nonnegative support and investigated limit theorems for normalized iterations, the description of limit laws being given in [24]. In particular, the authors of [23] calculated the ch.f. of the equilibrium transform that can be used as the definition of the equilibrium transform in the general case and hence, with the inverse formula, can give a hint to definition in Equation (21) of the equilibrium d.f. with no support constraints.

We introduce an extension of the equilibrium distribution that is applicable for alternating random variables with finite nonzero first moments, but leads out of the class of probability distributions.

Let *P* be a probability measure with the d.f. *<sup>F</sup>*(*x*) = *P*(( − <sup>∞</sup>, *<sup>x</sup>*)), *x* ∈ R, ch.f. *f*(*t*) = *eitx P*(*dx*) = R *<sup>e</sup>itxdF*(*x*), *t* ∈ R, and a finite first moment *a* := *xP*(*dx*) = R *xdF*(*x*). If a r.v. *X* (on some probability space (<sup>Ω</sup>, Σ, **P**)) has the distribution *P*, we also write *P* = L (*X*), *f*(*t*) = **E***eitX* =: *fX*(*t*), *<sup>F</sup>*(*x*) = **P**(*X* < *x*) =: *FX*(*x*), *a* = **E***X*.

**Definition 1.** *The equilibrium d.f. (distribution) w.r.t. the d.f. F (probability distribution P / law* L (*X*)*) with a* -= 0 *is a function of bounded variation (a (signed) measure Pe /* L *e*(*X*) *on* B(R) *with the d.f.)*

$$F^{\epsilon}(\mathbf{x}) := \begin{cases} -\frac{1}{a} \int\_{-\infty}^{\infty} F(y) \, dy, & \text{if } \mathbf{x} \le \mathbf{0}, \\\\ -\frac{\mathbf{E}X^{-}}{a} + \frac{1}{a} \int\_{0}^{\infty} (1 - F(y)) \, dy, & \text{if } \mathbf{x} > \mathbf{0}, \end{cases} \tag{21}$$

$$\mathbf{x} = \frac{1}{a} \left( \mathbf{x}^+ - \int\_{-\infty}^{\infty} F(y) \, dy \right), \quad \mathbf{x} \in \mathbb{R}. \tag{22}$$

In Theorem 1(a) below, it is proved that *Fe*, indeed, has bounded variation and some useful properties of the equilibrium transform are stated as well.

We call *Fe* /*Pe* /L *e*(*X*) the equilibrium transform (d.f./distribution) w.r.t. *F*/*P*/L (*X*)/*X* correspondingly, although it may not be a probability d.f./distribution at all. At the same time, it can be easily seen that L *e*(*X*) is a probability measure if and only if *X* does not change sign (that is, if and only if *P* is concentrated either on (− <sup>∞</sup>, 0] or on [0, ∞)), in which case one might construct a random variable *Xe* with the distribution L (*Xe*) = L *e*(*X*) and such that *X* and *Xe* are either both nonnegative or both nonpositive.

In what follows, to indicate the r.v. whose equilibrium transform is considered, we use the corresponding lower index and write *FeX* and *f eX* for (*FX*)*e* and (*fX*)*<sup>e</sup>*, respectively.

**Theorem 1.** *Let X be a r.v. with the d.f. F and a* -= 0, *and Fe be the equilibrium d.f. w.r.t. F defined in Equation* (21)*. Then:*

**(a) Absolute continuity.** *The function Fe has bounded variation on* R *with*

> |L *<sup>e</sup>*(*X*)|(R) = **<sup>E</sup>**|*X*|/|**E***<sup>X</sup>*|, *<sup>F</sup><sup>e</sup>*(−<sup>∞</sup>) = 0, *<sup>F</sup><sup>e</sup>*(+∞) = 1,

*and, hence,* L *e*(*X*) *is a Borel measure with unit on* R*; moreover, Fe is a.c. with the Lebesgue derivative*

$$p^e(\mathbf{x}) = \begin{cases} -\frac{1}{d}F(\mathbf{x}), & \text{if } \mathbf{x} \le \mathbf{0}, \\ \frac{1}{d}(1 - F(\mathbf{x})), & \text{if } \mathbf{x} > \mathbf{0}, \end{cases} \tag{23}$$

*and* supp L *e*(*X*) *coincides with the convex hull of* supp L (*X*)*.*

**(b) Characteristic function.** *The ch.f. (Fourier–Stieltjes transform) of Fe has the form*

$$f^{\varepsilon}(t) := \int\_{\mathbb{R}} \varepsilon^{it \text{x}} dF^{\varepsilon}(x) = \frac{f(t) - 1}{tf'(0)} = \frac{f(t) - 1}{ita}, \quad \text{if } t \neq 0, \quad and \quad f^{\varepsilon}(0) = 1. \tag{24}$$


$$\mathbb{E}\mathbb{g}(X) - \mathbb{g}(0) = \mathbb{E}X \cdot \int\_{\mathbb{R}} \mathbb{g}'(x) \, dF^{\varepsilon}(x) \tag{25}$$

*for all Lipschitz functions g* : R → R*.*

**(e) Mixture preservation.** *For arbitrary d.f.s F*1, *F*2, ... *with identical nonzero expectations and a discrete probability distribution pn* ≥ 0, *n* ∈ N, ∑∞*<sup>n</sup>*=<sup>1</sup> *pn* = 1, *we have*

$$\left(\sum\_{n=1}^{\infty} p\_n F\_\mathbb{I}\right)^\varepsilon = \sum\_{n=1}^{\infty} p\_n F\_\mathbb{I}^\varepsilon. \tag{26}$$

**(f) Homogeneity.** *For all c* ∈ R \ {0}*, we have*

$$F(F\_{\varepsilon X})^\varepsilon(\mathbf{x}) = F\_X^\varepsilon(\mathbf{x}/\mathbf{c}), \quad \mathbf{x} \in \mathbb{R}, \tag{27}$$

*or, in terms of* (*constant-sign*) *r.v.s,* (*cX*)*e d*= *cXe, c* ∈ R \ {0}*. In other words, equilibrium transform respects scaling.*

**(g) Moments.** *If* **E**|*X*|*r*+<sup>1</sup> < ∞ *for some r* > 0*, then for all k* ∈ N ∩ [1,*r*] *we have*

$$\int\_{\mathbb{R}} \mathbf{x}^k \, dF^c(\mathbf{x}) = \frac{\mathbf{E}X^{k+1}}{(k+1)\mathbf{E}X} \; , \quad \int\_{\mathbb{R}} |\mathbf{x}|^r \, dF^c(\mathbf{x}) = \frac{\mathbf{E}X|X|^r}{(r+1)\mathbf{E}X} \; , \tag{28}$$

$$\int\_{\mathbb{R}} \mathbf{x}^k |dF^\varepsilon|(\mathbf{x}) = \frac{\mathbf{E}|X|X^k}{(k+1)|\mathbf{E}X|} \quad , \quad \int\_{\mathbb{R}} |\mathbf{x}|^r |dF^\varepsilon|(\mathbf{x}) = \frac{\mathbf{E}|X|^{r+1}}{(k+1)|\mathbf{E}X|} \quad. \tag{29}$$

**(h) Single summand property.** *Let N*, *X*1, *X*2, ... *be independent r.v.s, such that an* := **E***Xn* ∈ (0, <sup>∞</sup>)*, n* ∈ N*,* **P**(*N* ∈ N0) = 1*, SN* := *X*1 + ... + *XN, S*0 := 0*, A* := **E***SN* = ∞ ∑ *<sup>n</sup>*=1 *an***<sup>P</sup>**(*<sup>N</sup>* ≥ *n*) *be finite, and M be a* N*-valued r.v. with the distribution*

$$\mathbf{P}(M=m) = \frac{a\_m}{A} \mathbf{P}(N \ge m), \quad m \in \mathbb{N}.$$

*Then,*

$$\mathcal{L}^{\varrho\varepsilon}(S\_N) = \sum\_{m=1}^{\infty} \mathbf{P}(M=m) \lrcorner \mathcal{L}^{\varrho}(S\_{m-1}) \* \mathcal{L}^{\varrho\varepsilon}(X\_m),\tag{30}$$

*where* ∗ *denotes the convolution of two Borel measures, or, in terms of* (*constant-sign*) *r.v.s,*

$$S\_N^\epsilon \stackrel{d}{=} \mathcal{S}\_{M-1} + X\_{M'}^\epsilon$$

*where all the r.v.s are independent. In particular, if N* ∼ Geom(*p*) *and all Xk's have identical nonzero expectations, then M d*= *N and*

$$\mathcal{L}^{\mathfrak{e}^{\mathfrak{e}}}(S\_N) = \mathcal{L}^{\mathfrak{e}^{\mathfrak{e}}}(S\_{N-1}) = \sum\_{n=1}^{\infty} p(1-p)^{n-1} \mathcal{L}^{\mathfrak{e}}(S\_{n-1}) \* \mathcal{L}^{\mathfrak{e}^{\mathfrak{e}}}(X\_n), \tag{31}$$

*which can be also rewritten, in the case of i.i.d.* {*Xk*}*, in the form*

$$
\mathcal{Q}^{\mathfrak{c}}(S\_N) = \mathcal{Q}^{\mathfrak{c}}(S\_{N-1}) = \mathcal{Q}^{\mathfrak{c}}(S\_{N-1}) \* \mathcal{Q}^{\mathfrak{c}}(X\_1).
$$

**(i) Relation between** *ζ***-distances.** *For arbitrary d.f.s F and G with finite moments of order s* > 1 *and identical expectations a* -= 0*, we have*

$$\mathbb{Z}\_s(F, G) = |a| \, \mathbb{Z}\_{s-1}(F^\varepsilon, G^\varepsilon). \tag{32}$$

Theorem 2 below provides also an optimal upper bound for *ζ*1(*<sup>F</sup>*, *Fe*) given *F*(0+) and the second-order moment of *F*.

**Remark 1.** *Theorem 1(h) shows that the equilibrium transform of the geometric random sum of independent r.v.s with identical nonzero means does not depend on whether or not one takes the geometric distribution starting from zero.*

Let us make several historical remarks. Some of the properties of the equilibrium distribution stated in Theorem 1 were known for a nonnegative r.v. *X*. Thus, the characteristic function of *Xe* given in Equation (24) was found in [23], Equation (25) was taken as the definition of (the distribution of) *Xe* in [13,14]. In Theorem 2.1 of [13], it was proved that the exponential distribution is the only fixed point of the equilibrium transform; this fact is proved directly also in Lemma 5.2 of [14]. In [14] (p. 268), it is observed that (*cX*)*e d*= *cXe* for *c* > 0. Some moment calculations were given in [22]. Single summand property for *SN* was demonstrated in the proof of Theorem 3.1 of [13] for nonnegative, but not necessarily independent {*Xk*}. The fact that L *e*(*SN*) = L *<sup>e</sup>*(*SN*−<sup>1</sup>) for i.i.d. nonnegative {*Xk*} was observed in [8] (p. 1394). The equality in Equation (32) for *F*(0) = *G*(0) = 0 and *s* = 2 was stated in [12] (p. 37).

To prove Theorem 1, we require the following auxiliary statement. **Lemma 2.** *For every n* ∈ N *and z*1,..., *zn* ∈ C*, we have*

$$\prod\_{k=1}^{n} z\_k - 1 = \sum\_{k=1}^{n} (z\_k - 1) \prod\_{j=1}^{k-1} z\_j = \sum\_{k=1}^{n} (z\_k - 1) \prod\_{j=k+1}^{n} z\_{j\prime} \tag{33}$$

*where* <sup>∏</sup>*bj*=*<sup>a</sup>*(·) := 1 *for b* < *a.*

**Proof.** We use the induction w.r.t. *n*. For *n* = 1 Equation (33) is trivial. Let Equation (33) hold for *n* = 1, ... , *m* − 1; let us prove it for *n* = *m*. Using the inductive transition in the second equality below, we ge<sup>t</sup>

$$\prod\_{k=1}^{m} z\_k - 1 = (z\_m - 1)\prod\_{k=1}^{m-1} z\_k + \prod\_{k=1}^{m-1} z\_k - 1 = (z\_m - 1)\prod\_{k=1}^{m-1} z\_k + \sum\_{k=1}^{m-1} (z\_k - 1)\prod\_{j=1}^{k-1} z\_j = \sum\_{k=1}^{m} (z\_k - 1)\prod\_{j=1}^{k-1} z\_j.$$

The second equality in Equation (33) can be deduced from the first one just by the re-numeration of {*zk*}*nk*=<sup>1</sup> : *zk* ← *zn*−*k*+1, *k* = 1, . . . , *n*.

**Proof of Theorem 1.** (a) It follows immediately from the definition in Equation (21) of *Fe* that *Fe* is a.c. with the density given in Equation (23). In turn, Equation (23) implies that supp L *e*(*X*) is the convex hull of supp L (*X*) and, accounting for |L *e*(*X*)|(R) = |*p<sup>e</sup>*(*x*)|*dx* = **E**|*X*|/|**E***X*| < <sup>∞</sup>, also that *Fe* has bounded variation. The limiting values *Fe*(<sup>±</sup>∞) can be found directly using the definition of *Fe*.

(b) Using the density of *Fe* (see Equation (23)) and integrating by parts, we have

*f <sup>e</sup>*(*t*) = 1*a* R *eitx p<sup>e</sup>*(*x*) *dx* = 1*a* R *eitx*- (0,∞)(*x*) − *<sup>F</sup>*(*x*) *dx* = 1*ita* R - (0,∞)(*x*) − *<sup>F</sup>*(*x*) *deitx* = = 1 *ita* − *<sup>e</sup>itxF*(*x*)<sup>0</sup>−<sup>∞</sup> + *eitx*-1 − *<sup>F</sup>*(*x*)<sup>∞</sup>0 + R *eitx dF*(*x*) = *f*(*t*) − 1 *ita* ,

which coincides with Equation (24).

> (c) This statement follows immediately due to the uniqueness of the solution to the linear equation

$$f^c(t) \equiv \frac{f(t) - 1}{ita} = f(t) \quad \Leftrightarrow \quad f(t) = \frac{1}{1 - ita} \sim \text{Exp}(1/a) \dots$$

(d)–(g) These statements follow from the definition and integration by parts for (d) and (g) or the linearity of the Lebesgue–Stieltjes integral for (e).

(h) Let us denote *f*0(*t*) ≡ 1, *fk*(*t*) = **E***eitXk* , *k* ∈ N, *t* ∈ R. Using the fact that

$$f\_{\mathbb{S}\_N}(t) = \sum\_{n=0}^{\infty} \mathbb{P}(N=n) \mathbb{E}e^{itS\_n} = \sum\_{n=0}^{\infty} \mathbb{P}(N=n) \prod\_{k=0}^{n} f\_k(t),$$

together with the equation for the equilibrium ch.f. in Equations (24) and (33), we ge<sup>t</sup>

$$\begin{split} f\_{\mathbb{S}\_N}^{\varepsilon}(t) &= \frac{f\_{\mathbb{S}\_N}(t) - 1}{tf\_{\mathbb{S}\_N}'(0)} = \frac{1}{itA} \sum\_{n=1}^{\infty} \mathbb{P}(N=n) \left( \prod\_{k=1}^n f\_k(t) - 1 \right) = \\ &= \sum\_{n=1}^{\infty} \mathbb{P}(N=n) \sum\_{k=1}^n \frac{f\_k(t) - 1}{itA} \prod\_{j=1}^{k-1} f\_j(t) = \\ &= \sum\_{n=1}^{\infty} \mathbb{P}(N=n) \sum\_{k=1}^n \frac{a\_k}{A} f\_k^{\varepsilon}(t) f\_{\mathbb{S}\_{k-1}}(t). \end{split}$$

Changing the order of summation, which is possible by virtue of the absolute convergence of the above series, and recalling the definition of L (*M*), we obtain

$$f\_{S\_N}^{\varepsilon}(t) = \sum\_{k=1}^{\infty} f\_k^{\varepsilon}(t) \, f\_{\mathbb{S}\_{k-1}}(t) \cdot \frac{a\_k}{A} \sum\_{n=k}^{\infty} \mathbb{P}(N=n) = \sum\_{k=1}^{\infty} f\_k^{\varepsilon}(t) \, f\_{\mathbb{S}\_{k-1}}(t) \, \mathbb{P}(M=k),$$

which is equivalent to Equation (30) by virtue of the uniqueness theorem.

If now *N* ∼ Geom(*p*) and *a*1 = *a*2 = ... = *a*, then *A* = *a***E***N* = *<sup>a</sup>*/*p*, **P**(*M* = *k*) = *p*(<sup>1</sup> − *p*)*<sup>k</sup>*−<sup>1</sup> = **P**(*N* = *k*), *k* ∈ N. Denoting by *M*0 a r.v. corresponding to *N*0 := *N* − 1 with the distribution

$$\mathbf{P}(M\_0 = k) := a\_k \mathbf{P}(N\_0 \ge k) \Big/ \sum\_{k=1}^{\infty} a\_k \mathbf{P}(N\_0 \ge k) = \mathbf{P}(N\_0 \ge k) / \mathbf{E} N\_0 = p(1-p)^{k-1}, \quad k \in \mathbb{N}\_0$$

we observe that *M*0 *d*= *N d*= *M*. This proves Equation (31).

(i) This statement follows from Theorem 4.2(a), Equation (4.20) of [5]. It can also be proved independently, namely, by virtue of (d) we have

$$\begin{split} \mathcal{I}\_{\mathbb{S}}(\mathcal{F},G) &= \sup\_{h \in \mathcal{F}\_{\mathbb{S}}} \left| \int\_{\mathbb{R}} h \, dF - \int\_{\mathbb{R}} h \, dG \right| = |a| \sup\_{h \in \mathcal{F}\_{\mathbb{S}}} \left| \int\_{\mathbb{R}} h' \, dF^{\varepsilon} - \int\_{\mathbb{R}} h' \, dG^{\varepsilon} \right| = \\ &= |a| \sup\_{h \in \mathcal{F}\_{\mathbb{S}-1}} \left| \int\_{\mathbb{R}} h \, dF^{\varepsilon} - \int\_{\mathbb{R}} h \, dG^{\varepsilon} \right| = |a| \, \zeta\_{s-1}(F,G). \end{split}$$

To conclude this section, we construct an optimal upper bound for the Kantorovich distance between an arbitrary probability distribution with nonzero mean and its equilibrium transform given its second moment and the mass of nonpositive axis. Before formulating the corresponding result, we have to note that Cantelli's (one-sided Chebyshev's) inequality yields **P**(*X* ≤ 0) ≤ 1 − 1/**E***X*<sup>2</sup> for an arbitrary r.v. *X* with 0 < **E***X*<sup>2</sup> < <sup>∞</sup>, and, hence,

$$\mathbb{E}X^2 \ge \frac{1}{1 - \mathbb{P}(X \le 0)}.$$

This remark explains the choice of the domain of parameters *q* and *b* in the following Theorem 2.

**Theorem 2.** *Take any q* ∈ [0, 1) *and b* ≥ <sup>√</sup>11−*<sup>q</sup> and let X be a square integrable r.v. with* **E***X* = 1*,* **E***X*<sup>2</sup> = *b*2*, and* **P**(*X* ≤ 0) = *q. Then,*

$$\mathbb{E}\_1\left(\mathcal{L}^\varrho(\mathbf{X}), \mathcal{L}^{\varrho^c}(\mathbf{X})\right) \le \frac{b^2}{2} - q,\tag{34}$$

*where* L *e*(*X*) *is the equilibrium transform of* L (*X*)*. The equality in Equation* (34) *is attained for every q* ∈ (0, 1) *and b* ≥ <sup>√</sup>11−*<sup>q</sup> on the two-point distribution* L (*X*) = *qδu* + (1 − *q*)*<sup>δ</sup>v with*

$$u = 1 - \sqrt{\frac{1 - q}{q}(b^2 - 1)}, \quad v = 1 + \sqrt{\frac{q}{1 - q}(b^2 - 1)},\tag{35}$$

*and for q* = 0 *and b* = 1 *on the degenerate distribution* L (*X*) = *δ*1*.*

**Remark 2.** *With the account of Theorem 1(f) and Lemma 1(a), for arbitrary* **E***X* -= 0*, Equation* (34) *takes the form*

$$\mathcal{J}\_1\left(\mathcal{E}^\circ(X), \mathcal{E}^{\circ c}(X)\right) \le \frac{1}{2} \cdot \frac{\mathbf{E}X^2}{|\mathbf{E}X|} - |\mathbf{E}X| \cdot \mathbf{P}(X \le 0).$$

**Proof of Theorem 2.** Let *F* be the d.f. of *X* and *Fe* be its equilibrium transform. Consider the following functional on the space **F** of probability d.f.s with unit mean and finite second moment:

$$J(F) = \zeta\_1(F, F^\epsilon) - \frac{1}{2} \int\_{\mathbb{R}} \mathbf{x}^2 \, dF(\mathbf{x}) + F(0+), \quad F \in \mathbf{F}. \tag{36}$$

Then, Equation (34) would follow from

$$\sup\_{F \in \mathcal{F}} J(F) \le 0. \tag{37}$$

Let us prove Equation (37).

Since *h* ∈ Lip1 if and only if (−*h*) ∈ Lip1, the modulus sign in the definition of *ζ*1(*<sup>F</sup>*, *Fe*) (see Equation (19)) may be omitted. Hence, we can rewrite

$$f(F) = \sup\_{h \in \text{Lip}\_1} f\_1(F, h), \quad \text{where} \quad f\_1(F, h) = \int\_{\mathbb{R}} h \, dF - \int\_{\mathbb{R}} h \, dF^\varepsilon - \frac{1}{2} \int\_{\mathbb{R}} \mathbf{x}^2 \, dF(\mathbf{x}) + F(+0), \quad F \in \mathbf{F}.$$

Note that *J*1(*<sup>F</sup>*, *h*) is linear w.r.t. *F* ∈ **F** for every *h* ∈ Lip1, by definition. According to Theorems 2 and 3 of [25], for any fixed *h* ∈ Lip1, the least upper bound sup*F*∈**<sup>F</sup>** *J*1(*<sup>F</sup>*, *h*) w.r.t. probability d.f *F* satisfying two linear conditions (we can also fix the value *b*2 ≥ 1 of the second moment and then take the least upper bound w.r.t. all *b* ≥ 1) coincides with that over the set of three-point distributions from **F**. Since every three-point distribution has finite moments of all orders, the condition of finiteness of the second-order moments may be eliminated, so that

$$\sup\_{F \in \mathcal{F}} J(F) = \sup\_{h \in \text{Lip}\_1} \sup\_{F \in \mathcal{F}\_3} J\_1(F, h)\_{\prime}$$

where **F**3 is the space of all discrete probability d.f.s with at most three jumps and unit first moment. Furthermore, according to Hoeffding [26], the least upper bound sup*F*∈**F**3 *J*1(*<sup>F</sup>*, *h*) w.r.t. discrete probability d.f.s *F* with finite number of jumps and satisfying one moment condition is attained on two-point distributions, hence,

$$\sup\_{F \in \mathcal{F}} J(F) = \sup\_{h \in \mathcal{L}.\text{i}\mathbb{P}\_1} \sup\_{F \in \mathcal{F}\_2} J\_1(F, h) = \sup\_{F \in \mathcal{F}\_2} J(F),$$

where **F**2 is the space of all discrete probability d.f.s with at most two jumps and unit first moment. Therefore, to prove Equation (37), it suffices to show that *J*(*F*) ≤ 0 for every *F* ∈ **F**2

Let *F* correspond to a two-point distribution *p δu* + (1 − *p*) *δv* with *u* < *v* and *p* ∈ [0, <sup>1</sup>). The condition R *x dF*(*x*) = 1 yields *u* < 1 ≤ *v* and *v* = (1 − *pu*)/(<sup>1</sup> − *p*), so that there are only three possibilities:

**Case 1:** *u* ≤ 0 < 1 ≤ *v* and *p* ∈ [0, <sup>1</sup>). Then,

$$q = \mathbb{P}(X \le 0) = p, \quad b^2 = \mathbb{E}X^2 = \frac{pu^2 - 2pu + 1}{1 - p},\tag{38}$$

and, by definition of *Fe* given in Equation (21), we have

$$F^\mathbf{c}(\mathbf{x}) = \begin{cases} 0, & \text{for } \mathbf{x} \le \mathbf{u}, \\ pu - px\_\prime & \text{for } \mathbf{u} < \mathbf{x} \le \mathbf{0}, \\ pu + (1 - p)\mathbf{x}, & \text{for } 0 < \mathbf{x} \le v\_\prime, \\ 1, & \text{for } \mathbf{x} > v. \end{cases}$$

Observing that the difference *<sup>F</sup>*(*x*) − *Fe*(*x*) has exactly one sign change at *x* = *p*(<sup>1</sup> − *u*)/(<sup>1</sup> − *p*) = *v* − 1 ∈ [0, *v*) and using Equation (20), after some elementary calculations, we ge<sup>t</sup>

$$\mathcal{Z}\_1\left(F, F^c\right) = \frac{1}{2}\mu^2 p - \mu p + \frac{1}{2}(1 - \mu)^2 \frac{p^2}{1 - p} + \frac{1}{2}(1 - p) \cdot 1\_{\prime\prime}$$

and, hence,

$$J(F) = \zeta\_1 \left( F, F^c \right) - \frac{p\mu^2 - 2pu + 1}{2(1 - p)} + p = 0,$$

which means that *J*(*F*) = 0 for arbitrary two-point probability distribution with unit first moment and a nonpositive atom. Expressing *u* and *v* in terms of *q* and *b*2 (see Equation (38)), we ge<sup>t</sup> Equation (35).

**Case 2:** 0 < *u* < 1 ≤ *v* and *p* ∈ [0, *u*]. Then, *q* = **P**(*X* ≤ 0) = 0,

$$F^\circ(\mathbf{x}) = \begin{cases} 0, & \text{for } \mathbf{x} \le \mathbf{0}, \\ \mathbf{x}, & \text{for } \mathbf{0} < \mathbf{x} \le \mathbf{u}\_{\prime\prime}, \\ \mathbf{u} + (1 - p)(\mathbf{x} - \mathbf{u}), & \text{for } \mathbf{u} < \mathbf{x} \le \mathbf{v}\_{\prime\prime}, \\ 1, & \text{for } \mathbf{x} > \mathbf{v}\_{\prime\prime} \end{cases}$$

and by *Fe*(*x*) − *<sup>F</sup>*(*x*) ≥ 0 for all *x* ∈ R, we ge<sup>t</sup> *ζ*1(*<sup>F</sup>*, *Fe*) = 12*u*<sup>2</sup> + 12 (*v* − *u*)(*u* + 1 − <sup>2</sup>*p*) = 1 − 12**E***X*2. Hence,

$$J(F) = \zeta\_1 \left( F, F^{\mathfrak{r}} \right) - \frac{1}{2} \mathbf{E} X^2 + q = 1 - \mathbf{E} X^2 \le 0,$$

since **E***X*<sup>2</sup> ≥ (**E***X*)<sup>2</sup> = 1 by Jensen's inequality. The equality here and, hence, in Equation (34) is attained in the case of degenerate distribution *δ*1.

**Case 3:** 0 < *u* < 1 < *v* and *p* ∈ (*<sup>u</sup>*, 1). Then, *q* = 0 and *Fe* has the same form as in the previous case, but the function *Fe*(*x*) − *<sup>F</sup>*(*x*) now has exactly one sign change at *x* = *p*(<sup>1</sup>− *<sup>u</sup>*)/(<sup>1</sup>− *p*) = *v* −1 ∈ (*<sup>u</sup>*, *<sup>v</sup>*), and, hence, *ζ*1 (*<sup>F</sup>*, *Fe*) = 12*u*<sup>2</sup> + 12 (*p* − *u*)<sup>2</sup> 1 1−*p* + 12 (1 − *p*) · 1. Thus,

$$J(F) = \zeta\_1 \left( F, F^c \right) - \frac{1}{2} \mathbf{E} X^2 + q = \mu^2 - p < 0,$$

since *u*2 < *u* < *p* in this case, and the equality in Equation (37) (and, hence, in Equation (34)) is not attained.

**Remark 3.** *Analyzing the proof, one can make sure that Equation* (34) *admits a slight improvement:*

$$\mathcal{J}\_1\left(\mathcal{A}^\varrho(X), \mathcal{X}^{\varrho^c}(X)\right) \le \frac{\mathbb{E}X^2}{2} - \mathbb{P}(X \le 0) - \mathbb{E}(1 - X)^2 \mathbb{I}\_{(0,1]}(X)$$

*for any r.v. X with* **E***X* = 1 *and finite second moment. The proof differs only by the appearance (subtraction) of an additional term* (0,1](<sup>1</sup> − *x*)<sup>2</sup> *dF*(*x*) *in definition in Equation* (36) *of J*(*F*)*, which is still linear w.r.t. F, and, hence, does not change the logic. One has only to check that the new J*(*F*) *is nonpositive for two-point distributions. In Case 1, J*(*F*) *is retained. In Cases 2 and 3, the additional term is of the form p*(<sup>1</sup> − *u*)<sup>2</sup> *and it can be made sure that this term does not affect the sign of J*(*F*)*.*

## **4. Stein's Method**

Stein's method, first introduced in [27] for normal approximation, is a powerful technique that allows to estimate distances with *ζ*-structure (see Equation (2)) between probability distributions and a fixed target distribution (of a r.v.) *Z*. A complete survey on Stein's method may be found, e.g., in [14]. Suppose that the distance *ζ*H is of the form given in Equation (2) for a specific class H of real-valued

functions. As mentioned in the Introduction, this is the case for both uniform (Kolmogorov) and Kantorovich distances with H = (−∞,*<sup>a</sup>*)(·) | *a* ∈ R and H = Lip1, respectively.

The first step of Stein's method is to construct the so-called *Stein operator* A in some space F of real functions, such that

$$\mathbb{E}\mathcal{A}f(Z) = 0 \quad \forall f \in \mathcal{F}.\tag{39}$$

The second step is to find the solution *fh* to the *Stein equation*

$$\mathcal{A}f\_h(\mathbf{x}) = h(\mathbf{x}) - \mathbf{E}h(Z) \tag{40}$$

for every *h* ∈ H. Once the solution is found, it becomes possible to estimate the distance between the distributions of *X* and *Z* as

$$\begin{split} \mathcal{L}\_{\mathcal{H}}(X, Z) &= \sup\_{h \in \mathcal{H}} \left| \int\_{\mathbb{R}} h \, d\mathcal{F}\_{\mathcal{X}} - \int\_{\mathbb{R}} h \, d\mathcal{F}\_{\mathcal{Z}} \right| = \sup\_{h \in \mathcal{H}} \left| \int\_{\mathbb{R}} h \, d\mathcal{F}\_{\mathcal{X}} - \mathbb{E}h(Z) \right| = \\ &= \sup\_{h \in \mathcal{H}} \left| \int\_{\mathbb{R}} \left( h - \mathbb{E}h(Z) \right) \, d\mathcal{F}\_{\mathcal{X}} \right| = \sup\_{h \in \mathcal{H}} \left| \int\_{\mathbb{R}} \mathcal{A}f\_{h} \, d\mathcal{F}\_{\mathcal{X}} \right| = \sup\_{h \in \mathcal{H}} \left| \mathbb{E}\mathcal{A}f\_{h}(X) \right|. \end{split} \tag{41}$$

The final estimate for *ζ*H(*<sup>X</sup>*, *Z*) is usually derived by bounding the latest expression in Equation (41) from above using the properties of the Stein operator A and those of the solutions *fh* to the Stein Equation (40).

It can be made sure that for *Z d*= E ∼ Exp(1) the following operator satisfies Equation (39) on the space F of absolutely continuous functions with **E**| *f* (E )| < +∞ and thus appears to be the Stein operator:

$$\mathcal{A}f(\mathbf{x}) = f'(\mathbf{x}) - f(\mathbf{x}) + f(\mathbf{0}). \tag{42}$$

Peköz and Röllin [13] found an explicit solution to Stein Equation (40) in this case:

$$f\_h(\mathbf{x}) = -\varepsilon^x \int\_{\mathbf{x}}^{+\infty} \tilde{h}(\mathbf{t}) e^{-t} \, d\mathbf{t}, \quad \text{where} \quad \tilde{h}(\mathbf{t}) = h(\mathbf{t}) - \mathbf{E}h(\mathcal{E}), \tag{43}$$

for every *h* with **E**|*h*(E )| < ∞. Note that *fh*(0) = 0.

The following theorem extends results of Peköz and Röllin [13] in Theorem 2.1 to distributions with no support constraints and provides estimates of the accuracy of the exponential approximation in terms of the Kantorovich distance characterizing the proximity to the equilibrium transform.

**Theorem 3.** *Let X be a square integrable r.v. with* **E***X* = 1 *and* E ∼ Exp(1)*. Then,*

$$\begin{aligned} \zeta\_1(X,\mathcal{C}) &\le 2\zeta\_1(\mathcal{C}(X), \mathcal{C}^{\mathfrak{e}}(X)),\\ \zeta\_1(\mathcal{C}^{\mathfrak{e}}(X), \operatorname{Exp}(1)) &\le \zeta\_1(\mathcal{C}(X), \mathcal{C}^{\mathfrak{e}}(X)), \end{aligned}$$

*where* L *e*(*X*) *is the equilibrium transform of* L (*X*)*.*

**Proof.** Let *fh* be defined by Equation (43). Then, by Equations (41), (42), and (25), we have

$$\mathbb{E}\_{\mathbb{P}}(X,\mathcal{E}) = \sup\_{h \in \text{Lip}\_1} \left| \mathbb{E} \mathcal{A} f\_{\mathbb{h}}(X) \right| = \sup\_{h \in \text{Lip}\_1} \left| \mathbb{E} f'\_{\mathbb{h}}(X) - \mathbb{E} f\_{\mathbb{h}}(X) \right| = \sup\_{h \in \text{Lip}\_1} \left| \int\_{\mathbb{R}} f'\_{\mathbb{h}} \, dF\_X - \int\_{\mathbb{R}} f'\_{\mathbb{h}} \, dF\_X^{\varepsilon} \right| $$

and

$$\begin{split} \mathcal{I}\_{1}\left(\mathcal{L}^{\mathbf{f}^{\varepsilon}}(X),\operatorname{Exp}(\mathbf{1})\right) &= \sup\_{h\in\operatorname{Lip}\_{\mathbb{R}}} \left| \int\_{\mathbb{R}} h(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) - \operatorname{E}h(\mathcal{E}) \right| = \sup\_{h\in\operatorname{Lip}\_{\mathbb{R}}} \left| \int\_{\mathbb{R}} \widetilde{h}(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) \right| = \\ &= \sup\_{h\in\operatorname{Lip}\_{\mathbb{R}}} \left| \int\_{\mathbb{R}} \mathcal{A}f\_{h}(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) \right| = \sup\_{h\in\operatorname{Lip}\_{\mathbb{R}}} \left| \int\_{\mathbb{R}} f'\_{h}(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) - \int\_{\mathbb{R}} f\_{h}(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) \right| = \\ &= \sup\_{h\in\operatorname{Lip}\_{\mathbb{R}}} \left| \int\_{\mathbb{R}} f\_{h}(\mathbf{x}) \, dF\_{\mathbf{X}}(\mathbf{x}) - \int\_{\mathbb{R}} f\_{h}(\mathbf{x}) \, dF^{\varepsilon}\_{\mathbf{X}}(\mathbf{x}) \right|. \end{split}$$

In Lemma 4.1 of [13] (see also Lemma 5.3 of [14]), it is proved that *fh* ∈ Lip1 and *f h* ∈ Lip2 for *h* ∈ Lip1. This remark together with the observation that L (*X*) and L *e*(*X*) have finite first moments immediately leads to the statement of the theorem.

Less formally, Theorem 3 states that, if L (*X*) and L *e*(*X*) are close, then so are L (*X*) and Exp(1), and, hence, may be regarded as the continuity theorem to the fixed-point property stated in Theorem 1(c).

## **5. Main Results**

**Theorem 4.** *Let X*1, *X*2, ... *be a sequence of independent square integrable random variables with* **E***Xn* = *a* -= 0 *and Sn* := ∑*ni*=<sup>1</sup> *Xi for n* ∈ N*, S*0 := 0*. Let p* ∈ (0, <sup>1</sup>)*, N* ∼ Geom(*p*), *be independent of all* {*Xn*}*, N*0 := *N* − 1*, and W* := *SN*/**E***SN* = *pSN*/*a, W*0 := *SN*0/**E***SN*0 = *<sup>p</sup>SN*0/(*a*(<sup>1</sup> − *p*)) *be normalized geometric random sums,* E ∼ Exp(1)*. Then,*

$$\zeta\_1(\mathcal{W}, \mathcal{E}) \le \frac{2p}{|a|} \sum\_{n=1}^{\infty} \mathbb{P}(\mathcal{N} = n) \, \zeta\_1(\mathcal{L}^\varepsilon(\mathcal{X}\_n), \mathcal{L}^\varepsilon(\mathcal{X}\_n)) \le p \left( \frac{\mathbb{E} \mathcal{X}\_N^2}{a^2} - 2 \, \mathbb{P}(\mathcal{X}\_N \le 0) \right), \tag{44}$$
  $\zeta\_1(\mathcal{W}\_0, \mathcal{E}) \le \frac{p}{1-p} \cdot \frac{\mathbb{E} \mathcal{X}\_N^2}{a^2} \,. \tag{45}$ 

Before proceeding to the proof, we need the following auxiliary statement.

**Lemma 3.** *Under the conditions of Theorem 4, we have*

 −

$$\begin{aligned} \zeta\_1\left(\mathcal{L}\left(\mathcal{S}\_N\right), \mathcal{L}^\varepsilon\left(\mathcal{S}\_N\right)\right) &\leq \sum\_{n=1}^\infty p(1-p)^{n-1} \mathcal{J}\_1\left(\mathcal{L}^\varepsilon\left(\mathcal{X}\_n\right), \mathcal{L}^\varepsilon\left(\mathcal{X}\_n\right)\right), \\\zeta\_1\left(\mathcal{L}^\varepsilon\left(\mathcal{S}\_{N\_0}\right), \mathcal{L}^\varepsilon\left(\mathcal{S}\_{N\_0}\right)\right) &\leq \frac{\mathcal{E}X\_N^2}{2|a|}. \end{aligned}$$

**Proof.** Let *Fn* be the d.f. of *Xn*, *n* ∈ N. Then, according to Equation (20), Theorem 1(h), Tonelli's theorem, and an obvious fact that L (*Sn*) = L (*Sn*−<sup>1</sup>) ∗ L (*Xn*), we have

$$\begin{split} \zeta\_{1}(\mathcal{L}^{\boldsymbol{\varrho}}(S\_{N}),\mathcal{L}^{\boldsymbol{\varrho}}(S\_{N})) &= \int\_{\mathbb{R}} \left| F\_{\mathcal{S}\_{N}}(\mathbf{x}) - F\_{\mathcal{S}\_{N}}^{\boldsymbol{\varepsilon}}(\mathbf{x}) \right| \, d\mathbf{x} \leq \\ &\leq \sum\_{n=1}^{\infty} p(1-p)^{n-1} \int\_{\mathbb{R}} \int\_{\mathbb{R}} \left| F\_{\mathbb{R}}(\mathbf{x}-\mathbf{s}) - F\_{\mathbb{R}}^{\boldsymbol{\varepsilon}}(\mathbf{x}-\mathbf{s}) \right| \, d\mathcal{F}\_{\mathcal{S}\_{n-1}}(\mathbf{s}) \, d\mathbf{x} = \\ &= \sum\_{n=1}^{\infty} p(1-p)^{n-1} \int\_{\mathbb{R}} \int\_{\mathbb{R}} \left| F\_{\mathbb{R}}(\mathbf{x}-\mathbf{s}) - F\_{\mathbb{R}}^{\boldsymbol{\varepsilon}}(\mathbf{x}-\mathbf{s}) \right| \, d\mathbf{x} \, d\mathcal{F}\_{\mathcal{S}\_{n-1}}(\mathbf{s}) = \\ &= \sum\_{n=1}^{\infty} p(1-p)^{n-1} \zeta\_{1} \left( \mathcal{L}^{\boldsymbol{\varepsilon}}(X\_{n}), \mathcal{L}^{\boldsymbol{\varepsilon}}(X\_{n}) \right) \,, \end{split}$$

which proves the first claim of the lemma, and, similarly,

$$\begin{split} \mathcal{I}\_{\mathsf{T}} \left( \mathcal{L}^{\mathsf{c}} \left( \mathcal{S}\_{\mathrm{N}\_{0}} \right), \mathcal{L}^{\mathsf{c}\mathsf{c}} \left( \mathcal{S}\_{\mathrm{N}\_{0}} \right) \right) &\leq \sum\_{n=1}^{\infty} p(1-p)^{n-1} \int\_{\mathbb{R}} \int\_{\mathbb{R}} \left| \mathbb{1}\_{\left( 0, +\infty \right)} (\mathbf{x} - \mathbf{s}) - F\_{\mathrm{n}}^{\mathrm{c}} (\mathbf{x} - \mathbf{s}) \right|^{\mathsf{c}} dF\_{\mathrm{S}\_{n-1}} (\mathbf{s}) \, d\mathbf{x} = \\ &= \sum\_{n=1}^{\infty} p(1-p)^{n-1} \int\_{\mathbb{R}} \left| \mathbb{1}\_{\left( 0, +\infty \right)} (\mathbf{x}) - F\_{\mathrm{n}}^{\mathrm{c}} (\mathbf{x}) \right| \, d\mathbf{x} = \sum\_{n=1}^{\infty} p(1-p)^{n-1} \zeta\_{1} \left( \delta\_{0}, \mathcal{L}^{\mathsf{c}} \left( \mathcal{X}\_{\mathrm{n}} \right) \right) . \end{split}$$

where *δ*0 denotes the Dirac delta-measure concentrated in 0. As can easily be seen from the definition of the equilibrium transform given in Equation (21),

$$\begin{aligned} \text{if } a > 0, \quad \text{then} \quad & F^\sharp(\mathbf{x}) \le 0, \; \mathbf{x} \le 0, \quad F(\mathbf{x}) \le 1, \; \mathbf{x} \ge 0, \\\text{if } a < 0, \quad \text{then} \quad & F^\sharp(\mathbf{x}) \ge 0, \; \mathbf{x} \le 0, \quad F(\mathbf{x}) \ge 1, \; \mathbf{x} \ge 0, \end{aligned}$$

hence, we write

$$|\mathbb{1}\_{(0,+\infty)}(\mathbf{x}) - F\_n^{\mathbf{e}}(\mathbf{x})| = \begin{cases} F\_n^{\mathbf{e}}(\mathbf{x}) \operatorname{sign} a, & \mathbf{x} \le \mathbf{0}, \\ (1 - F\_n^{\mathbf{e}}(\mathbf{x})) \operatorname{sign} a, & \mathbf{x} \ge \mathbf{0}, \end{cases}$$

and also using Equation (28), we obtain

$$\mathcal{J}\_1\left(\delta\_{0}, \mathcal{L}^{\mathbf{e}}(X\_{\mathrm{n}})\right) = \operatorname{sign} a \cdot \left(-\int\_{-\infty}^{0} F\_{\mathrm{n}}^{\mathrm{c}}(\mathbf{x}) \, d\mathbf{x} + \int\_{0}^{+\infty} \left(1 - F\_{\mathrm{n}}^{\mathrm{c}}(\mathbf{x})\right) \, d\mathbf{x}\right) = \operatorname{sign} a \cdot \int\_{\mathbb{R}} \mathbf{x} \, dF\_{\mathrm{n}}^{\mathrm{c}}(\mathbf{x}) = \frac{\operatorname{E} X\_{\mathrm{n}}^{2}}{2|a|}.$$

The second claim of the lemma follows now by the total probability formula and independence conditions.

**Proof of Theorem 4.** Due to the homogeneity of both the Kantorovich metric (Lemma 1(a)) and the equilibrium transform (Theorem 1(f)), without loss of generality, we can assume that *a* = 1. The second inequality in Equation (44) is the implication of Theorem 2, thus it remains only to prove the first inequality in Equation (44) and the inequality in Equation (45). Indeed, by Theorems 3 and 1(f) and Lemmas 1 and 3, we have

$$\begin{aligned} \mathcal{Z}\_1(\mathcal{W}, \mathcal{C}) &\le 2 \, \mathcal{Z}\_1(\mathcal{L}^\mathbf{e}(\mathcal{W}), \mathcal{L}^\mathbf{e}(\mathcal{W})) = \\ &= 2p \, \mathcal{Z}\_1(\mathcal{L}^\mathbf{e}(\mathcal{S}\_N), \mathcal{L}^\mathbf{e}(\mathcal{S}\_N)) \le 2p \sum\_{n=1}^\infty \mathbf{P}(N=n) \, \mathcal{Z}\_1(\mathcal{L}^\mathbf{e}(\mathcal{X}\_n), \mathcal{L}^\mathbf{e}(\mathcal{X}\_n)) \, \mathcal{Z}\_1(\mathcal{X}\_n) \end{aligned}$$

and

$$\begin{split} \mathcal{L}\_1(\mathcal{W}\_0, \mathcal{C}) &\leq 2 \, \mathcal{J}\_1 \left( \mathcal{L}^\varepsilon(\mathcal{W}\_0) \,, \mathcal{L}^\varepsilon(\mathcal{W}\_0) \right) = \\ &= \frac{2p}{1-p} \, \mathcal{J}\_1 \left( \mathcal{L}^\varepsilon(\mathcal{S}\_{N\_0}) \,, \mathcal{L}^\varepsilon(\mathcal{S}\_{N\_0}) \right) \leq \frac{p}{1-p} \, \mathcal{E} X\_N^2 . \qed \end{split}$$

**Corollary 1.** *Under the conditions of Theorem 4 and* sup*n* **<sup>E</sup>***X*2*n* < <sup>∞</sup>*, we have*

$$\mathcal{L}\_1(\mathcal{W}, \mathcal{E}) \le \frac{2p}{|a|} \sup\_n \mathbb{E}\_1 \left( \mathcal{L}^\varrho(X\_\mathbb{R}), \mathcal{L}^\varrho(X\_\mathbb{R}) \right) \le p \sup\_n \left( \frac{\mathbb{E} X\_\mathbb{R}^2}{a^2} - 2\,\mathbb{P}(X\_\mathbb{R} \le 0) \right),\tag{46}$$

$$\mathcal{Z}\_1(\mathbb{W}\_0, \mathcal{E}) \le \frac{p}{(1-p)a^2} \sup\_n \mathbb{E}X\_n^2. \tag{47}$$

**Remark 4.** *The right-hand side of Equation* (47) *is no less than that of Equation* (46) *because of the factor* 1 1−*p* > 1 *and the absence of the nonpositive term* −2**P**(*Xn* ≤ <sup>0</sup>)*. This result agrees with the intuition that W may be closer to* E *than W*0*, because SN contains a.s. one summand more than SN*0 *.*

**Corollary 2.** *Under the conditions of Theorem 4, we have*

$$\zeta\_2(\mathcal{W}, \mathcal{E}) \le \frac{3p}{|a|} \sum\_{n=1}^{\infty} \mathbf{P}(\mathcal{N} = n) \, \zeta\_1(\mathcal{E}(X\_n), \mathcal{E}^\varepsilon(X\_n)) \le \frac{3p}{2} \left( \frac{\mathbf{E} X\_N^2}{a^2} - 2 \, \mathbf{P}(X\_N \le 0) \right), \tag{48}$$

$$\mathbb{Z}\_2(\mathbb{W}\_0, \mathcal{E}) \le \frac{p}{1-p} \cdot \frac{3 \operatorname{EX}\_N^2}{2a^2} \,. \tag{49}$$

Recently, Korolev and Zeifman [28] obtained a bound similar to Equation (49), but with the constant factor of 1/2 on the right-hand side instead of 3/2, i.e., three times smaller. The estimate in Equation (48) is also worse than Kalashnikov's bound in Equation (10) obtained in the i.i.d. case and **E***X*1 = 1, since Equation (10) with *s* = 2, by Theorem 3, yields

$$\mathcal{Z}^{\varrho}(\mathcal{W}, \mathcal{C}) \le p \mathcal{Z}\_1(X\_1, \mathcal{C}) \le 2p \mathcal{Z}\_1\left(\mathcal{C}^{\varrho}(X\_1), \mathcal{C}^{\varrho}(X\_1)\right),$$

while Equation (48) in the i.i.d. case with **E***X*1 = 1 reduces to

$$
\mathcal{J}\_2(\mathsf{W}, \mathcal{E}) \le 3p \mathcal{J}\_1(\mathcal{E}(X\_1), \mathcal{E}^{\mathrm{ev}}(X\_1)) \,\,\,\,
$$

which is 1.5 times greater.

**Proof.** Using subsequently Theorem 1(i,c), the triangle inequality for the Kantorovich metric, Theorem 3, and Lemma 3 together with the homogeneity of the Kantorovich distance and the equilibrium transform, we obtain

$$\begin{split} \zeta\_{2}(\mathcal{W},\mathcal{C}) &= \zeta\_{1}\big(\mathcal{L}^{\mathfrak{c}}(\mathcal{W}),\mathcal{L}^{\mathfrak{c}}(\mathcal{C})\big) = \zeta\_{1}\big(\mathcal{L}^{\mathfrak{c}}(\mathcal{W}),\mathcal{L}^{\mathfrak{c}}(\mathcal{C})\big) \leq \widetilde{\zeta}\_{1}\big(\mathcal{L}^{\mathfrak{c}}(\mathcal{W}),\mathcal{L}^{\mathfrak{c}}(\mathcal{W})\big) + \widetilde{\zeta}\_{1}(\mathcal{W},\mathcal{C}) \leq \\ &\leq 3\,\widetilde{\zeta}\_{1}\big(\mathcal{L}^{\mathfrak{c}}(\mathcal{W}),\mathcal{L}^{\mathfrak{c}}(\mathcal{W})\big) \leq \frac{3p}{|a|}\sum\_{n=1}^{\infty}\mathbb{P}(N=n)\,\widetilde{\zeta}\_{1}\big(\mathcal{L}^{\mathfrak{c}}(X\_{n}),\mathcal{L}^{\mathfrak{c}}(X\_{n})\big). \end{split}$$

Similarly,

$$\mathcal{Z}\_2(\mathcal{W}\_0, \mathcal{C}) \le 3 \, \mathcal{J}\_1 \left( \mathcal{E}^\mathrm{cr}(\mathcal{W}\_0), \mathcal{E}^\mathcal{C}(\mathcal{W}\_0) \right) \le \frac{3}{2} \cdot \frac{p}{1-p} \cdot \frac{\mathrm{E} X\_N^2}{a^2} \cdot \square$$

To study the problem of the accuracy of the estimates obtained above in Equations (46) and (47), let us introduce the *asymptotically best constant* for the Kantorovich distance in the Rényi theorem for geometric random sums of i.i.d. r.v.s in a way similar to the definition of the asymptotically best constant [29] in the classical Berry–Esseen inequality (see also [3,30–35]):

$$\mathbb{C}\_{\text{AB}} := \sup\_{\{\mathbf{X}\_{\text{1}}\} \sim \text{i.i.d.} \colon \mathbb{E}\mathbf{X}\_{\text{1}} \neq 0, \mathbb{E}\mathbf{X}\_{\text{1}}^{2} < \infty^{p}} \lim\_{p \to +0} \zeta\_{1}(\mathcal{W}, \mathcal{E}) \frac{(\mathbb{E}\mathbf{X}\_{1})^{2}}{p \mathbb{E}\mathbf{X}\_{1}^{2}},\tag{50}$$

which serves as a lower bound to the constant *C* in the inequality

$$\mathbb{Z}\_1(\mathcal{W}, \mathcal{E}) \le \mathbb{C}p \to \mathbb{X}\_1^2 / (\mathbb{E}X\_1)^2,\tag{51}$$

still if it is supposed to hold only for sufficiently small *p*. Similarly, define *C*0AB for *W*0. The inequality in Equation (46) (similarly, Equation (47)) trivially yields the validity of Equation (51) with *C* = 1 for all *p* ∈ (0, <sup>1</sup>). Since

$$\mathbb{C} \ge \mathbb{C}\_{\text{AB}\_{\prime}}$$

it is easy to conclude that *C*AB ≤ 1.

**Theorem 5.** *For the asymptotically best constants C*AB*, C*0AB *defined in Equation* (50)*, for W and W*0 *we have*

$$
\mathcal{C}\_{\text{AB}} \ge 1/4, \quad \mathcal{C}\_{\text{AB}}^0 \ge 1/4.
$$

**Proof.** Taking all *Xn* := 1, we ge<sup>t</sup> **E***Xn* = **<sup>E</sup>***X*2*n* = 1 and *W* = *pN*, *W*0 := *pN*0/(<sup>1</sup> − *p*), where *N* ∼ Geom(*p*) and *N*0 := *N* − 1. To estimate *ζ*1(*<sup>W</sup>*, E ), we use the definition of the Kantorovich distance in Equation (19) and take *h*(*x*) = 1*t* sin(*tx*) ∈ Lip1 as a test function, where *t* ∈ R \ {0} is the free parameter to be chosen later. Recalling the ch.f.s of the exponential and the geometric distributions, we obtain

$$\mathsf{Eh}(\circledast) = \frac{1}{t} \odot \mathsf{Ec}^{it\circledast} = \odot \frac{1}{t(1-it)} = \frac{1}{1+t^2} \prime$$

$$\begin{split} \mathsf{E}h(W) &= \mathsf{E}h(pN) = \frac{1}{t} \odot \mathsf{E}e^{iptN} = \frac{1}{t} \odot \left[ \frac{pe^{itp}}{1 - (1-p)e^{itp}} \right] = \\ &= \frac{1}{t} \odot \left[ \frac{pe^{itp}\left(1 - (1-p)e^{-itp}\right)}{1 + (1-p)^2 - 2(1-p)\cos(tp)} \right] = \frac{p\sin(tp)}{tp^2 + 2t(1-p)\left(1 - \cos\left(tp\right)\right)}, \\ \mathsf{E}h(\mathcal{W}\_0) &= \mathsf{E}h\left(\frac{pN\_0}{1-p}\right) = \frac{1}{t} \odot \mathsf{E}e^{itpN\_0/\left(1-p\right)} = \frac{p(1-p)\sin\left(\frac{tp}{1-p}\right)}{tp^2 + 2t(1-p)\left(1 - \cos\left(\frac{tp}{1-p}\right)\right)}. \end{split}$$

Thus,

$$\begin{split} \mathbf{C}\_{\text{AB}} &\geq \lim\_{p \to +0} \sup\_{t \neq 0} \frac{|\mathbf{E}h(\mathcal{W}) - \mathbf{E}h(\mathcal{E})|}{p} \geq \sup\_{t \neq 0} \lim\_{p \to +0} \left| \frac{\mathbf{E}h(\mathcal{W}) - \mathbf{E}h(\mathcal{E})}{p} \right| = \\ &= \sup\_{t \neq 0} \lim\_{p \to +0} \left| \frac{p^3 t^3 + o(p^3)}{p^3 t (t^2 + 1)^2 + o(p^3)} \right| = \sup\_{t \neq 0} \frac{t^2}{(t^2 + 1)^2} = 1/4, \end{split}$$

and, similarly,

$$\mathcal{C}\_{\text{AB}}^0 \ge \sup\_{t \ne 0} \lim\_{p \to +0} \left| \frac{\mathbb{E}h(\mathcal{W}\_0) - \mathbb{E}h(\mathcal{E})}{p} \right| = \sup\_{t \ne 0} \frac{t^2}{(t^2 + 1)^2} = 1/4. \quad \Box$$

Theorem 1(h) allows extending Theorem 4 to non-geometric random sums of independent random variables with arbitrary means of identical signs. Namely, the following statement holds.

**Theorem 6.** *Let X*1, *X*2,... *be a sequence of independent random variables, independent of all else, with*

$$a\_n := \mathbf{E}X\_n > 0, \quad b\_n := \mathbf{E}X\_n^2 < \infty, \quad n \in \mathbb{N},$$

*and Sn* := ∑*ni*=<sup>1</sup> *Xi for n* ∈ N*, S*0 := 0*. Let N be a* N0*-valued r.v.,*

$$A := \mathbf{E} \mathbf{S}\_N = \sum\_{n=1}^{\infty} a\_n \mathbf{P}(N \ge n) < \infty$$

*and M be a* N*-valued r.v. with the distribution*

$$\mathbb{P}(M=m) = \frac{a\_m}{A} \mathbb{P}(N \ge m), \quad m \in \mathbb{N}.$$

*Assume also that* **E***SM* < ∞*. Then, with W* := *SN*/**E***SN* = *<sup>A</sup>*−<sup>1</sup>*SN, for any joint distribution* L (*<sup>N</sup>*, *<sup>M</sup>*)*, we have*

$$\mathbb{E}\_{\mathbb{S}}\left(\mathcal{W},\mathcal{E}\right) \le 2A^{-1} \left( \sup\_{n} \mathbb{E}|X\_{n}| \cdot \mathbb{E}|N-M| + \sum\_{m \in \mathbb{N}} \mathbb{P}(M=m) \,\, \mathbb{E}\_{1} \left( \mathcal{E}^{\varepsilon}(X\_{m}) \,, \mathcal{E}^{\varepsilon}(X\_{m}) \right) \right) \le \tag{52}$$

$$\leq 2A^{-1}\left(\sup\_{n} \mathbb{E}|X\_{n}| \cdot \mathbb{E}|N-M| + \mathbb{E}\left(\frac{b\_{M}}{2a\_{M}} - a\_{M} \cdot \mathbb{P}(X\_{M} \leq 0|M)\right)\right).\tag{53}$$

**Remark 5.** *If both expectations* **E***N and* **E***M are finite, then* **E**|*N* − *M*| *in Equations* (52) *and* (53) *can be replaced with ζ*1(*<sup>N</sup>*, *<sup>M</sup>*)*.*

**Remark 6.** *Theorem 6 reduces to ([13], Theorem 3.1) in the case of nonnegative* {*Xn*} *and to Theorem 4, Equation* (44)*, in the case of N* ∼ Geom(*p*) *and identical a* := **E***Xn* -= 0, *n* ∈ N. *For shifted geometric N, i.e.,* **P**(*N* = *n*) = *p*(<sup>1</sup> − *p*)*<sup>n</sup>*, *n* ∈ N0, *under the assumptions of Theorem 4, Theorem 6 yields a bound*

$$\begin{split} \mathcal{J}\_{1}(\mathsf{W}\_{0},\mathcal{C}) &\leq \frac{p}{1-p} \left( 2 \sup\_{n} \frac{\mathsf{E}|X\_{n}|}{|a|} + \sum\_{n \in \mathbb{N}} \mathsf{P}(N=n-1) \, \_{1}^{\mathsf{T}} \left( \mathcal{L}^{\mathsf{e}}(X\_{n}), \mathcal{L}^{\mathsf{e}}(X\_{n}) \right) \right) \leq \\ &\leq \frac{p}{1-p} \left( \frac{\mathsf{E}X\_{N+1}^{2}}{a^{2}} + 2 \sup\_{n} \frac{\mathsf{E}|X\_{n}|}{|a|} - \mathsf{P}(X\_{N+1} \leq 0) \right), \end{split}$$

*whose rightmost part is worse than the estimate in Equation* (45)*, generally speaking (for example, in the i.i.d. case), since* **<sup>E</sup>**|*Xn*|≥|*a*| *for all n* ∈ N *and* **<sup>P</sup>**(*XN*+<sup>1</sup> ≤ 0) ≤ 1*.*

**Proof of Theorem 6.** By Theorem 3 and homogeneity of the Kantorovich distance and the equilibrium transform (see Lemma 1(a) and Theorem 1(f)), we have

$$\mathcal{Z}\_1(\mathcal{W}, \mathcal{C}) \le 2\mathcal{Z}\_1\Big(\mathcal{L}^\rho(\mathcal{W}), \mathcal{L}^\rho(\mathcal{W}^\varepsilon)\Big) = 2A^{-1}\mathcal{Z}\_1(\mathcal{L}^\rho(\mathcal{S}\_N), \mathcal{L}^\rho(\mathcal{S}\_N))\,. \tag{54}$$

Let us bound *ζ*1-L (*SN*), L *e*(*SN*) from above.

For a given joint distribution L (*<sup>N</sup>*, *<sup>M</sup>*), let *pnm* := **P**(*N* = *n*, *M* = *<sup>m</sup>*), *n* ∈ N0, *m* ∈ N. Denoting *Sj*,*<sup>k</sup>* := <sup>∑</sup>*ki*=*<sup>j</sup> Xi* for *j* ≤ *k* and using the representation in Equation (20) and Theorem 1(h), we have

*ζ*1-L (*SN*), L *e*(*SN*) = R *FSN* (*x*) − *FeSN* (*x*) *dx* = R *FSN* (*x*) − *FSM*−<sup>1</sup> ∗ *FeXM* (*x*) *dx* = = R ∑ *<sup>n</sup>*∈N0,*m*∈N *pnm FSn* (*x*) − *FSm*−<sup>1</sup> ∗ *FeXm* (*x*) *dx* ≤ ≤ ∑ *n*,*m pnm* R *FSn* (*x*) − *FSm*−<sup>1</sup> ∗ *FeXm* (*x*) *dx* ≤ ≤ ∑ *n*<*m pnm* R (0,+∞)(*x*) − *FSn*+1,*<sup>m</sup>*−<sup>1</sup> ∗ *FeXm* (*x*) *dx*+ + ∑ *n*≥*m pnm* R *FSm*,*<sup>n</sup>* (*x*) − *FeXm* (*x*) *dx*.

Adding and subtracting *FSn*+1,*m* (*x*) under the modulus sign in the integrands in the first sum (w.r.t. *n* < *m*) and *FXm* (*x*) in the second one (w.r.t. *n* ≥ *m*) and using further the triangle inequality and Lemma 1(b), we obtain

*ζ*1-L (*SN*), L *e*(*SN*) ≤ ∑ *n*<*m pnm ζ*1(*<sup>δ</sup>*0, *Sn*+1,*m*) + ∑ *n*≥*m pnm ζ*1(*Sm*+1,*<sup>n</sup>*, *<sup>δ</sup>*0)+ + ∑ *n*,*m pnm ζ*1-L (*Xm*), L *<sup>e</sup>*(*Xm*) = = ∑ *n*,*m pnm***E** *n*∨*m* ∑ *<sup>i</sup>*=(*n*∧*m*)+<sup>1</sup> *Xi* + ∑*m*∈N **P**(*M* = *m*) *ζ*1-L (*Xm*), L *<sup>e</sup>*(*Xm*) ≤ ≤ sup *i* **E**|*Xi*| · ∑ *n*,*m pnm*|*n* − *m*| + ∑ *m*∈N **P**(*M* = *m*) *ζ*1-L (*Xm*), L *<sup>e</sup>*(*Xm*) = sup *i* **E**|*Xi*| · **E**|*N* − *M*| + ∑ *m*∈N **P**(*M* = *m*) *ζ*1-L (*Xm*), L *<sup>e</sup>*(*Xm*) .

Substituting the latter bound into Equation (54) yields Equation (52). The bound in Equation (53) follows from Equation (52) by Theorem 2 (see also Remark 2).

**Author Contributions:** Conceptualization, I.S.; methodology, I.S. and M.T.; formal analysis, I.S. and M.T.; investigation, I.S. and M.T.; writing—original draft preparation, I.S. and M.T.; writing—review and editing, I.S. and M.T.; supervision, I.S.; funding acquisition, and I.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The results of Sections 1–3 (including Theorem 1) were obtained under support by the Russian Science Foundation, project No. 18-11-00155. The rest of the study was funded by RFBR, project number 20-31-70054, and by the gran<sup>t</sup> of the President of Russia No. MD–189.2019.1.

**Acknowledgments:** The authors would like to thank Professor Victor Korolev for the careful editing of the manuscript and to anonymous referee for a suggestion resulting in Theorem 6.

**Conflicts of Interest:** The authors declare no conflict of interest.
