*Article* **A Central Limit Theorem for Predictive Distributions**

**Patrizia Berti 1, Luca Pratelli <sup>2</sup> and Pietro Rigo 3,\***


**Abstract:** Let *S* be a Borel subset of a Polish space and *F* the set of bounded Borel functions *f* : *S* → R. Let *an*(·) = *P Xn*+<sup>1</sup> ∈·| *X*1, ... , *Xn*) be the *n*-th predictive distribution corresponding to a sequence (*Xn*) of *S*-valued random variables. If (*Xn*) is conditionally identically distributed, there is a random probability measure *μ* on *S* such that *f dan <sup>a</sup>*.*s*. −→ *f dμ* for all *f* ∈ *F*. Define *Dn*(*f*) = *dn f dan* <sup>−</sup> *f dμ* for all *<sup>f</sup>* <sup>∈</sup> *<sup>F</sup>*, where *dn* <sup>&</sup>gt; 0 is a constant. In this note, it is shown that, under some conditions on (*Xn*) and with a suitable choice of *dn*, the finite dimensional distributions of the process *Dn* = *Dn*(*f*) : *f* ∈ *F* stably converge to a Gaussian kernel with a known covariance structure. In addition, *E ϕ*(*Dn*(*f*)) | *X*1,..., *Xn* converges in probability for all *<sup>f</sup>* <sup>∈</sup> *<sup>F</sup>* and *<sup>ϕ</sup>* <sup>∈</sup> *Cb*(R).

**Keywords:** bayesian predictive inference; central limit theorem; conditional identity in distribution; exchangeability; predictive distribution; stable convergence

**MSC:** 60B10; 60G25; 60G09; 60F05; 62F15; 62M20

## **1. Introduction**

All random elements appearing in the sequel are defined on a common probability space, say (Ω, A, *P*). We denote by *S* a Borel subset of a Polish space and by B the Borel *σ*-field on *S*. We let

> <sup>P</sup> <sup>=</sup> probability measures on B and *F* = real bounded Borel functions on *S* .

Moreover, if *λ* ∈ P and *f* ∈ *F*, we write *λ*(*f*) to denote

$$
\lambda(f) = \int f \, d\lambda.
$$

In other terms, depending on the context, *λ* is regarded as a function on B or a function on *F*. This slight abuse of notation is quite usual (see, e.g., [1,2]) and very useful for the purposes of this note.

Let

\*\*Copyright:\*\*  $\otimes$  2021 by the authors.\*\*

Liconsse MDPI, Basis, Swittzerland.

This article is an open ascess article

distributions of the terms and

conditions of the Corollary Commons

Attribuation (CC BY) liconsse (http://www.w3.org/1998/Math/MathMLS1998).

*X* = (*X*1, *X*2,...)

be a sequence of *S*-valued random variables and

$$\mathcal{F}\_{\mathbb{O}} = \{ \mathcal{Q}, \Omega \} \quad \text{and} \quad \mathcal{F}\_{\mathbb{N}} = \sigma(X\_1, \dots, X\_n).$$

The *predictive distributions* of *X* are the random probability measures on (*S*, B) given by

$$a\_n(\cdot) = P(X\_{n+1} \in \cdot \mid \mathcal{F}\_n) \qquad \text{for all } n \ge 0.1$$

**Citation:** Berti, P.; Pratelli, L.; Rigo, P. A Central Limit Theorem for Predictive Distributions. *Mathematics* **2021**, *9*, 3211. https://doi.org/ 10.3390/math9243211

Academic Editors: Emanuele Dolera and Federico Bassetti

Received: 30 October 2021 Accepted: 8 December 2021 Published: 12 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Under some conditions, there is a further random probability measure *μ* on (*S*, B) such that

$$
\mu(f) \stackrel{a.s.}{=} \lim\_{n} a\_n(f) \qquad \text{for each } f \in F. \tag{1}
$$

For instance, condition (1) holds if *X* is exchangeable. More generally, it holds if *X* is conditionally identically distributed (c.i.d.), as defined in Section 2. Note also that, since *S* is separable, condition (1) implies *an* → *μ* weakly. Regarding *an* and *μ* as measurable functions from Ω into P, one obtains

$$P\left(\left\{\omega \in \Omega : a\_{n,\omega} \to \mu\_{\omega} \text{ weakly} \right\} \right) = 1.$$

Assume condition (1), fix a sequence *dn* of positive constants, and define

$$D\_n(f) = d\_n \left\{ a\_n(f) - \mu(f) \right\} \qquad \text{for each } f \in F.$$

This note deals with the process

$$D\_{\mathfrak{n}} = \left\{ D\_{\mathfrak{n}}(f) : f \in F \right\}.$$

Our goal is to show that, under some conditions on *X* and with a suitable choice of the constants *dn*, the finite-dimensional distributions of *Dn* stably converge, as *n* → ∞, to a certain Gaussian limit.

To be more precise, we recall that a *kernel* on (*S*, B) is a measurable map *α* : *S* → P. This means that *α*(*x*) ∈ P, for each *x* ∈ *S*, and the function *x* → *α*(*x*)(*A*) is B-measurable for each *A* ∈ B. In what follows, we write

$$
\mathfrak{a}(\mathfrak{x})(f) = \int f(y)\mathfrak{a}(\mathfrak{x})(dy) \qquad \text{for all } \mathfrak{x} \in S \text{ and } f \in F.
$$

Next, as in [3], suppose the predictive distributions of *X* satisfy the recursive equation

$$a\_{n+1} = q\_n a\_n + \left(1 - q\_n\right) a(X\_{n+1}) \qquad \text{a.s. for all } n \ge 0,\tag{2}$$

where *q*0, *q*1,... ∈ (0, 1) are constants and *α* is a kernel on (*S*, B). Moreover, let

$$\nu(\cdot) = P(X\_1 \in \cdot)$$

be the marginal distribution of *X*1. Under condition (2), *X* is c.i.d. whenever *α* is a regular conditional distribution for *ν* given a sub-*σ*-field G⊂B; see ([3] Section 5). Hence, we assume

$$\mathfrak{a}(\cdot)(A) = E\_{\mathbb{V}}(\mathbf{1}\_A \mid \mathcal{G}), \quad \mathbb{V}\text{-a.s.},\tag{3}$$

for all *A* ∈ B and some sub-*σ*-field G⊂B. For instance, condition (3) holds if

$$\mathfrak{a}(\mathfrak{x}) = \delta\_{\mathfrak{x}} \qquad \text{for all } \mathfrak{x} \in \mathcal{S}$$

where *δ<sup>x</sup>* denotes the unit mass at the point *x* (just let G = B). In addition, we assume

$$\sum\_{n} (1 - q\_n)^2 < \infty \quad \text{and} \quad \lim\_{n} d\_n \sup\_{k \ge n} (1 - q\_{k-1}) = 0$$

where

$$d\_{\mathbb{R}} = \left(\sum\_{k\geq n} (1 - q\_k)^2\right)^{-1/2}$$

In this framework, it is shown that

$$\left(D\_{\mathfrak{n}}(f\_1), \ldots, D\_{\mathfrak{n}}(f\_{\mathfrak{p}})\right) \longrightarrow \mathcal{N}\_{\mathfrak{p}}(0, \Sigma) \qquad \text{stably} \tag{4}$$

.

for all *p* ≥ 1 and all *f*1,..., *fp* ∈ *F*, where Σ is the random covariance matrix with entries

$$
\sigma\_{\vec{j}k} = \int \mathfrak{a}(\mathfrak{x}) (f\_{\vec{j}}) \mathfrak{a}(\mathfrak{x}) (f\_k) \,\mu(dx) - \mu(f\_{\vec{j}}) \,\mu(f\_k) \,.
$$

We actually prove something more than (4). Let *Cb*(R) denote the set of real bounded continuous functions on R. Then, it is shown that

$$E\left\{\boldsymbol{\varrho}\left(\boldsymbol{D}\_{n}(f)\right)\mid\mathcal{F}\_{n}\right\} \stackrel{P}{\longrightarrow} \mathcal{N}(\boldsymbol{0},\sigma^{2})(\boldsymbol{\varrho})\tag{5}$$

for all *f* ∈ *F* and *ϕ* ∈ *Cb*(R), where

$$
\sigma^2 = \int \mathfrak{a}(\mathfrak{x}) (f)^2 \,\mu(d\mathfrak{x}) - \mu(f)^2.
$$

Based on (5), it is not hard to deduce condition (4).

Before concluding the Introduction, several remarks are in order.

(i) A remarkable special case is *α*(*x*) = *δ<sup>x</sup>* for all *x* ∈ *S*. Indeed, Equation (2) holds with *α* = *δ* in some meaningful situations, including Dirichlet sequences; see ([3] Section 4) for other examples. Thus, suppose *α* = *δ*. Then, the above formulae reduce to *<sup>σ</sup>jk* <sup>=</sup> *<sup>μ</sup>*(*fj fk*) <sup>−</sup> *<sup>μ</sup>*(*fj*) *<sup>μ</sup>*(*fk*) and *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>μ</sup>*(*<sup>f</sup>* <sup>2</sup>) <sup>−</sup> *<sup>μ</sup>*(*f*)2. Moreover, if *<sup>ν</sup>* is non-atomic and

$$\prod\_{j=0}^{n} q\_j \to 0 \quad \text{and} \quad \sum\_{n} \prod\_{j=0}^{n} q\_j = \infty,$$

then *μ* takes the form

$$\mu \stackrel{a.s.}{=} \sum\_{\mathfrak{n}} V\_{\mathfrak{n}} \,\delta\_{Y\_{\mathfrak{n}}}$$

where (*Vn*) and (*Yn*) are independent sequences and (*Yn*) is i.i.d. with *Y*<sup>1</sup> ∼ *ν*; see ([3] Theorem 20) and [4] for details.


(vi) Conditions (4)–(5) are our main results. They can be motivated in at least two ways. Firstly, from the theoretical perspective, conditions (4)–(5) fit into the results concerning the asymptotic behavior of conditional expectations (see, e.g., [6–8] and references therein). Secondly, from the practical perspective, conditions (4)–(5) play a role in all those fields where predictive distributions are basic objects. The main example is Bayesian predictive inference. Indeed, the predictive distributions investigated in this note have been introduced in connection with Bayesian prediction problems; see [3]. Another example is the asymptotic behavior of certain urn schemes. Related subjects, where (4)–(5) are potentially useful, are empirical processes for dependent data, Glivenko-Cantelli-type theorems and merging of opinions. Without any claim of being exhaustive, a list of references is: [3,5,9–21].

## **2. Preliminaries**

In this note, <sup>N</sup>*p*(0, *<sup>C</sup>*) denotes the Gaussian law on the Borel sets of <sup>R</sup>*<sup>p</sup>* with mean 0 and covariance matrix *C*, where *C* is symmetric and semidefinite positive. If *p* = 1 and *c* ≥ 0 is a scalar, we write N (0, *c*) instead of N1(0, *c*) and

$$\mathcal{N}(0,c)(\varphi) = \int \varphi(\mathbf{x}) \, \mathcal{N}(0,c)(d\mathbf{x}),$$

for all bounded measurable *ϕ* : R → R. Note that, if Σ is a random covariance matrix, <sup>N</sup>*p*(0, <sup>Σ</sup>) is a random probability measure on the Borel sets of <sup>R</sup>*p*.

Let us briefly recall *stable convergence*. Let <sup>A</sup><sup>+</sup> <sup>=</sup> {*<sup>H</sup>* ∈ A : *<sup>P</sup>*(*H*) <sup>&</sup>gt; <sup>0</sup>}. Fix a random probability measure *K* on (*S*, B) and define

$$
\lambda\_H(A) = E\{K(A) \mid H\} \qquad \text{for all } A \in \mathcal{B} \text{ and } H \in \mathcal{A}^+.
$$

Each *λ<sup>H</sup>* is a probability measure on B. Then, *Xn converges stably to K*, written *Xn* → *K* stably, if

$$P(X\_n \in \cdot \mid H) \longrightarrow \lambda\_H \text{ weakly for all } H \in \mathcal{A}^+.$$

In particular, *Xn* converges in distribution to *λ*Ω. However, stable convergence is stronger than convergence in distribution. To see this, take a further random variable *X* : Ω → *S*. Then, *Xn <sup>P</sup>*−→ *<sup>X</sup>* if, and only if, *Xn* <sup>→</sup> *<sup>δ</sup><sup>X</sup>* stably. Thus, stable convergence is strictly connected to convergence in probability. Moreover, (*Xn*, *X*) → *K* × *δ<sup>X</sup>* stably whenever *Xn* → *K* stably. Therefore, if *Xn* converges stably, (*Xn*, *X*) still converges stably for *any S*-valued random variable *X*.

We next turn to conditional identity in distribution. Say that *X* is *conditionally identically distributed* (c.i.d.) if

$$P(X\_k \in \cdot \mid \mathcal{F}\_n) = P(X\_{n+1} \in \cdot \mid \mathcal{F}\_n) \quad \text{a.s. for all } k > n \ge 0.$$

Thus, at each time *n*, the future observations (*Xk* : *k* > *n*) are identically distributed given the past. This is actually weaker than exchangeability. Indeed, *X* is exchangeable if, and only if, it is stationary and c.i.d.

C.i.d. sequences were introduced in [9,22] and then investigated in various papers; see, e.g., [3–5,11,23–29].

The asymptotics of c.i.d. sequences is similar to that of exchangeable ones. To see this, suppose *X* is c.i.d. and define the empirical measures

$$\mu\_n = \frac{1}{n} \sum\_{j=1}^n \delta\_{X\_j}$$

.

Then, there is a random probability measure *μ* on (*S*, B) such that

$$\mu(A) \stackrel{a.s.}{=} \lim\_{m} \mu\_m(A) \qquad \text{for each fixed } A \in \mathcal{B}.$$

It follows that

$$E\left\{\mu(A) \mid \mathcal{F}\_n\right\} = \lim\_{m} E\left\{\mu\_m(A) \mid \mathcal{F}\_n\right\}$$

$$I = \lim\_{m} \frac{1}{m} \sum\_{j=n+1}^{m} P(X\_j \in A \mid \mathcal{F}\_n) = P(X\_{n+1} \in A \mid \mathcal{F}\_n) \qquad \text{a.s.}$$

for all *n* ≥ 0 and *A* ∈ B. Therefore, as in the exchangeable case, the predictive distributions can be written as

$$a\_n(\cdot) = P(X\_{n+1} \in \cdot \mid \mathcal{F}\_n) = E\{\mu(\cdot) \mid \mathcal{F}\_n\} \qquad \text{a.s.}$$

Using the martingale convergence theorem, this implies

$$\mu(f) \stackrel{a.s.}{=} \lim\_{n} E\left\{\mu(f) \mid \mathcal{F}\_n\right\} = \lim\_{n} a\_n(f) \qquad \text{for all } f \in F.$$

Furthermore, *X* is asymptotically exchangeable, in the sense that the probability distribution of the shifted sequence (*Xn*, *Xn*+1, ...) converges weakly to an exchangeable probability measure on (*S*∞, <sup>B</sup>∞).

Finally, we state a technical result to be used later on.

**Lemma 1.** *Let* (*Yn*) *be a sequence of real integrable random variables, adapted to the filtration* (F*n*)*, and*

$$Z\_{\mathfrak{n}} = E(\boldsymbol{\Upsilon}\_{n+1} \mid \boldsymbol{\mathcal{F}}\_{\mathfrak{n}}).$$

*Let V be a real non-negative random variable and* 0 < *b*<sup>1</sup> < *b*<sup>2</sup> < ... *an increasing sequence of constants, such that bn* <sup>↑</sup> <sup>∞</sup> *and bn*/*bn*+<sup>1</sup> <sup>→</sup> <sup>1</sup>*. Suppose* (*Y*<sup>2</sup> *<sup>n</sup>* ) *is uniformly integrable, Zn <sup>a</sup>*.*s*. −→ *<sup>Z</sup> for some random variable Z, and define*

$$T\_{\mathfrak{n}} = b\_{\mathfrak{n}} \left( Z\_{\mathfrak{n}} - Z \right).$$

*Then,*

$$E\left\{\boldsymbol{\varrho}(T\_n) \mid \mathcal{F}\_n\right\} \stackrel{P}{\longrightarrow} \mathcal{N}(0, V)(\boldsymbol{\varrho}) \qquad \text{for all } \boldsymbol{\varrho} \in \mathsf{C}\_b(\mathbb{R}).$$

*provided*

$$b\_n^2 \sum\_{k \ge n} (Z\_k - Z\_{k-1})^2 \stackrel{P}{\longrightarrow} V;\tag{6}$$

$$\lim\_{n} b\_n \to \left\{ \sup\_{k \ge n} |Z\_k - Z\_{k-1}| \right\} = 0;\tag{7}$$

$$\sum\_{k\geq n} E\left|E(Z\_{k+1}\mid \mathcal{F}\_k) - Z\_k\right| = o(1/b\_n). \tag{8}$$

**Proof.** Just repeat the proof of ([10] Theorem 1) with *bn* in the place of <sup>√</sup>*n*.

## **3. Main Result**

Let us go back to the notation of Section 1. Recall that *qn* ∈ (0, 1) is a constant for each *<sup>n</sup>* <sup>≥</sup> 0 and *dn* <sup>=</sup> <sup>∑</sup>*k*≥*n*(<sup>1</sup> <sup>−</sup> *qk*)<sup>2</sup> −1/2. We aim to prove the following CLT.

**Theorem 1.** *Assume conditions* (2)*–*(3) *and*

$$\sum\_{n} (1 - q\_n)^2 < \infty \quad \text{and} \quad \lim\_{n} d\_n \sup\_{k \ge n} (1 - q\_{k-1}) = 0.$$

*Then, there is a random probability measure μ on* (*S*, B) *such that*

$$\mu(f) \stackrel{a.s.}{=} \lim\_{n} a\_n(f) \qquad \text{and} \qquad E\left\{\varphi(D\_n(f)) \mid \mathcal{F}\_n\right\} \stackrel{P}{\longrightarrow} \mathcal{N}(0, \sigma^2)(\varphi)$$

*for all f* ∈ *F and ϕ* ∈ *Cb*(R)*, where*

$$
\sigma^2 = \int \mathfrak{a}(\mathfrak{x}) (f)^2 \,\mu(d\mathfrak{x}) - \mu(f)^2 \,\mu
$$

*As a consequence,*

$$\left(D\_{\mathfrak{n}}(f\_1), \ldots, D\_{\mathfrak{n}}(f\_{\mathfrak{p}})\right) \longrightarrow \mathcal{N}\_{\mathfrak{p}}(0, \Sigma) \qquad \text{stably}$$

*for all p* ≥ 1 *and all f*1,..., *fp* ∈ *F where the covariance matrix* Σ *has entries*

$$
\sigma\_{jk} = \int \mathfrak{a}(\mathfrak{x}) (f\_{\vec{j}}) \mathfrak{a}(\mathfrak{x}) (f\_k) \,\mu(dx) - \mu(f\_{\vec{j}}) \,\mu(f\_k).
$$

**Proof.** Due to conditions (2)–(3), *X* is c.i.d.; see ([3] Section 5). Hence, as noted in Section 2, there is a random probability measure *μ* on (*S*, B) such that

$$a\_n(f) \stackrel{a.s.}{=} E\{\mu(f) \mid \mathcal{F}\_n\} \qquad \text{for all } f \in F.$$

By martingale convergence, it follows that *an*(*f*) *<sup>a</sup>*.*s*. −→ *<sup>μ</sup>*(*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>F</sup>*.

We next prove condition (5). Fix *f* ∈ *F* and define

$$b\_n = d\_{n\prime} \quad \text{Y}\_n = a\_n(f), \quad Z = \mu(f) \quad \text{and} \quad V = \sigma^2.$$

Then, (*Y*<sup>2</sup> *<sup>n</sup>* ) is uniformly integrable (for *f* is bounded) and *bn* satisfies the conditions of Lemma 1. Moreover,

$$Z\_{\mathfrak{n}} = E(Y\_{n+1} \mid \mathcal{F}\_{\mathfrak{n}}) = E\{E(\mathfrak{\mu}(f) \mid \mathcal{F}\_{n+1}) \mid \mathcal{F}\_{\mathfrak{n}}\} = E\{\mathfrak{\mu}(f) \mid \mathcal{F}\_{\mathfrak{n}}\} = a\_{\mathfrak{n}}(f) \quad \text{a.s.}$$

so that *Zn <sup>a</sup>*.*s*. −→ *<sup>Z</sup>*. Therefore, Lemma <sup>1</sup> applies. Hence, to prove (5), it suffices to check conditions (6)–(8).

Let *c* = sup| *f* |. Since *E*(*Zk*<sup>+</sup><sup>1</sup> | F*k*) = *Zk* a.s., condition (8) is trivially true. Moreover, condition (2) implies

$$\begin{array}{rcl} \mathbf{Z}\_{k} - \mathbf{Z}\_{k-1} &=& a\_{k}(f) - a\_{k-1}(f) \\ &=& q\_{k-1}a\_{k-1}(f) + (\mathbf{1} - q\_{k-1})a(\mathbf{X}\_{k})(f) - a\_{k-1}(f) \\ &=& (\mathbf{1} - q\_{k-1}) \left\{ a(\mathbf{X}\_{k})(f) - a\_{k-1}(f) \right\} & \text{a.s. for all } k \ge 1. \end{array}$$

Hence, condition (7) holds, since

$$d\_n \operatorname{E} \left\{ \sup\_{k \ge n} |Z\_k - Z\_{k-1}| \right\} \le 2 \operatorname{c} d\_n \sup\_{k \ge n} (1 - q\_{k-1}) \longrightarrow 0.$$

It remains to prove condition (6), namely

$$d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \left\{ a(X\_k)(f) - a\_{k-1}(f) \right\}^2 \stackrel{P}{\longrightarrow} \sigma^2.$$

First note that, since *ak*−1(*f*)<sup>2</sup> *<sup>a</sup>*.*s*. −→ *<sup>μ</sup>*(*f*)<sup>2</sup> as *<sup>k</sup>* <sup>→</sup> <sup>∞</sup>, one obtains

$$d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 a\_{k-1}(f)^2 = \frac{\sum\_{k \ge n} (1 - q\_{k-1})^2 a\_{k-1}(f)^2}{\sum\_{k \ge n} (1 - q\_k)^2} \stackrel{a.s.}{\longrightarrow} \mu(f)^2.$$

Next, define

$$R\_k = a(X\_k)(f)^2 \quad \text{and} \quad M\_n = d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \{ R\_k - E(R\_k \mid \mathcal{F}\_{k-1}) \}.$$

Then,

$$\begin{array}{rcl} E(M\_{\mathfrak{n}}^{2}) & = & d\_{\mathfrak{n}}^{4} \sum\_{k \geq \underline{n}} (1 - q\_{k-1})^{4} E\left\{ \left( R\_{k} - E(\mathcal{R}\_{k} \mid \mathcal{F}\_{k-1}) \right)^{2} \right\} \\ & \leq & 4 \epsilon^{4} d\_{\mathfrak{n}}^{4} \sum\_{k \geq \underline{n}} (1 - q\_{k-1})^{4} \\ & \leq & 4 \epsilon^{4} d\_{\mathfrak{n}}^{2} \sup\_{k \geq \underline{n}} (1 - q\_{k-1})^{2} \cdot d\_{\mathfrak{n}}^{2} \sum\_{k \geq \underline{n}} (1 - q\_{k-1})^{2} \\ & \longrightarrow & 0. \end{array}$$

Moreover,

$$E(\mathbb{R}\_k \mid \mathcal{F}\_{k-1}) = E\left\{ \int a(\mathbf{x}) (f)^2 \,\mu(d\mathbf{x}) \mid \mathcal{F}\_{k-1} \right\} \stackrel{a.s.}{\longrightarrow} \int a(\mathbf{x}) (f)^2 \,\mu(d\mathbf{x}).$$

Therefore,

$$d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \mathcal{R}\_k = M\_n + d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \, ^2E(\mathcal{R}\_k \mid \mathcal{F}\_{k-1}) \stackrel{P}{\longrightarrow} \int a(\mathbf{x}) (f)^2 \, \mu(d\mathbf{x}) .$$

By the same argument, it follows that

$$d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \left. a(X\_k)(f) \right| a\_{k-1}(f) \stackrel{P}{\longrightarrow} \mu(f) \int \mathfrak{a}(\mathbf{x})(f) \, \mu(d\mathbf{x}) .$$

In addition, as proved in the Claim below,

$$
\int \mathfrak{a}(\mathfrak{x})(f)\,\mu(d\mathfrak{x}) \stackrel{a.s.}{=} \mu(f).
$$

Collecting all pieces together, one finally obtains

$$\int\_{\mathbb{R}} d\_n^2 \sum\_{k \ge n} (1 - q\_{k-1})^2 \left\{ a(X\_k)(f) - a\_{k-1}(f) \right\}^2 \stackrel{P}{\longrightarrow} \mu(f)^2 + \int a(\mathbf{x})(f)^2 \,\mu(d\mathbf{x}) - 2\,\mu(f)^2 = \sigma^2.$$

Hence, condition (6) holds.

This concludes the proof of (5). We next prove that (5) ⇒ (4). Let *p* ≥ 1 and *f*1, ... , *fp* ∈ *F*. Fix *u*1,..., *up* ∈ R and define

$$\mathcal{U}\_{\rm II} = \sum\_{j=1}^{p} \mu\_{j} D\_{\rm u}(f\_{j}) \quad \text{and} \quad \sigma\_{\rm u}^{2} = \sum\_{j,k} \mu\_{j} \mu\_{k} \sigma\_{jk}.$$

Moreover, for each *<sup>H</sup>* ∈ A+, define the probability measure

$$
\lambda\_H(A) = E\left\{ \mathcal{N}(0, \sigma\_\mu^2)(A) \mid H \right\} \qquad \text{for each Borel set } A \subset \mathbb{R}.
$$

We have to show that

$$P(\mathcal{U}\_{\mathcal{U}} \in \cdot \mid H) \longrightarrow \lambda\_H \text{ weakly for each } H \in \mathcal{A}^+. \tag{9}$$

To this end, call *φ<sup>H</sup>* the characteristic function of *λH*, namely

$$\phi\_H(t) = E\left(\int e^{itx} \mathcal{N}(0, \sigma\_u^2)(dx) \mid H\right) = E\left(e^{-t^2 \sigma\_u^2/2} \mid H\right) \quad \text{for all } t \in \mathbb{R}.$$

Letting *f* = ∑*<sup>p</sup> <sup>j</sup>*=<sup>1</sup> *uj fj*, one obtains

$$\mathcal{U}\_{\mathfrak{U}} = D\_{\mathfrak{u}}(f) \quad \text{and} \quad \sigma\_{\mathfrak{u}}^2 = \int a(\mathfrak{x}) (f)^2 \,\mu(d\mathfrak{x}) - \mu(f)^2 \,\mu$$

Therefore, condition (5) yields

$$E\left(e^{it\,UL\_n}\right) = E\left(E\left\{e^{it\,D\_n(f)} \mid \mathcal{F}\_n\right\}\right) \longrightarrow E\left(e^{-t^2\,\sigma\_u^2/2}\right) = \phi\_\Omega(t)$$

for each *<sup>t</sup>* ∈ R. Hence, condition (9) holds for *<sup>H</sup>* = <sup>Ω</sup>. Next, suppose *<sup>H</sup>* ∈ ( *<sup>n</sup>* F*<sup>n</sup>* and *P*(*H*) > 0. Then, for large *n*, one obtains

$$E\left(\mathbf{1}\_H e^{i\operatorname{t\u{L}}\_n}\right) = E\left(\mathbf{1}\_H E\left\{e^{i\operatorname{t\u{D}}\_n(f)} \mid \mathcal{F}\_n\right\}\right).$$

Hence, for each *t* ∈ R, condition (5) still implies

$$P(H)\,\phi\_H(t) = E\left(\mathbf{1}\_H\,\boldsymbol{\varepsilon}^{-t^2\,\sigma\_u^2/2}\right) = \lim\_{n} E\left(\mathbf{1}\_H\,E\left\{\boldsymbol{\varepsilon}^{it\,D\_n(f)} \mid \mathcal{F}\_n\right\}\right) = \lim\_{n} E\left(\mathbf{1}\_H\,\boldsymbol{\varepsilon}^{it\,L\_n}\right).$$

Therefore, condition (9) holds whenever *<sup>H</sup>* ∈ ( *<sup>n</sup>* F*<sup>n</sup>* and *P*(*H*) > 0. Based on this fact, by standard arguments, condition (9) easily follows for each *<sup>H</sup>* ∈ A+.

To conclude the proof of the Theorem, it remains only to show that:

**Claim:** *<sup>α</sup>*(*x*)(*f*) *<sup>μ</sup>*(*dx*) *<sup>a</sup>*.*s*. <sup>=</sup> *<sup>μ</sup>*(*f*) for all *<sup>f</sup>* <sup>∈</sup> *<sup>F</sup>*.

**Proof of the Claim:** By (3), *α* is a regular conditional distribution for *ν* given a sub-*σ*field of B, where *ν* is the marginal distribution of *X*1. Therefore, as proved in ([3] Lemma 6), there is a set *A* ∈ B such that *ν*(*A*) = 1 and

$$\int \mathfrak{a}(z)(f)\mathfrak{a}(\mathfrak{x})(dz) = \mathfrak{a}(\mathfrak{x})(f) \qquad \text{for all } \mathfrak{x} \in A \text{ and } f \in F.$$

Since *X* is c.i.d. (and, thus, identically distributed) one also obtains *P*(*Xn* ∈ *A*) = *ν*(*A*) = 1 for all *n* ≥ 1.

Having noted these facts, fix *f* ∈ *F*. Since *a*<sup>0</sup> = *ν* and *α* is a regular conditional distribution for *<sup>ν</sup>*, 

$$
\int a(x)(f) \, a\_0(dx) = a\_0(f).
$$

Moreover, if *α*(*x*)(*f*) *an*(*dx*) = *an*(*f*) a.s. for some *n* ≥ 0, then

$$\begin{array}{rcl} \int \mathfrak{a}(\mathfrak{x})(f) \, a\_{n+1}(d\mathfrak{x}) &=& q\_n \int \mathfrak{a}(\mathfrak{x})(f) \, a\_n(d\mathfrak{x}) + (1 - q\_n) \int \mathfrak{a}(\mathfrak{x})(f) \, \mathfrak{a}(X\_{n+1})(d\mathfrak{x}) \\ &=& q\_n \, a\_n(f) + (1 - q\_n) \, \mathfrak{a}(X\_{n+1})(f) \\ &=& a\_{n+1}(f) & \text{a.s.} \end{array}$$

By induction, one obtains *α*(*x*)(*f*) *an*(*dx*) = *an*(*f*) a.s. for each *n* ≥ 0. Hence,

$$\int a(\mathbf{x})(f)\,\mu(d\mathbf{x}) = \lim\_{n} \int a(\mathbf{x})(f)\,a\_{\mathrm{ll}}(d\mathbf{x}) = \lim\_{n} a\_{\mathrm{ll}}(f) = \mu(f) \qquad \text{a.s.}$$

We do not know whether *E ϕ Dn*(*f*) | F*n* converges a.s. (and not only in probability) under the conditions of Theorem 1. However, it can be shown that *E ϕ Dn*(*f*) | F*n* converges a.s. under slightly stronger conditions on *qn*.

Under conditions (2)–(3), for Theorem 1 to work, it suffices that

$$\lim\_{n} n^{b} \left(1 - q\_{n}\right) = c \qquad \text{for some } b > 1/2 \text{ and } c > 0. \tag{10}$$

In addition, if (10) holds, then

$$\frac{n^{b-1/2}}{d\_n} \to \frac{c}{\sqrt{2b-1}}$$

Hence, letting *D*∗ *<sup>n</sup>* <sup>=</sup> *<sup>n</sup>b*−1/2(*an* <sup>−</sup> *<sup>μ</sup>*), one obtains

$$\left(D\_n^\*(f\_1), \ldots, D\_n^\*(f\_p)\right) \longrightarrow \mathcal{N}\_p\left(0, \frac{c^2}{2b-1}\Sigma\right) \qquad \text{stably.}$$

for all *p* ≥ 1 and all *f*1,..., *fp* ∈ *F*, provided conditions (2), (3) and (10) hold.

We close this note with some examples.

#### **Example 1.** *Let*

$$q\_n = \frac{n + \theta\_n}{n + 1 + \theta\_{n+1}}$$

*where* (*θn*) *is a bounded increasing sequence with θ*<sup>0</sup> > 0*. Then, X is c.i.d. (because of* (2)*–*(3)*) but is exchangeable if and only if θ<sup>n</sup>* = *θ*<sup>0</sup> *for all n. In any case, since condition* (10) *holds with b* = *c* = 1*, Theorem <sup>1</sup> applies and dn can be replaced by* <sup>√</sup>*n. Letting D*<sup>∗</sup> *<sup>n</sup>* <sup>=</sup> <sup>√</sup>*<sup>n</sup>* (*an* <sup>−</sup> *<sup>μ</sup>*)*, it follows that*

$$\left(D\_n^\*(f\_1), \ldots, D\_n^\*(f\_p)\right) \longrightarrow \mathcal{N}\_p(0, \Sigma) \qquad \text{stably.}$$

*It is worth noting that, in the special case θ<sup>n</sup>* = *θ*<sup>0</sup> *for all n, the predictive distributions of X reduce to*

$$a\_n = \frac{\theta\_0 \,\nu + \sum\_{i=1}^n \mathfrak{a}(X\_i)}{n + \theta\_0}.$$

*Therefore, X is a Dirichlet sequence if α* = *δ. The general case, where α is any kernel satisfying condition* (3)*, is investigated in [30]. It turns out that X satisfies most properties of Dirichlet sequences. In particular, μ has the same distribution as*

$$\mu^\* = \sum\_{n} V\_n \,\alpha(Y\_n)\_{\prime\prime}$$

*where* (*Vn*) *and* (*Yn*) *are independent sequences,* (*Yn*) *is i.i.d. with Y*<sup>1</sup> ∼ *ν, and* (*Vn*) *has the stick breaking distribution. Nevertheless, as shown in the next example, X can behave quite differently from a Dirichlet sequence.*

**Example 2** (Example 1 continued)**.** *Let* H *be a countable partition of S such that H* ∈ B *and ν*(*H*) > 0 *for all H* ∈ H*. Define*

$$\mathfrak{a}(\mathbf{x}) = \sum\_{H \in \mathcal{H}} \mathbf{1}\_H(\mathbf{x}) \,\nu(\cdot \mid H) = \nu(\cdot \mid H\_{\mathbf{x}}) \qquad \text{for all } \mathbf{x} \in S$$

*where Hx is the only element of the partition* H*, such that x* ∈ *H. Then, α is a regular conditional distribution for ν given σ*(H) *(i.e., condition* (3) *holds). If the qn are as in Example 1 with θ<sup>n</sup>* = *θ*<sup>0</sup> *for all n, one obtains*

$$a\_{\rm nl} = \frac{\theta\_0 \,\nu + \sum\_{i=1}^n \nu(\cdot \mid H\_{X\_i})}{n + \theta\_0}.$$
 
$$a\_{\rm nl} \ll \nu \qquad \text{for all } n \ge 0. \tag{11}$$

*Therefore,*

*This is a striking difference with respect to Dirichlet sequences. For instance, if ν is non-atomic, condition* (11) *yields*

$$P(X\_i = X\_j \text{ for some } i \neq j) = 0.$$

*while P*(*Xi* = *Xj for some i* = *j*) = 1 *if X is a Dirichlet sequence. Note also that, for each f* ∈ *F,*

$$\sigma^2 = \int \mathfrak{a}(\mathfrak{x}) (f)^2 \, \mu(d\mathfrak{x}) - \mu(f)^2 = \sum\_{H \in \mathcal{H}} \nu(f \mid H)^2 \, \mu(H) - \mu(f)^2$$

*while <sup>σ</sup>*<sup>2</sup> <sup>=</sup> *<sup>μ</sup>*(*<sup>f</sup>* <sup>2</sup>) <sup>−</sup> *<sup>μ</sup>*(*f*)<sup>2</sup> *if <sup>X</sup> is a Dirichlet sequence. Other choices of <sup>α</sup>, which make <sup>X</sup> quite different from a Dirichlet sequence, are in [30].*

**Example 3.** *A meaningful special case is* ∑*n*(1 − *qn*) < ∞*. In this case,*

$$\prod\_{j=0}^{\infty} q\_j := \lim\_{n} \prod\_{j=0}^{n} q\_j$$

*exists and is strictly positive. Hence, μ admits the representation*

$$\mu = \nu \prod\_{j=0}^{\infty} q\_j \, + \sum\_{i=1}^{\infty} \kappa(X\_i) \, (1 - q\_{i-1}) \prod\_{j=i}^{\infty} q\_j \, .$$

*As an example, under conditions* (2)*–*(3)*, Theorem 1 applies whenever*

$$q\_n = \exp\{-\left(c+n\right)^{-2}\} \qquad \text{for some constant } c > 0.1$$

*With this choice of qn, one obtains* (<sup>1</sup> <sup>−</sup> *qn*) (*<sup>c</sup>* <sup>+</sup> *<sup>n</sup>*)<sup>2</sup> <sup>→</sup> <sup>1</sup>*, so that* <sup>∑</sup>*n*(<sup>1</sup> <sup>−</sup> *qn*) <sup>&</sup>lt; <sup>∞</sup> *and <sup>μ</sup> can be written as above. Note also that*

$$\lim\_{n} \frac{d\_n}{(c+n)^{3/2}} = \sqrt{3}.$$

*Therefore, for fixed <sup>f</sup>* <sup>∈</sup> *F, the rate of convergence of an*(*f*) *to <sup>μ</sup>*(*f*) *is <sup>n</sup>*−3/2 *and not the usual <sup>n</sup>*−1/2*.*

**Author Contributions:** Methodology, P.B., L.P. and P.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 817257.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We are grateful to Giorgio Letta and Eugenio Regazzini. They not only introduced us to probability theory, they also shared with us their enthusiasm and some of their expertise.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

