*Article* **Predictive Constructions Based on Measure-Valued Pólya Urn Processes**

**Sandra Fortini 1,\*, Sonia Petrone <sup>1</sup> and Hristo Sariev <sup>2</sup>**


**\*** Correspondence: sandra.fortini@unibocconi.it

**Abstract:** Measure-valued Pólya urn processes (MVPP) are Markov chains with an additive structure that serve as an extension of the generalized *k*-color Pólya urn model towards a continuum of possible colors. We prove that, for any MVPP (*μn*)*n*≥<sup>0</sup> on a Polish space X, the normalized sequence (*μn*/*μn*(X))*n*≥<sup>0</sup> agrees with the marginal predictive distributions of some random process (*Xn*)*n*≥1. Moreover, *<sup>μ</sup><sup>n</sup>* = *<sup>μ</sup>n*−<sup>1</sup> + *RXn* , *<sup>n</sup>* ≥ 1, where *<sup>x</sup>* → *Rx* is a random transition kernel on X; thus, if *μn*−<sup>1</sup> represents the contents of an urn, then *Xn* denotes the color of the ball drawn with distribution *<sup>μ</sup>n*−1/*μn*−1(X) and *RXn*—the subsequent reinforcement. In the case *RXn* = *WnδXn* , for some non-negative random weights *<sup>W</sup>*1, *<sup>W</sup>*2, ..., the process (*Xn*)*n*≥<sup>1</sup> is better understood as a randomly reinforced extension of Blackwell and MacQueen's Pólya sequence. We study the asymptotic properties of the predictive distributions and the empirical frequencies of (*Xn*)*n*≥<sup>1</sup> under different assumptions on the weights. We also investigate a generalization of the above models via a randomization of the law of the reinforcement.

**Keywords:** predictive distributions; random probability measures; reinforced processes; Pólya sequences; urn schemes; Bayesian inference; conditional identity in distribution; total variation distance

**MSC:** 60G57; 60B10; 60G25; 60F05; 60G09

## **1. Introduction**

Let (*Xn*)*n*≥<sup>1</sup> be a sequence of homogeneous random observations, taking values in a Polish space X. The central assumption in the Bayesian approach to inductive reasoning is that (*Xn*)*n*≥<sup>1</sup> is exchangeable, that is, its law is invariant under finite permutations. Then, by de Finetti's theorem, there exists a random probability measure *P*˜ on X such that, given *P*˜, the random variables *X*1, *X*2, ... are conditionally independent and identically distributed with marginal distribution *P*˜ (see [1], Section 3), denoted

$$X\_n \mid P \stackrel{i.i.d.}{\sim} P.\tag{1}$$

Furthermore, *P*˜ is the almost sure (a.s.) weak limit of the predictive distributions and the empirical frequencies,

$$\mathbb{P}(X\_{n+1}\in\cdot \mid X\_1,\ldots,X\_n) \stackrel{w}{\longrightarrow} \tilde{P}(\cdot) \quad \text{a.s.} \qquad \text{and} \qquad \frac{1}{n} \sum\_{i=1}^n \delta\_{X\_i}(\cdot) \stackrel{w}{\longrightarrow} \tilde{P}(\cdot) \quad \text{a.s.} \tag{2}$$

The model (1) is completed by choosing a prior distribution for *P*˜. Inference consists in computing the conditional (posterior) distribution of *P*˜ given an observed sample (*X*1, ... , *Xn*), with most inferential conclusions depending on some average with respect to the posterior distribution; for example, under squared loss, for any measurable set

**Citation:** Fortini, S.; Petrone, S.; Sariev, H. Predictive Constructions Based on Measure-Valued Pólya Urn Processes. *Mathematics* **2021**, *9*, 2845. https://doi.org/10.3390/ math9222845

Academic Editors: Emanuele Dolera and Federico Bassetti

Received: 4 October 2021 Accepted: 8 November 2021 Published: 10 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

*<sup>B</sup>* <sup>⊆</sup> <sup>X</sup>, the best estimate of *<sup>P</sup>*˜(*B*) is the posterior mean, <sup>E</sup>[*P*˜(*B*)|*X*1, ... , *Xn*]. In addition, the posterior mean can be utilized for predictive inference since

$$\mathbb{P}(X\_{n+1}\in B|X\_1,\ldots,X\_n) = \mathbb{E}[\tilde{P}(B)|X\_1,\ldots,X\_n].\tag{3}$$

A different modeling strategy uses the Ionescu–Tulcea theorem to define the law of the process from the sequence of predictive distributions, (P(*Xn*+<sup>1</sup> ∈ ·|*X*1, ... , *Xn*))*n*≥1. In that case, one can refer to Theorem 3.1 in [2] for necessary and sufficient conditions on (P(*Xn*+<sup>1</sup> ∈ ·|*X*1, ... , *Xn*))*n*≥<sup>1</sup> to be consistent with exchangeability. The predictive approach to model building is deeply rooted in Bayesian statistics, where the parameter *P*˜ is assigned an auxiliary role and the focus is on observable "facts", see [2–6]. Moreover, using the predictive distributions as primary objects allows one to make predictions instantly or helps ease computations. See [7] for a review on some well-known predictive constructions of priors for Bayesian inference.

In this work, we consider a class of predictive constructions based on measure-valued Pólya urn processes (MVPP). MVPPs have been introduced in the probabilistic literature [8,9] as an extension of *k*-color urn models, but their implications for (Bayesian) statistics have yet to be explored. A first aim of the paper is thus to show the potential use of MVPPs as predictive constructions in Bayesian inference. In fact, some popular models in Bayesian nonparametric inference can be framed in such a way, see Equation (8). A second aim of the paper is to suggest novel extensions of MVPPs that we believe can offer more flexibility in statistical applications.

MVPPs are essentially measure-valued Markov processes that have an additive structure, with the formal definition being postponed to Section 2.1 (Definition 1). Given an MVPP (*μn*)*n*≥0, we consider a sequence of random observations that are characterized by P(*X*<sup>1</sup> ∈ ·) = *μ*0(·)/*μ*0(X) and, for *n* ≥ 1,

$$\mathbb{P}(X\_{n+1}\in \cdot \mid X\_1, \mu\_1, \dots, X\_{\mathfrak{n}}, \mu\_{\mathfrak{n}}) = \frac{\mu\_{\mathfrak{n}}(\cdot)}{\mu\_{\mathfrak{n}}(\mathbb{X})}.\tag{4}$$

The random measure *μ<sup>n</sup>* is not necessarily measurable with respect to (*X*1, ... , *Xn*), so the predictive construction (4) is more flexible than models based solely on the predictive distributions of (*Xn*)*n*≥1; for example, (*μn*)*n*≥<sup>0</sup> allows for the presence of latent variables or other sources of observable data (see also [10] for a covariate-based predictive construction). However, (4) can lead to an imbalanced design, which may break the symmetry imposed by exchangeability. Nevertheless, it is still possible that the sequence (*Xn*)*n*≥<sup>1</sup> satisfies (2) for some *<sup>P</sup>*˜, in which case Lemma 8.2 in [1] implies that (*Xn*)*n*≥<sup>1</sup> is asymptotically exchangeable with directing random measure *P*˜.

In Theorem 1, we show that, taking (*μn*)*n*≥<sup>0</sup> as primary, the sequence (*Xn*)*n*≥<sup>1</sup> in (4) can be chosen such that

$$
\mu\_n = \mu\_{n-1} + R\_{X\_{n'}} \tag{5}
$$

where *x* → *Rx* is a measurable map from X to the space of finite measures on X. Models of the kind (4)–(5) are computationally efficient. Indeed, as new observations become available, predictions can be updated at a constant computational cost and with limited storage of information. If, in addition, (*Xn*)*n*≥<sup>1</sup> is asymptotically exchangeable, then (4)– (5) can provide a computationally simple approximation of an exchangeable scheme for Bayesian inference, along the lines in [11].

The recursive formula (5) allows us to interpret the dynamics of MVPPs in terms of an urn sampling scheme, as the name suggests. Let *μ*<sup>0</sup> be a non-random finite measure on X. Suppose we have an urn whose contents are described by *μ*<sup>0</sup> in the sense that *μ*0(*B*) denotes the total mass of balls with colors in *B* ⊆ X. At time *n* = 1, a ball is extracted at random from the urn, and we denote its color by *X*1. The urn is then reinforced according to a replacement rule (*Rx*)*x*∈X, so that the updated composition becomes *<sup>μ</sup>*<sup>1</sup> ≡ *<sup>μ</sup>*<sup>0</sup> + *RX*<sup>1</sup> . At any time *<sup>n</sup>* > 1, a ball of color *Xn* is picked with probability distribution *<sup>μ</sup>n*−1/*μn*−1(X), and the contents of the urn are subsequently reinforced by *RXn* . In the case the space of

colors is finite, |X| = *k*, the above procedure is better known as a generalized *k*-color Pólya urn [12].

We focus our analysis on MVPPs for which *Rx* is concentrated on *x*; thus, after each draw, we reinforce only the color of the observed ball. More formally, we consider MVPPs that have a reinforcement measure of the kind *RXn* = *WnδXn* , *n* ≥ 1, where *Wn* is some non-negative random variable. In that case, Equations (4) and (5) become

$$\mathbb{P}(\mathbf{X}\_{n+1}\in\cdot \mid \mathbf{X}\_1, \mathbf{W}\_1, \dots, \mathbf{X}\_n, \mathbf{W}\_n) = \sum\_{i=1}^n \frac{\mathbf{W}\_i}{\mu\_0(\mathbf{X}) + \sum\_{j=1}^n \mathbf{W}\_j} \delta\_{\mathbf{X}\_i}(\cdot) + \frac{\mu\_0(\mathbf{X})}{\mu\_0(\mathbf{X}) + \sum\_{j=1}^n \mathbf{W}\_j} \mu\_0'(\cdot),\tag{6}$$

and

$$
\mu\_n = \mu\_{n-1} + \mathcal{W}\_n \delta\_{\mathcal{X}\_n}.\tag{7}
$$

A notable example is Blackwell and MacQueen's em Pólya sequence [13], which is a random process (*Xn*)*n*≥<sup>1</sup> characterized by P(*X*<sup>1</sup> ∈ ·) = *<sup>ν</sup>*(·) and, for *<sup>n</sup>* ≥ 1,

$$\mathbb{P}(X\_{n+1}\in \cdot \mid X\_1, \dots, X\_n) = \sum\_{i=1}^n \frac{1}{\theta + n} \delta\_{X\_i}(\cdot) + \frac{\theta}{\theta + n} \nu(\cdot),\tag{8}$$

for some probability measure *<sup>ν</sup>* on X and a constant *<sup>θ</sup>* > 0. By [13], (*Xn*)*n*≥<sup>1</sup> is exchangeable and corresponds to the model (1) with Dirichlet process prior with parameters (*θ*, *ν*). It is easily seen that (8) is related to the MVPP (*μn*)*n*≥<sup>0</sup> given by *μ*<sup>0</sup> = *θν* and, for *n* ≥ 1,

$$
\mu\_n = \mu\_{n-1} + \delta\_{X\_n}.
$$

Therefore, we will call any MVPP a randomly reinforced Pólya process (RRPP) if it admits representation (6)–(7).

Existing studies on MVPPs look at models that have mostly a balanced design, i.e., *Rx*(X) = *<sup>r</sup>*, *<sup>x</sup>* ∈ X, and assume irreducibility-like conditions for (*Rx*)*x*∈X, see [8,9,14,15] and Remark 4 in [16]. In contrast, RRPPs require that *Rx*({*x*}*c*) = 0, and so are excluded from the analysis in those papers. In fact, this difference in reinforcement mechanisms mirrors the dichotomy within *k*-color urn models, where the replacement *R* is best described in terms of a matrix with random elements. There, the class of randomly reinforced urns [17] assumes an *R* with zero off-diagonal elements (i.e., we reinforce only the color of the observed ball), whereas the generalized Pólya urn models require the mean replacement matrix to be irreducible. Similarly to the *k*-color case, RRPPs need the use of different techniques, which yield completely different results than those in [8,9,14–16]. As an example, Theorem 1 in [16] and our Theorem 2 prove convergence of the kind (2), yet the limit probability measure in [16] is non-random.

The RRPP has been implicitly studied by [17–23], among others, with the focus being on the process (*Xn*)*n*≥1. Those papers deal primarily with the *k*-color case (with the exception of [18,19,23]) and can be categorized on the basis of their assumptions on (*Wn*)*n*≥1. For example, [18,19,21,22] assume that *Wn* and (*X*1, *W*1, ... , *Xn*−1, *Wn*−1, *Xn*) are independent, in which case the process (*Xn*)*n*≥<sup>1</sup> is conditionally identically distributed (c.i.d.) [21], that is, conditionally on current information, all future observations are identically distributed. It follows from [21] that c.i.d. processes preserve many of the properties of exchangeable sequences and, in particular, satisfy (2)–(3). In contrast, [17,20,23] assume that the reinforcement *Wn* depends on the particular color *Xn*, and prove a version of (2) where *P*˜ is concentrated on the set of dominant colors for which the expected reinforcement is maximum. In this work, we reconsider the above models in the framework of RRPPs. For the c.i.d. case, we prove results whose analogues have already been established by [23] for the model with dominant colors. In particular, we extend the convergence in (2) to be in total variation and give a unified central limit theorem. We also examine the number of distinct values that are generated by the sequence (*Xn*)*n*≥1.

In some applications, the definition of an MVPP can be too restrictive as it assumes that the probability law of the reinforcement *R* is known. However, we can envisage situations where the law is itself random, so we extend the definition of an MVPP by introducing a random parameter *V*. The resulting generalized measure-valued Pólya urn process (GMVPP) turns out to be a mixture of Markov processes and admits representation (4)–(5), conditional on the parameter *V*. When the reinforcement measure *Rx* is concentrated on *x*, we call (*μn*)*n*≥<sup>0</sup> a generalized randomly reinforced Pólya process (GRRPP). We give a characterization of GRRPPs with exchangeable weights (*Wn*)*n*≥<sup>1</sup> and show that the process ((*Xn*, *Wn*))*n*≥<sup>1</sup> is partially conditionally identically distributed (partially c.i.d) [24], that is, conditionally on the past observations and the concurrent observation from the other sequence, the future observations are marginally identically distributed. We also extend some of the results for RRPPs to the generalized setting.

The paper is structured as follows. In Section 2.1, we recall the definition of a measurevalued Pólya urn process and prove representation (4)–(5) for a suitably selected sequence (*Xn*)*n*≥1. Section 2.2 defines a particular subclass of MVPPs, called randomly reinforced Pólya processes (RRPP), which share with exchangeable Pólya sequences the property of reinforcing only the observed color. Section 3 is devoted to the study of the asymptotic properties of RRPPs. In Section 4, we give the definition of GMVPPs and GRRPPs, and obtain basic results.

#### **2. Definitions and a Representation Result**

Let (X, *d*) be a complete separable metric space, endowed with its Borel *σ*-field X . Denote by

$$\mathsf{M}\_F(\mathsf{X}), \qquad \mathsf{M}\_F^\*(\mathsf{X}), \qquad \mathsf{M}\_P(\mathsf{X}), \dots$$

the collections of measures *μ* on X that are finite, finite and non-null, and probability measures, respectively. We regard M*F*(X), M<sup>∗</sup> *<sup>F</sup>*(X) and M*P*(X) as measurable spaces equipped with the *σ*-fields generated by *μ* → *μ*(*B*), *B* ∈ X . We further let

$$\mathbb{K}\_F(\mathbb{X}, \mathbb{Y}), \qquad \mathbb{K}\_P(\mathbb{X}, \mathbb{Y}),$$

be the collections of transition kernels *K* from X to Y that are finite and probability kernels, respectively. Any non-null measure *μ* ∈ M<sup>∗</sup> *<sup>F</sup>*(X) has a normalized version *μ* = *μ*/*μ*(X). If *<sup>f</sup>* : <sup>X</sup> <sup>→</sup> <sup>Y</sup> is measurable, then *<sup>f</sup>* - : M*F*(X) → M*F*(Y) denotes the induced mapping of measures, *f* -(*μ*)(·) = *<sup>μ</sup>*(*<sup>f</sup>* <sup>−</sup>1(·)), *<sup>μ</sup>* <sup>∈</sup> <sup>M</sup>*F*(X).

All random quantities are defined on a common probability space (Ω, H, P), which is assumed to be rich enough to support any required randomization. The symbol '⊥" will be used to denote independence between random objects, and " *<sup>d</sup>* =" equality in distribution.

## *2.1. Measure-Valued Pólya urn Processes*

Let *μ* ∈ M<sup>∗</sup> *<sup>F</sup>*(X) describe the contents of an urn, as in Section 1. Once a ball is picked at random from *μ*, the urn is reinforced according to a replacement rule, which is formally a kernel *R* ∈ K*F*(X, X) that maps colors *x* → *Rx*(·) to finite measures; thus,

$$
\mu + R\_{\text{x}\prime} \tag{9}
$$

represents the updated urn composition if a ball of color *x* has been observed. In general, *R* is random and there exists a probability kernel R ∈ K*P*(X, M*F*(X)) such that *Rx* ∼ R*x*, *x* ∈ X. Then, the distribution of (9) prior to the sampling of the urn is given by

$$\hat{\mathcal{R}}\_{\mu}(\cdot) = \int\_{\mathcal{X}} \psi\_{\mu}^{\sharp}(\mathcal{R}\_{x})(\cdot) \mu'(dx),\tag{10}$$

where *ψμ* is the measurable map *ν* → *ν* + *μ* from M*F*(X) to M<sup>∗</sup> *<sup>F</sup>*(X). By Lemma 3.3 in [9], *μ* <sup>→</sup> *<sup>R</sup>*<sup>ˆ</sup> *<sup>μ</sup>* is a measurable map from <sup>M</sup><sup>∗</sup> *<sup>F</sup>*(X) to M*P*(M<sup>∗</sup> *<sup>F</sup>*(X)).

**Definition 1** (Measure-Valued Pólya Urn Process [9])**.** *A sequence* (*μn*)*n*≥<sup>0</sup> *of random finite measures on* X *is called a measure-valued Pólya urn process (MVPP) with parameters μ*<sup>0</sup> ∈ M<sup>∗</sup> *<sup>F</sup>*(X) *and* R ∈ <sup>K</sup>*P*(X, <sup>M</sup>*F*(X)) *if it is a Markov process with transition kernel <sup>R</sup>*<sup>ˆ</sup> *given by* (10)*. If, in particular,* R*<sup>x</sup>* = *<sup>δ</sup>Rx for some <sup>R</sup>* ∈ K*F*(X, X)*, then* (*μn*)*n*≥<sup>0</sup> *is said to be a deterministic MVPP.*

The representation theorem below formalizes the idea of MVPP as an urn scheme.

**Theorem 1.** *A sequence* (*μn*)*n*≥<sup>0</sup> *of random finite measures is an MVPP with parameters* (*μ*0, R) *if and only if, for every n* ≥ 1*,*

$$
\mu\_n = \mu\_{n-1} + R\_{X\_n} \quad \text{a.s.} \tag{11}
$$

*where* (*Xn*)*n*≥<sup>1</sup> *is a sequence of* X*-valued random variables such that X*<sup>1</sup> ∼ *<sup>μ</sup>* <sup>0</sup> *and, for n* ≥ 2*,*

$$\mathbb{P}(X\_n \in \cdot \mid X\_1, \mu\_1, \dots, X\_{n-1}, \mu\_{n-1}) = \mu\_{n-1}'(\cdot),\tag{12}$$

*and R is a random finite transition kernel on* X *such that*

$$\mathbb{P}(\mathcal{R}\_{X\_n} \in \cdot \mid X\_1, \mu\_1, \dots, X\_{n-1}, \mu\_{n-1}, X\_n) = \mathcal{R}\_{X\_n}(\cdot). \tag{13}$$

**Proof.** If (*μn*)*n*≥<sup>0</sup> satisfies (11)–(13) for every *n* ≥ 1, then it holds a.s. that

$$\mathbb{P}(\mu\_n \in \cdot \mid \mu\_1, \dots, \mu\_{n-1}) = \mathbb{E}[\psi\_{\mu\_{n-1}}^\sharp(\mathcal{R}\_{X\_n})(\cdot) \mid \mu\_1, \dots, \mu\_{n-1}] = \hat{\mathcal{R}}\_{\mu\_{n-1}}(\cdot) \cdot$$

Conversely, suppose (*μn*)*n*≥<sup>0</sup> is a MVPP with parameters (*μ*0, R). As R is a probability kernel from X to M*F*(X) and M*F*(X) is Polish, then there exists by Lemma 4.22 in [25] a measurable function *f*(*x*, *u*) such that, for every *x* ∈ X,

$$f(\mathbf{x}, \mathcal{U}) \sim \mathcal{R}\_{\mathbf{x}, \mathcal{U}}$$

whenever *U* is a uniform random variable on [0, 1], denoted *U* ∼ Unif[0, 1].

Let us prove by induction that there exists a sequence ((*Xn*, *Un*))*n*≥<sup>1</sup> such that *X*<sup>1</sup> ∼ *μ* 0, *U*<sup>1</sup> ⊥ *X*1, *U*<sup>1</sup> ∼ Unif[0, 1], *μ*<sup>1</sup> = *μ*<sup>0</sup> + *f*(*X*1, *U*1) a.s., (*μ*2, *μ*3, ...) ⊥ (*X*1, *U*1) | *μ*1, and, for every *n* ≥ 2,


Then, Equations (11)–(13) follow from (*i*)–(*iii*) with *RXn* = *f*(*Xn*, *Un*).

Regarding the base case, let *X*˜ <sup>1</sup> and *U*˜ <sup>1</sup> be independent random variables such that *<sup>U</sup>*˜ <sup>1</sup> <sup>∼</sup> Unif[0, 1] and *<sup>X</sup>*˜ <sup>1</sup> <sup>∼</sup> *<sup>μ</sup>* <sup>0</sup>. It follows that, for any measurable set *B* ⊆ M*F*(X),

$$\mathbb{P}(\mu\_1 \in B) = \mathring{R}\_{\mu\_0}(B) = \mathbb{E}[\psi\_{\mu\_0}^\sharp(\mathcal{R}\_{\widehat{X}\_1})(B)] = \mathbb{P}((\mu\_0 + f(\check{X}\_1, \check{U}\_1)) \in B);$$

thus, *μ*<sup>1</sup> *d* = *μ*<sup>0</sup> + *f*(*X*˜ 1, *U*˜ <sup>1</sup>). By Theorem 8.17 in [25], there exist random variables *X*<sup>1</sup> and *U*<sup>1</sup> such that

$$(\mu\_1, \mathcal{X}\_1, \mathcal{U}\_1) \stackrel{d}{=} (\mu\_0 + f(\mathcal{X}\_1, \mathcal{U}\_1), \mathcal{X}\_1, \mathcal{U}\_1)\_{\prime\prime}$$

and (*μ*2, *<sup>μ</sup>*3, ...) <sup>⊥</sup> (*X*1, *<sup>U</sup>*1) <sup>|</sup> *<sup>μ</sup>*1. Then, in particular, (*X*1, *<sup>U</sup>*1) *<sup>d</sup>* = (*X*˜ 1, *U*˜ <sup>1</sup>) and (*μ*1, *μ*<sup>0</sup> + *<sup>f</sup>*(*X*1, *<sup>U</sup>*1)) *<sup>d</sup>* = (*μ*<sup>0</sup> + *f*(*X*˜ 1, *U*˜ <sup>1</sup>), *μ*<sup>0</sup> + *f*(*X*˜ 1, *U*˜ <sup>1</sup>)), so

$$
\mu\_1 = \mu\_0 + f(X\_1, U\_1) \qquad \text{a.s.}
$$

Regarding the induction step, assume that (*i*)–(*v*) hold true until some *n* > 1. Let *<sup>X</sup>*˜ *<sup>n</sup>*+<sup>1</sup> and *<sup>U</sup>*˜ *<sup>n</sup>*+<sup>1</sup> be such that *<sup>U</sup>*˜ *<sup>n</sup>*+<sup>1</sup> <sup>∼</sup> Unif[0, 1], *<sup>U</sup>*˜ *<sup>n</sup>*+<sup>1</sup> <sup>⊥</sup> (*X*1, *<sup>U</sup>*1, *<sup>μ</sup>*1, ... , *Xn*, *Un*, *<sup>μ</sup>n*, *<sup>X</sup>*˜ *<sup>n</sup>*+1), and

$$\mathbb{P}(\mathcal{X}\_{n+1} \in \cdot \mid \mathcal{X}\_1 \wr lI\_1, \mu\_1, \dots, \mathcal{X}\_n \wr lI\_n, \mu\_n) = \mu'\_n(\cdot) \dots$$

It follows from (*v*) that, for any measurable set *B* ⊆ M*F*(X),

$$\begin{aligned} \mathbb{P}(\mu\_{n+1} \in \mathcal{B} | \mathcal{X}\_1, \mathcal{U}\_1, \mu\_1, \dots, \mathcal{X}\_n, \mathcal{U}\_n, \mu\_n) &= \mathbb{E}[\psi\_{\mu\_n}^\sharp(\mathcal{R}\_{\mathcal{X}\_{n+1}})(\mathcal{B}) | \mathcal{X}\_1, \mathcal{U}\_1, \mu\_1, \dots, \mathcal{X}\_n, \mathcal{U}\_n, \mu\_n] \\ &= \mathbb{P}((\mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1})) \in \mathcal{B} | \mathcal{X}\_1, \mathcal{U}\_1, \mu\_1, \dots, \mathcal{X}\_n, \mathcal{U}\_n, \mu\_n); \end{aligned}$$

thus, *μn*+<sup>1</sup> *d* <sup>=</sup> *<sup>μ</sup><sup>n</sup>* <sup>+</sup> *<sup>f</sup>*(*X*˜ *<sup>n</sup>*<sup>+</sup>1, *<sup>U</sup>*˜ *<sup>n</sup>*+1) <sup>|</sup> *<sup>X</sup>*1, *<sup>U</sup>*1, *<sup>μ</sup>*1, ... , *Xn*, *Un*, *<sup>μ</sup>n*. By Theorem 8.17 in [25], there exist random variables *Xn*+<sup>1</sup> and *Un*+<sup>1</sup> such that

$$\begin{aligned} & \left( (\mu\_{n+1}, \mathbf{X}\_1, \mathsf{U}\_1, \mu\_1, \dots, \mathbf{X}\_n, \mathsf{U}\_n, \mu\_n, \mathbf{X}\_{n+1}, \mathsf{U}\_{n+1}) \right) \\ & \stackrel{d}{=} \left( \mu\_n + f(\mathbf{\hat{X}}\_{n+1}, \mathbf{\hat{U}}\_{n+1}), \mathbf{X}\_1, \mathsf{U}\_1, \mathsf{U}\_1, \dots, \mathsf{X}\_n, \mathsf{U}\_n, \mathsf{U}\_n, \mathsf{X}\_{n+1}, \mathsf{U}\_{n+1}), \end{aligned} \right)$$

and (*μn*+2, *μn*+3, ...) ⊥ (*Xn*+1, *Un*+1) | (*X*1, *U*1, *μ*<sup>1</sup> ... , ... , *Xn*, *Un*, *μn*, *μn*+1). Then, in particular, *Un*+<sup>1</sup> ∼ Unif[0, 1], *Un*+<sup>1</sup> ⊥ (*X*1, *U*1, *μ*1,..., *Xn*, *Un*, *μn*, *Xn*+1), and

$$\mathbb{P}(X\_{n+1} \in \cdot \mid X\_1, \mathcal{U}\_1, \mu\_1, \dots, X\_n, \mathcal{U}\_n, \mu\_n) = \mu'\_n(\cdot).$$

Moreover,

$$\left(\mu\_{n+1}, \mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1})\right) \overset{d}{=} \left(\mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1}), \mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1})\right);$$

therefore,

$$\mathbb{P}(\mu\_{n+1} = \mu\_n + f(X\_{n+1}, \mathcal{U}\_{n+1})) = \mathbb{P}(\mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1}) = \mu\_n + f(\mathcal{X}\_{n+1}, \mathcal{U}\_{n+1})) = 1.$$

By Theorem 8.12 in [25], statement (*v*) with *n* + 1 is equivalent to *μn*+<sup>2</sup> ⊥ (*X*1, *U*1) | (*μ*1, ... , *μn*+1) and *μn*+<sup>2</sup> ⊥ (*Xk*+1, *Uk*<sup>+</sup>1) | (*X*1, *U*1, ... , *Xk*, *Uk*, *μ*1, ... , *μn*+1), *k* = 1, ... , *n*. The latter follows from the induction hypothesis since, by (*iv*), we have (*μk*+2, ... , *μn*+2) ⊥ (*Xk*+1, *Uk*<sup>+</sup>1) | (*X*1, *U*1,..., *Xk*, *Uk*, *μ*1,..., *μk*+1) for every *k* = 1, . . . , *n*.

The process (*Xn*)*n*≥<sup>1</sup> in Theorem 1 corresponds to the sequence of observed colors from the implied urn sampling scheme. Furthermore, the replacement rule takes the form *RXn* = *f*(*Xn*, *Un*), where *f* is some measurable function, *Un* ∼ Unif[0, 1], and *Un* ⊥ (*X*1, *U*1,..., *Xn*−1, *Un*−1, *Xn*), from which it follows that

$$
\mu\_n = \mu\_{n-1} + f(X\_n, \mathcal{U}\_n),
\tag{14}
$$

and

$$\mathbb{P}(X\_{n+1}\in \cdot \mid X\_1, \dots, X\_{n\prime} \left(\mathcal{U}\_{\mathbf{m}}\right)\_{\mathbf{m}\geq 1}) = \frac{\mu\_0(\cdot) + \sum\_{i=1}^n f(X\_i, \mathcal{U}\_i)(\cdot)}{\mu\_0(\mathbb{X}) + \sum\_{i=1}^n f(X\_i, \mathcal{U}\_i)(\mathbb{X})}. \tag{15}$$

Thus, the sequence (*Un*)*n*≥<sup>1</sup> models the additional randomness in the reinforcement measure *R*. Janson [9] obtains a rather similar result; Theorem 1.3 in [9] states that any MVPP (*μn*)*n*≥<sup>0</sup> can be coupled with a deterministic MVPP (*μ*¯*n*)*n*≥<sup>0</sup> on X × [0, 1] in the sense that

$$
\vec{\mu}\_n = \mu\_n \times \lambda\_\prime \tag{16}
$$

where *λ* is the Lebesgue measure on [0, 1], and *μ<sup>n</sup>* × *λ* is the product measure on X × [0, 1]. In our case, the MVPP defined by *μ*¯0 = *μ*<sup>0</sup> × *λ* and, for *n* ≥ 1,

$$
\bar{\mu}\_n = \bar{\mu}\_{n-1} + f(X\_n, \mathcal{U}\_n) \times \lambda\_n
$$

has a non-random replacement rule *Rx*,*<sup>u</sup>* = *f*(*x*, *u*) × *λ* and satisfies (16) on a set of probability one.

## *2.2. Randomly Reinforced Pólya Processes*

It follows from (8) that any Pólya sequence generates a deterministic MVPP through

$$
\mu\_n = \mu\_{n-1} + \delta\_{X\_n}.
$$

Here, we consider a randomly reinforced extension of Pólya sequences in the form of an MVPP with replacement rule *Rx* = *W*(*x*) · *δx*, *x* ∈ X, where *W*(*x*) is a non-negative random variable.

**Definition 2** (Randomly Reinforced Pólya Process)**.** *We call an MVPP with parameters* (*μ*0, R) *a randomly reinforced Pólya process (RRPP) if there exists η* ∈ K*P*(X, R+) *such that* <sup>R</sup>*<sup>x</sup>* <sup>=</sup> *<sup>ξ</sup>*- *<sup>x</sup>*(*ηx*)*, x* ∈ X*, where ξ<sup>x</sup>* : R<sup>+</sup> → M*F*(X) *is the map w* → *wδx.*

Observe that, for RRPPs, the reinforcement measure *f*(*x*, *u*) in (14)–(15) concentrates its mass on *x*; thus, we obtain the following variant of the representation result in Theorem 1.

**Proposition 1.** *Let* (*μn*)*n*≥<sup>0</sup> *be an RRPP with parameters* (*μ*0, *η*)*. Then, there exist a measurable function <sup>h</sup>* : X × [0, 1] → R<sup>+</sup> *and a sequence* ((*Xn*, *Un*))*n*≥<sup>1</sup> *such that, using Wn* = *<sup>h</sup>*(*Xn*, *Un*)*, we have for every n* ≥ 1 *that*

$$
\mu\_{\rm n} = \mu\_{\rm n-1} + \mathcal{W}\_{\rm n} \delta\_{X\_{\rm n}} \quad \text{a.s.} \tag{17}
$$

*where X*<sup>1</sup> ∼ *μ* <sup>0</sup> *and, for n* ≥ 1*, Un* ∼ Unif[0, 1]*, Un* ⊥ (*X*1, *U*1,..., *Xn*−1, *Un*−1, *Xn*)*, and*

$$\mathbb{P}(X\_{n+1}\in\cdot \mid X\_1, \mathbb{W}\_1, \dots, X\_n, \mathbb{W}\_n) = \sum\_{i=1}^n \frac{\mathbb{W}\_i}{\mu\_0(\mathbb{X}) + \sum\_{j=1}^n \mathbb{W}\_j} \delta\_{X\_i}(\cdot) + \frac{\mu\_0(\mathbb{X})}{\mu\_0(\mathbb{X}) + \sum\_{j=1}^n \mathbb{W}\_j} \mu\_0'(\cdot). \tag{18}$$

*Moreover,*

$$\mathbb{P}(\mathcal{W}\_n \in \cdot \mid X\_1, \mathcal{W}\_1, \dots, X\_{n-1}, \mathcal{W}\_{n-1}, X\_n) = \eta\_{X\_n}(\cdot). \tag{19}$$

It follows from (19) that *W*(*x*) ≡ *h*(*x*, *U*) ∼ *ηx*, *x* ∈ X, whenever *U* ∼ Unif[0, 1]. Then, the random measure

$$R\_x = \mathcal{W}(\mathbf{x}) \cdot \delta\_\mathbf{x} \tag{20}$$

is such that *Rx* ∼ R*x*, where R*<sup>x</sup>* appears in Definition 2.

## **3. Asymptotic Properties of RRPP**

In this section, we study the asymptotic properties of RRPPs through the sequence (*Xn*)*n*≥<sup>1</sup> in the representation (17). We show that the limit behavior of (*μn*)*n*≥<sup>0</sup> depends on the relationship between weights and observations. In particular, when *W*(*x*) ≡ *W* in (20) is constant with respect to the color *x*, the process (*Xn*)*n*≥<sup>1</sup> is conditionally identically distributed (c.i.d.) and, for every *A* ∈ X , the normalized sequence (*μ <sup>n</sup>*(*A*))*n*≥<sup>0</sup> is a bounded martingale. We consider the c.i.d. case in Section 3.3. In contrast, if some colors *x* have a higher expected reinforcement, then they tend to dominate the observation process and, as *n* grows to infinity, the probability measure *μ <sup>n</sup>* concentrates its mass on the subset of dominant colors, see Theorem 2.

### *3.1. Preliminaries*

Our focus is on the convergence of the normalized sequence (*μ <sup>n</sup>*)*n*≥0, which by Theorem 1 is a.s. equal to the predictive distributions (18). We also consider the sequence of empirical frequencies of (*Xn*)*n*≥1, defined for *n* ≥ 1 by

$$
\hat{\mu}'\_n = \frac{1}{n} \sum\_{i=1}^n \delta\_{X\_i}.
$$

We obtain conditions under which the convergence in (2) extends to convergence in total variation, where the total variation distance between any two probability measures *α*, *β* ∈ M*P*(X) is given by

$$d\_{TV}(\mathfrak{a}, \beta) = \sup\_{B \in \mathcal{X}} |\mathfrak{a}(B) - \beta(B)|.$$

To state some of the results, we recall the definition of support of a probability measure *γ* ∈ M*P*(R+),

$$\text{supp}(\gamma) = \{ \mu \ge 0 : \gamma((\mu - \epsilon, \mu + \epsilon)) > 0, \forall \epsilon > 0 \}.$$

Of particular interest is the conditional probability of observing a new color, given by

$$\theta\_n \equiv \mathbb{P}(X\_{n+1} \notin \{X\_1, \dots, X\_n\} \mid X\_1, \mathcal{W}\_1, \dots, X\_n, \mathcal{W}\_n) \\ = \frac{\theta}{\theta + \sum\_{j=1}^n \mathcal{W}\_j} \mu\_0'(\{X\_1, \dots, X\_n\}^c),$$

for *n* ≥ 1, where *θ* = *μ*0(X). This would inform us on the number of distinct values in a sample (*X*1,..., *Xn*) of size *n*,

$$L\_n = \max\{k \in \{1, \dots, n\} : \mathcal{X}\_k \notin \{\mathcal{X}\_1, \dots, \mathcal{X}\_{k-1}\}\}\_{\mathsf{A}}$$

since *θ<sup>n</sup>* = P(*Ln*+<sup>1</sup> = *Ln* + 1|*X*1, *W*1,..., *Xn*, *Wn*).

The following modes of convergence are used when we investigate the rate of convergence of the distance between *μ <sup>n</sup>* and *μ*ˆ*n*.

*Almost sure (a.s.) conditional convergence*. Let <sup>G</sup> = (G*n*)*n*≥<sup>0</sup> be a filtration and *<sup>Q</sup>*˜ <sup>∈</sup> <sup>K</sup>*P*(Ω, <sup>X</sup>). A sequence (*Yn*)*n*≥<sup>1</sup> is said to converge to *<sup>Q</sup>*˜ in the sense of a.s. conditional convergence w.r.t. G if the conditional distribution of *Yn*, given G*n*, converges weakly on a set of probability one to *<sup>Q</sup>*˜, that is, as *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>,

$$\mathbb{P}(\mathbf{Y}\_n \in \cdot \mid \mathcal{G}\_n) \stackrel{w}{\longrightarrow} \tilde{Q}(\cdot) \quad \text{a.s.}$$

We refer to [22] for more details.

*Stable convergence*. Stable convergence is a strong form of convergence in distribution, albeit weaker than a.s. conditional convergence. A sequence (*Yn*)*n*≥<sup>1</sup> is said to converge stably to *Q*˜ if

$$\mathbb{E}\left[Vf(\mathbf{Y}\_{\mathsf{H}})\right] \longrightarrow \mathbb{E}\left[V\int\_{\mathbb{X}} f(\mathbf{x})\vec{\mathsf{Q}}(d\mathbf{x})\right],$$

for all continuous bounded functions *f* and any integrable random variable *V*. The main application of stable convergence is in central limit theorems that allow for mixing variables in the limit. See [26] for a complete reference on stable convergence.

In the sequel, the stable and a.s. conditional limits will be some Gaussian law, which we denote by <sup>N</sup> (*μ*, *<sup>σ</sup>*2) for parameters (*μ*, *<sup>σ</sup>*2), where <sup>N</sup> (*μ*, 0) = *δμ*.

## *3.2. RRPP with Dominant Colors*

Using (20), let us define, for *x* ∈ X,

$$w(\mathbf{x}) = \mathbb{E}[W(\mathbf{x})] \qquad \text{and} \qquad \vec{w} = \sup\_{\mathbf{x} \in \mathbb{X}} w(\mathbf{x}).$$

We further let

$$\mathcal{D} = \{ \mathfrak{x} \in \mathbb{X} : w(\mathfrak{x}) = \mathfrak{w} \},$$

be the set of dominant colors. The model (18) with D ⊂ X has been studied by [23] under the assumption that *w*¯ is strictly greater than the next largest value of *w*(·) in the support of *w*-(*μ* <sup>0</sup>). Then, the probability of observing a non-dominant color, *<sup>x</sup>* ∈ D*c*, vanishes, and the predictive and the empirical distributions converge in total variation to a common random probability measure, which is concentrated on D. For completeness reasons, we report here the main results from [23].

**Theorem 2** ([23], Theorem 3.3)**.** *For any RRPP* (*μn*)*n*≥<sup>0</sup> *that satisfies*

$$\begin{aligned} \mathcal{W}(\mathbf{x}) &\le \not\!\!\!/ < \infty; \\ \#\psi &\in \text{supp}(w^\sharp(\mu\_0')); \\ \bar{w} &> \bar{w}^\sharp \equiv \text{supp}\{\mu \ge 0 : u \in \text{supp}(w^\sharp(\mu\_0'(\cdot|\mathcal{D}^\xi)))\}, \end{aligned} \tag{21}$$

*there exists a random probability measure P on* ˜ <sup>X</sup> *with <sup>P</sup>*˜(D) = <sup>1</sup> *a.s. such that*

$$d\_{TV}(\mu'\_{n\prime}, \mathcal{P}) \stackrel{a.s.}{\longrightarrow} 0 \qquad \text{and} \qquad d\_{TV}(\mathfrak{h}'\_{n\prime}, \mathcal{P}) \stackrel{a.s.}{\longrightarrow} 0.$$

Under conditions (21), Theorem 3.3 in [23] implies ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Wi*/*<sup>n</sup> <sup>a</sup>*.*s*. −→ *<sup>w</sup>*¯. If *<sup>μ</sup>*<sup>0</sup> is further diffuse, then ∑<sup>∞</sup> *<sup>n</sup>*=<sup>1</sup> *<sup>θ</sup><sup>n</sup>* <sup>=</sup> <sup>∞</sup> a.s., and so *Ln <sup>a</sup>*.*s*. −→ <sup>∞</sup> by Theorem 1 in [27]; thus, by Theorem 1 in [27], Proposition 3.4 in [23] shows that the actual growth rate is that of a Pólya sequence,

$$\frac{L\_n}{\log n} \stackrel{a.s.}{\longrightarrow} \frac{\theta}{\pi \overline{\theta}}.\tag{22}$$

In addition to the uniform convergence in Theorem 2, the authors in [23] obtain set-wise rates of convergence. To state their result, we introduce, for any *A* ∈ X ,

$$q\_A = \lim\_{n \to \infty} \mathbb{E}[\mathsf{W}\_{n+1}^2 \delta\_{X\_{n+1}}(A) | X\_1, W\_1, \dots, X\_{n\_\prime}, W\_{\mathfrak{n}}]\_{\mathsf{s}}$$

which exists a.s. under the assumptions of Theorem 2.

**Theorem 3** ([23], Theorem 4.2)**.** *Let* (*μn*)*n*≥<sup>0</sup> *be an RRPP satisfying* (21)*. Suppose w*¯ > 2*w*¯ *c. Define*

$$V(A) = \frac{1}{i\mathcal{\bar{b}}^2} \left\{ (\tilde{P}(A^\varepsilon))^2 q\_A + (\tilde{P}(A))^2 q\_{A^\varepsilon} \right\} \quad \text{and} \quad \mathcal{U}(A) = V(A) - \tilde{P}(A)\tilde{P}(A^\varepsilon).$$

*Then,*

$$\sqrt{n}\left(\mu'\_n(A) - \hat{\mu}'\_n(A)\right) \stackrel{stability}{\longrightarrow} \mathcal{N}(0, \mathcal{U}(A))\_{\prime\prime}$$

*and*

$$\sqrt{n}\left(\mu\_n'(A) - \mathcal{P}(A)\right) \stackrel{a.s.c.q.d.}{\longrightarrow} \mathcal{N}(0, V(A)) \qquad w.r.t. (\mathcal{F}\_n^{X,W})\_{n\geq 1\prime}$$

*where* <sup>F</sup> *<sup>X</sup>*,*<sup>W</sup> <sup>n</sup>* <sup>=</sup> *<sup>σ</sup>*(*X*1, *<sup>W</sup>*1,..., *Xn*, *Wn*)*, n* <sup>≥</sup> <sup>1</sup> *is the filtration generated by* ((*Xn*, *Wn*))*n*≥1*.*

## *3.3. RRPP with Independent Weights*

Let (*μn*)*n*≥<sup>0</sup> be an RRPP with reinforcement distribution *η<sup>x</sup>* ≡ *η* that does not depend on *x*. Using the notation of Section 3.2, we have

$$w(\mathbf{x}) \equiv \emptyset,\tag{23}$$

and, thus, D = X. An equivalent formulation can be given in terms of the sequence of weights (*Wn*)*n*≥<sup>1</sup> in Proposition 1, whereby

$$\mathcal{W}\_{\text{ll}} = h(\mathcal{U}\_{\text{ll}}),\tag{24}$$

for some measurable function *h*, with *Un* ⊥ (*X*1, *U*1, ... , *Xn*−1, *Un*−1, *Xn*) and *Un* ∼ Unif[0, 1]. Then, *Wn <sup>i</sup>*.*i*.*d*. <sup>∼</sup> *<sup>η</sup>* and *Wn* <sup>⊥</sup> (*X*1,..., *Xn*), which implies that <sup>E</sup>[*W*1] = *<sup>w</sup>*¯.

The model (18) with weights (24) has been studied by [18,19,22], among others, where the authors obtain central limit theorems and study the growth rate of *Ln* when *w*¯ < ∞.

Their results rely on the fact that (*Xn*)*n*≥<sup>1</sup> is conditionally identically distributed (c.i.d.) with respect to the filtration generated by ((*Xn*, *Wn*))*n*≥1. By [21], an X-valued random sequence (*Yn*)*n*≥<sup>1</sup> that is adapted to a filtration (F*n*)*n*≥<sup>1</sup> is said to be c.i.d. with respect to (F*n*)*n*≥<sup>1</sup> if and only if (*Yn*)*n*≥<sup>1</sup> is identically distributed and, for every *n*, *k* ≥ 1,

$$\mathbb{P}(\boldsymbol{\Upsilon}\_{n+k} \in \cdot \mid \mathcal{F}\_n) = \mathbb{P}(\boldsymbol{\Upsilon}\_{n+1} \in \cdot \mid \mathcal{F}\_n). \tag{25}$$

**Proposition 2** ([19], Lemma 6)**.** *For any RRPP* (*μn*)*n*≥<sup>0</sup> *with η<sup>x</sup>* ≡ *η, the observation process* (*Xn*)*n*≥<sup>1</sup> *is c.i.d. with respect to the filtration generated by* ((*Xn*, *Wn*))*n*≥1*.*

C.i.d. processes preserve many of the properties of exchangeable sequences, see [21]. For example, if (*Yn*)*n*≥<sup>1</sup> is c.i.d., then there exists a random probability measure such that (2)–(3) hold true with respect to the filtration used in the definition (25). It follows for the model in Proposition <sup>2</sup> that there exists *<sup>P</sup>*˜ <sup>∈</sup> <sup>K</sup>*P*(Ω, <sup>X</sup>) such that, for every *<sup>A</sup>* ∈ X ,

$$
\mu'\_n(A) \stackrel{as}{\longrightarrow} \mathcal{P}(A).
$$

In fact, by (25), the sequence (*μ <sup>n</sup>*(*A*))*n*≥<sup>0</sup> is a bounded martingale. On the other hand, (23) implies that D = X; therefore, any RRPP with *η<sup>x</sup>* ≡ *η* whose weights are bounded, *W*<sup>1</sup> ≤ *β* < ∞, satisfies the assumptions of Theorem 2. In that case,

$$d\_{TV}(\mu'\_{n\prime}, \mathbb{P}) \stackrel{as.}{\longrightarrow} \mathbb{P}\_{-}$$

It follows from Theorem 4.2 in [23] that the boundedness condition in (21) is needed to show that (*i*) ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Wi*/*<sup>n</sup> <sup>a</sup>*.*s*. −→ *<sup>w</sup>*¯; and (*ii*) *<sup>μ</sup> <sup>n</sup>* converge set-wise to *P*˜, which is non-trivial in that setting. Here, (*i*) is granted as (*Wn*)*n*≥<sup>1</sup> is i.i.d., and (*ii*) has already been established; thus, we obtain the following result for RRPPs with independent weights.

**Theorem 4.** *For any RRPP* (*μn*)*n*≥<sup>0</sup> *with <sup>η</sup><sup>x</sup>* <sup>≡</sup> *<sup>η</sup>, there exists a random probability measure <sup>P</sup>*˜ *on* X *such that*

$$d\_{TV}(\mu\_{n'}'\not\mathbb{P}) \stackrel{a.s.}{\longrightarrow} 0 \qquad \text{and} \qquad d\_{TV}(\sharp\_{n'}'\not\mathbb{P}) \stackrel{a.s.}{\longrightarrow} 0.$$

**Proof.** Let ((*Xn*, *Wn*))*n*≥<sup>1</sup> be the joint observation process associated to (*μn*)*n*≥<sup>0</sup> by Proposition 1. As *<sup>η</sup><sup>x</sup>* <sup>≡</sup> *<sup>η</sup>*, Equation (19) implies that *Wn <sup>i</sup>*.*i*.*d*. <sup>∼</sup> *<sup>η</sup>*; thus, by the strong law of large numbers,

$$\frac{1}{n}\sum\_{i=1}^{n}\mathcal{W}\_{i}\stackrel{a.s.}{\longrightarrow}\vec{w}\leq\infty.\tag{26}$$

Let us define, for *n* ≥ 1,

$$P\_n(\cdot) = \mathbb{P}(X\_{n+1} \in \cdot \mid \mathcal{F}\_n^{X, \mathcal{W}}), \qquad \text{where} \quad \mathcal{F}\_n^{X, \mathcal{W}} = \sigma(X\_1, \mathcal{W}\_1, \dots, X\_n, \mathcal{W}\_n).$$

By Proposition 2, (*Xn*)*n*≥<sup>1</sup> is c.i.d. with respect to (<sup>F</sup> *<sup>X</sup>*,*<sup>W</sup> <sup>n</sup>* )*n*≥1, so there exists by Lemmas 2.1 and 2.4 in [21] a random probability measure *P*˜ on X such that, for every *A* ∈ X ,

$$P\_n(A) \stackrel{a.s.}{\longrightarrow} \vec{P}(A). \tag{27}$$

Moreover, <sup>X</sup> *f*(*x*)*Pn*(*dx*) = E[ <sup>X</sup> *<sup>f</sup>*(*x*)*P*˜(*dx*)|F *<sup>X</sup>*,*<sup>W</sup> <sup>n</sup>* ] a.s. for every bounded measurable *f* : X → R. Fix *m* ≥ 1. By a monotone class argument, we can show that, for every bounded measurable *<sup>f</sup>* : <sup>X</sup><sup>2</sup> <sup>→</sup> <sup>R</sup>,

$$\int\_{\mathcal{X}} f(X\_m, \mathbf{x}) P\_n(d\mathbf{x}) = \mathbb{E}[\int\_{\mathcal{X}} f(X\_m, \mathbf{x}) \mathcal{P}(d\mathbf{x}) \mid \mathcal{F}\_n^{X, W}] \qquad \text{a.s., for all } n > m;$$

thus, *Pn*({*Xm*}) = <sup>E</sup>[*P*˜({*Xm*})|F *<sup>X</sup>*,*<sup>W</sup> <sup>n</sup>* ] a.s., and so (*Pn*({*Xm*}))*n*>*<sup>m</sup>* is a uniformly integrable martingale. It follows from martingale convergence that, as *n* → ∞,

$$P\_n(\{X\_m\}) \stackrel{a.s.}{\longrightarrow} P(\{X\_m\}).\tag{28}$$

Using (26)–(28), we can repeat the argument in the proof of Proposition 3.1 in [23] to show that (*i*) *dTV*(*Pn*, *<sup>P</sup>*˜) *<sup>a</sup>*.*s*. −→ 0, and so *dTV*(*<sup>μ</sup> <sup>n</sup>*, *<sup>P</sup>*˜) *<sup>a</sup>*.*s*. −→ 0 by Proposition 1; and (*ii*) *dTV*(*μ*ˆ *<sup>n</sup>*, *<sup>P</sup>*˜) *<sup>a</sup>*.*s*. −→ 0.

Equation (26) implies that *<sup>θ</sup><sup>n</sup> <sup>a</sup>*.*s*. −→ 0. If, in addition, *<sup>w</sup>*¯ <sup>&</sup>lt; <sup>∞</sup>, then <sup>∑</sup><sup>∞</sup> *<sup>n</sup>*=<sup>1</sup> *θ<sup>n</sup>* = ∞ a.s. and *Ln <sup>a</sup>*.*s*. −→ <sup>∞</sup>. In fact, as long as *<sup>w</sup>*¯ <sup>&</sup>lt; <sup>∞</sup>, the sequence (*Ln*)*n*≥<sup>1</sup> grows at the same rate as (22).

**Proposition 3** ([18], Lemma 6)**.** *Let η* ∈ M*P*(X) *and μ*<sup>0</sup> *be diffuse. If w*¯ < ∞*, then*

$$\frac{L\_n}{\log n} \xrightarrow{as.} \frac{\theta}{\psi}.$$

If *w*¯ = ∞, then *θ<sup>n</sup>* may approach zero fast enough that we stop seeing new observations as *n* → ∞. For example, let us consider random reinforcement with a totally skewed stable distribution *Sα*(1, *σ*, 0) for *α* ∈ (0, 2] and *σ* > 0. If *α* < 1, then *w*¯ = ∞, and we show that *n*1/*αθ<sup>n</sup>* is stochastically bounded, which implies that *Ln* converges to a finite limit.

**Proposition 4.** *Let η be a Sα*(1, *σ*, 0) *distribution with stability parameter α* < 1*, and μ*<sup>0</sup> *be diffuse. Then, θ<sup>n</sup>* = *Op*(*n*−1/*α*) *and*

$$\lim\_{n \to \infty} L\_n < \infty \quad a.s.$$

**Proof.** From the properties of stable distributions, we obtain *n*−1/*<sup>α</sup>* ∑*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Wi d* = *W*<sup>1</sup> for every *n* ≥ 1 and, as a consequence,

$$\theta\_n = n^{-1/a} \frac{\theta}{n^{-1/a}\theta + n^{-1/a}\sum\_{i=1}^n W\_i} \stackrel{d}{=} n^{-1/a} \frac{\theta}{n^{-1/a}\theta + W\_1} \le n^{-1/a} \frac{\theta}{W\_1}.$$

By Theorem 5.4.1 in [28], E[1/*W*1] < ∞, and so 1/*W*<sup>1</sup> < ∞ a.s. It follows for every *M* > 0 that <sup>P</sup>(*n*1/*αθ<sup>n</sup>* <sup>&</sup>gt; *<sup>M</sup>*) <sup>≤</sup> <sup>P</sup>(*θ*/*W*<sup>1</sup> <sup>&</sup>gt; *<sup>M</sup>*), which can be made arbitrarily small by taking *<sup>M</sup>* large enough. Regarding the second assertion, as 1/*α* > 1, we have

$$\mathbb{E}[\lim\_{n\to\infty} L\_n] = \lim\_{n\to\infty} \sum\_{i=1}^n \mathbb{E}[\mathbb{1}\_{\{L\_i = L\_{i-1} + 1\}}] = \sum\_{n=1}^\infty \mathbb{E}[\theta\_n] \le \sum\_{n=1}^\infty \frac{\theta}{n^{1/a}} \mathbb{E}[1/W\_1] < \infty.$$

Proposition 4 can be extended for any fat tailed reinforcement distribution *η* by means of a generalized central limit theorem (see, e.g., [28] (p. 62)).

The rate of convergence of (18) and *μ*ˆ *<sup>n</sup>* has already been studied for the model with independent weights under different assumptions, see, e.g., [19] (p. 1363), Examples 4.2 and 4.5 in the technical report to [18], Corollary 4.1 in [22] for X = {0, 1}. In the next theorem, we combine ideas from [18,20] to give a fairly general result.

**Theorem 5.** *Let <sup>η</sup>* <sup>∈</sup> <sup>M</sup>*P*(R+)*. If* <sup>E</sup>[*W*<sup>2</sup> <sup>1</sup> ] < ∞*, then*

$$\sqrt{n}(\mu\_n'(A) - \mu\_n(A)) \stackrel{\text{stably}}{\longrightarrow} \mathcal{N}(0, \mathcal{U}(A)), \qquad \text{where} \quad \mathcal{U}(A) = \frac{\text{Var}(\mathcal{W}\_1)}{\mathbb{E}[\mathcal{W}\_1^2]} \bar{P}(A)\bar{P}(A^c). \tag{29}$$

*If, in addition,* E[*W*<sup>4</sup> <sup>1</sup> ] < ∞*, then, with respect to the filtration generated by* ((*Xn*, *Wn*))*n*≥1*,*

$$\sqrt{n}(\mu'\_n(A) - \tilde{P}(A)) \stackrel{a.s.cond.}{\longrightarrow} \mathcal{N}(0, V(A)), \qquad \text{where} \quad V(A) = \frac{\mathbb{E}[\mathcal{W}^2\_1]}{\mathfrak{a}^2} \tilde{P}(A)\tilde{P}(A^\epsilon). \tag{30}$$

**Proof.** Let us define, for *n* ≥ 1,

$$P\_{\mathbb{N}}(\cdot) = \mathbb{P}(X\_{\mathbb{n}+1} \in \cdot \mid X\_{1}, \mathcal{W}\_{1}, \dots, X\_{\mathbb{n}}, \mathcal{W}\_{\mathbb{n}}) .$$

The assertions in Theorem 5 have already been established by [18] when *W*<sup>1</sup> ≥ *γ* > 0. In that case, Examples 4.2 and 4.5 in the technical report to [18] show that (29) is a consequence of the fact that

$$\mathbb{E}\left[\max\_{1\le k\le n}|\mathcal{Y}\_{n,k}|\right] \longrightarrow 0 \qquad\text{and}\qquad \sum\_{k=1}^{n} \mathcal{Y}\_{n,k}^{2} \stackrel{p}{\longrightarrow} \mathcal{U}(A),\tag{31}$$

where *Yn*,*<sup>k</sup>* = <sup>√</sup><sup>1</sup> *n <sup>δ</sup>Xk* (*A*) − *kPk*(*A*)+(*<sup>k</sup>* − <sup>1</sup>)*Pk*−1(*A*) , and (30) follows from

$$\mathbb{E}\left[\sup\_{n\geq 1}\sqrt{n}|P\_{n-1}(A)-P\_n(A)|\right] < \infty \qquad \text{and} \qquad n\sum\_{k\geq n} \left(P\_{k-1}(A)-P\_k(A)\right)^2 \xrightarrow{a.s.} V(A).$$

Replicating the approach of Proposition 9 in [20], we avoid using the assumption *<sup>W</sup>*<sup>1</sup> <sup>≥</sup> *<sup>γ</sup>* <sup>&</sup>gt; 0 by conditioning on the sets *Hn* <sup>=</sup> {<sup>2</sup> <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *Wi* ≥ *nw*¯ }, *n* ≥ 1. By (26), 11*Hn <sup>a</sup>*.*s*. −→ 1, so (29) follows from (31) with

$$\mathcal{Y}\_{n,k} = \frac{1}{\sqrt{n!}} \mathbf{1}\_{H\_{k-1}} \{ \delta\_{X\_k}(A) - kP\_k(A) + (k-1)P\_{k-1}(A) \},$$

whereas (30) is, ultimately, a result of

$$\mathbb{E}\left[\sup\_{n\geq 1}\sqrt{n}\cdot\mathbb{1}\_{H\_n}|P\_{n-1}(A)-P\_n(A)|\right]<\infty \quad \text{and} \quad n\sum\_{k\geq n}\left(P\_{k-1}(A)-P\_k(A)\right)^2\xrightarrow{a.s.}V(A).$$

$$\square$$

## **4. Generalized Measure-Valued Pólya Urn Processes**

The definition of an MVPP assumes that the law of the reinforcement R is fixed, yet, in some situations, R can itself be random (e.g., RRPP with exchangeable weights, see Section 4.1). To avoid measurability issues, we assume a parametric model for R, with the parameter taking values in a Polish space V.

**Definition 3** (Generalized Measure-Valued Pólya Urn Process)**.** *Let V be a* V*-valued random variable. A sequence* (*μn*)*n*≥<sup>0</sup> *of random finite measure on* X *is called a generalized measurevalued Pólya urn process (GMVPP) with uncertainty parameter V, initial state μ*<sup>0</sup> ∈ M<sup>∗</sup> *<sup>F</sup>*(X) *and replacement rule* R ∈ <sup>K</sup>*P*(<sup>V</sup> <sup>×</sup> <sup>X</sup>, <sup>M</sup>*F*(X)) *if <sup>μ</sup>*<sup>1</sup> <sup>|</sup> *<sup>V</sup>* <sup>∼</sup> *<sup>R</sup>*<sup>ˆ</sup> *<sup>V</sup> <sup>μ</sup>*<sup>0</sup> *, and, for every n* ≥ 2*,*

$$\mathbb{P}(\mu\_n \in \cdot \mid V, \mu\_1, \dots, \mu\_{n-1}) = \mathbb{R}^{V}\_{\mu\_{n-1}}(\cdot)\_\prime$$

*where R is the transition probability kernel from* <sup>ˆ</sup> <sup>V</sup> <sup>×</sup> <sup>M</sup><sup>∗</sup> *<sup>F</sup>*(X) *to* M<sup>∗</sup> *<sup>F</sup>*(X) *given by*

$$(v, \mu) \mapsto \mathcal{R}^v\_{\mu}(\cdot) = \int\_{\mathbb{X}} \psi^\sharp\_{\mu}(\mathcal{R}(v, \mathfrak{x}))(\cdot) \mu'(d\mathfrak{x}),$$

*and ψμ is the map ν* → *ν* + *μ.*

It follows from Definition 3 that any GMVPP is a mixture of Markov chains with initial state *μ*<sup>0</sup> and transition kernel *R*ˆ *<sup>V</sup>*. A separate modeling approach, which we do not examine here, defines a measure-valued Markov chain with transition kernel

$$
\mu \mapsto \int\_{\mathfrak{X}} \psi\_{\mu}^{\sharp}(\mathcal{R}(\mu, \mathfrak{x})) (\cdot) \mu'(dx).
$$

In fact, some of the predictive constructions in [11,29] can be framed in such a way.

Theorem 1 extends to GMVPPs, provided that we condition all quantities on the parameter *V*. As a consequence, there exists a measurable function *f* from V × X × [0, 1] to M*F*(X) and a random sequence ((*Xn*, *Un*))*n*≥<sup>1</sup> such that

$$
\mu\_n = \mu\_{n-1} + f(V, X\_{n\prime} \mathcal{U}\_n) \quad \text{a.s.} \tag{32}
$$

where *Un* ∼ Unif[0, 1], *Un* ⊥ (*V*, *X*1, *U*1, ... , *Xn*−1, *Un*−1, *Xn*), *X*<sup>1</sup> | (*V*,(*Um*)*m*≥1) ∼ *μ* 0, and, for *n* ≥ 1,

$$\mathbb{P}(X\_{n+1}\in\cdot \mid V, X\_1, \dots, X\_{n\prime} \left(\mathcal{U}\_{\mathbf{m}}\right)\_{\mathbf{m}\geq 1}) = \frac{\mu\_0(\cdot) + \sum\_{i=1}^n f(V, X\_i, \mathcal{U}\_i)(\cdot)}{\mu\_0(\mathcal{X}) + \sum\_{i=1}^n f(V, X\_i, \mathcal{U}\_i)(\mathcal{X})},\tag{33}$$

and

$$\mathbb{P}(f(V, X\_{\mathbb{n}}, \mathbb{U}\_{\mathbb{n}}) \in \cdot \mid V, X\_{\mathbb{1}}, \mathbb{U}\_{1}, \dots, X\_{\mathbb{n}-1}, \mathbb{U}\_{\mathbb{n}-1}, X\_{\mathbb{n}}) = \mathcal{R}(V, X\_{\mathbb{n}})(\cdot) \,. \tag{34}$$

The definition of a randomly reinforced Pólya process is similarly generalized to cover the case of a random reinforcement distribution *η*.

**Definition 4** (Generalized Randomly Reinforced Pólya Process)**.** *We call a GMVPP with parameters* (*V*, *μ*0, R) *a generalized randomly reinforced Pólya process (GRRPP) if there exists <sup>η</sup>* <sup>∈</sup> <sup>K</sup>*P*(<sup>V</sup> <sup>×</sup> <sup>X</sup>, <sup>R</sup>+) *such that* <sup>R</sup>(*v*, *<sup>x</sup>*) = *<sup>ξ</sup>*- *<sup>x</sup>*(*η*(*v*, *x*))*, where ξ<sup>x</sup>* : R<sup>+</sup> → M*F*(X) *is the map w* → *wδx.*

For GRRPPs, the function *f* in the representation (32)–(34) can be written as

$$f(\upsilon, \mathbf{x}, \mathfrak{u}) = h(\upsilon, \mathbf{x}, \mathfrak{u}) \cdot \delta\_{\mathfrak{x}, \mathfrak{u}}$$

where *h* is a measurable function from V × X × [0, 1] to R<sup>+</sup> such that *h*(*v*, *x*, *U*) ∼ *η*(*v*, *x*) for all *v* ∈ V and *x* ∈ X, whenever *U* ∼ Unif[0, 1]. Letting *Wn* = *h*(*V*, *Xn*, *Un*), we obtain

$$
\mu\_n = \mu\_{n-1} + W\_n \delta\_{X\_n} \quad \text{a.s.} \tag{35}
$$

where

$$\mathbb{P}(X\_{n+1}\in \cdot \mid V, X\_1, \dots, X\_{\mathfrak{n}\prime} (\mathcal{U}\_{\mathfrak{m}})\_{\mathfrak{m}\geq 1}) = \frac{\mu\_0(\cdot) + \sum\_{i=1}^{\mathfrak{n}} \mathcal{W}\_i \delta\_{X\_i}(\cdot)}{\mu\_0(\mathbb{X}) + \sum\_{i=1}^{\mathfrak{n}} \mathcal{W}\_i},\tag{36}$$

and

$$\mathbb{P}(\mathcal{W}\_n \in \cdot \mid V, X\_1, \mathcal{U}\_1, \dots, X\_{n-1}, \mathcal{U}\_{n-1}, X\_n) = \eta(V, X\_n)(\cdot). \tag{37}$$

The weights *Wn* in (36) allow us to incorporate additional information about the observations (*Xn*)*n*≥1. As an example, consider the problem of computer-based classification, where the output usually includes confidence scores, which reflect the software's confidence that the classifications are correct. In analyzing the number and dimension of the types already discovered, or the probability of detecting a new type, a typical procedure would take into account only those classifications whose confidence scores are above a certain threshold. Alternatively, we could adopt a Bayesian perspective and weigh each classification according to its confidence score. Denoting by ((*Xn*, *Wn*))*n*≥<sup>1</sup> the sequence of classifications and confidence scores, we would model the distribution of the next classification by (36).

## *4.1. GRRPP with Exchangeable Weights*

Let (*μn*)*n*≥<sup>0</sup> be a GRRPP with reinforcement distribution *η*(*v*) that does not depend on *x*. Then,

$$\mathcal{W}\_n = h(V, \mathcal{U}\_n)\_{\prime\prime}$$

for some measurable function *h*(*v*, *u*). The next result shows that the sequence (*Wn*)*n*≥<sup>1</sup> is exchangeable with directing random measure *η*˜ ≡ *η*(*V*). Moreover, (*μn*)*n*≥<sup>0</sup> is completely parameterized by (*μ*0, *η*˜).

**Theorem 6.** *A sequence* (*μn*)*n*≥<sup>0</sup> *of random finite measures is a GRRPP with parameters* (*μ*0, *η*˜) *for η*˜ ∈ K*P*(Ω, R+) *if and only if μ*<sup>0</sup> = *θν and, for every n* ≥ 1*,*

$$
\mu\_n = \mu\_{n-1} + \mathcal{W}\_n \delta\_{X\_n} \quad a.s.,
$$

*where <sup>θ</sup>* ∈ (0, <sup>∞</sup>)*, <sup>ν</sup>* ∈ M*P*(X)*,* (*Wn*)*n*≥<sup>1</sup> *is an exchangeable process with directing random measure <sup>η</sup>*˜*, and* (*Xn*)*n*≥<sup>1</sup> *is a sequence of* X*-valued random variables such that <sup>X</sup>*<sup>1</sup> | (*Wk*)*k*≥<sup>1</sup> ∼ *<sup>ν</sup> and, for n* ≥ 1*,*

$$\mathbb{P}(X\_{n+1}\in \cdot \mid X\_1, \dots, X\_{n\prime} \left(W\_k\right)\_{k\geq 1}) = \sum\_{i=1}^n \frac{W\_i}{\theta + \sum\_{j=1}^n W\_j} \delta\_{X\_i}(\cdot) + \frac{\theta}{\theta + \sum\_{j=1}^n W\_j} \nu(\cdot). \tag{38}$$

**Proof.** Let (*μn*)*n*≥<sup>0</sup> be a GRRPP with parameters (*μ*0, *η*˜), and consider the representation (35)–(37). Put *θ* = *μ*0(X) and *ν* = *μ* <sup>0</sup>. It follows from (37) that

$$\mathsf{W}\_{n} \mid \vec{\eta} \stackrel{i.i.d.}{\sim} \vec{\eta};$$

thus, (*Wn*)*n*≥<sup>1</sup> is exchangeable. Moreover, *Wn* = *h*(*V*, *Un*), *n* ≥ 1, so (38) follows from (36). Conversely, suppose *μ<sup>n</sup>* = *μn*−<sup>1</sup> + *WnδXn* , where the process ((*Xn*, *Wn*))*n*≥<sup>1</sup> is as described. It follows from (38) and Theorem 8.12 in [25] that

$$(\mathcal{W}\_k)\_{k \ge 1} \perp X\_1 \quad \text{and} \quad (\mathcal{W}\_{n+k})\_{k \ge 1} \perp (X\_1, \dots, X\_{n+1}) \mid (\mathcal{W}\_1, \dots, \mathcal{W}\_n), \ n \ge 1. \tag{39}$$

Since (*Wn*)*n*≥<sup>1</sup> is exchangeable with directing random measure *η*˜, we have

$$\mathcal{W}\_{\rm II} \mid \tilde{\eta} \stackrel{\rm r.t.d.}{\sim} \tilde{\eta}.\tag{40}$$

Furthermore, *η*˜ is measurable with respect to the tail *σ*-field of (*Wn*)*n*≥1, so, by (39),

$$
\overline{\eta} \perp X\_1 \quad \text{and} \quad \overline{\eta} \perp (X\_1, \dots, X\_{n+1}) \mid (\mathcal{W}\_1, \dots, \mathcal{W}\_n), \ n \ge 1. \tag{41}
$$

Using (39)–(41), we can show that

$$\mathcal{W}\_1 \perp X\_1 \mid \overline{\eta} \quad \text{and} \quad \mathcal{W}\_{n+1} \perp (X\_1, \mathcal{W}\_{1\prime}, \dots, X\_n, \mathcal{W}\_n, X\_{n+1}) \mid \overline{\eta}, \ n \ge 1.$$

$$\text{Then, } \mathbb{P}(\mu\_1 \in \cdot \mid \tilde{\eta}) = \mathbb{P}(\mu\_0 + \mathcal{W}\_1 \delta\_{X\_1} \in \cdot \mid \tilde{\eta}) = \int\_X \psi\_{\mu\_0}^\sharp (\xi\_x^\sharp(\tilde{\eta}))(\cdot) \mu\_0'(dx) \text{ and, for } n \ge 2,$$

$$\begin{split} \mathbb{P}(\mu\_{n} \in \cdot \mid \vec{\eta}, \mu\_{1}, \dots, \mu\_{n-1}) \\ &= \mathbb{E}\left[\mathbb{P}(\mu\_{n-1} + \mathcal{W}\_{n}\delta\_{\mathcal{X}\_{n}} \in \cdot \mid \vec{\eta}, \mathcal{X}\_{1}, \dots, \mathcal{W}\_{n-1}, \mathcal{X}\_{n}) \, |\, \vec{\eta}, \mu\_{1}, \dots, \mu\_{n-1}\right] \\ &= \mathbb{E}\left[\mathbb{E}(\boldsymbol{\upmu}^{\sharp}\_{\mu\_{n-1}}(\mathcal{Z}^{\sharp}\_{\mathcal{X}\_{n}}(\vec{\eta}))(\cdot) \mid \mathcal{X}\_{1}, \dots, \mathcal{X}\_{n-1}, (\mathcal{W}\_{m})\_{m \geq 1}\right] \, |\, \vec{\eta}, \mu\_{1}, \dots, \mu\_{n-1}\| \\ &= \int\_{\mathcal{X}} \boldsymbol{\upmu}^{\sharp}\_{\mu\_{n-1}}(\mathcal{Z}^{\sharp}\_{\mathcal{X}}(\vec{\eta}))(\cdot) \boldsymbol{\upmu}^{\prime}\_{n-1}(dx). \end{split}$$

It follows from the proof of Theorem 6 that (*X*1, *W*1) ∼ *μ* <sup>0</sup> × E[*η*˜] and, for *n* ≥ 1,

$$\mathbb{P}\left(\left(X\_{n+1},\mathcal{W}\_{n+1}\right)\in\cdot \mid X\_1,\mathcal{W}\_1,\dots,X\_n,\mathcal{W}\_n\right) = \left(\mu'\_n \times \mathbb{E}[\bar{\eta}|\mathcal{W}\_1,\dots,\mathcal{W}\_n]\right)(\cdot). \tag{42}$$

As *μ <sup>n</sup>* and E[*η*˜|*W*1, ... , *Wn*] are both symmetric with respect to ((*X*1, *W*1), ... ,(*Xn*, *Wn*)), then (42) is a symmetric function of ((*X*1, *W*1), ... ,(*Xn*, *Wn*)). This is a necessary but not sufficient condition for ((*Xn*, *Wn*))*n*≥<sup>1</sup> to be exchangeable, see Proposition 3.2 and Example 3.1 in [2]. In Proposition 5, we show that ((*Xn*, *Wn*))*n*≥<sup>1</sup> is exchangeable if and only if either *μ* <sup>0</sup> is degenerate or the weights are a.s. identical. On the other hand, for every *n*, *k* ≥ 1, the sequence ((*Xn*, *Wn*))*n*≥<sup>1</sup> satisfies

$$\mathbb{P}(\mathcal{W}\_k \in \cdot \mid X\_1) = \mathbb{P}(\mathcal{W}\_1 \in \cdot \mid X\_1), \qquad \mathbb{P}(X\_k \in \cdot \mid \mathcal{W}\_1) = \mathbb{P}(X\_1 \in \cdot \mid \mathcal{W}\_1), \tag{43}$$

and

$$\begin{aligned} \mathbb{P}(\mathcal{W}\_{n+k} \in \cdot | \mathbf{X}\_1, \mathcal{W}\_1, \dots, \mathcal{X}\_n, \mathcal{W}\_{n\prime} \mathbf{X}\_{n+1}) &= \mathbb{P}(\mathcal{W}\_{n+1} \in \cdot | \mathbf{X}\_1, \mathcal{W}\_1, \dots, \mathcal{X}\_n, \mathcal{W}\_{n\prime} \mathbf{X}\_{n+1}), \\ \mathbb{P}(\mathbf{X}\_{n+k} \in \cdot | \mathbf{X}\_1, \mathcal{W}\_1, \dots, \mathcal{X}\_n, \mathcal{W}\_{n\prime} \mathcal{W}\_{n+1}) &= \mathbb{P}(\mathbf{X}\_{n+1} \in \cdot | \mathbf{X}\_1, \mathcal{W}\_1, \dots, \mathcal{X}\_n, \mathcal{W}\_{n\prime} \mathcal{W}\_{n+1}). \end{aligned} \tag{44}$$

By [24], Equations (43) and (44) are defining a process that is partially conditionally identity distributed (partially c.i.d.). Analogously to the c.i.d. case, partially c.i.d. processes preserve many of the properties of partially exchangeable sequences, see [24].

**Proposition 5.** *Under the conditions of Theorem 6,* ((*Xn*, *Wn*))*n*≥<sup>1</sup> *is partially c.i.d. Moreover,* ((*Xn*, *Wn*))*n*≥<sup>1</sup> *is exchangeable if and only if either μ* <sup>0</sup> *is degenerate or Wn* = *W*<sup>1</sup> *a.s., n* ≥ 1*. In that case,* ((*Xn*, *Wn*))*n*≥<sup>1</sup> *is partially exchangeable.*

**Proof.** It follows that ((*Xn*, *Wn*))*n*≥<sup>1</sup> is partially c.i.d. if and only if *<sup>X</sup>*<sup>2</sup> *<sup>d</sup>* <sup>=</sup> *<sup>X</sup>*<sup>1</sup> <sup>|</sup> *<sup>W</sup>*1, *<sup>W</sup>*<sup>2</sup> *<sup>d</sup>* = *W*<sup>1</sup> | *X*1, and (44) is true for every *n* ≥ 1 with *k* = 2. By hypothesis, (*Wn*)*n*≥<sup>1</sup> is exchangeable and (*Wn*)*n*≥<sup>1</sup> <sup>⊥</sup> *<sup>X</sup>*1, so *<sup>W</sup>*<sup>2</sup> *<sup>d</sup>* = *W*<sup>1</sup> | *X*1. Moreover, applying (39) repeatedly, we obtain

$$\begin{split} \mathbb{P}(\mathcal{W}\_{n+2} \in \cdot \mid \mathbf{X}\_{1}, \mathcal{W}\_{1}, \dots, \mathcal{X}\_{n}, \mathcal{W}\_{n}, \mathcal{X}\_{n+1}) \\ &= \mathbb{E}\left[\mathbb{P}(\mathcal{W}\_{n+2} \in \cdot \mid \mathbf{X}\_{1}, \dots, \mathcal{W}\_{n+1}, \mathcal{X}\_{n+2}) | \mathcal{X}\_{1}, \mathcal{W}\_{1}, \dots, \mathcal{X}\_{n}, \mathcal{W}\_{n}, \mathcal{X}\_{n+1}] \right] \\ &= \mathbb{E}\left[\mathbb{P}(\mathcal{W}\_{n+2} \in \cdot \mid \mathcal{W}\_{1}, \dots, \mathcal{W}\_{n+1}) | \mathcal{W}\_{1}, \dots, \mathcal{W}\_{n} \right] \\ &= \mathbb{P}(\mathcal{W}\_{n+1} \in \cdot \mid \mathcal{W}\_{1}, \dots, \mathcal{W}\_{n}) = \mathbb{P}(\mathcal{W}\_{n+1} \in \cdot \mid \mathcal{X}\_{1}, \mathcal{W}\_{1}, \dots, \mathcal{X}\_{n}, \mathcal{W}\_{n}, \mathcal{X}\_{n+1}). \end{split}$$

On the other hand, by (38),

$$\begin{split} \mathbb{P}(\mathbf{X}\_{n+2} \in \cdot \mid \mathbf{X}\_1, \mathbf{W}\_1, \dots, \mathbf{X}\_n, \mathbf{W}\_n, \mathbf{W}\_{n+1}) &= \mathbb{E}\left[\boldsymbol{\mu}\_{n+1}'(\cdot) \mid \mathbf{X}\_1, \mathbf{W}\_1, \dots, \mathbf{X}\_n, \mathbf{W}\_n, \mathbf{W}\_{n+1}\right] \\ &= \frac{\boldsymbol{\mu}\_n(\cdot) + \mathbf{W}\_{n+1} \cdot \boldsymbol{\mu}\_n'(\cdot)}{\boldsymbol{\mu}\_{n+1}(\mathbf{X})} = \boldsymbol{\mu}\_n'(\cdot) \\ &= \mathbb{P}(\mathbf{X}\_{n+1} \in \cdot \mid \mathbf{X}\_1, \mathbf{W}\_1, \dots, \mathbf{X}\_n, \mathbf{W}\_n, \mathbf{W}\_{n+1}). \end{split}$$

Analogously, P(*X*<sup>2</sup> ∈·| *W*1) = *μ*1(·) = P(*X*<sup>1</sup> ∈·| *W*1), which completes the proof of the first part.

If *μ* <sup>0</sup> is degenerate, then ((*Xn*, *Wn*))*n*≥<sup>1</sup> is trivially exchangeable. If *Wn* = *W*<sup>1</sup> a.s. instead, then one can show that ((*Xn*, *Wn*))*n*≥<sup>1</sup> satisfies condition (*b*) of Proposition 3.2 in [2], which, together with the symmetry of (42), implies by Theorem 3.1 in [2] that ((*Xn*, *Wn*))*n*≥<sup>1</sup> is exchangeable.

Conversely, suppose that ((*Xn*, *Wn*))*n*≥<sup>1</sup> is exchangeable. As ((*Xn*, *Wn*))*n*≥<sup>1</sup> is partially c.i.d., the predictive distributions (42) converge to a product random measure [24]. It follows from de Finetti's theorem that ((*Xn*, *Wn*))*n*≥<sup>1</sup> is partially exchangeable, so, in particular,

$$(X\_1, W\_1, X\_2, W\_2) \stackrel{d}{=} (X\_1, W\_2, X\_2, W\_1).$$

However, *W*<sup>2</sup> ⊥ *X*<sup>2</sup> | (*X*1, *W*1) from (36), so *W*<sup>1</sup> ⊥ *X*<sup>2</sup> | (*X*1, *W*2). Thus, for every bounded measurable function ˜ *f* , there exists a measurable function *g* ˜ *<sup>f</sup>* such that

$$\mathbb{E}[\tilde{f}(X\_2)|X\_1, W\_1, W\_2] = \mathbb{g}\_{\tilde{f}}(X\_1, W\_2) \quad \text{a.s.}$$

Integrating ˜ *f*(*X*2) with respect to (38) and rearranging the terms, we obtain

$$\mathcal{W}\_1(\tilde{f}(X\_1) - \mathcal{g}\_f(X\_1, \mathcal{W}\_2)) = \theta(\mathcal{g}\_f(X\_1, \mathcal{W}\_2) - \mathbb{E}[\tilde{f}(X\_1)]) \quad \text{a.s.}$$

Assume that *μ* <sup>0</sup> is non-degenerate. Then, there is an ˜ *f* such that P ˜ *f*(*X*1) = E[ ˜ *f*(*X*1)] = 0; e.g., take ˜ *f* = 11*<sup>B</sup>* for some *B* ∈ X such that 0 < P(*X*<sup>1</sup> ∈ *B*) < 1. It follows that

$$\begin{aligned} \mathbb{P}(\widetilde{f}(X\_1) = \mathcal{g}\_{\widetilde{f}}(X\_1, \mathcal{W}\_2) = 0) &= \mathbb{P}(\widetilde{f}(X\_1) = \mathbb{E}[\widetilde{f}(X\_2) | X\_1, \mathcal{W}\_1, \mathcal{W}\_2]) \\ &= \mathbb{P}(\widetilde{f}(X\_1) = \mathbb{E}[\widetilde{f}(X\_1)]) = 0; \end{aligned}$$

therefore,

$$W\_1 = \frac{\theta\left(\mathcal{g}\_f(X\_1, W\_2) - \mathbb{E}[\tilde{f}(X\_1)]\right)}{\tilde{f}(X\_1) - \mathcal{g}\_f(X\_1, W\_2)} \quad \text{a.s.}$$

In other words, there exists a measurable function ˜ *h* such that *W*<sup>1</sup> = ˜ *h*(*X*1, *W*2) a.s., and so *W*<sup>2</sup> = ˜ *h*(*X*1, *W*1) a.s. by partial exchangeability. It follows from *X*<sup>1</sup> ⊥ (*W*1, *W*2) that, for every *A* ∈ B(R+),

$$\mathbb{P}(\mathcal{W}\_2 \in A | \mathcal{W}\_1) = \mathbb{P}(\mathcal{W}\_2 \in A | X\_1, \mathcal{W}\_1) = \mathbb{1}\_A(\mathcal{W}\_2) \quad \text{a.s.} $$

thus, *W*<sup>2</sup> = *W*<sup>1</sup> a.s. and, from exchangeability, *Wn* = *W*<sup>1</sup> a.s., *n* ≥ 1.

#### *4.2. Asymptotic Properties of GRRPP with Exchangeable Weights*

It follows from (38) that the GRRPP with exchangeable weights is a mixture of RRPPs with independent weights, with the mixing distribution affecting only the sequence (*Wn*)*n*≥1. Thus, we expect that the results in Section 3.3 carry over to this more general setting. In this section, we concentrate on the behavior of *θ<sup>n</sup>* and the sequence (*Ln*)*n*≥1.

Assume that P(*W*<sup>1</sup> > 0|*η*˜) > 0. If E[*W*1] < ∞, then 0 < E[*W*1|*η*˜] < ∞ a.s., and, by the law of large numbers for exchangeable random variables (see [1], Section 2),

$$\frac{1}{m}\sum\_{i=1}^{n}\mathcal{W}\_{i} \stackrel{a.s.}{\longrightarrow} \mathbb{E}[\mathcal{W}\_{1}|\vec{\eta}] \in (0, +\infty).$$

Then, if *<sup>μ</sup>*<sup>0</sup> is diffuse, *<sup>n</sup>* · *<sup>θ</sup><sup>n</sup> <sup>a</sup>*.*s*. −→ *<sup>θ</sup>*/E[*W*1|*η*˜] and <sup>∑</sup>*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> *θ<sup>n</sup>* = ∞ a.s., so Theorem 1 in [27] implies

$$\frac{L\_n}{\log n} = \frac{L\_n}{\sum\_{k=1}^n \theta\_k} \left( \frac{1}{\log n} \sum\_{k=1}^n \frac{1}{k} (k \cdot \theta\_k) \right) \stackrel{a.s.}{\longrightarrow} \frac{\theta}{\mathbb{E}[\mathcal{W}\_1 | \vec{\eta}]}.$$

If E[*W*1] = ∞, then *Ln* may converge to a finite limit, as *n* → ∞. For example, let us consider a strictly stable reinforcement distribution as in Proposition 4.

**Proposition 6.** *Let* (*μn*)*n*≥<sup>0</sup> *be a GRRPP with parameters* (*V*, *μ*0, *η*) *such that V is a strictly positive random variable with* E[*V*−1] < ∞*, μ*<sup>0</sup> *is diffuse, and η*(*v*)*, v* > 0 *is a Sα*(1, *v*, 0) *distribution with stability parameter α* < 1*. Then, θ<sup>n</sup>* = *OP*(*n*−1/*α*) *and*

$$\lim\_{n \to \infty} L\_{\text{ll}} < \infty \quad a.s.$$

**Proof.** It follows from how the weights in the representation (35) are chosen that we can take

$$\mathcal{W}\_n = V F^{-1}(\mathcal{U}\_n)\_{\prime\prime}$$

where *Un* <sup>∼</sup> Unif[0, 1], *Un* <sup>⊥</sup> (*V*, *<sup>X</sup>*1, *<sup>U</sup>*1, ... , *Xn*−1, *Un*−1, *Xn*), and *<sup>F</sup>*−<sup>1</sup> is the inverse of the *Sα*(1, 1, 0) distribution function. Then,

$$\theta\_n = \frac{\theta}{\theta + \sum\_{i=1}^n W\_i} \le n^{-1/a} \frac{\theta}{V n^{-1/a} \sum\_{i=1}^n F^{-1}(U\_i)} \stackrel{d}{=} n^{-1/a} \frac{\theta}{V Y} Y$$

for some *<sup>Y</sup>* <sup>∼</sup> *<sup>S</sup>α*(1, 1, 0) such that *<sup>Y</sup>* <sup>⊥</sup> *<sup>V</sup>*. It follows for every *<sup>M</sup>* <sup>&</sup>gt; 0 that <sup>P</sup>(*n*1/*αθ<sup>n</sup>* <sup>&</sup>gt; *<sup>M</sup>*) <sup>≤</sup> P(*θ*/*VY* > *M*), which can be made arbitrarily small by taking *M* large enough. Regarding the second assertion, as 1/*α* > 1 and E[*θ*/(*VY*)] < ∞ by Theorem 5.4.1 in [28], we have

$$\mathbb{E}[\lim\_{n\to\infty} L\_n] = \lim\_{n\to\infty} \sum\_{i=1}^n \mathbb{E}[\mathbf{1}\_{\{L\_i = L\_{i-1} + 1\}}] = \sum\_{n=1}^\infty \mathbb{E}[\theta\_n] \le \sum\_{n=1}^\infty \frac{\theta}{n^{1/a}} \mathbb{E}[1/VY] < \infty.$$

Extensions of Proposition 6 can be obtained by exploiting the central limit theorems for exchangeable random variables, which are found in [30,31].

## **5. Discussion**

In this paper, we study the extension of randomly reinforced urns [17] to an unbounded set of possible colors. The resulting measure-valued urn process provides a predictive characterization of the law of an asymptotically exchangeable sequence of random variables, which corresponds to the observation process of an implied urn sampling scheme. In fact, the model (6)–(7) fits into a line of recent research, which explores efficient predictive constructions for fast online prediction or approximately-Bayesian solutions, see [11,29,32] and references therein. To that end, one direction for future work is to generalize the functional relationship in (7) and/or, as one referee suggested, to consider finitely-additive measures, along the lines discussed in [33].

We investigate the asymptotic properties of the sequences of predictive distributions and empirical frequencies of the observation process, and prove their convergence in total variation distance to a common random limit. The rate of convergence of their difference is given set-wise; so, another possible direction for future research is to consider a stronger distance. As far as we know, the topic of merging of the predictive and empirical distributions is largely unexplored. Within the relevant literature, we mention the works of [4,34], where the authors study the rate of convergence of the Wasserstein or Prokhorov distances under exchangeability, and the papers by Berti et al. [21,35], who consider the c.i.d. case and regard the difference between the predictive and empirical measures as a map in the space of real bounded functions.

**Author Contributions:** Formal analysis, S.F., S.P., H.S.; writing—original draft preparation, S.F., S.P., H.S.; writing—review and editing, S.F., S.P., H.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 817257). H.S. was partially supported by the Bulgarian Ministry of Education and Science under the National Research Programme "Young scientists and postdoctoral students" approved by DCM No. 577/17.08.2018.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We wish to express our sincere gratitude to Regazzini for his deeply inspiring ideas and for instilling in us his passion for research. We thank the four anonymous referees for the valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**

