Predictive Constructions Based on Measure-Valued Pólya Urn Processes

Fortini, Sandra; Petrone, Sonia; Sariev, Hristo

doi:10.3390/math9222845

Open AccessArticle

Predictive Constructions Based on Measure-Valued Pólya Urn Processes

by

Sandra Fortini

^1,*

,

Sonia Petrone

¹

and

Hristo Sariev

²

¹

Department of Decision Sciences, Bocconi University, 20136 Milano, Italy

²

Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, 1113 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(22), 2845; https://doi.org/10.3390/math9222845

Submission received: 4 October 2021 / Revised: 4 November 2021 / Accepted: 8 November 2021 / Published: 10 November 2021

(This article belongs to the Special Issue Bayesian Predictive Inference and Related Asymptotics—Festschrift for Eugenio Regazzini's 75th Birthday)

Download Versions Notes

Abstract

:

Measure-valued Pólya urn processes (MVPP) are Markov chains with an additive structure that serve as an extension of the generalized k-color Pólya urn model towards a continuum of possible colors. We prove that, for any MVPP

{(μ_{n})}_{n \geq 0}

on a Polish space

X

, the normalized sequence

{(μ_{n} / μ_{n} (X))}_{n \geq 0}

agrees with the marginal predictive distributions of some random process

{(X_{n})}_{n \geq 1}

. Moreover,

μ_{n} = μ_{n - 1} + R_{X_{n}}

,

n \geq 1

, where

x \mapsto R_{x}

is a random transition kernel on

X

; thus, if

μ_{n - 1}

represents the contents of an urn, then

X_{n}

denotes the color of the ball drawn with distribution

μ_{n - 1} / μ_{n - 1} (X)

and

R_{X_{n}}

—the subsequent reinforcement. In the case

R_{X_{n}} = W_{n} δ_{X_{n}}

, for some non-negative random weights

W_{1}, W_{2}, \dots

, the process

{(X_{n})}_{n \geq 1}

is better understood as a randomly reinforced extension of Blackwell and MacQueen’s Pólya sequence. We study the asymptotic properties of the predictive distributions and the empirical frequencies of

{(X_{n})}_{n \geq 1}

under different assumptions on the weights. We also investigate a generalization of the above models via a randomization of the law of the reinforcement.

Keywords:

predictive distributions; random probability measures; reinforced processes; Pólya sequences; urn schemes; Bayesian inference; conditional identity in distribution; total variation distance

MSC:

60G57; 60B10; 60G25; 60F05; 60G09

1. Introduction

Let

{(X_{n})}_{n \geq 1}

be a sequence of homogeneous random observations, taking values in a Polish space

X

. The central assumption in the Bayesian approach to inductive reasoning is that

{(X_{n})}_{n \geq 1}

is exchangeable, that is, its law is invariant under finite permutations. Then, by de Finetti’s theorem, there exists a random probability measure

\tilde{P}

on

X

such that, given

\tilde{P}

, the random variables

X_{1}, X_{2}, \dots

are conditionally independent and identically distributed with marginal distribution

\tilde{P}

(see [1], Section 3), denoted

X_{n} ∣ \tilde{P} \overset{i . i . d .}{\sim} \tilde{P} .

(1)

Furthermore,

\tilde{P}

is the almost sure (a.s.) weak limit of the predictive distributions and the empirical frequencies,

P (X_{n + 1} \in \cdot ∣ X_{1}, \dots, X_{n}) \overset{w}{⟶} \tilde{P} (\cdot) a . s . and \frac{1}{n} \sum_{i = 1}^{n} δ_{X_{i}} (\cdot) \overset{w}{⟶} \tilde{P} (\cdot) a . s .

(2)

The model (1) is completed by choosing a prior distribution for

\tilde{P}

. Inference consists in computing the conditional (posterior) distribution of

\tilde{P}

given an observed sample

(X_{1}, \dots, X_{n})

, with most inferential conclusions depending on some average with respect to the posterior distribution; for example, under squared loss, for any measurable set

B \subseteq X

, the best estimate of

\tilde{P} (B)

is the posterior mean,

E [\tilde{P} (B) | X_{1}, \dots, X_{n}]

. In addition, the posterior mean can be utilized for predictive inference since

P (X_{n + 1} \in B | X_{1}, \dots, X_{n}) = E [\tilde{P} (B) | X_{1}, \dots, X_{n}] .

(3)

A different modeling strategy uses the Ionescu–Tulcea theorem to define the law of the process from the sequence of predictive distributions,

{(P (X_{n + 1} \in \cdot | X_{1}, \dots, X_{n}))}_{n \geq 1}

. In that case, one can refer to Theorem 3.1 in [2] for necessary and sufficient conditions on

{(P (X_{n + 1} \in \cdot | X_{1}, \dots, X_{n}))}_{n \geq 1}

to be consistent with exchangeability. The predictive approach to model building is deeply rooted in Bayesian statistics, where the parameter

\tilde{P}

is assigned an auxiliary role and the focus is on observable “facts”, see [2,3,4,5,6]. Moreover, using the predictive distributions as primary objects allows one to make predictions instantly or helps ease computations. See [7] for a review on some well-known predictive constructions of priors for Bayesian inference.

In this work, we consider a class of predictive constructions based on measure-valued Pólya urn processes (MVPP). MVPPs have been introduced in the probabilistic literature [8,9] as an extension of k-color urn models, but their implications for (Bayesian) statistics have yet to be explored. A first aim of the paper is thus to show the potential use of MVPPs as predictive constructions in Bayesian inference. In fact, some popular models in Bayesian nonparametric inference can be framed in such a way, see Equation (8). A second aim of the paper is to suggest novel extensions of MVPPs that we believe can offer more flexibility in statistical applications.

MVPPs are essentially measure-valued Markov processes that have an additive structure, with the formal definition being postponed to Section 2.1 (Definition 1). Given an MVPP

{(μ_{n})}_{n \geq 0}

, we consider a sequence of random observations that are characterized by

P (X_{1} \in \cdot) = μ_{0} (\cdot) / μ_{0} (X)

and, for

n \geq 1

,

P (X_{n + 1} \in \cdot ∣ X_{1}, μ_{1}, \dots, X_{n}, μ_{n}) = \frac{μ_{n} (\cdot)}{μ_{n} (X)} .

(4)

The random measure

μ_{n}

is not necessarily measurable with respect to

(X_{1}, \dots, X_{n})

, so the predictive construction (4) is more flexible than models based solely on the predictive distributions of

{(X_{n})}_{n \geq 1}

; for example,

{(μ_{n})}_{n \geq 0}

allows for the presence of latent variables or other sources of observable data (see also [10] for a covariate-based predictive construction). However, (4) can lead to an imbalanced design, which may break the symmetry imposed by exchangeability. Nevertheless, it is still possible that the sequence

{(X_{n})}_{n \geq 1}

satisfies (2) for some

\tilde{P}

, in which case Lemma 8.2 in [1] implies that

{(X_{n})}_{n \geq 1}

is asymptotically exchangeable with directing random measure

\tilde{P}

.

In Theorem 1, we show that, taking

{(μ_{n})}_{n \geq 0}

as primary, the sequence

{(X_{n})}_{n \geq 1}

in (4) can be chosen such that

μ_{n} = μ_{n - 1} + R_{X_{n}},

(5)

where

x \mapsto R_{x}

is a measurable map from

X

to the space of finite measures on

X

. Models of the kind (4)–(5) are computationally efficient. Indeed, as new observations become available, predictions can be updated at a constant computational cost and with limited storage of information. If, in addition,

{(X_{n})}_{n \geq 1}

is asymptotically exchangeable, then (4)–(5) can provide a computationally simple approximation of an exchangeable scheme for Bayesian inference, along the lines in [11].

The recursive formula (5) allows us to interpret the dynamics of MVPPs in terms of an urn sampling scheme, as the name suggests. Let

μ_{0}

be a non-random finite measure on

X

. Suppose we have an urn whose contents are described by

μ_{0}

in the sense that

μ_{0} (B)

denotes the total mass of balls with colors in

B \subseteq X

. At time

n = 1

, a ball is extracted at random from the urn, and we denote its color by

X_{1}

. The urn is then reinforced according to a replacement rule

{(R_{x})}_{x \in X}

, so that the updated composition becomes

μ_{1} \equiv μ_{0} + R_{X_{1}}

. At any time

n > 1

, a ball of color

X_{n}

is picked with probability distribution

μ_{n - 1} / μ_{n - 1} (X)

, and the contents of the urn are subsequently reinforced by

R_{X_{n}}

. In the case the space of colors is finite,

| X | = k

, the above procedure is better known as a generalized k-color Pólya urn [12].

We focus our analysis on MVPPs for which

R_{x}

is concentrated on x; thus, after each draw, we reinforce only the color of the observed ball. More formally, we consider MVPPs that have a reinforcement measure of the kind

R_{X_{n}} = W_{n} δ_{X_{n}}

,

n \geq 1

, where

W_{n}

is some non-negative random variable. In that case, Equations (4) and (5) become

P (X_{n + 1} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}) = \sum_{i = 1}^{n} \frac{W_{i}}{μ_{0} (X) + \sum_{j = 1}^{n} W_{j}} δ_{X_{i}} (\cdot) + \frac{μ_{0} (X)}{μ_{0} (X) + \sum_{j = 1}^{n} W_{i}} μ_{0}^{'} (\cdot),

(6)

and

μ_{n} = μ_{n - 1} + W_{n} δ_{X_{n}} .

(7)

A notable example is Blackwell and MacQueen’s em Pólya sequence [13], which is a random process

{(X_{n})}_{n \geq 1}

characterized by

P (X_{1} \in \cdot) = ν (\cdot)

and, for

n \geq 1

,

P (X_{n + 1} \in \cdot ∣ X_{1}, \dots, X_{n}) = \sum_{i = 1}^{n} \frac{1}{θ + n} δ_{X_{i}} (\cdot) + \frac{θ}{θ + n} ν (\cdot),

(8)

for some probability measure

ν

on

X

and a constant

θ > 0

. By [13],

{(X_{n})}_{n \geq 1}

is exchangeable and corresponds to the model (1) with Dirichlet process prior with parameters

(θ, ν)

. It is easily seen that (8) is related to the MVPP

{(μ_{n})}_{n \geq 0}

given by

μ_{0} = θ ν

and, for

n \geq 1

,

μ_{n} = μ_{n - 1} + δ_{X_{n}} .

Therefore, we will call any MVPP a randomly reinforced Pólya process (RRPP) if it admits representation (6)–(7).

Existing studies on MVPPs look at models that have mostly a balanced design, i.e.,

R_{x} (X) = r

,

x \in X

, and assume irreducibility-like conditions for

{(R_{x})}_{x \in X}

, see [8,9,14,15] and Remark 4 in [16]. In contrast, RRPPs require that

R_{x} ({x}^{c}) = 0

, and so are excluded from the analysis in those papers. In fact, this difference in reinforcement mechanisms mirrors the dichotomy within k-color urn models, where the replacement R is best described in terms of a matrix with random elements. There, the class of randomly reinforced urns [17] assumes an R with zero off-diagonal elements (i.e., we reinforce only the color of the observed ball), whereas the generalized Pólya urn models require the mean replacement matrix to be irreducible. Similarly to the k-color case, RRPPs need the use of different techniques, which yield completely different results than those in [8,9,14,15,16]. As an example, Theorem 1 in [16] and our Theorem 2 prove convergence of the kind (2), yet the limit probability measure in [16] is non-random.

The RRPP has been implicitly studied by [17,18,19,20,21,22,23], among others, with the focus being on the process

{(X_{n})}_{n \geq 1}

. Those papers deal primarily with the k-color case (with the exception of [18,19,23]) and can be categorized on the basis of their assumptions on

{(W_{n})}_{n \geq 1}

. For example, [18,19,21,22] assume that

W_{n}

and

(X_{1}, W_{1}, \dots, X_{n - 1}, W_{n - 1}, X_{n})

are independent, in which case the process

{(X_{n})}_{n \geq 1}

is conditionally identically distributed (c.i.d.) [21], that is, conditionally on current information, all future observations are identically distributed. It follows from [21] that c.i.d. processes preserve many of the properties of exchangeable sequences and, in particular, satisfy (2)–(3). In contrast, [17,20,23] assume that the reinforcement

W_{n}

depends on the particular color

X_{n}

, and prove a version of (2) where

\tilde{P}

is concentrated on the set of dominant colors for which the expected reinforcement is maximum. In this work, we reconsider the above models in the framework of RRPPs. For the c.i.d. case, we prove results whose analogues have already been established by [23] for the model with dominant colors. In particular, we extend the convergence in (2) to be in total variation and give a unified central limit theorem. We also examine the number of distinct values that are generated by the sequence

{(X_{n})}_{n \geq 1}

.

In some applications, the definition of an MVPP can be too restrictive as it assumes that the probability law of the reinforcement R is known. However, we can envisage situations where the law is itself random, so we extend the definition of an MVPP by introducing a random parameter V. The resulting generalized measure-valued Pólya urn process (GMVPP) turns out to be a mixture of Markov processes and admits representation (4)–(5), conditional on the parameter V. When the reinforcement measure

R_{x}

is concentrated on x, we call

{(μ_{n})}_{n \geq 0}

a generalized randomly reinforced Pólya process (GRRPP). We give a characterization of GRRPPs with exchangeable weights

{(W_{n})}_{n \geq 1}

and show that the process

{((X_{n}, W_{n}))}_{n \geq 1}

is partially conditionally identically distributed (partially c.i.d) [24], that is, conditionally on the past observations and the concurrent observation from the other sequence, the future observations are marginally identically distributed. We also extend some of the results for RRPPs to the generalized setting.

The paper is structured as follows. In Section 2.1, we recall the definition of a measure-valued Pólya urn process and prove representation (4)–(5) for a suitably selected sequence

{(X_{n})}_{n \geq 1}

. Section 2.2 defines a particular subclass of MVPPs, called randomly reinforced Pólya processes (RRPP), which share with exchangeable Pólya sequences the property of reinforcing only the observed color. Section 3 is devoted to the study of the asymptotic properties of RRPPs. In Section 4, we give the definition of GMVPPs and GRRPPs, and obtain basic results.

2. Definitions and a Representation Result

Let

(X, d)

be a complete separable metric space, endowed with its Borel

σ

-field

X

. Denote by

M_{F} (X), M_{F}^{*} (X), M_{P} (X),

the collections of measures

μ

on

X

that are finite, finite and non-null, and probability measures, respectively. We regard

M_{F} (X)

,

M_{F}^{*} (X)

and

M_{P} (X)

as measurable spaces equipped with the

σ

-fields generated by

μ \mapsto μ (B)

,

B \in X

. We further let

K_{F} (X, Y), K_{P} (X, Y),

be the collections of transition kernels K from

X

to

Y

that are finite and probability kernels, respectively. Any non-null measure

μ \in M_{F}^{*} (X)

has a normalized version

μ^{'} = μ / μ (X)

. If

f : X \to Y

is measurable, then

f^{♯} : M_{F} (X) \to M_{F} (Y)

denotes the induced mapping of measures,

f^{♯} (μ) (\cdot) = μ (f^{- 1} (\cdot))

,

μ \in M_{F} (X)

.

All random quantities are defined on a common probability space

(Ω, H, P)

, which is assumed to be rich enough to support any required randomization. The symbol “⊥” will be used to denote independence between random objects, and “

\overset{d}{=}

” equality in distribution.

2.1. Measure-Valued Pólya urn Processes

Let

μ \in M_{F}^{*} (X)

describe the contents of an urn, as in Section 1. Once a ball is picked at random from

μ

, the urn is reinforced according to a replacement rule, which is formally a kernel

R \in K_{F} (X, X)

that maps colors

x \mapsto R_{x} (\cdot)

to finite measures; thus,

μ + R_{x},

(9)

represents the updated urn composition if a ball of color x has been observed. In general, R is random and there exists a probability kernel

R \in K_{P} (X, M_{F} (X))

such that

R_{x} \sim R_{x}

,

x \in X

. Then, the distribution of (9) prior to the sampling of the urn is given by

{\hat{R}}_{μ} (\cdot) = \int_{X} ψ_{μ}^{♯} (R_{x}) (\cdot) μ^{'} (d x),

(10)

where

ψ_{μ}

is the measurable map

ν \mapsto ν + μ

from

M_{F} (X)

to

M_{F}^{*} (X)

. By Lemma 3.3 in [9],

μ \mapsto {\hat{R}}_{μ}

is a measurable map from

M_{F}^{*} (X)

to

M_{P} (M_{F}^{*} (X))

.

Definition 1

(Measure-Valued Pólya Urn Process [9]). A sequence

{(μ_{n})}_{n \geq 0}

of random finite measures on

X

is called a measure-valued Pólya urn process (MVPP) with parameters

μ_{0} \in M_{F}^{*} (X)

and

R \in K_{P} (X, M_{F} (X))

if it is a Markov process with transition kernel

\hat{R}

given by (10). If, in particular,

R_{x} = δ_{R_{x}}

for some

R \in K_{F} (X, X)

, then

{(μ_{n})}_{n \geq 0}

is said to be a deterministic MVPP.

The representation theorem below formalizes the idea of MVPP as an urn scheme.

Theorem 1.

A sequence

{(μ_{n})}_{n \geq 0}

of random finite measures is an MVPP with parameters

(μ_{0}, R)

if and only if, for every

n \geq 1

,

μ_{n} = μ_{n - 1} + R_{X_{n}} a . s .,

(11)

where

{(X_{n})}_{n \geq 1}

is a sequence of

X

-valued random variables such that

X_{1} \sim μ_{0}^{'}

and, for

n \geq 2

,

P (X_{n} \in \cdot ∣ X_{1}, μ_{1}, \dots, X_{n - 1}, μ_{n - 1}) = μ_{n - 1}^{'} (\cdot),

(12)

and R is a random finite transition kernel on

X

such that

P (R_{X_{n}} \in \cdot ∣ X_{1}, μ_{1}, \dots, X_{n - 1}, μ_{n - 1}, X_{n}) = R_{X_{n}} (\cdot) .

(13)

Proof.

If

{(μ_{n})}_{n \geq 0}

satisfies (11)–(13) for every

n \geq 1

, then it holds a.s. that

P (μ_{n} \in \cdot ∣ μ_{1}, \dots, μ_{n - 1}) = E [ψ_{μ_{n - 1}}^{♯} (R_{X_{n}}) (\cdot) ∣ μ_{1}, \dots, μ_{n - 1}] = {\hat{R}}_{μ_{n - 1}} (\cdot) .

Conversely, suppose

{(μ_{n})}_{n \geq 0}

is a MVPP with parameters

(μ_{0}, R)

. As

R

is a probability kernel from

X

to

M_{F} (X)

and

M_{F} (X)

is Polish, then there exists by Lemma 4.22 in [25] a measurable function

f (x, u)

such that, for every

x \in X

,

f (x, U) \sim R_{x},

whenever U is a uniform random variable on

[0, 1]

, denoted

U \sim Unif [0, 1]

.

Let us prove by induction that there exists a sequence

{((X_{n}, U_{n}))}_{n \geq 1}

such that

X_{1} \sim μ_{0}^{'}

,

U_{1} ⊥ X_{1}

,

U_{1} \sim Unif [0, 1]

,

μ_{1} = μ_{0} + f (X_{1}, U_{1})

a.s.,

(μ_{2}, μ_{3}, \dots) ⊥ (X_{1}, U_{1}) ∣ μ_{1}

, and, for every

n \geq 2

,

(i): $P (X_{n} \in \cdot ∣ X_{1}, U_{1}, μ_{1}, \dots, X_{n - 1}, U_{n - 1}, μ_{n - 1}) = μ_{n - 1}^{'} (\cdot)$ ;
(ii): $U_{n} \sim Unif [0, 1]$ and $U_{n} ⊥ (X_{1}, U_{1}, μ_{1}, \dots, X_{n - 1}, U_{n - 1}, μ_{n - 1}, X_{n})$ ;
(iii): $μ_{n} = μ_{n - 1} + f (X_{n}, U_{n})$ a.s.;
(iv): $(μ_{n + 1}, μ_{n + 2}, \dots) ⊥ (X_{n}, U_{n}) ∣ (X_{1}, U_{1}, μ_{1}, \dots, X_{n - 1}, U_{n - 1}, μ_{n - 1}, μ_{n})$ ;
(v): $μ_{n + 1} ⊥ (X_{1}, U_{1}, \dots, X_{n}, U_{n}) ∣ (μ_{1}, \dots, μ_{n})$ .

Then, Equations (11)–(13) follow from

(i)

–

(i i i)

with

R_{X_{n}} = f (X_{n}, U_{n})

.

Regarding the base case, let

{\tilde{X}}_{1}

and

{\tilde{U}}_{1}

be independent random variables such that

{\tilde{U}}_{1} \sim Unif [0, 1]

and

{\tilde{X}}_{1} \sim μ_{0}^{'}

. It follows that, for any measurable set

B \subseteq M_{F} (X)

,

P (μ_{1} \in B) = {\hat{R}}_{μ_{0}} (B) = E [ψ_{μ_{0}}^{♯} (R_{{\tilde{X}}_{1}}) (B)] = P ((μ_{0} + f ({\tilde{X}}_{1}, {\tilde{U}}_{1})) \in B);

thus,

μ_{1} \overset{d}{=} μ_{0} + f ({\tilde{X}}_{1}, {\tilde{U}}_{1})

. By Theorem 8.17 in [25], there exist random variables

X_{1}

and

U_{1}

such that

(μ_{1}, X_{1}, U_{1}) \overset{d}{=} (μ_{0} + f ({\tilde{X}}_{1}, {\tilde{U}}_{1}), {\tilde{X}}_{1}, {\tilde{U}}_{1}),

and

(μ_{2}, μ_{3}, \dots) ⊥ (X_{1}, U_{1}) ∣ μ_{1}

. Then, in particular,

(X_{1}, U_{1}) \overset{d}{=} ({\tilde{X}}_{1}, {\tilde{U}}_{1})

and

(μ_{1}, μ_{0} + f (X_{1}, U_{1})) \overset{d}{=} (μ_{0} + f ({\tilde{X}}_{1}, {\tilde{U}}_{1}), μ_{0} + f ({\tilde{X}}_{1}, {\tilde{U}}_{1}))

, so

μ_{1} = μ_{0} + f (X_{1}, U_{1}) a . s .

Regarding the induction step, assume that

(i)

–

(v)

hold true until some

n > 1

. Let

{\tilde{X}}_{n + 1}

and

{\tilde{U}}_{n + 1}

be such that

{\tilde{U}}_{n + 1} \sim Unif [0, 1]

,

{\tilde{U}}_{n + 1} ⊥ (X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}, {\tilde{X}}_{n + 1})

, and

P ({\tilde{X}}_{n + 1} \in \cdot ∣ X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}) = μ_{n}^{'} (\cdot) .

It follows from

(v)

that, for any measurable set

B \subseteq M_{F} (X)

,

\begin{matrix} P (μ_{n + 1} \in B | X_{1}, U_{1}, μ_{1}, \dots, & X_{n}, U_{n}, μ_{n}) = E [ψ_{μ_{n}}^{♯} (R_{{\tilde{X}}_{n + 1}}) (B) | X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}] \\ = P ((μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1})) \in B | X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}); \end{matrix}

thus,

μ_{n + 1} \overset{d}{=} μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}) ∣ X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}

. By Theorem 8.17 in [25], there exist random variables

X_{n + 1}

and

U_{n + 1}

such that

\begin{matrix} (μ_{n + 1}, X_{1}, U_{1}, μ_{1}, & \dots, X_{n}, U_{n}, μ_{n}, X_{n + 1}, U_{n + 1}) \\ \overset{d}{=} (μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}), X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}, {\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}), \end{matrix}

and

(μ_{n + 2}, μ_{n + 3}, \dots) ⊥ (X_{n + 1}, U_{n + 1}) ∣ (X_{1}, U_{1}, μ_{1} \dots, \dots, X_{n}, U_{n}, μ_{n}, μ_{n + 1})

. Then, in particular,

U_{n + 1} \sim Unif [0, 1]

,

U_{n + 1} ⊥ (X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}, X_{n + 1})

, and

P (X_{n + 1} \in \cdot ∣ X_{1}, U_{1}, μ_{1}, \dots, X_{n}, U_{n}, μ_{n}) = μ_{n}^{'} (\cdot) .

Moreover,

(μ_{n + 1}, μ_{n} + f (X_{n + 1}, U_{n + 1})) \overset{d}{=} (μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}), μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}));

therefore,

P (μ_{n + 1} = μ_{n} + f (X_{n + 1}, U_{n + 1})) = P (μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1}) = μ_{n} + f ({\tilde{X}}_{n + 1}, {\tilde{U}}_{n + 1})) = 1 .

By Theorem 8.12 in [25], statement

(v)

with

n + 1

is equivalent to

μ_{n + 2} ⊥ (X_{1}, U_{1}) ∣ (μ_{1}, \dots, μ_{n + 1})

and

μ_{n + 2} ⊥ (X_{k + 1}, U_{k + 1}) ∣ (X_{1}, U_{1}, \dots, X_{k}, U_{k}, μ_{1}, \dots, μ_{n + 1})

,

k = 1, \dots, n

. The latter follows from the induction hypothesis since, by

(i v)

, we have

(μ_{k + 2}, \dots, μ_{n + 2}) ⊥ (X_{k + 1}, U_{k + 1}) ∣ (X_{1}, U_{1}, \dots, X_{k}, U_{k}, μ_{1}, \dots, μ_{k + 1})

for every

k = 1, \dots, n

. □

The process

{(X_{n})}_{n \geq 1}

in Theorem 1 corresponds to the sequence of observed colors from the implied urn sampling scheme. Furthermore, the replacement rule takes the form

R_{X_{n}} = f (X_{n}, U_{n})

, where f is some measurable function,

U_{n} \sim Unif [0, 1]

, and

U_{n} ⊥ (X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n})

, from which it follows that

μ_{n} = μ_{n - 1} + f (X_{n}, U_{n}),

(14)

and

P (X_{n + 1} \in \cdot ∣ X_{1}, \dots, X_{n}, {(U_{m})}_{m \geq 1}) = \frac{μ_{0} (\cdot) + \sum_{i = 1}^{n} f (X_{i}, U_{i}) (\cdot)}{μ_{0} (X) + \sum_{i = 1}^{n} f (X_{i}, U_{i}) (X)} .

(15)

Thus, the sequence

{(U_{n})}_{n \geq 1}

models the additional randomness in the reinforcement measure R. Janson [9] obtains a rather similar result; Theorem 1.3 in [9] states that any MVPP

{(μ_{n})}_{n \geq 0}

can be coupled with a deterministic MVPP

{({\bar{μ}}_{n})}_{n \geq 0}

on

X \times [0, 1]

in the sense that

{\bar{μ}}_{n} = μ_{n} \times λ,

(16)

where

λ

is the Lebesgue measure on

[0, 1]

, and

μ_{n} \times λ

is the product measure on

X \times [0, 1]

. In our case, the MVPP defined by

{\bar{μ}}_{0} = μ_{0} \times λ

and, for

n \geq 1

,

{\bar{μ}}_{n} = {\bar{μ}}_{n - 1} + f (X_{n}, U_{n}) \times λ,

has a non-random replacement rule

R_{x, u} = f (x, u) \times λ

and satisfies (16) on a set of probability one.

2.2. Randomly Reinforced Pólya Processes

It follows from (8) that any Pólya sequence generates a deterministic MVPP through

μ_{n} = μ_{n - 1} + δ_{X_{n}} .

Here, we consider a randomly reinforced extension of Pólya sequences in the form of an MVPP with replacement rule

R_{x} = W (x) \cdot δ_{x}

,

x \in X

, where

W (x)

is a non-negative random variable.

Definition 2

(Randomly Reinforced Pólya Process). We call an MVPP with parameters

(μ_{0}, R)

a randomly reinforced Pólya process (RRPP) if there exists

η \in K_{P} (X, R_{+})

such that

R_{x} = ξ_{x}^{♯} (η_{x})

,

x \in X

, where

ξ_{x} : R_{+} \to M_{F} (X)

is the map

w \mapsto w δ_{x}

.

Observe that, for RRPPs, the reinforcement measure

f (x, u)

in (14)–(15) concentrates its mass on x; thus, we obtain the following variant of the representation result in Theorem 1.

Proposition 1.

Let

{(μ_{n})}_{n \geq 0}

be an RRPP with parameters

(μ_{0}, η)

. Then, there exist a measurable function

h : X \times [0, 1] \to R_{+}

and a sequence

{((X_{n}, U_{n}))}_{n \geq 1}

such that, using

W_{n} = h (X_{n}, U_{n})

, we have for every

n \geq 1

that

μ_{n} = μ_{n - 1} + W_{n} δ_{X_{n}} a . s .,

(17)

where

X_{1} \sim μ_{0}^{'}

and, for

n \geq 1

,

U_{n} \sim Unif [0, 1]

,

U_{n} ⊥ (X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n})

, and

P (X_{n + 1} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}) = \sum_{i = 1}^{n} \frac{W_{i}}{μ_{0} (X) + \sum_{j = 1}^{n} W_{j}} δ_{X_{i}} (\cdot) + \frac{μ_{0} (X)}{μ_{0} (X) + \sum_{j = 1}^{n} W_{j}} μ_{0}^{'} (\cdot) .

(18)

Moreover,

P (W_{n} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n - 1}, W_{n - 1}, X_{n}) = η_{X_{n}} (\cdot) .

(19)

It follows from (19) that

W (x) \equiv h (x, U) \sim η_{x}

,

x \in X

, whenever

U \sim Unif [0, 1]

. Then, the random measure

R_{x} = W (x) \cdot δ_{x}

(20)

is such that

R_{x} \sim R_{x}

, where

R_{x}

appears in Definition 2.

3. Asymptotic Properties of RRPP

In this section, we study the asymptotic properties of RRPPs through the sequence

{(X_{n})}_{n \geq 1}

in the representation (17). We show that the limit behavior of

{(μ_{n})}_{n \geq 0}

depends on the relationship between weights and observations. In particular, when

W (x) \equiv W

in (20) is constant with respect to the color x, the process

{(X_{n})}_{n \geq 1}

is conditionally identically distributed (c.i.d.) and, for every

A \in X

, the normalized sequence

{(μ_{n}^{'} (A))}_{n \geq 0}

is a bounded martingale. We consider the c.i.d. case in Section 3.3. In contrast, if some colors x have a higher expected reinforcement, then they tend to dominate the observation process and, as n grows to infinity, the probability measure

μ_{n}^{'}

concentrates its mass on the subset of dominant colors, see Theorem 2.

3.1. Preliminaries

Our focus is on the convergence of the normalized sequence

{(μ_{n}^{'})}_{n \geq 0}

, which by Theorem 1 is a.s. equal to the predictive distributions (18). We also consider the sequence of empirical frequencies of

{(X_{n})}_{n \geq 1}

, defined for

n \geq 1

by

{\hat{μ}}_{n}^{'} = \frac{1}{n} \sum_{i = 1}^{n} δ_{X_{i}} .

We obtain conditions under which the convergence in (2) extends to convergence in total variation, where the total variation distance between any two probability measures

α, β \in M_{P} (X)

is given by

d_{T V} (α, β) = sup_{B \in X} | α (B) - β (B) | .

To state some of the results, we recall the definition of support of a probability measure

γ \in M_{P} (R_{+})

,

supp (γ) = {u \geq 0 : γ ((u - ϵ, u + ϵ)) > 0, \forall ϵ > 0} .

Of particular interest is the conditional probability of observing a new color, given by

θ_{n} \equiv P (X_{n + 1} \notin {X_{1}, \dots, X_{n}} ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}) = \frac{θ}{θ + \sum_{j = 1}^{n} W_{j}} μ_{0}^{'} ({X_{1}, \dots, X_{n}}^{c}),

for

n \geq 1

, where

θ = μ_{0} (X)

. This would inform us on the number of distinct values in a sample

(X_{1}, \dots, X_{n})

of size n,

L_{n} = max {k \in {1, \dots, n} : X_{k} \notin {X_{1}, \dots, X_{k - 1}}},

since

θ_{n} = P (L_{n + 1} = L_{n} + 1 | X_{1}, W_{1}, \dots, X_{n}, W_{n})

.

The following modes of convergence are used when we investigate the rate of convergence of the distance between

μ_{n}^{'}

and

{\hat{μ}}_{n}

.

Almost sure (a.s.) conditional convergence. Let

G = {(G_{n})}_{n \geq 0}

be a filtration and

\tilde{Q} \in K_{P} (Ω, X)

. A sequence

{(Y_{n})}_{n \geq 1}

is said to converge to

\tilde{Q}

in the sense of a.s. conditional convergence w.r.t.

G

if the conditional distribution of

Y_{n}

, given

G_{n}

, converges weakly on a set of probability one to

\tilde{Q}

, that is, as

n \to \infty

,

P (Y_{n} \in \cdot ∣ G_{n}) \overset{w}{⟶} \tilde{Q} (\cdot) a . s .

We refer to [22] for more details.

Stable convergence. Stable convergence is a strong form of convergence in distribution, albeit weaker than a.s. conditional convergence. A sequence

{(Y_{n})}_{n \geq 1}

is said to converge stably to

\tilde{Q}

if

E [V f (Y_{n})] ⟶ E [V \int_{X} f (x) \tilde{Q} (d x)],

for all continuous bounded functions f and any integrable random variable V. The main application of stable convergence is in central limit theorems that allow for mixing variables in the limit. See [26] for a complete reference on stable convergence.

In the sequel, the stable and a.s. conditional limits will be some Gaussian law, which we denote by

N (μ, σ^{2})

for parameters

(μ, σ^{2})

, where

N (μ, 0) = δ_{μ}

.

3.2. RRPP with Dominant Colors

Using (20), let us define, for

x \in X

,

w (x) = E [W (x)] and \bar{w} = sup_{x \in X} w (x) .

We further let

D = {x \in X : w (x) = \bar{w}},

be the set of dominant colors. The model (18) with

D \subset X

has been studied by [23] under the assumption that

\bar{w}

is strictly greater than the next largest value of

w (\cdot)

in the support of

w^{♯} (μ_{0}^{'})

. Then, the probability of observing a non-dominant color,

x \in D^{c}

, vanishes, and the predictive and the empirical distributions converge in total variation to a common random probability measure, which is concentrated on

D

. For completeness reasons, we report here the main results from [23].

Theorem 2

([23], Theorem 3.3). For any RRPP

{(μ_{n})}_{n \geq 0}

that satisfies

\begin{matrix} W (x) \leq β < \infty; \\ \bar{w} \in supp (w^{♯} (μ_{0}^{'})); \\ \bar{w} > {\bar{w}}^{c} \equiv sup {u \geq 0 : u \in supp (w^{♯} (μ_{0}^{'} (\cdot | D^{c}))}, \end{matrix}

(21)

there exists a random probability measure

\tilde{P}

on

X

with

\tilde{P} (D) = 1

a.s. such that

d_{T V} (μ_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0 a n d d_{T V} ({\hat{μ}}_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0 .

Under conditions (21), Theorem 3.3 in [23] implies

\sum_{i = 1}^{n} W_{i} / n \overset{a . s .}{⟶} \bar{w}

. If

μ_{0}

is further diffuse, then

\sum_{n = 1}^{\infty} θ_{n} = \infty

a.s., and so

L_{n} \overset{a . s .}{⟶} \infty

by Theorem 1 in [27]; thus, by Theorem 1 in [27], Proposition 3.4 in [23] shows that the actual growth rate is that of a Pólya sequence,

\frac{L_{n}}{log n} \overset{a . s .}{⟶} \frac{θ}{\bar{w}} .

(22)

In addition to the uniform convergence in Theorem 2, the authors in [23] obtain set-wise rates of convergence. To state their result, we introduce, for any

A \in X

,

q_{A} = lim_{n \to \infty} E [W_{n + 1}^{2} δ_{X_{n + 1}} (A) | X_{1}, W_{1}, \dots, X_{n}, W_{n}],

which exists a.s. under the assumptions of Theorem 2.

Theorem 3

([23], Theorem 4.2). Let

{(μ_{n})}_{n \geq 0}

be an RRPP satisfying (21). Suppose

\bar{w} > 2 {\bar{w}}^{c}

. Define

V (A) = \frac{1}{{\bar{w}}^{2}} \{{(\tilde{P} (A^{c}))}^{2} q_{A} + {(\tilde{P} (A))}^{2} q_{A^{c}}\} a n d U (A) = V (A) - \tilde{P} (A) \tilde{P} (A^{c}) .

Then,

\sqrt{n} (μ_{n}^{'} (A) - {\hat{μ}}_{n}^{'} (A)) \overset{s t a b l y}{⟶} N (0, U (A)),

and

\sqrt{n} (μ_{n}^{'} (A) - \tilde{P} (A)) \overset{a . s . c o n d .}{⟶} N (0, V (A)) w . r . t . {(F_{n}^{X, W})}_{n \geq 1},

where

F_{n}^{X, W} = σ (X_{1}, W_{1}, \dots, X_{n}, W_{n})

,

n \geq 1

is the filtration generated by

{((X_{n}, W_{n}))}_{n \geq 1}

.

3.3. RRPP with Independent Weights

Let

{(μ_{n})}_{n \geq 0}

be an RRPP with reinforcement distribution

η_{x} \equiv η

that does not depend on x. Using the notation of Section 3.2, we have

w (x) \equiv \bar{w},

(23)

and, thus,

D = X

. An equivalent formulation can be given in terms of the sequence of weights

{(W_{n})}_{n \geq 1}

in Proposition 1, whereby

W_{n} = h (U_{n}),

(24)

for some measurable function h, with

U_{n} ⊥ (X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n})

and

U_{n} \sim Unif [0, 1]

. Then,

W_{n} \overset{i . i . d .}{\sim} η

and

W_{n} ⊥ (X_{1}, \dots, X_{n})

, which implies that

E [W_{1}] = \bar{w}

.

The model (18) with weights (24) has been studied by [18,19,22], among others, where the authors obtain central limit theorems and study the growth rate of

L_{n}

when

\bar{w} < \infty

. Their results rely on the fact that

{(X_{n})}_{n \geq 1}

is conditionally identically distributed (c.i.d.) with respect to the filtration generated by

{((X_{n}, W_{n}))}_{n \geq 1}

. By [21], an

X

-valued random sequence

{(Y_{n})}_{n \geq 1}

that is adapted to a filtration

{(F_{n})}_{n \geq 1}

is said to be c.i.d. with respect to

{(F_{n})}_{n \geq 1}

if and only if

{(Y_{n})}_{n \geq 1}

is identically distributed and, for every

n, k \geq 1

,

P (Y_{n + k} \in \cdot ∣ F_{n}) = P (Y_{n + 1} \in \cdot ∣ F_{n}) .

(25)

Proposition 2

([19], Lemma 6). For any RRPP

{(μ_{n})}_{n \geq 0}

with

η_{x} \equiv η

, the observation process

{(X_{n})}_{n \geq 1}

is c.i.d. with respect to the filtration generated by

{((X_{n}, W_{n}))}_{n \geq 1}

.

C.i.d. processes preserve many of the properties of exchangeable sequences, see [21]. For example, if

{(Y_{n})}_{n \geq 1}

is c.i.d., then there exists a random probability measure such that (2)–(3) hold true with respect to the filtration used in the definition (25). It follows for the model in Proposition 2 that there exists

\tilde{P} \in K_{P} (Ω, X)

such that, for every

A \in X

,

μ_{n}^{'} (A) \overset{a . s .}{⟶} \tilde{P} (A) .

In fact, by (25), the sequence

{(μ_{n}^{'} (A))}_{n \geq 0}

is a bounded martingale. On the other hand, (23) implies that

D = X

; therefore, any RRPP with

η_{x} \equiv η

whose weights are bounded,

W_{1} \leq β < \infty

, satisfies the assumptions of Theorem 2. In that case,

d_{T V} (μ_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} \tilde{P} .

It follows from Theorem 4.2 in [23] that the boundedness condition in (21) is needed to show that

(i)

\sum_{i = 1}^{n} W_{i} / n \overset{a . s .}{⟶} \bar{w}

; and

(i i)

μ_{n}^{'}

converge set-wise to

\tilde{P}

, which is non-trivial in that setting. Here,

(i)

is granted as

{(W_{n})}_{n \geq 1}

is i.i.d., and

(i i)

has already been established; thus, we obtain the following result for RRPPs with independent weights.

Theorem 4.

For any RRPP

{(μ_{n})}_{n \geq 0}

with

η_{x} \equiv η

, there exists a random probability measure

\tilde{P}

on

X

such that

d_{T V} (μ_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0 a n d d_{T V} ({\hat{μ}}_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0 .

Proof.

Let

{((X_{n}, W_{n}))}_{n \geq 1}

be the joint observation process associated to

{(μ_{n})}_{n \geq 0}

by Proposition 1. As

η_{x} \equiv η

, Equation (19) implies that

W_{n} \overset{i . i . d .}{\sim} η

; thus, by the strong law of large numbers,

\frac{1}{n} \sum_{i = 1}^{n} W_{i} \overset{a . s .}{⟶} \bar{w} \leq \infty .

(26)

Let us define, for

n \geq 1

,

P_{n} (\cdot) = P (X_{n + 1} \in \cdot ∣ F_{n}^{X, W}), where F_{n}^{X, W} = σ (X_{1}, W_{1}, \dots, X_{n}, W_{n}) .

By Proposition 2,

{(X_{n})}_{n \geq 1}

is c.i.d. with respect to

{(F_{n}^{X, W})}_{n \geq 1}

, so there exists by Lemmas 2.1 and 2.4 in [21] a random probability measure

\tilde{P}

on

X

such that, for every

A \in X

,

P_{n} (A) \overset{a . s .}{⟶} \tilde{P} (A) .

(27)

Moreover,

\int_{X} f (x) P_{n} (d x) = E [\int_{X} f (x) \tilde{P} (d x) | F_{n}^{X, W}]

a.s. for every bounded measurable

f : X \to R

. Fix

m \geq 1

. By a monotone class argument, we can show that, for every bounded measurable

f : X^{2} \to R

,

\int_{X} f (X_{m}, x) P_{n} (d x) = E [\int_{X} f (X_{m}, x) \tilde{P} (d x) ∣ F_{n}^{X, W}] a . s ., f o r a l l n > m;

thus,

P_{n} ({X_{m}}) = E [\tilde{P} ({X_{m}}) | F_{n}^{X, W}]

a.s., and so

{(P_{n} ({X_{m}}))}_{n > m}

is a uniformly integrable martingale. It follows from martingale convergence that, as

n \to \infty

,

P_{n} ({X_{m}}) \overset{a . s .}{⟶} \tilde{P} ({X_{m}}) .

(28)

Using (26)–(28), we can repeat the argument in the proof of Proposition 3.1 in [23] to show that

(i)

d_{T V} (P_{n}, \tilde{P}) \overset{a . s .}{⟶} 0

, and so

d_{T V} (μ_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0

by Proposition 1; and

(i i)

d_{T V} ({\hat{μ}}_{n}^{'}, \tilde{P}) \overset{a . s .}{⟶} 0

. □

Equation (26) implies that

θ_{n} \overset{a . s .}{⟶} 0

. If, in addition,

\bar{w} < \infty

, then

\sum_{n = 1}^{\infty} θ_{n} = \infty

a.s. and

L_{n} \overset{a . s .}{⟶} \infty

. In fact, as long as

\bar{w} < \infty

, the sequence

{(L_{n})}_{n \geq 1}

grows at the same rate as (22).

Proposition 3

([18], Lemma 6). Let

η \in M_{P} (X)

and

μ_{0}

be diffuse. If

\bar{w} < \infty

, then

\frac{L_{n}}{log n} \overset{a . s .}{⟶} \frac{θ}{\bar{w}} .

If

\bar{w} = \infty

, then

θ_{n}

may approach zero fast enough that we stop seeing new observations as

n \to \infty

. For example, let us consider random reinforcement with a totally skewed stable distribution

S_{α} (1, σ, 0)

for

α \in (0, 2]

and

σ > 0

. If

α < 1

, then

\bar{w} = \infty

, and we show that

n^{1 / α} θ_{n}

is stochastically bounded, which implies that

L_{n}

converges to a finite limit.

Proposition 4.

Let η be a

S_{α} (1, σ, 0)

distribution with stability parameter

α < 1

, and

μ_{0}

be diffuse. Then,

θ_{n} = O_{p} (n^{- 1 / α})

and

lim_{n \to \infty} L_{n} < \infty a . s .

Proof.

From the properties of stable distributions, we obtain

n^{- 1 / α} \sum_{i = 1}^{n} W_{i} \overset{d}{=} W_{1}

for every

n \geq 1

and, as a consequence,

θ_{n} = n^{- 1 / α} \frac{θ}{n^{- 1 / α} θ + n^{- 1 / α} \sum_{i = 1}^{n} W_{i}} \overset{d}{=} n^{- 1 / α} \frac{θ}{n^{- 1 / α} θ + W_{1}} \leq n^{- 1 / α} \frac{θ}{W_{1}} .

By Theorem 5.4.1 in [28],

E [1 / W_{1}] < \infty

, and so

1 / W_{1} < \infty

a.s. It follows for every

M > 0

that

P (n^{1 / α} θ_{n} > M) \leq P (θ / W_{1} > M)

, which can be made arbitrarily small by taking M large enough. Regarding the second assertion, as

1 / α > 1

, we have

E [lim_{n \to \infty} L_{n}] = lim_{n \to \infty} \sum_{i = 1}^{n} E [1 1_{{L_{i} = L_{i - 1} + 1}}] = \sum_{n = 1}^{\infty} E [θ_{n}] \leq \sum_{n = 1}^{\infty} \frac{θ}{n^{1 / α}} E [1 / W_{1}] < \infty .

□

Proposition 4 can be extended for any fat tailed reinforcement distribution

η

by means of a generalized central limit theorem (see, e.g., [28] (p. 62)).

The rate of convergence of (18) and

{\hat{μ}}_{n}^{'}

has already been studied for the model with independent weights under different assumptions, see, e.g., [19] (p. 1363), Examples 4.2 and 4.5 in the technical report to [18], Corollary 4.1 in [22] for

X = {0, 1}

. In the next theorem, we combine ideas from [18,20] to give a fairly general result.

Theorem 5.

Let

η \in M_{P} (R_{+})

. If

E [W_{1}^{2}] < \infty

, then

\sqrt{n} (μ_{n}^{'} (A) - {\hat{μ}}_{n} (A)) \overset{s t a b l y}{⟶} N (0, U (A)), w h e r e U (A) = \frac{Var (W_{1})}{E [W_{1}^{2}]} \tilde{P} (A) \tilde{P} (A^{c}) .

(29)

If, in addition,

E [W_{1}^{4}] < \infty

, then, with respect to the filtration generated by

{((X_{n}, W_{n}))}_{n \geq 1}

,

\sqrt{n} (μ_{n}^{'} (A) - \tilde{P} (A)) \overset{a . s . c o n d .}{⟶} N (0, V (A)), w h e r e V (A) = \frac{E [W_{1}^{2}]}{{\bar{w}}^{2}} \tilde{P} (A) \tilde{P} (A^{c}) .

(30)

Proof.

Let us define, for

n \geq 1

,

P_{n} (\cdot) = P (X_{n + 1} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}) .

The assertions in Theorem 5 have already been established by [18] when

W_{1} \geq γ > 0

. In that case, Examples 4.2 and 4.5 in the technical report to [18] show that (29) is a consequence of the fact that

E [max_{1 \leq k \leq n} | Y_{n, k} |] ⟶ 0 and \sum_{k = 1}^{n} Y_{n, k}^{2} \overset{p}{⟶} U (A),

(31)

where

Y_{n, k} = \frac{1}{\sqrt{n}} \{δ_{X_{k}} (A) - k P_{k} (A) + (k - 1) P_{k - 1} (A)\}

, and (30) follows from

E [sup_{n \geq 1} \sqrt{n} | P_{n - 1} (A) - P_{n} (A) |] < \infty and n \sum_{k \geq n} {(P_{k - 1} (A) - P_{k} (A))}^{2} \overset{a . s .}{⟶} V (A) .

Replicating the approach of Proposition 9 in [20], we avoid using the assumption

W_{1} \geq γ > 0

by conditioning on the sets

H_{n} = {2 \sum_{i = 1}^{n} W_{i} \geq n \bar{w}}

,

n \geq 1

. By (26),

1 1_{H_{n}} \overset{a . s .}{⟶} 1

, so (29) follows from (31) with

Y_{n, k} = \frac{1}{\sqrt{n}} 1 1_{H_{k - 1}} \{δ_{X_{k}} (A) - k P_{k} (A) + (k - 1) P_{k - 1} (A)\},

whereas (30) is, ultimately, a result of

E [sup_{n \geq 1} \sqrt{n} \cdot 1 1_{H_{n}} | P_{n - 1} (A) - P_{n} (A) |] < \infty and n \sum_{k \geq n} {(P_{k - 1} (A) - P_{k} (A))}^{2} \overset{a . s .}{⟶} V (A) .

□

4. Generalized Measure-Valued Pólya Urn Processes

The definition of an MVPP assumes that the law of the reinforcement

R

is fixed, yet, in some situations,

R

can itself be random (e.g., RRPP with exchangeable weights, see Section 4.1). To avoid measurability issues, we assume a parametric model for

R

, with the parameter taking values in a Polish space

V

.

Definition 3

(Generalized Measure-Valued Pólya Urn Process). Let V be a

V

-valued random variable. A sequence

{(μ_{n})}_{n \geq 0}

of random finite measure on

X

is called a generalized measure-valued Pólya urn process (GMVPP) with uncertainty parameter V, initial state

μ_{0} \in M_{F}^{*} (X)

and replacement rule

R \in K_{P} (V \times X, M_{F} (X))

if

μ_{1} ∣ V \sim {\hat{R}}_{μ_{0}}^{V}

, and, for every

n \geq 2

,

P (μ_{n} \in \cdot ∣ V, μ_{1}, \dots, μ_{n - 1}) = {\hat{R}}_{μ_{n - 1}}^{V} (\cdot),

where

\hat{R}

is the transition probability kernel from

V \times M_{F}^{*} (X)

to

M_{F}^{*} (X)

given by

(v, μ) \mapsto {\hat{R}}_{μ}^{v} (\cdot) = \int_{X} ψ_{μ}^{♯} (R (v, x)) (\cdot) μ^{'} (d x),

and

ψ_{μ}

is the map

ν \mapsto ν + μ

.

It follows from Definition 3 that any GMVPP is a mixture of Markov chains with initial state

μ_{0}

and transition kernel

{\hat{R}}^{V}

. A separate modeling approach, which we do not examine here, defines a measure-valued Markov chain with transition kernel

μ \mapsto \int_{X} ψ_{μ}^{♯} (R (μ, x)) (\cdot) μ^{'} (d x) .

In fact, some of the predictive constructions in [11,29] can be framed in such a way.

Theorem 1 extends to GMVPPs, provided that we condition all quantities on the parameter V. As a consequence, there exists a measurable function f from

V \times X \times [0, 1]

to

M_{F} (X)

and a random sequence

{((X_{n}, U_{n}))}_{n \geq 1}

such that

μ_{n} = μ_{n - 1} + f (V, X_{n}, U_{n}) a . s .,

(32)

where

U_{n} \sim Unif [0, 1]

,

U_{n} ⊥ (V, X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n})

,

X_{1} ∣ (V, {(U_{m})}_{m \geq 1}) \sim μ_{0}^{'}

, and, for

n \geq 1

,

P (X_{n + 1} \in \cdot ∣ V, X_{1}, \dots, X_{n}, {(U_{m})}_{m \geq 1}) = \frac{μ_{0} (\cdot) + \sum_{i = 1}^{n} f (V, X_{i}, U_{i}) (\cdot)}{μ_{0} (X) + \sum_{i = 1}^{n} f (V, X_{i}, U_{i}) (X)},

(33)

and

P (f (V, X_{n}, U_{n}) \in \cdot ∣ V, X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n}) = R (V, X_{n}) (\cdot) .

(34)

The definition of a randomly reinforced Pólya process is similarly generalized to cover the case of a random reinforcement distribution

η

.

Definition 4

(Generalized Randomly Reinforced Pólya Process). We call a GMVPP with parameters

(V, μ_{0}, R)

a generalized randomly reinforced Pólya process (GRRPP) if there exists

η \in K_{P} (V \times X, R_{+})

such that

R (v, x) = ξ_{x}^{♯} (η (v, x))

, where

ξ_{x} : R_{+} \to M_{F} (X)

is the map

w \mapsto w δ_{x}

.

For GRRPPs, the function f in the representation (32)–(34) can be written as

f (v, x, u) = h (v, x, u) \cdot δ_{x},

where h is a measurable function from

V \times X \times [0, 1]

to

R_{+}

such that

h (v, x, U) \sim η (v, x)

for all

v \in V

and

x \in X

, whenever

U \sim Unif [0, 1]

. Letting

W_{n} = h (V, X_{n}, U_{n})

, we obtain

μ_{n} = μ_{n - 1} + W_{n} δ_{X_{n}} a . s .,

(35)

where

P (X_{n + 1} \in \cdot ∣ V, X_{1}, \dots, X_{n}, {(U_{m})}_{m \geq 1}) = \frac{μ_{0} (\cdot) + \sum_{i = 1}^{n} W_{i} δ_{X_{i}} (\cdot)}{μ_{0} (X) + \sum_{i = 1}^{n} W_{i}},

(36)

and

P (W_{n} \in \cdot ∣ V, X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n}) = η (V, X_{n}) (\cdot) .

(37)

The weights

W_{n}

in (36) allow us to incorporate additional information about the observations

{(X_{n})}_{n \geq 1}

. As an example, consider the problem of computer-based classification, where the output usually includes confidence scores, which reflect the software’s confidence that the classifications are correct. In analyzing the number and dimension of the types already discovered, or the probability of detecting a new type, a typical procedure would take into account only those classifications whose confidence scores are above a certain threshold. Alternatively, we could adopt a Bayesian perspective and weigh each classification according to its confidence score. Denoting by

{((X_{n}, W_{n}))}_{n \geq 1}

the sequence of classifications and confidence scores, we would model the distribution of the next classification by (36).

4.1. GRRPP with Exchangeable Weights

Let

{(μ_{n})}_{n \geq 0}

be a GRRPP with reinforcement distribution

η (v)

that does not depend on x. Then,

W_{n} = h (V, U_{n}),

for some measurable function

h (v, u)

. The next result shows that the sequence

{(W_{n})}_{n \geq 1}

is exchangeable with directing random measure

\tilde{η} \equiv η (V)

. Moreover,

{(μ_{n})}_{n \geq 0}

is completely parameterized by

(μ_{0}, \tilde{η})

.

Theorem 6.

A sequence

{(μ_{n})}_{n \geq 0}

of random finite measures is a GRRPP with parameters

(μ_{0}, \tilde{η})

for

\tilde{η} \in K_{P} (Ω, R_{+})

if and only if

μ_{0} = θ ν

and, for every

n \geq 1

,

μ_{n} = μ_{n - 1} + W_{n} δ_{X_{n}} a . s .,

where

θ \in (0, \infty)

,

ν \in M_{P} (X)

,

{(W_{n})}_{n \geq 1}

is an exchangeable process with directing random measure

\tilde{η}

, and

{(X_{n})}_{n \geq 1}

is a sequence of

X

-valued random variables such that

X_{1} ∣ {(W_{k})}_{k \geq 1} \sim ν

and, for

n \geq 1

,

P (X_{n + 1} \in \cdot ∣ X_{1}, \dots, X_{n}, {(W_{k})}_{k \geq 1}) = \sum_{i = 1}^{n} \frac{W_{i}}{θ + \sum_{j = 1}^{n} W_{j}} δ_{X_{i}} (\cdot) + \frac{θ}{θ + \sum_{j = 1}^{n} W_{j}} ν (\cdot) .

(38)

Proof.

Let

{(μ_{n})}_{n \geq 0}

be a GRRPP with parameters

(μ_{0}, \tilde{η})

, and consider the representation (35)–(37). Put

θ = μ_{0} (X)

and

ν = μ_{0}^{'}

. It follows from (37) that

W_{n} ∣ \tilde{η} \overset{i . i . d .}{\sim} \tilde{η};

thus,

{(W_{n})}_{n \geq 1}

is exchangeable. Moreover,

W_{n} = h (V, U_{n})

,

n \geq 1

, so (38) follows from (36).

Conversely, suppose

μ_{n} = μ_{n - 1} + W_{n} δ_{X_{n}}

, where the process

{((X_{n}, W_{n}))}_{n \geq 1}

is as described. It follows from (38) and Theorem 8.12 in [25] that

{(W_{k})}_{k \geq 1} ⊥ X_{1} and {(W_{n + k})}_{k \geq 1} ⊥ (X_{1}, \dots, X_{n + 1}) ∣ (W_{1}, \dots, W_{n}), n \geq 1 .

(39)

Since

{(W_{n})}_{n \geq 1}

is exchangeable with directing random measure

\tilde{η}

, we have

W_{n} ∣ \tilde{η} \overset{i . i . d .}{\sim} \tilde{η} .

(40)

Furthermore,

\tilde{η}

is measurable with respect to the tail

σ

-field of

{(W_{n})}_{n \geq 1}

, so, by (39),

\tilde{η} ⊥ X_{1} and \tilde{η} ⊥ (X_{1}, \dots, X_{n + 1}) ∣ (W_{1}, \dots, W_{n}), n \geq 1 .

(41)

Using (39)–(41), we can show that

W_{1} ⊥ X_{1} ∣ \tilde{η} and W_{n + 1} ⊥ (X_{1}, W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}) ∣ \tilde{η}, n \geq 1 .

Then,

P (μ_{1} \in \cdot ∣ \tilde{η}) = P (μ_{0} + W_{1} δ_{X_{1}} \in \cdot ∣ \tilde{η}) = \int_{X} ψ_{μ_{0}}^{♯} (ξ_{x}^{♯} (\tilde{η})) (\cdot) μ_{0}^{'} (d x)

and, for

n \geq 2

,

\begin{matrix} P (μ_{n} \in \cdot ∣ \tilde{η}, μ_{1}, \dots, & μ_{n - 1}) \\ = E [P (μ_{n - 1} + W_{n} δ_{X_{n}} \in \cdot ∣ \tilde{η}, X_{1}, \dots, W_{n - 1}, X_{n})| \tilde{η}, μ_{1}, \dots, μ_{n - 1}] \\ = E [E [ψ_{μ_{n - 1}}^{♯} (ξ_{X_{n}}^{♯} (\tilde{η})) (\cdot) ∣ X_{1}, \dots, X_{n - 1}, {(W_{m})}_{m \geq 1}]| \tilde{η}, μ_{1}, \dots, μ_{n - 1}] \\ = \int_{X} ψ_{μ_{n - 1}}^{♯} (ξ_{x}^{♯} (\tilde{η})) (\cdot) μ_{n - 1}^{'} (d x) . \end{matrix}

□

It follows from the proof of Theorem 6 that

(X_{1}, W_{1}) \sim μ_{0}^{'} \times E [\tilde{η}]

and, for

n \geq 1

,

P ((X_{n + 1}, W_{n + 1}) \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}) = (μ_{n}^{'} \times E [\tilde{η} | W_{1}, \dots, W_{n}]) (\cdot) .

(42)

As

μ_{n}^{'}

and

E [\tilde{η} | W_{1}, \dots, W_{n}]

are both symmetric with respect to

((X_{1}, W_{1}), \dots, (X_{n}, W_{n}))

, then (42) is a symmetric function of

((X_{1}, W_{1}), \dots, (X_{n}, W_{n}))

. This is a necessary but not sufficient condition for

{((X_{n}, W_{n}))}_{n \geq 1}

to be exchangeable, see Proposition 3.2 and Example 3.1 in [2]. In Proposition 5, we show that

{((X_{n}, W_{n}))}_{n \geq 1}

is exchangeable if and only if either

μ_{0}^{'}

is degenerate or the weights are a.s. identical. On the other hand, for every

n, k \geq 1

, the sequence

{((X_{n}, W_{n}))}_{n \geq 1}

satisfies

P (W_{k} \in \cdot ∣ X_{1}) = P (W_{1} \in \cdot ∣ X_{1}), P (X_{k} \in \cdot ∣ W_{1}) = P (X_{1} \in \cdot ∣ W_{1}),

(43)

and

\begin{matrix} P (W_{n + k} \in \cdot | X_{1}, W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}) = P (W_{n + 1} \in \cdot | X_{1}, W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}), \\ P (X_{n + k} \in \cdot | X_{1}, W_{1}, \dots, X_{n}, W_{n}, W_{n + 1}) = P (X_{n + 1} \in \cdot | X_{1}, W_{1}, \dots, X_{n}, W_{n}, W_{n + 1}) . \end{matrix}

(44)

By [24], Equations (43) and (44) are defining a process that is partially conditionally identity distributed (partially c.i.d.). Analogously to the c.i.d. case, partially c.i.d. processes preserve many of the properties of partially exchangeable sequences, see [24].

Proposition 5.

Under the conditions of Theorem 6,

{((X_{n}, W_{n}))}_{n \geq 1}

is partially c.i.d. Moreover,

{((X_{n}, W_{n}))}_{n \geq 1}

is exchangeable if and only if either

μ_{0}^{'}

is degenerate or

W_{n} = W_{1}

a.s.,

n \geq 1

. In that case,

{((X_{n}, W_{n}))}_{n \geq 1}

is partially exchangeable.

Proof.

It follows that

{((X_{n}, W_{n}))}_{n \geq 1}

is partially c.i.d. if and only if

X_{2} \overset{d}{=} X_{1} ∣ W_{1}

,

W_{2} \overset{d}{=} W_{1} ∣ X_{1}

, and (44) is true for every

n \geq 1

with

k = 2

. By hypothesis,

{(W_{n})}_{n \geq 1}

is exchangeable and

{(W_{n})}_{n \geq 1} ⊥ X_{1}

, so

W_{2} \overset{d}{=} W_{1} ∣ X_{1}

. Moreover, applying (39) repeatedly, we obtain

\begin{matrix} P (W_{n + 2} \in \cdot ∣ X_{1} & , W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}) \\ = E [P (W_{n + 2} \in \cdot ∣ X_{1}, \dots, W_{n + 1}, X_{n + 2}) | X_{1}, W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}] \\ = E [P (W_{n + 2} \in \cdot ∣ W_{1}, \dots, W_{n + 1}) | W_{1}, \dots, W_{n}] \\ = P (W_{n + 1} \in \cdot ∣ W_{1}, \dots, W_{n}) = P (W_{n + 1} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}, X_{n + 1}) . \end{matrix}

On the other hand, by (38),

\begin{matrix} P (X_{n + 2} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}, W_{n + 1}) & = E [μ_{n + 1}^{'} (\cdot) ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}, W_{n + 1}] \\ = \frac{μ_{n} (\cdot) + W_{n + 1} \cdot μ_{n}^{'} (\cdot)}{μ_{n + 1} (X)} = μ_{n}^{'} (\cdot) \\ = P (X_{n + 1} \in \cdot ∣ X_{1}, W_{1}, \dots, X_{n}, W_{n}, W_{n + 1}) . \end{matrix}

Analogously,

P (X_{2} \in \cdot ∣ W_{1}) = μ_{1} (\cdot) = P (X_{1} \in \cdot ∣ W_{1})

, which completes the proof of the first part.

If

μ_{0}^{'}

is degenerate, then

{((X_{n}, W_{n}))}_{n \geq 1}

is trivially exchangeable. If

W_{n} = W_{1}

a.s. instead, then one can show that

{((X_{n}, W_{n}))}_{n \geq 1}

satisfies condition

(b)

of Proposition 3.2 in [2], which, together with the symmetry of (42), implies by Theorem 3.1 in [2] that

{((X_{n}, W_{n}))}_{n \geq 1}

is exchangeable.

Conversely, suppose that

{((X_{n}, W_{n}))}_{n \geq 1}

is exchangeable. As

{((X_{n}, W_{n}))}_{n \geq 1}

is partially c.i.d., the predictive distributions (42) converge to a product random measure [24]. It follows from de Finetti’s theorem that

{((X_{n}, W_{n}))}_{n \geq 1}

is partially exchangeable, so, in particular,

(X_{1}, W_{1}, X_{2}, W_{2}) \overset{d}{=} (X_{1}, W_{2}, X_{2}, W_{1}) .

However,

W_{2} ⊥ X_{2} ∣ (X_{1}, W_{1})

from (36), so

W_{1} ⊥ X_{2} ∣ (X_{1}, W_{2})

. Thus, for every bounded measurable function

\tilde{f}

, there exists a measurable function

g_{\tilde{f}}

such that

E [\tilde{f} (X_{2}) | X_{1}, W_{1}, W_{2}] = g_{\tilde{f}} (X_{1}, W_{2}) a . s .

Integrating

\tilde{f} (X_{2})

with respect to (38) and rearranging the terms, we obtain

W_{1} (\tilde{f} (X_{1}) - g_{\tilde{f}} (X_{1}, W_{2})) = θ (g_{\tilde{f}} (X_{1}, W_{2}) - E [\tilde{f} (X_{1})]) a . s .

Assume that

μ_{0}^{'}

is non-degenerate. Then, there is an

\tilde{f}

such that

P (\tilde{f} (X_{1}) = E [\tilde{f} (X_{1})]) = 0

; e.g., take

\tilde{f} = 1 1_{B}

for some

B \in X

such that

0 < P (X_{1} \in B) < 1

. It follows that

\begin{matrix} P (\tilde{f} (X_{1}) = g_{\tilde{f}} (X_{1}, W_{2}) = 0) & = P (\tilde{f} (X_{1}) = E [\tilde{f} (X_{2}) | X_{1}, W_{1}, W_{2}]) \\ = P (\tilde{f} (X_{1}) = E [\tilde{f} (X_{1})]) = 0; \end{matrix}

therefore,

W_{1} = \frac{θ (g_{\tilde{f}} (X_{1}, W_{2}) - E [\tilde{f} (X_{1})])}{\tilde{f} (X_{1}) - g_{\tilde{f}} (X_{1}, W_{2})} a . s .

In other words, there exists a measurable function

\tilde{h}

such that

W_{1} = \tilde{h} (X_{1}, W_{2})

a.s., and so

W_{2} = \tilde{h} (X_{1}, W_{1})

a.s. by partial exchangeability. It follows from

X_{1} ⊥ (W_{1}, W_{2})

that, for every

A \in B (R_{+})

,

P (W_{2} \in A | W_{1}) = P (W_{2} \in A | X_{1}, W_{1}) = 1 1_{A} (W_{2}) a . s .;

thus,

W_{2} = W_{1}

a.s. and, from exchangeability,

W_{n} = W_{1}

a.s.,

n \geq 1

. □

4.2. Asymptotic Properties of GRRPP with Exchangeable Weights

It follows from (38) that the GRRPP with exchangeable weights is a mixture of RRPPs with independent weights, with the mixing distribution affecting only the sequence

{(W_{n})}_{n \geq 1}

. Thus, we expect that the results in Section 3.3 carry over to this more general setting. In this section, we concentrate on the behavior of

θ_{n}

and the sequence

{(L_{n})}_{n \geq 1}

.

Assume that

P (W_{1} > 0 | \tilde{η}) > 0

. If

E [W_{1}] < \infty

, then

0 < E [W_{1} | \tilde{η}] < \infty

a.s., and, by the law of large numbers for exchangeable random variables (see [1], Section 2),

\frac{1}{n} \sum_{i = 1}^{n} W_{i} \overset{a . s .}{⟶} E [W_{1} | \tilde{η}] \in (0, + \infty) .

Then, if

μ_{0}

is diffuse,

n \cdot θ_{n} \overset{a . s .}{⟶} θ / E [W_{1} | \tilde{η}]

and

\sum_{i = 1}^{n} θ_{n} = \infty

a.s., so Theorem 1 in [27] implies

\frac{L_{n}}{log n} = \frac{L_{n}}{\sum_{k = 1}^{n} θ_{k}} (\frac{1}{log n} \sum_{k = 1}^{n} \frac{1}{k} (k \cdot θ_{k})) \overset{a . s .}{⟶} \frac{θ}{E [W_{1} | \tilde{η}]} .

If

E [W_{1}] = \infty

, then

L_{n}

may converge to a finite limit, as

n \to \infty

. For example, let us consider a strictly stable reinforcement distribution as in Proposition 4.

Proposition 6.

Let

{(μ_{n})}_{n \geq 0}

be a GRRPP with parameters

(V, μ_{0}, η)

such that V is a strictly positive random variable with

E [V^{- 1}] < \infty

,

μ_{0}

is diffuse, and

η (v)

,

v > 0

is a

S_{α} (1, v, 0)

distribution with stability parameter

α < 1

. Then,

θ_{n} = O_{P} (n^{- 1 / α})

and

lim_{n \to \infty} L_{n} < \infty a . s .

Proof.

It follows from how the weights in the representation (35) are chosen that we can take

W_{n} = V F^{- 1} (U_{n}),

where

U_{n} \sim Unif [0, 1]

,

U_{n} ⊥ (V, X_{1}, U_{1}, \dots, X_{n - 1}, U_{n - 1}, X_{n})

, and

F^{- 1}

is the inverse of the

S_{α} (1, 1, 0)

distribution function. Then,

θ_{n} = \frac{θ}{θ + \sum_{i = 1}^{n} W_{i}} \leq n^{- 1 / α} \frac{θ}{V n^{- 1 / α} \sum_{i = 1}^{n} F^{- 1} (U_{i})} \overset{d}{=} n^{- 1 / α} \frac{θ}{V Y},

for some

Y \sim S_{α} (1, 1, 0)

such that

Y ⊥ V

. It follows for every

M > 0

that

P (n^{1 / α} θ_{n} > M) \leq P (θ / V Y > M)

, which can be made arbitrarily small by taking M large enough. Regarding the second assertion, as

1 / α > 1

and

E [θ / (V Y)] < \infty

by Theorem 5.4.1 in [28], we have

E [lim_{n \to \infty} L_{n}] = lim_{n \to \infty} \sum_{i = 1}^{n} E [1 1_{{L_{i} = L_{i - 1} + 1}}] = \sum_{n = 1}^{\infty} E [θ_{n}] \leq \sum_{n = 1}^{\infty} \frac{θ}{n^{1 / α}} E [1 / V Y] < \infty .

□

Extensions of Proposition 6 can be obtained by exploiting the central limit theorems for exchangeable random variables, which are found in [30,31].

5. Discussion

In this paper, we study the extension of randomly reinforced urns [17] to an unbounded set of possible colors. The resulting measure-valued urn process provides a predictive characterization of the law of an asymptotically exchangeable sequence of random variables, which corresponds to the observation process of an implied urn sampling scheme. In fact, the model (6)–(7) fits into a line of recent research, which explores efficient predictive constructions for fast online prediction or approximately-Bayesian solutions, see [11,29,32] and references therein. To that end, one direction for future work is to generalize the functional relationship in (7) and/or, as one referee suggested, to consider finitely-additive measures, along the lines discussed in [33].

We investigate the asymptotic properties of the sequences of predictive distributions and empirical frequencies of the observation process, and prove their convergence in total variation distance to a common random limit. The rate of convergence of their difference is given set-wise; so, another possible direction for future research is to consider a stronger distance. As far as we know, the topic of merging of the predictive and empirical distributions is largely unexplored. Within the relevant literature, we mention the works of [4,34], where the authors study the rate of convergence of the Wasserstein or Prokhorov distances under exchangeability, and the papers by Berti et al. [21], Berti et al. [35], who consider the c.i.d. case and regard the difference between the predictive and empirical measures as a map in the space of real bounded functions.

Author Contributions

Formal analysis, S.F., S.P., H.S.; writing—original draft preparation, S.F., S.P., H.S.; writing—review and editing, S.F., S.P., H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 817257). H.S. was partially supported by the Bulgarian Ministry of Education and Science under the National Research Programme “Young scientists and postdoctoral students” approved by DCM No. 577/17.08.2018.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We wish to express our sincere gratitude to Regazzini for his deeply inspiring ideas and for instilling in us his passion for research. We thank the four anonymous referees for the valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aldous, D.J. Exchangeability and related topics. École D’Été De Probab. De St.-Flour XIII 1983 1985, 1117, 1–198. [Google Scholar]
Fortini, S.; Ladelli, L.; Regazzini, E. Exchangeability, predictive distributions and parametric models. Sankhya Ser. A 2000, 62, 86–109. [Google Scholar]
Cifarelli, D.M.; Regazzini, E. De Finetti’s contribution to probability and statistics. Statist. Sci. 1996, 11, 253–282. [Google Scholar] [CrossRef]
Cifarelli, D.M.; Dolera, E.; Regazzini, E. Frequentistic approximations to Bayesian prevision of exchangeable random elements. Int. J. Approx. Reason. 2016, 78, 138–152. [Google Scholar] [CrossRef] [Green Version]
Fortini, S.; Petrone, S. Predictive distribution (de Finetti’s view). In Wiley StatsRef: Statistics Reference Online; Wiley Online Library, 2014; pp. 1–9. Available online: https://onlinelibrary.wiley.com/doi/full/10.1002/9781118445112.stat07831 (accessed on 4 October 2021).
Regazzini, E. Old and recent results on the relationship between predictive inference and statistical modeling either in nonparametric or parametric form. In Bayesian Statistics 6; Oxford University Press: Oxford, UK, 1999; pp. 571–588. [Google Scholar]
Fortini, S.; Petrone, S. Predictive construction of priors in Bayesian nonparametrics. Braz. J. Probab. Stat. 2012, 26, 423–449. [Google Scholar] [CrossRef]
Mailler, C.; Marckert, J.F. Measure-valued Pólya urn processes. Electron. Commun. Probab. 2017, 22, 33. [Google Scholar] [CrossRef]
Janson, S. Random replacements in Pólya urns with infinitely many colours. Electron. Commun. Probab. 2019, 24, 11. [Google Scholar] [CrossRef]
Aletti, G.; Ghiglietti, A.; Rosenberger, W.F. Nonparametric covariate-adjusted reponse-adaptive design based on a functional urn model. Ann. Stat. 2018, 46, 3838–3866. [Google Scholar] [CrossRef] [Green Version]
Fortini, S.; Petrone, S. Quasi-Bayes properties of a procedure for sequential learning in mixture models. J. R. Stat. Soc. Ser. B 2020, 82, 1087–1114. [Google Scholar] [CrossRef]
Zhang, L.X.; Hu, F.; Cheung, S.H.; Chan, W.S. Immigrated urn models—Theoretical properties and applications. Ann. Stat. 2011, 39, 643–671. [Google Scholar] [CrossRef]
Blackwell, D.; MacQueen, J.B. Ferguson distributions via Pólya urn schemes. Ann. Stat. 1973, 1, 353–355. [Google Scholar] [CrossRef]
Bandyopadhyay, A.; Thacker, D. Pólya urn schemes with infinitely many colors. Bernoulli 2017, 23, 3243–3267. [Google Scholar] [CrossRef] [Green Version]
Janson, S. A.s. convergence for infinite colour Pólya urns associated with random walks. Ark. Mat. 2021, 59, 87–123. [Google Scholar] [CrossRef]
Mailler, C.; Villemonais, D. Stochastic approximation on non-compact measure spaces and application to measure-valued Pólya processes. Ann. Appl. Probab. 2020, 30, 2393–2438. [Google Scholar] [CrossRef]
Muliere, P.; Paganoni, A.M.; Secchi, P. A randomly reinforced urn. J. Stat. Plan. Inference 2006, 136, 1853–1874. [Google Scholar] [CrossRef]
Bassetti, F.; Crimaldi, I.; Leisen, F. Conditionally identically distributed species sampling sequences. Adv. Appl. Probab. 2010, 42, 433–459. [Google Scholar] [CrossRef] [Green Version]
Berti, P.; Crimaldi, I.; Pratelli, L.; Rigo, P. Rate of convergence of predictive distributions for dependent data. Bernoulli 2009, 15, 1351–1367. [Google Scholar] [CrossRef]
Berti, P.; Crimaldi, I.; Pratelli, L.; Rigo, P. Central limit theorems for multicolor urns with dominated colors. Stoch. Process. Appl. 2010, 120, 1473–1491. [Google Scholar] [CrossRef] [Green Version]
Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for a class of identically distributed random variables. Ann. Probab. 2004, 32, 2029–2052. [Google Scholar] [CrossRef] [Green Version]
Crimaldi, I. An almost sure conditional convergence result and an application to a generalized Pólya urn. Int. Math. Forum 2009, 4, 1139–1156. [Google Scholar]
Sariev, H.; Fortini, S.; Petrone, S. Infinite-Color Randomly Reinforced Urns with Dominant Colors. 2021. Preprint. Available online: https://arxiv.org/abs/2106.04307 (accessed on 4 October 2021).
Fortini, S.; Petrone, S.; Sporysheva, P. On a notion of partially conditionally identically distributed sequences. Stoch. Process. Appl. 2018, 128, 819–846. [Google Scholar] [CrossRef] [Green Version]
Kallenberg, O. Foundations of Modern Probability, 3rd ed.; Springer: New York, NY, USA, 2021. [Google Scholar]
Häusler, E.; Luschgy, H. Stable Convergence and Stable Limit Theorems; Springer: Cham, Switzerland, 2015. [Google Scholar]
Dubins, L.; Freedman, D. A sharper form of the Borel-Cantelli lemma and the strong law. Ann. Math. Stat. 1965, 36, 800–807. [Google Scholar] [CrossRef]
Uchaikin, V.V.; Zolotarev, V.M. Chance and Stability: Stable Distributions and Their Applications; Walter de Gruyter: Berlin, Germany, 2011. [Google Scholar]
Fong, E.; Holmes, C.; Walker, S. Martingale Posterior Distributions. 2021. Preprint. Available online: https://arxiv.org/abs/2103.15671 (accessed on 4 October 2021).
Fortini, S.; Ladelli, L.; Regazzini, E. A central limit problem for partially exchangeable random variables. Theory Probab. Appl. 1997, 41, 224–246. [Google Scholar] [CrossRef] [Green Version]
Fortini, S.; Ladelli, L.; Regazzini, E. Central limit theorem with exchangeable summands and mixtures of stable laws as limits. Boll. Unione Mat. Ital. 2012, 5, 515–542. [Google Scholar]
Berti, P.; Dreassi, E.; Pratelli, L.; Rigo, P. A class of models for Bayesian predictive inference. Bernoulli 2021, 27, 702–726. [Google Scholar] [CrossRef]
de Cooman, G.; Bock, J.D.; Diniz, M.A. Coherent predictive inference under exchangeability with imprecise probabilities. J. Artif. Intell. Res. 2015, 52, 1–95. [Google Scholar] [CrossRef] [Green Version]
Dolera, E.; Regazzini, E. Uniform rates of the Glivenko-Cantelli convergence and their use in approximating Bayesian inferences. Bernoulli 2019, 25, 2982–3015. [Google Scholar] [CrossRef] [Green Version]
Berti, P.; Pratelli, L.; Rigo, P. Limit theorems for empirical processes based on dependent data. Electron. J. Probab. 2012, 17, 1–18. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fortini, S.; Petrone, S.; Sariev, H. Predictive Constructions Based on Measure-Valued Pólya Urn Processes. Mathematics 2021, 9, 2845. https://doi.org/10.3390/math9222845

AMA Style

Fortini S, Petrone S, Sariev H. Predictive Constructions Based on Measure-Valued Pólya Urn Processes. Mathematics. 2021; 9(22):2845. https://doi.org/10.3390/math9222845

Chicago/Turabian Style

Fortini, Sandra, Sonia Petrone, and Hristo Sariev. 2021. "Predictive Constructions Based on Measure-Valued Pólya Urn Processes" Mathematics 9, no. 22: 2845. https://doi.org/10.3390/math9222845

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predictive Constructions Based on Measure-Valued Pólya Urn Processes

Abstract

1. Introduction

2. Definitions and a Representation Result

2.1. Measure-Valued Pólya urn Processes

2.2. Randomly Reinforced Pólya Processes

3. Asymptotic Properties of RRPP

3.1. Preliminaries

3.2. RRPP with Dominant Colors

3.3. RRPP with Independent Weights

4. Generalized Measure-Valued Pólya Urn Processes

4.1. GRRPP with Exchangeable Weights

4.2. Asymptotic Properties of GRRPP with Exchangeable Weights

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI