*Article* **Trapping the Ultimate Success**

**Alexander Gnedin \* and Zakaria Derbazi**

School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK; z.derbazi@qmul.ac.uk **\*** Correspondence: a.gnedin@qmul.ac.uk

**Abstract:** We introduce a betting game where the gambler aims to guess the last success epoch in a series of inhomogeneous Bernoulli trials paced randomly in time. At a given stage, the gambler may bet on either the event that no further successes occur, or the event that exactly one success is yet to occur, or may choose any proper range of future times (a trap). When a trap is chosen, the gambler wins if the last success epoch is the only one that falls in the trap. The game is closely related to the sequential decision problem of maximising the probability of stopping on the last success. We use this connection to analyse the best-choice problem with random arrivals generated by a Pólya-Lundberg process.

**Keywords:** best choice problem; optimal stopping time; last record; trapping strategy

**MSC:** 60G40

## **1. Introduction**

Suppose a series of inhomogeneous Bernoulli trials, with a given profile of success probabilities *p* = (*pk*, *k* ≥ 1), is paced randomly in time by some independent point process. As the outcomes and epochs of the first *k* ≥ 0 trials become known at some time *t*, the gambler is asked to bet on the time of the last success. The gambler is allowed to choose either a bygone action, a next action, or a proper subset of future times called *trap*. The gambler wins with bygone if no further successes occur, and with next if exactly one success occurs after time *t*. In the case a trapping action is chosen, the gambler wins if the last success epoch is isolated by the trap from the other success epochs.

Motivation to study this game stems from connections to the best-choice problems with random arrivals [1–9] and the random records model [10,11]. A prototype problem of this kind involves a sequence of rankable items arriving by a Poisson process with a finite horizon, where the *k*th arrival is relatively the best (a record) with probability *pk* = 1/*k*. The optimisation task is to maximise the probability of selecting the overall best item (the last record) using a non-anticipating stopping strategy. Cowan and Zabczyk [5] showed that the optimal strategy is *myopic*, which means that the decision to stop on a particular record arrival only depends on whether the winning chance with bygone exceeds that with next. They also determined the critical cut-offs of the optimal strategy and studied some asymptotics. Similar results have been obtained for the best-choice problem with some other pacing processes [1,4,7,9]. In this context, trapping can be employed to test optimality of the myopic strategy, which fails if in some situations the action bygone outperforms next but a trapping action is better still. Simple trapping strategies are easy to evaluate and provide insight into the occurrence of records.

Regarding the pacing point process, we shall assume that it is mixed binomial [12]. This setting covers, in particular, the wide class of mixed Poisson processes. In essence, this pacing process is characterised by the *prior* distribution *π* of the total number of trials, and some background continuous distribution to spread the epochs of the trials in an i.i.d. manner. Without loss of generality, the distribution will be assumed uniform; hence, given the number of trials, they are scattered in time like the uniform order statistics on [0, 1]. We

**Citation:** Gnedin, A.; Derbazi, Z. Trapping the Ultimate Success. *Mathematics* **2022**, *10*, 158. https:// doi.org/10.3390/math10010158

Academic Editors: Emanuele Dolera and Federico Bassetti

Received: 30 October 2021 Accepted: 30 December 2021 Published: 5 January 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

enrich the model with a natural size parameter by letting *π* vary within a family of power series distributions.

The most obvious instance of a trapping action amounts to leaving some fraction of time to isolate the last success. We call this trapping action the *z-strategy*, with a parameter designating the proportion of time getting skipped (as compared to the real-time cut-off in the name of the familiar '1/*e*-strategy' of the best choice [13,14]). The overall optimality of the class of *z*-strategies among all trapping actions will be explored for a fixed and a random number of trials. For the problem of stopping on the last success, the optimality of the myopic strategy will be shown to hold if the sequence of its cut-offs is decreasing and interlacing with another set of critical points of *z*-strategies.

Then we specialise to the best-choice problem driven by a Pólya-Lundberg pacing process, when the number of trials follows a logarithmic series distribution. In different terms, the model was introduced by Bruss and Yor [15]. Bruss and Rogers [4] recently observed that the strategy stopping at the first record after time threshold 1/*e* is not optimal. We present a more detailed analysis; in particular, we use a curious property of certain hypergeometric functions to show that the cut-offs of the myopic strategy are increasing, hence the monotone case of optimal stopping [16] does not hold. Simulation suggests, however, that the myopic strategy is very close to optimality, both in terms of the cutoffs and the winning probability. A better approximation to optimality is achieved by the strategy that stops as soon as bygone becomes more beneficial than trapping with a z-strategy.

Viewed inside a bigger picture, the log-series prior appears as the edge *ν* = 0 instance of the random records model with negative binomial distribution NB(*ν*, *q*) of the number of trials. It is known that for *ν* = 1, corresponding to the geometric prior, all cut-offs coincide [17,18], while for integer *ν* > 1 they are decreasing [7]. In [19], we show that for 0 < *ν* < 1 the myopic strategy is not optimal, with the pattern of cut-offs as in the log-series case treated here.

## **2. Setting the Scene**

#### *2.1. The Probability Model*

Let *π* be a power series distribution

$$
\pi\_n = c(q) w\_n q^n \; , n \ge 0 \; , \tag{1}
$$

with weights *w*<sup>0</sup> ≥ 0, *wn* > 0 for *n* ≥ 1

and scale parameter *q* > 0 varying within the interval of convergence of ∑*<sup>n</sup> wnqn*.

The associated mixed binomial process (*Nt*, *t* ∈ [0, 1]) is an orderly counting process with the uniform order statistics property. The process can also be seen as a time inhomogeneous pure-birth process, with a transition rate expressible through the generating function of (*wn*), see [20].

Conditionally on *Nt* = *k*:


$$\pi(j\mid t,k) := \mathbb{P}(N\_1 - N\_t = j \mid N\_t = k) = f\_k(\mathbf{x}) \binom{k+j}{j} w\_{k+j} \mathbf{x}^j, \ j \ge 0,\tag{2}$$

with scale variable

$$\mathbf{x} := (1 - t)q \tag{3}$$

and a normalisation function *fk*(*x*).

(iii) *Nt*<sup>+</sup>*s*/(1−*t*) − *Nt*, *<sup>s</sup>* ∈ [0, 1] is a mixed binomial process on [0, 1], with the number of trials distributed according to (2).

The conditioning relation (2) appears in many statistical problems related to censored or partially observable data.

In principle, instead of considering a family of distributions for (*Nt*) with parameter *q*, we could deal with one counting process on the *x*-scale. We prefer not to adhere to this viewpoint, as the 'real time' variable is more intuitive. Nevertheless, we will use (3) to switch back and forth between *t* and *x*, as *x* is better suitable for power series work.

Let = (*pk*, *k* ≥ 1) be a profile of success probabilities. We assume that

$$0 \le p\_1 \le 1, \quad 0 \le p\_k < 1 \quad \text{for} \ k > 1 \text{ and } \sum\_{k=1}^{\infty} p\_k = \infty.$$

The *k*th trial, which is occurring at index/epoch *k*, is a success with probability *pk*, independently of other trials and the pacing process. Thus, the point process of success epochs is obtained from (*Nt*) by thinning out the *<sup>k</sup>*th point with probability 1 <sup>−</sup> *pk*. Taken by itself, the process counting the success epochs is typically intractable [10]. A notable exception is the random records model (*pk* = 1/*k*) with the geometric prior *π*, when the process is Poisson [1].

We shall identify *state* (*t*, *k*) with the event *Nt* = *k*. The notation (*t*, *k*)◦ will be used to denote the event that the *k*th trial epoch is *t* and the outcome is a success. If there is at least one success, the sequence of successes (*ti*, *ki*)◦ increases in both components.

#### *2.2. The Trapping Game and Stopping Problem*

A single episode of the trapping game refers to the generic state (*t*, *k*). The gambler plays either next or bygone, or chooses a proper subset of the interval (*t*, 1]. The trap [*t* + *z*(1 − *t*), 1], for 0 < *z* < 1, will be called *z-strategy* ; this action leaves a (1 − *z*) portion of the remaining time to isolate the last success epoch from other successes.

Let F*<sup>t</sup>* be the sigma-algebra generated by the epochs and outcomes of trials on [0, *t*]. Under stopping strategy *τ*, we mean a random variable taking values in [0, 1] and adapted to the filtration (F*t*, *t* ∈ [0, 1]). The performance of *τ* is assessed by the probability of the event that (*τ*, *Nτ*)◦ is the last success state.

We call a stopping strategy Markovian if in the event *τ* ≥ *t* a decision to stop or to continue in state (*t*, *k*)◦ does not depend on the trials before time *t*. The general theory [21] implies existence of the optimal stopping strategy and that it can be found within the class of Markovian strategies.

Conditional on F*t*, the probability that (*t*, *k*)◦ is the last success equals the winning probability with bygone, while the probability that (*t*, *<sup>k</sup>*)◦ is the penultimate success equals the winning probability with next. If for every (*t*, *<sup>k</sup>*), where bygone is at least as good as next, also every state (*<sup>t</sup>* , *k* ) ∈ [*t*, 1] × {*k*, *k* + 1, ···} has this property, then the optimal stopping problem is *monotone* [21].

Define the *myopic* stopping strategy *τ*∗ to be the first record (*t*, *k*)◦, if any, such that bygone is at least as beneficial as next. In the monotone case the myopic strategy is optimal among all stopping strategies.

Suppose for each *<sup>k</sup>* <sup>≥</sup> 1 there exists a cut-off time *ak* such that the action bygone is at least as good as next precisely for *<sup>t</sup>* <sup>∈</sup> [*ak*, 1]. Then *<sup>τ</sup>*<sup>∗</sup> coincides with the time of the first success (*t*, *k*)◦ satisfying *t* ≥ *ak* (or *τ*<sup>∗</sup> = 1 if there is no such trial). The problem is monotone, hence *τ*<sup>∗</sup> is optimal if the cut-offs are non-increasing, that is *a*<sup>1</sup> ≥ *a*<sup>2</sup> ≥··· .

#### **3. The Game with Fixed Number of Trials**

In this section, we assess the outcomes of actions in state (*t*, *k*) conditioned on the total number of trials *n* > *k*. This can be interpreted as the game of an informed gambler who knows *n* but not the outcomes of unseen trials *k* + 1, ··· , *n*. The time *t* is not important and a comparison of bygone with next is tantamount to the discrete-time optimal stopping at the last success [22,23]. The best action will be shown to coincide with a *z*-strategy provided next beats bygone.
