*3.1.* bygone *vs.* next

The number of successes in trials *k* + 1, ··· , *n* has probability generating function

$$\lambda \mapsto \prod\_{m=k+1}^n \left( 1 - p\_{\mathfrak{M}} + p\_{\mathfrak{M}} \lambda \right) = \left( 1 + \lambda \sum\_{i=k+1}^n \frac{p\_i}{1 - p\_i} \right) \prod\_{m=k+1}^n \left( 1 - p\_{\mathfrak{M}} \right) + O(\lambda^2).$$

From this expansion, the probability of no success is

$$s\_0(k+1,n) := \prod\_{m=k+1}^n (1-p\_m)\_n$$

and the probability of exactly one success is

$$s\_1(k+1,n) := \sum\_{i=k+1}^n \frac{p\_i}{1-p\_i} \prod\_{m=k+1}^n (1-p\_m) = s\_0(k+1,n) \sum\_{i=k+1}^n \frac{p\_i}{1-p\_i}.$$

There is an obvious recursion

$$s\_1(k,n) = (1-p\_k)s\_1(k+1,n) + p\_k s\_0(k+1,n),$$

which we can write as

$$\begin{aligned} s\_1(k,n) - s\_1(k+1,n) &= -p\_k \{ s\_0(k+1,n) - s\_1(k+1,n) \} \\ &= -p\_k s\_0(k+1,n) \left( 1 - \sum\_{i=k+1}^n \frac{p\_i}{1 - p\_i} \right) . \end{aligned} \tag{4}$$

Note that the sequence,

$$1 - \sum\_{i=k+1}^{n} \frac{p\_i}{1 - p\_i}, \quad 0 \le k \le n - 1,\tag{5}$$

has the sign pattern

$$-\prime \cdots \prime \prime \cdots \prime \prime \ge 0 \prime + \prime \cdots \prime + \prime$$

and let *k*∗ be the index value where the sign changes from negative. It follows that:


Each *A* ⊂ {1, ··· , *n*} corresponds to a stopping strategy in the discrete time problem [22,23]. We say that *A* wins if the index of the last success falls in *A* while no other index of success does.

**Lemma 1.** *Among all A* ⊂ {1, ··· , *n*}*, the set A*<sup>∗</sup> := {*k*<sup>∗</sup> + 1, ··· , *n*} *wins with the maximal probability.*

**Proof.** Clearly, *n* ∈ *A* is necessary for *A* to be optimal. By induction, suppose we have shown that {*k* + 1, ··· , *n*} ⊂ *A*. Including *k* adds to said probability

$$c \, p\_k \{ s\_0(k+1, n) - s\_1(k+1) \}\_{\prime \prime}$$

where *c* ≥ 0 depends on *A* ∩ {1, ··· , *k* − 1} only. However, this is non-negative precisely for *k* ≥ *k*∗.

The next lemma improves upon Theorem 3.1 of [24] by offering a weaker condition for monotonicity.

**Lemma 2.** *For k*<sup>∗</sup> = *k*∗(*n*)*, if pk*∗+<sup>1</sup> ≥ *pn*+<sup>1</sup> *then* max*<sup>k</sup> s*1(*k*, *n*) ≥ max*<sup>k</sup> s*1(*k*, *n* + 1)*.*

**Proof.** It is readily checked that the maximum value of *s*1(· , *n* + 1) is achieved at either *k*<sup>∗</sup> or *k*∗ + 1.

Firstly, compare the winning probability of *A*<sup>∗</sup> for *n* trials with that of *B* := {*k*<sup>∗</sup> + 1, ··· , *<sup>n</sup>* <sup>+</sup> <sup>1</sup>} for *<sup>n</sup>* <sup>+</sup> 1 trials. A difference results from the event that the (*<sup>n</sup>* <sup>+</sup> <sup>1</sup>)st trial is a success, and the number of successes among trials *k*<sup>∗</sup> + 1, ··· , *n* does not exceed 1. Hence the difference of winning probabilities is

$$p\left(s\_1(k^\*+1,n) - s\_0(k^\*+1,n)\right)p\_{n+1} = \left(1 - \sum\_{i=k^\*+1}^n \frac{p\_i}{1-p\_i}\right)s\_0(k^\*+1,n) \ge 0.$$

Secondly, compare *A*<sup>∗</sup> with the other possible maximiser, *C* := {*k*<sup>∗</sup> + 2, ··· , *n*, *n* + 1}. The difference of winning probabilities of *A*∗ in the setting with *n* trials and *C* with (*n* + 1) trials has four components:


After simplification, (a) + (b) − (c) − (d) becomes

$$\left(1 - \sum\_{i=k^\*+2}^n \frac{p\_i}{1 - p\_i}\right) (p\_{k^\*+1} - p\_{n+1})\_{\prime\prime}$$

which has the same sign as *pk*∗+<sup>1</sup> − *pn*+<sup>1</sup> because the first factor is non-negative by the optimality of *A*∗.

#### *3.2. z-Strategies*

For *n* fixed, the winning probability of a *z*-strategy in state (*t*, *k*) does not depend on *t* and is given by a Bernstein polynomial in *z* ∈ [0, 1],

$$S\_1(k,n;z) := \sum\_{j=0}^{n-k-1} \binom{n-k}{j} z^j (1-z)^{n-k-j} s\_1(k+j+1,n). \tag{6}$$

In particular, *<sup>S</sup>*1(*k*, *<sup>n</sup>*; 0) = *<sup>s</sup>*1(*<sup>k</sup>* <sup>+</sup> 1, *<sup>n</sup>*) is the probability to win with next. Similarly,

$$S\_0(k,n;z) := \sum\_{j=0}^{n-k} \binom{n-k}{j} z^j (1-z)^{n-k-j} s\_0(k+j+1,n)$$

is the probability that none of the successes occurs in the time interval (*t* + *z*(1 − *t*), 1], so *<sup>S</sup>*0(*k*, *<sup>n</sup>*; 0) = *<sup>s</sup>*0(*<sup>k</sup>* <sup>+</sup> 1, *<sup>n</sup>*) equals the probability to win with bygone.

Note that *s*0(*k* + 1, *n*) = *S*0(*k*, *n*; 0) and *s*1(*k* + 1, *n*) = *S*1(*k*, *n*; 0). From (i) and (ii) above

$$k \ge k^\* \Longleftrightarrow \mathcal{S}\_0(k, n; 0) \ge \mathcal{S}\_1(k, n; 0) \implies \mathcal{S}\_1(k, n; 0) = \max\_z \mathcal{S}\_1(k, n; z). \tag{7}$$

This is also valid for the maximum taken over *all* trapping actions.

From the unimodality of *s*1(·, *n*) and the shape-preserving properties of the Bernstein polynomials (see [25], Theorem 3.3), it follows that (6) is unimodal. Thus, either the maximum is at 0 and next beats all *<sup>z</sup>*-strategies, or there exists a unique optimal *<sup>z</sup>*-strategy. Next result stating that the optimum can be understood in a strong sense is a continuoustime counterpart of Lemma 1.

**Theorem 1.** *If S*0(*k*, *n*; 0) < *S*1(*k*, *n*; 0) *then the optimal trapping action is a z-strategy with threshold determined as the unique maximiser of S*1(*k*, *n*; ·)*.*

**Proof.** By a change of variables we reduce the claim to the case (*t*, *k*)=(0, 0). There is certainly a final interval that belongs to the optimal trap, because close to the end of the time, the probability of two or more successes is of order *o*(1 − *t*). Now, suppose [*z*, 1] belongs to the trap and we are assessing if the length element [*z* − d*z*, *z*] is worth including. The change of the winning probability due to the inclusion is a multiple of

*n* ∑ *j*=1 *<sup>n</sup>* <sup>−</sup> <sup>1</sup> *j* − 1 *<sup>z</sup>j*−1(<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)*n*−*<sup>j</sup> pj*{*s*0(*j* + 1, *n*) − *s*1(*j* + 1, *n*)} *n h* + *o*(*h*) = (8) (<sup>1</sup> <sup>−</sup> *<sup>z</sup>*)*<sup>n</sup> <sup>n</sup>* ∑ *j*=1 *<sup>n</sup>* <sup>−</sup> <sup>1</sup> *j* − 1 *z* 1 − *z j pk*{*s*0(*j* + 1, *n*) − *s*1(*j* + 1, *n*)} *n h* + *o*(*h*),

with some positive factor depending on the structure of the trap within [0, *z* − *h*]. By (4), in the variable *z*/(1 − *z*) the polynomial ∑(···) has at most one variation of sign in the coefficients. Applying Descartes' rule of signs, we see that the polynomial has at most one positive root. This implies that the optimal trap is a final interval with the cut-off coinciding with the root, or [0, 1] (action next) if there are no roots.

It remains to check that the root, if any, coincides with the maximiser of

$$S\_1(0,n;z) = \sum\_{j=0}^n \binom{n}{j} z^j (1-z)^{n-j} s\_1(j+1,n).$$

Indeed, we have for the derivative using (4)

$$\begin{aligned} D\_z S\_1(0, n; z) &= \\ \sum\_{j=1}^n \binom{n-1}{j-1} n z^{j-1} (1-z)^{n-j} s\_1(j+1, n) - \sum\_{j=0}^{n-1} \binom{n-1}{j} n z^j (1-z)^{n-j-1} s\_1(j+1, n) \\ &= \sum\_{k=1}^n (\cdots) - \sum\_{k=1}^n \binom{n-1}{k-1} n z^{k-1} (1-z)^{n-k} s\_1(k, n) \\ &= \sum\_{k=1}^n \binom{n-1}{k-1} n z^{k-1} (1-z)^{n-k} \{s\_1(k+1, n) - s\_1(k, n)\} \\ &= \sum\_{k=1}^n \binom{n-1}{k-1} n z^{k-1} (1-z)^{n-k} p\_k \{s\_1(k+1, n) - s\_0(k+1, n)\}, \end{aligned}$$

which is the negative of the polynomial in (8). This provides the desired conclusion.

## *3.3. Examples*

The best-choice problem is related to the profile *pk* = 1/*k*. The associated Bernstein polynomials satisfy

$$S\_1(k, n; z) \to -z \log z, \quad n \to \infty,$$

where the convergence is uniform. Both maximiser and the maximum value converge to 1/*e* as *n* → ∞

The case *k* = 0 was studied in much detail [13,14,17,26]. The winning probability of *z*-strategy can be alternatively written as a Taylor polynomial

$$S\_1(0, n; z) = 1 - z - \sum\_{j=2}^{n} \frac{(1 - z)^j}{j(j - 1)},$$

which decreases pointwise to *z* → −*z* log *z* as *n* increases (see Figure 1). The maximisers increase monotonically to 1/*e* and also max*<sup>z</sup> S*1(0, *n*; *z*) ↓ 1/*e*. These facts underlie the minimax property that the 1/*e*-strategy ensures winning probability of at least 1/*e* for every *n* ≥ 1.

The nice monotonicity properties do not extend to *k* > 0, the minimax value is below 1/*e* and the 1/*e*-strategy is not minimax. This is already seen in the case *k* = 1, where the Bernstein polynomials become

$$\begin{aligned} S\_1(1, n; z) &= \quad \frac{n-1}{n} - \sum\_{j=2}^{n-1} \frac{(n-j)(1-z)^j}{n j(j-1)} \\ &= \quad S\_1(0, n; z) + \sum\_{j=1}^{n-1} \frac{(1-z)^{j+1}}{n j} - \frac{(1-z)}{n}. \end{aligned}$$

The first formula is derived by conditioning on the highest rank *j* of trials that occur before the threshold of *z*-strategy.

**Figure 1.** The winning probability *S*1(*k*, *n*; *z*) of *z*-strategy in the best-choice problem for *k* = 0 and 1 .

The more general profile

$$p\_k = \frac{\theta}{\theta + k - 1}, \quad k \ge 1,\tag{9}$$

with parameter *θ* > 0, plays a central role in the combinatorial structures related to the Ewens sampling formula for random partitions [27]. The term *Karamata–Stirling law* was coined in [28] for the distribution of the number of successes with these probabilities. The number of successes in trials *k* + 1, ··· , *n* has probability generating function

$$
\lambda \mapsto \frac{(k + \theta \lambda)\_{n-k}}{(k + \theta)\_{n-k}}.
$$

As *<sup>n</sup>* <sup>→</sup> <sup>∞</sup>, *<sup>S</sup>*1(*k*, *<sup>n</sup>*; *<sup>z</sup>*) → −*θz<sup>θ</sup>* log *<sup>z</sup>*. The maximum values still converge to 1/*<sup>e</sup>* but the maximisers approach *e*−1/*θ*. The shapes vary considerably with *θ*, see Figure 2. For *θ* large, the minimax winning probability is close to zero.

**Figure 2.** Bernstein polynomials for *pk* = *θ*/(*θ* + *k* − 1).

#### **4. Random Number of Trials:** *z***-Strategies**

We proceed with the continuous time setting, assuming *p* and *π* are given. In state (*t*, *k*), the probability of isolating the last success by means of a *z*-strategy is a convex mixture of the Bernstein polynomials:

$$\mathcal{S}\_1(t,k;z) := \sum\_{j=1}^{\infty} \pi(j|t,k) \sum\_{i=0}^{j-1} \binom{j}{i} z^i (1-z)^{j-1} s\_1(k+i+1,k+j). \tag{10}$$

The *z* = 0 instance,

$$\mathcal{S}\_1(t,k;0) = \sum\_{j=1}^{\infty} \pi(j|t,k)s\_1(k+1,k+j)\_{\prime\prime}$$

is the probability to win with next, and <sup>S</sup>1(*t*, *<sup>k</sup>*; 1) = 0. Similarly, the probability that none of the successes are trapped by the *z*-strategy is:

$$\mathcal{S}\_0(t,k;z) := \sum\_{j=0}^{\infty} \pi(j|t,k) \sum\_{i=0}^{j-1} \binom{j}{i} z^i (1-z)^{j-1} s\_0(k+i+1,k+j)\_+$$

and <sup>S</sup>0(*t*, *<sup>k</sup>*; 0) is the probability to win with bygone.

Being a convex mixture of unimodal functions, S1(*t*, *k*; ·) itself need not be unimodal. Accordingly, the optimal trap need not be a final interval. It may rather include a few disjoint intervals akin to 'islands' in the discrete time best-choice problems [29].

Concavity is a simple condition to ensure unimodality. We say that *s*1(·, *n*) is concave if for every *n* ≥ 1 the second difference in the first variable is non-positive.

**Theorem 2.** *Suppose s*1(·, *n*) *is concave. Then* S1(*t*, *k*; ·) *is unimodal with maximum at some z*∗*. If z*<sup>∗</sup> ∈ (0, 1) *then for z* = *z*<sup>∗</sup> *the z-strategy is optimal among all trapping actions, and if z*<sup>∗</sup> = 0 *then* next *outperforms every trapping action.*

**Proof.** By the shape-preserving properties of Bernstein polynomials [25], the internal sum in (10) is a concave function in *z*, therefore the mixture S1(*t*, *k*; ·) is also concave hence unimodal. The maximum is attained at 0 if *Dz*S1(*t*, *k*; 0) ≤ 0, and *z*<sup>∗</sup> > 0 otherwise. The overall optimality follows from the unimodality as in Theorem 1.

The concavity is easy to express in terms of *p* explicitly. The second difference in the variable *k* of the probability generating function

$$
\lambda \mapsto \prod\_{j=k}^n (1 - p\_j + \lambda p\_j),
$$

becomes

$$\left\{ (1 - p\_k + \lambda p\_k)(1 - p\_{k+1} + \lambda p\_{k+1}) - 2(1 - p\_{k+1} + \lambda p\_{k+1}) + 1 \right\} \prod\_{j=k+2}^n (1 - p\_j + \lambda p\_j).$$

Computing *D<sup>λ</sup>* at *λ* = 0 yields the second difference of *s*1(· , *n*)

$$(p\_k - 2p\_k p\_{k+1} - p\_{k+1}) + (p\_k p\_{k+1} - p\_k + p\_{k+1}) \sum\_{j=k+2}^n \frac{p\_j}{1 - p\_j}.\tag{11}$$

From this, a sufficient condition for the concavity of *s*1(·, *n*) is

$$p\_k - 2p\_k p\_{k+1} - p\_{k+1} \le 0, \quad p\_k p\_{k+1} - p\_k + p\_{k+1} \le 0, \quad k \ge 1. \tag{12}$$

Notably, (12) ensures unimodality for arbitrary *π* and only involves two consecutive success probabilities. The price to pay for the simplicity is that the condition is restrictive, as seen in Figure 3.

**Figure 3.** The concavity condition (12) holds for profiles *p* with (*pk*, *pk*<sup>+</sup>1) squeezed between the parabolas.

For the profile (9), straight calculation shows that (11) is non-positive, hence *s*1(·, *n*) is concave, iff

$$\frac{1}{2} \le \theta \le 1.$$

This is only a half range, but it includes two most important for application cases *θ* = 1 and *θ* = 1/2.

#### **5. Tests for the Monotone Case of Optimal Stopping**

Using (2) and (3), we can cast the winning probabilities with actions bygone, next and a *z*-strategy as:

$$\begin{aligned} \mathcal{S}\_0(t, k; 0) &= \quad f\_k(\mathbf{x}) P\_k(\mathbf{x}), \\ \mathcal{S}\_1(t, k; 0) &= \quad f\_k(\mathbf{x}) Q\_k(\mathbf{x}), \\ \mathcal{S}\_1(t, k; z) &= \quad f\_k(\mathbf{x}) R\_k(\mathbf{x}, z), \end{aligned} \tag{13}$$

where *x* = *q*(1 − *t*) and

$$\begin{aligned} P\_k(\mathbf{x}) &:= \sum\_{j=0}^{\infty} \binom{k+j}{j} w\_{k+j} \mathbf{x}^j s\_0(k+1, k+j), \\ Q\_k(\mathbf{x}) &:= \sum\_{j=1}^{\infty} \binom{k+j}{j} w\_{k+j} \mathbf{x}^j s\_1(k+1, k+j), \\ R\_k(\mathbf{x}, z) &:= \sum\_{j=1}^{\infty} \binom{k+j}{j} w\_{k+j} \mathbf{x}^j \sum\_{i=0}^{j-1} \binom{j}{i} z^i (1-z)^{j-i} s\_1(k+i+1, k+j). \end{aligned}$$

Thus, *Qk*(*x*) = *Rk*(*x*, 0). We are looking next at some critical points for the trapping game and the optimal stopping problem.

**Lemma 3.** *Equation Pk*(*x*) = *Qk*(*x*) *has at most one root α<sup>k</sup>* > 0*, for every k* ≥ 1*.*

**Proof.** Coefficients of the series *Pk*(*x*) − *Qk*(*x*) have at most one change of sign from + to −, hence Descartes' rule of signs for power series [30] entails that there is at most one positive root.

We set *α<sup>k</sup>* = ∞ if the root does not exist. Define the cut-off

$$a\_k = \left(1 - \frac{a\_k}{q}\right)\_+.$$

This is the earliest time when bygone becomes at least as good as next. Keep in mind that if the sequence (*αk*) is monotone, then (*ak*) is also monotone but with the monotonicity direction reversed. The monotone case of optimal stopping holds for every *q*, hence *τ*∗ is optimal, if *α<sup>k</sup>* ↑.

**Example 1.** *In the paradigmatic case pk* = 1/*k and the geometric prior with wn* = 1*, we have*

$$s\_0(k+1,n) = \frac{k}{n'} \quad s\_1(k+1,n) = \frac{k}{n} \sum\_{j=k+1}^n \frac{1}{j-1}.$$

*and explicitly computable power series*

$$P\_k(\mathbf{x}) = \frac{1}{(1-\mathbf{x})^{k'}} \quad Q\_k(\mathbf{x}) = \frac{-\log(1-\mathbf{x})}{(1-\mathbf{x})^k} \dots$$

*The equation Pk*(*x*) = *Qk*(*x*) *yields identical roots α<sup>k</sup>* = 1 − 1/*e and coinciding cut-offs ak* = (<sup>1</sup> <sup>−</sup> (<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−1)/*q*)+*. Thus, <sup>τ</sup>*<sup>∗</sup> *stops at the first success trial after a time threshold. See [1,7,17–19] for details on this remarkable case.*

**Lemma 4.** *Equation DzRk*(*x*, 0) = 0 *has at most one root β<sup>k</sup>* > 0*, for every k* ≥ 0*. If the root exists, then β<sup>k</sup>* ≤ *αk*+1*.*

**Proof.** We follow the argument in Lemma 3. The derivative at *z* = 0 is

$$D\_z R\_k(\mathbf{x}, 0) = p\_{k+1} \sum\_{j=1}^{\infty} \binom{k+j}{j} w\_{k+j} \, {}\_j \mathbf{x}^j \left\{ \mathbf{s}\_0(k+2, k+j) - s\_1(k+2, k+j) \right\} \mathbf{y}$$

which has at most one change of sign for *x* ≥ 0, and then from + to −. Furthermore,

$$\begin{aligned} \label{eq:SDAR-1} D\_{\mathbf{z}}R\_k(\mathbf{x},0) & \quad \geq & p\_{k+1} \sum\_{j=1}^{\infty} \binom{k+j}{j} w\_{k+j} \mathbf{x}^j \{s\_0(k+2,k+j) - s\_1(k+2,k+j)\} \\ & = & p\_{k+1} \{P\_{k+1}(\mathbf{x}) - Q\_{k+1}(\mathbf{x})\}. \end{aligned}$$

This follows by comparing the series and noting that the weights at positive terms in *Dz* are higher.

If there is no finite root, we set *β<sup>k</sup>* = ∞. Let

$$b\_k := \left(1 - \frac{\beta\_k}{q}\right)\_+.$$

We have *DzRk*(*q*(1 − *t*), 0) < 0 for *t* ∈ (*bk*, 1], and *bk* ≥ *ak*<sup>+</sup><sup>1</sup> by Lemma 4. Thus, *bk* is the earliest time when the action next at index *<sup>k</sup>* cannot be improved by a *<sup>z</sup>*-strategy with small enough *z*.

To summarise the above: for *<sup>t</sup>* <sup>&</sup>lt; *ak* action next is better than bygone, and tor *<sup>t</sup>* <sup>&</sup>lt; *bk* <sup>a</sup> trapping strategy is better than next.

**Theorem 3.** *The optimal stopping problem belongs to the monotone case (for every admissible q) if and only if α*<sup>1</sup> ≤ *α*<sup>2</sup> ≤··· . *In that case we have the interlacing pattern of roots*

$$
\cdots \cdot \le \varkappa\_k \le \beta\_k \le \varkappa\_{k+1} \le \beta\_{k+1} \le \cdots \cdot \tag{14}
$$

**Proof.** We argue in probabilistic terms. The bivariate sequence of success epochs (*t*, *k*)◦ is an increasing Markov chain. The monotone case of optimal stopping occurs iff the set of states where bygone outperforms next is closed, which holds iff this is an upper subset with respect to the partial order in [0, 1] × {1, 2, ···}. The latter property amounts to the monotonicity condition *α<sup>k</sup>* ↑.

By Lemma 3, the inequality *α<sup>k</sup>* ≤ *βk*+<sup>1</sup> always holds. In the monotone case, if in some state (*t*, *<sup>k</sup>*)◦ the actions bygone and next are equally good, then trapping cannot improve upon these by optimality of the myopic strategy. In the analytic terms, the above translates as the inequality *β<sup>k</sup>* ≤ *αk*.

#### **6. The Best-Choice Problem under the Log-Series Prior**

In this section we consider the random records model with the classic profile *pk* = 1/*k*, and a pacing process with the logarithmic series prior

$$
\pi\_{\mathfrak{n}} = \mathfrak{c}(q) \frac{q^n}{n}, \quad n \ge 1,\tag{15}
$$

(so *π*<sup>0</sup> = 0), where 0 < *q* < 1 and *c*(*q*) = | log(1 − *q*)| <sup>−</sup>1. See [31] for Poisson mixture representations of *π*. The function S1(*t*, *k*; ·) is concave, hence by Theorem 2 it is sufficient to consider *z*-strategies.

Let *T*<sup>1</sup> be the time of the first trial.

**Lemma 5.** *Under the logarithmic series prior* (15) *the pacing process has the following features:* (*i*) *The time of the first trial T*<sup>1</sup> *has probability density function*

$$t \mapsto \frac{c(q) \, q}{1 - (1 - t)q}, \quad t \in [0, 1].$$

(*ii*) (*Nt*, *t* ∈ [0, 1]) *is a Pólya-Lundberg birth process with transition rates*

$$\mathbb{P}(N\_{t+\mathbf{d}t} - N\_t = 1 \mid N\_t = k) = \begin{cases} \frac{c\left((1-t)q\right)q}{1 - (1-t)q}, & k = 0, \\\frac{k}{t+q^{-1}-1}, & k \ge 1. \end{cases}$$

(*iii*) *Given Nt* = *k, the posterior distribution π*(· | *t*, *k*) *of N*<sup>1</sup> − *Nt is* NB(*k*,(1 − *t*)*q*). *In particular, conditionally on T*<sup>1</sup> = *t*1*, the posterior distribution is geometric with the 'failure' probability* (1 − *t*1)*q.*

**Proof.** Assertion (i) follows from

$$\mathbb{P}(T\_1 > t) = \mathbb{P}(N\_t = 0) = \sum\_{n=1}^{\infty} \frac{c(q)q^n(1-t)^n}{n}.$$

and (iii) from the identity

$$
\binom{k+j}{j} \frac{x^j}{k+j} = \binom{k+j-1}{j} \frac{x^j}{k}
$$

underlying the formula for *π*(*j*|*t*, *k*) in terms of *x* = (1 − *t*)*q*.

In view of part (ii), we will use NB(0, *q*) to denote the log-series prior (15).

## *6.1. Hypergeometrics*

The power series of interest can be expressed via the Gaussian hypergeometric function

$$F(a,b;c;x) := \sum\_{j=0}^{\infty} \frac{(a)\_j (b)\_j}{(c)\_j} \frac{x^j}{j!} \dots$$

Recall the differentiation formula

$$D\_{\mathbf{x}}F(a,b;c,\mathbf{x}) = \frac{ab}{c}F(a+1,b+1;c+1,\mathbf{x})\_{\mathbf{x}}$$

the parameter transformation formula

$$F(a,b;c;\mathbf{x}) = (1-\mathbf{x})^{c-a-b} F(c-a,c-b;c;\mathbf{x}),$$

and Euler's integral representation for *c* > *b* > 0

$$F(a,b;c;x) = \frac{\Gamma(c)}{\Gamma(b)\Gamma(c-b)} \int\_0^1 \frac{y^{b-1}(1-y)^{c-b-1}dy}{(1-xy)^a}.$$

The probability generating function for the number of successes following state (*t*, *k*), for *k* ≥ 1, is given by a hypergeometric function:

$$\begin{aligned} \lambda &\mapsto (1-x)^k \sum\_{j=0}^{\infty} \binom{k+j-1}{j} x^j \frac{(k+\lambda)\_j}{(k+1)\_j} = \\ & (1-x)^k \sum\_{j=0}^{\infty} \frac{(k)\_j (k+\lambda)\_j}{(k+1)\_j} \frac{x^j}{j!} = \\ & (1-x)^k \, ^tF(k+\lambda, k; k+1; x). \end{aligned}$$

Expanding at *λ* = 0 we identify two basic power series as:

$$\begin{array}{rcl}P\_k(\mathbf{x})&=&k^{-1}\,\_1F(k,k;k+1;\mathbf{x}),\\Q\_k(\mathbf{x})&=&k^{-1}\,\_1D\_aF(k,k;k+1;\mathbf{x}),\end{array}$$

where as before *x* = (1 − *t*)*q* ∈ [0, 1] and *Da* is the derivative in the first parameter. The differentiation formula implies backward recursions:

$$\begin{array}{rcl}D\_{\mathbf{x}}P\_{k}(\mathbf{x})&=&kP\_{k+1}(\mathbf{x}),\\D\_{\mathbf{x}}Q\_{k}(\mathbf{x})&=&P\_{k+1}(\mathbf{x})+k\,Q\_{k+1}(\mathbf{x}).\end{array} \tag{16}$$

The normalisation function for probabilities (14) is *fk*(*x*) = *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*<sup>k</sup>* for *<sup>k</sup>* <sup>≥</sup> 1, and *f*0(*x*) = | log(1 − *x*)| <sup>−</sup>1. Applying the transformation formula yields *Pk*(*x*)=(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)1−*kF*(1, 1; *<sup>k</sup>* <sup>+</sup> 1, *<sup>x</sup>*), hence, we may write the winning probability with bygone as the series

$$\mathcal{S}\_0(t,k;0) = (1-x) \sum\_{j=0}^{\infty} \frac{j! \, x^j}{(k+1)\_j}, \quad x = (1-t)q.$$

It is readily seen that, as *k* increases, this function decreases to 1 − *x*. This result was already observed in [18] using a probabilistic argument. The convergence to 1 − *x* relates to the fact that for large *k*, the point process of record epochs approaches a Poisson process.

For *Rk*(*x*, *z*), we derive an integral formula. Consider first the case *k* ≥ 1. The probability generating function of the number of record epochs following (*t*, *k*) and falling in the final interval [*t* + *z*(1 − *t*), 1] has probability generating function

$$\begin{split} \lambda & \mapsto (1-x)^k \sum\_{j=0}^{\infty} \binom{k+j-1}{j} x^j \sum\_{i=0}^j \binom{j}{i} z^i (1-z)^{j-i} \frac{(k+i+\lambda)\_{j-i}}{(k+i+1)\_{j-i}} = \\ & (1-x)^k \sum\_{i=0}^{\infty} \binom{k+i-1}{i} (xz)^i F(k+i+\lambda, k+i, k+i+1; \mathbf{x} - \mathbf{x}z) = \\ & \qquad \qquad \qquad k(1-x)^k \sum\_{i=0}^{\infty} \binom{k+i}{i} (xz)^i \int\_0^1 \frac{y^{k+i-1} \mathbf{d}y}{(1-\mathbf{x}y+\mathbf{x}yz)^{k+i+1}} = \\ & \qquad \qquad \qquad k(1-x)^k \int\_0^1 \frac{y^{k-1}(1-\mathbf{x}y+\mathbf{x}yz)^{1-\lambda} \mathbf{d}y}{(1-\mathbf{x}y)^{k+1}}. \end{split}$$

Differentiating at *<sup>λ</sup>* <sup>=</sup> 0 yields <sup>S</sup>1(*k*, *<sup>t</sup>*; *<sup>z</sup>*), which is the same as *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kRk*(*x*, *<sup>z</sup>*) for *x* = (1 − *t*)*q*, whence

$$R\_k(\mathbf{x}, z) = \int\_0^1 \frac{y^{k-1} (1 - \mathbf{x}y + \mathbf{x}yz) |\log(1 - \mathbf{x}y + \mathbf{x}yz)| \mathbf{d}y}{(1 - \mathbf{x}y)^{k+1}}.\tag{17}$$

For *k* = 0, a similar calculation with log-series weights NB(0, *x*) gives

$$R\_0(\mathbf{x}, z) = \int\_0^1 \frac{(1 - \mathbf{x}y + \mathbf{x}yz) \log(1 - \mathbf{x}y + \mathbf{x}yz)}{y(1 - \mathbf{x}y)} \, \mathrm{d}y \, \mathrm{d}z$$

## *6.2. The Myopic Strategy*

The positive root obtained by equating

$$P\_1(\mathbf{x}) = \frac{|\log(1-\mathbf{x})|}{\mathbf{x}} \quad \text{and} \quad Q\_1(\mathbf{x}) = \frac{|\log(1-\mathbf{x})|^2}{2\mathbf{x}}$$

is *<sup>α</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>e</sup>*−<sup>2</sup> <sup>=</sup> 0.864665 ··· . On the other hand, solving *DzR*1(*x*, 0) = 0 yields a smaller value *β*<sup>1</sup> = 0.756004 ··· , hence the interlacing condition of Theorem 3 fails for *k* = 1. Translating in terms of the best-choice problem, this means that *τ*∗ stops at the first trial if

this occurs before *a*<sup>1</sup> = (1 − *α*1/*q*)+, but a *z*-strategy will be more beneficial for a bigger range of times *t* ≤ *b*<sup>1</sup> = (1 − *β*1/*q*)+. Therefore, at least for *q* > *β*1, it is not optimal to stop at the first trial before *b*<sup>1</sup> and the myopic strategy can be beaten.

The root *α*<sup>2</sup> := 0.755984 ··· is found by equating

$$P\_2(\mathbf{x}) = \frac{2(\mathbf{x} - L + \mathbf{x}L)}{(1 - \mathbf{x})\mathbf{x}^2} \quad \text{and} \quad Q\_2(\mathbf{x}) = \frac{-2\mathbf{x} + 2L - L^2 + \mathbf{x}L^2}{(1 - \mathbf{x})\mathbf{x}^2}.$$

where for shorthand *L* := − log(1 − *x*). Formulas become more complicated for larger *k*.

We see that *α*<sup>1</sup> > *α*2, which suggests monotonicity of the whole sequence. To show this, pass to the quotient and re-define the root *α<sup>k</sup>* as a unique solution on [0, 1) to

$$\frac{Q\_k(\mathbf{x})}{P\_k(\mathbf{x})} = 1 \quad \Longleftrightarrow \quad \frac{D\_a F(k, k; k+1; \mathbf{x})}{F(k, k; k+1; \mathbf{x})} = 1,\tag{18}$$

where *Da* acts in the first parameter. As *x* increases from 0 to 1, this logarithmic derivative runs from 0 to ∞.

**Lemma 6.** *The logarithmic derivative* (18) *increases in k, hence the sequence of roots α<sup>k</sup> is strictly decreasing.*

**Proof.** Euler's integral specialises as:

$$F(k+\lambda,k;k+1;\mathbf{x}) = k \int\_0^1 \frac{y^{k-1}}{(1-xy)^{k+\lambda}} \mathbf{d}y \dots$$

Expanding in parameter at *λ* = 0 gives the integral representations

$$P\_k(\mathbf{x}) = \int\_0^1 \frac{y^{k-1}}{(1-\mathbf{x}y)^k} \mathbf{d}y, \qquad Q\_k(\mathbf{x}) = \int\_0^1 \frac{y^{k-1}|\log(1-\mathbf{x}y)|}{(1-\mathbf{x}y)^k} \mathbf{d}y.$$

From these formulas,

$$\begin{split} Q\_k(\mathbf{x}) P\_{k+1}(\mathbf{x}) &= \int\_0^1 \frac{y^{k-1} |\log(1-\mathbf{x}y)|}{(1-\mathbf{x}y)^k} \mathbf{d}y \int\_0^1 \frac{z^k}{(1-\mathbf{x}z)^{k+1}} \mathbf{d}z \\ &= \int\_0^1 \int\_0^1 \frac{y^{k-1} z^{k-1} |\log(1-\mathbf{x}y)|}{(1-\mathbf{x}y)^k (1-\mathbf{x}z)^k} \frac{z}{(1-\mathbf{x}z)} \mathbf{d}y \mathbf{d}z. \end{split}$$

By the same kind of argument, a similar formula is obtained for *Qk*<sup>+</sup>1(*x*)*Pk*(*x*). Splitting the integration domain, and using symmetries of the integrand yields for *x* ∈ [0, 1)

$$\begin{aligned} Q\_k(\mathbf{x})P\_{k+1}(\mathbf{x}) - Q\_{k+1}(\mathbf{x})P\_k(\mathbf{x}) &= \\ \int\_0^1 \int\_0^1 \frac{y^{k-1}z^{k-1}|\log(1-xy)|}{(1-xy)^{k+1}(1-xz)^{k+1}}(z-y) \mathrm{d}y \mathrm{d}z &= \\ \int\_0^1 \int\_{y$$

which implies the asserted monotonicity.

Figure 4 shows some shapes of *fk*(*x*)*Pk*(*x*) and *fk*(*x*)*Qk*(*x*) for *k* = 1, 2, 3.

**Figure 4.** next and bygone curves for *<sup>k</sup>* <sup>=</sup> 1, 2, 3.

The log-series distribution weights satisfy *wn*+1/*wn* ↑ 1. Comparison with the geometric distribution, as in [19], in combination with the lemma give *α<sup>k</sup>* ↓ (1 − 1/*e*) as *k* → ∞. The same limit has been shown for analogous roots in the best-choice problem with the negative binomial prior NB(*ν*, *q*) for integer *ν* ≥ 1; however, the monotonicity direction in that setting is different [7].

To summarise findings of this section, we have:

**Theorem 4.** *The monotone case of optimal stopping does not hold. The myopic strategy τ*∗ *is not optimal and has the following features:*


## *6.3. Optimality and Bounds*

For state (*t*, *k*) and *x* = *q*(1 − *t*), define the *continuation value Vk*(*x*) to be the maximum probability of the best choice, as achievable by stopping strategies starting in the state. By the optimality principle, the overall optimal stopping strategy, starting from (0, 0), stops at the first record (*t*, *<sup>k</sup>*)◦ satisfying *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kPk*(*x*) <sup>≥</sup> *Vk*(*x*).

Given *Nt* = *k*, let *Tk*<sup>+</sup><sup>1</sup> be the next trial epoch (or 1 in the event *N*<sup>1</sup> = *k*). Similar to the argument in Lemma 5, we find that the random variable (1 − *Tk*<sup>+</sup>1)/(1 − *t*) has density

$$y \mapsto \frac{kx(1-x)^k}{(1-x+xy)^{k+1}}, \ y \in (0,1].$$

By the (*<sup>k</sup>* <sup>+</sup> <sup>1</sup>)st trial, the optimal stopping strategy stops if this is a record and bygone is more beneficial than the optimal continuation, hence integrating out *Tk*<sup>+</sup><sup>1</sup> we obtain

$$V\_k(\mathbf{x}) = \int\_0^1 \left[ \frac{1}{k+1} \max\{ (1-y)^{k+1} P\_{k+1}(y), V\_{k+1}(y) \} + \frac{k}{k+1} V\_{k+1}(y) \right] \frac{k \mathbf{x} (1-\mathbf{x})^k \mathbf{d}y}{(1-\mathbf{x}+\mathbf{x}y)^{k+1}}.$$

This has the equivalent differential form for *k* ≥ 1,

$$\frac{1}{k}\left(1-\mathbf{x}\right)D\_{\mathbf{x}}V\_{k}(\mathbf{x}) = \frac{k}{k+1}\left((1-\mathbf{x})^{k+1}P\_{k+1}(\mathbf{x}) - V\_{k+1}(\mathbf{x})\right)\_{+} + k\{V\_{k+1}(\mathbf{x}) - V\_{k}(\mathbf{x})\}.\tag{19}$$

For the special instance *k* = 0, integrating out the variable *T*<sup>1</sup> gives

$$V\_0(\mathbf{x}) = \int\_0^1 \max\left( (1-y)P\_1(y), V\_1(y) \right) \frac{\mathbf{d}y}{(1-\mathbf{x}+\mathbf{x}y)|\log(1-\mathbf{x})|'} $$

or, in the differential form with initial conditions *V*0(0) = 1 and *Vk*(0) = 0, for *k* ≥ 1

$$(1-\mathbf{x})|\log(1-\mathbf{x})|\,D\_{\mathbf{x}}V\_{0}(\mathbf{x})=\max\{(1-\mathbf{x})P\_{1}(\mathbf{x}),V\_{1}(\mathbf{x})\}-V\_{0}(\mathbf{x}).\tag{20}$$

By Corollary 4, the continuation value coincides with the winning probability of next in a segment of the range; therefore:

$$V\_k(\mathbf{x}) = k(1 - \mathbf{x})^k Q\_k(\mathbf{x}), \quad \text{for } 0 \le \mathbf{x} \le 1 - 1/\mathbf{e}, \ k \ge 0. \tag{21}$$

As a check, for *<sup>k</sup>* <sup>≥</sup> 1 let *<sup>V</sup>*\$*k*(*x*) :<sup>=</sup> *<sup>k</sup>*−1(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)−*kVk*(*x*). With this change of variable, (19) simplifies as

$$D\_x \mathcal{V}\_k(\mathbf{x}) = (P\_{k+1}(\mathbf{x}) - \mathcal{V}\_{k+1}(\mathbf{x}))\_+ + (k+1)\,\mathcal{V}\_{k+1}(\mathbf{x}).$$

For *x* in the range where *Pk*<sup>+</sup>1(*x*) − *V*\$*k*+1(*x*) ≥ 0, this becomes the recursion (16).

Outside the range covered by (21), Equations (19) and (20) should be complemented by a '*k* = ∞' boundary condition

$$\lim\_{k \to \infty} V\_k(\mathbf{x}) = \begin{cases} 1/\varepsilon, & \text{for } 1 - 1/\varepsilon \le \mathbf{x} \le 1, \\ -(1 - \mathbf{x})\log(1 - \mathbf{x}), & \text{for } 0 \le \mathbf{x} \le 1 - 1/\varepsilon. \end{cases}$$

Figure 5 shows stop, continuation and z-strategy curves for *k* = 1, 2 and 3. The numerical simulation suggests that the equation *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kPk*(*x*) = *Vk*(*x*), *<sup>k</sup>* <sup>≥</sup> 1 has a unique solution *γ<sup>k</sup>* and that the critical points increase with *k*, so the optimal stopping strategy is similar to the myopic. These critical points have lower bounds *δ<sup>k</sup>* defined as the solution to *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kPk*(*x*) = *Ik*(*x*) and upper bounds *<sup>ρ</sup><sup>k</sup>* defined as the critical points where bygone is the same as the *<sup>z</sup>*-strategy.

To approximate the continuation value in the range 1 − 1/*e* < *x* < 1, we simulated some easier computable bounds

$$k(1-\mathbf{x})^k Q\_k \le k(1-\mathbf{x})^k \max\_z R\_k(\mathbf{x}, z) \le V\_k(\mathbf{x}) < I\_k(\mathbf{x}).$$

The upper *information* bound *Ik*(*x*) (see Figure 6) is the winning probability of an informed gambler who in state (*t*, *k*) (with *x* = *q*(1 − *t*)) knows the total number of trials *N*1, as in Section 3. Two lower bounds stem from the comparison with the myopic and *z*-strategies. The points *β<sup>k</sup>* computed for *k* ≤ 10 all satisfy *β<sup>k</sup>* < *αk*, and so the first relation turns equality for 0 ≤ *x* ≤ *βk*. Therefore, the critical points satisfy

$$
\delta\_k < \gamma\_k < \rho\_k \le \mathfrak{a}\_k.
$$

The results of computation are presented in Figure 5 and Tables 1–4. The data show excellent performance of the strategy that by the first trial chooses between stopping and proceeding with a *z*-strategy.

**Figure 5.** Stop, continuation, z-strategy values and bounds; *k* = 1, 2, 3 and zoomed-in view for *k* = 3.

**Figure 6.** Information bounds on the optimal strategy *Ik*(*x*).

**Table 1.** Critical points: *αk*: Solution to *Pk*(*x*) = *Qk*(*x*), *βk*: Solution to *DzRk*(*x*, *z*) = 0, *γk*: Solution to *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kPk*(*x*) = *Vk*(*x*), *<sup>δ</sup>k*: Solution to *<sup>k</sup>*(<sup>1</sup> <sup>−</sup> *<sup>x</sup>*)*kPk*(*x*) = *Ik*(*x*), *<sup>ρ</sup>k*: Solution to *Pk*(*x*) = max*<sup>z</sup> Rk*(*x*, *<sup>z</sup>*).



**Table 2.** Winning probability and bounds for *k* = 1.

**Table 3.** Winning probability and bounds for *k* = 2.


**Table 4.** Winning probability and bounds for *k* = 3.


**Author Contributions:** Methodology, A.G.; validation, A.G. and Z.D.; formal analysis, A.G. and Z.D.; writing—original draft preparation, A.G.; writing—review and editing, A.G. and Z.D.; visualization, A.G. and Z.D.; supervision, A.G. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No 817257.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**

