*Article* **Geometric Ergodicity of the Random Walk Metropolis with Position-Dependent Proposal Covariance**

**Samuel Livingstone**

Department of Statistical Science, University College London, London WC1E 6BT, UK; samuel.livingstone@ucl.ac.uk

**Abstract:** We consider a Metropolis–Hastings method with proposal <sup>N</sup> (*x*, *hG*(*x*)−1), where *<sup>x</sup>* is the current state, and study its ergodicity properties. We show that suitable choices of *G*(*x*) can change these ergodicity properties compared to the Random Walk Metropolis case N (*x*, *h*Σ), either for better or worse. We find that if the proposal variance is allowed to grow unboundedly in the tails of the distribution then geometric ergodicity can be established when the target distribution for the algorithm has tails that are heavier than exponential, in contrast to the Random Walk Metropolis case, but that the growth rate must be carefully controlled to prevent the rejection rate approaching unity. We also illustrate that a judicious choice of *G*(*x*) can result in a geometrically ergodic chain when probability concentrates on an ever narrower ridge in the tails, something that is again not true for the Random Walk Metropolis.

**Keywords:** Monte Carlo; MCMC; Markov chains; computational statistics; bayesian inference

#### **1. Introduction**

Markov chain Monte Carlo (MCMC) methods are techniques for estimating expectations with respect to some probability measure *π*(·), which need not be normalised. This is done by sampling a Markov chain which has limiting distribution *π*(·), and computing empirical averages. A popular form of MCMC is the Metropolis–Hastings algorithm [1,2], where at each time step a 'proposed' move is drawn from some candidate distribution, and then accepted with some probability, otherwise the chain stays at the current point. Interest lies in finding choices of candidate distribution that will produce sensible estimators for expectations with respect to *π*(·).

The quality of these estimators can be assessed in many different ways, but a common approach is to understand conditions on *π*(·) that will result in a chain which converges to its limiting distribution at a *geometric* rate. If such a rate can be established, then a Central Limit Theorem will exist for expectations of functionals with finite second absolute moment under *π*(·) if the chain is reversible.

A simple yet often effective choice is a symmetric candidate distribution centred at the current point in the chain (with a fixed variance), resulting in the *Random Walk Metropolis* (RWM) (e.g., [3]). The convergence properties of a chain produced by the RWM are well-studied. In one dimension, essentially convergence is geometric if *π*(*x*) decays at an exponential or faster rate in the tails [4], while in higher dimensions an additional curvature condition is required [5]. Slower rates of convergence have also been established in the case of heavier tails [6].

Recently, some MCMC methods were proposed which generalise the RWM, whereby proposals are still centred at the current point *x* and symmetric, but the variance changes with *x* [7–11]. An extension to infinite-dimensional Hilbert spaces is also suggested in Reference [12]. The motivation is that the chain can become more 'local', perhaps making larger jumps when out in the tails, or mimicking the local dependence structure of *π*(·) to propose more intelligent moves. Designing MCMC methods of this nature is particularly relevant for modern Bayesian inference problems, where posterior distributions are

**Citation:** Livingstone, S. Geometric Ergodicity of the Random Walk Metropolis with Position-Dependent Proposal Covariance. *Mathematics* **2021**, *9*, 341. https://doi.org/ 10.3390/math9040341


Academic Editor : Panagiotis-Christos Vassiliou Received: 19 January 2021 Accepted: 4 February 2021 Published: 8 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

often high dimensional and exhibit nonlinear correlations [13]. We term this approach the *Position-dependent Random Walk Metropolis* (PDRWM), although technically this is a misnomer, since proposals are no longer random walks. Other choices of candidate distribution designed with distributions that exhibit nonlinear correlations were introduced in Reference [13]. Although powerful, these require derivative information for log *π*(*x*), something which can be unavailable in modern inference problems (e.g., [14]). We note that no such information is required for the PDRWM, as shown by the particular cases suggested in References [7–11]. However, there are relations between the approaches, to the extent that understanding how the properties of the PDRWM differ from the standard RWM should also aid understanding of the methods introduced in Reference [13].

In this article, we consider the convergence rate of a Markov chain generated by the PDRWM to its limiting distribution. Our main interest lies in whether this generalisation can change these *ergodicity* properties compared to the standard RWM with fixed covariance. We focus on the case in which the candidate distribution is Gaussian, and illustrate that such changes can occur in several different ways, either for better or worse. Our aim is not to give a complete characterisation of the approach, but rather to illustrate the possibilities through carefully chosen examples, which are known to be indicative of more general behaviour.

In Section 2 necessary concepts about Markov chains are briefly reviewed, before the PDRWM is introduced in Section 3. Some results in the one-dimensional case are given in Section 4, before a higher-dimensional model problem is examined in Section 5. Throughout *π*(·) denotes a probability measure (we use the terms probability measure and distribution synonymously), and *π*(*x*) its density with respect to Lebesgue measure *dx*.

Since an early version of this work appeared online, some contributions to the literature were made that are worthy of mention. A Markov kernel constructed as a statedependent mixture is introduced in Reference [15] and its properties are studied in some cases that are similar in spirit to the model problem of Section 5. An algorithm called *Directional Metropolis–Hastings*, which encompasses a specific instance of the PDRWM, is introduced and studied in Reference [16], and a modification of the same idea is used to develop the *Hop* kernel within the *Hug and Hop* algorithm of Reference [17]. Kamatani considers an algorithm designed for the infinite-dimensional setting in Reference [18] of a similar design to that discussed in Reference [12] and studies the ergodicity properties.

#### **2. Markov Chains and Geometric Ergodicity**

We will work on the Borel space (<sup>X</sup> , <sup>B</sup>), with X ⊂ <sup>R</sup>*<sup>d</sup>* for some *<sup>d</sup>* <sup>≥</sup> 1, so that each *Xt* ∈ X for a discrete-time Markov chain {*Xt*}*t*≥<sup>0</sup> with time-homogeneous transition kernel *<sup>P</sup>* : X ×B→ [0, 1], where *<sup>P</sup>*(*x*, *<sup>A</sup>*) = <sup>P</sup>[*Xi*+<sup>1</sup> <sup>∈</sup> *<sup>A</sup>*|*Xi* <sup>=</sup> *<sup>x</sup>*] and *<sup>P</sup>n*(*x*, *<sup>A</sup>*) is defined similarly for *Xi*+*n*. All chains we consider will have invariant distribution *π*(·), and be both *π*-irreducible and aperiodic, meaning *π*(·) is the limiting distribution from *π*-almost any starting point [19]. We use |·| to denote the Euclidean norm.

In Markov chain Monte Carlo the objective is to construct estimators of E*π*[ *f* ], for some *f* : X → R, by computing

$$\hat{f}\_n = \frac{1}{n} \sum\_{i=1}^n f(X\_i)\_\prime \quad X\_i \sim P^i(\infty\_\prime \cdot).$$

If *<sup>π</sup>*(·) is the limiting distribution for the chain then *<sup>P</sup>* will be *ergodic*, meaning <sup>ˆ</sup> *fn <sup>a</sup>*.*s*. −→ E*π*[ *f* ] from *π*-almost any starting point. For finite *n* the quality of ˆ *fn* intuitively depends on how quickly *<sup>P</sup>n*(*x*, ·) approaches *<sup>π</sup>*(·). We call the chain *geometrically ergodic* if

$$\|\|P^n(\mathbf{x}, \cdot) - \pi(\cdot)\|\_{TV} \le M(\mathbf{x})\rho^n,\tag{1}$$

from *π*-almost any *x* ∈ X , for some *M* > 0 and *ρ* < 1, where *μ*(·) − *ν*(·)*TV* := sup*A*∈B <sup>|</sup>*μ*(*A*) <sup>−</sup> *<sup>ν</sup>*(*B*)<sup>|</sup> is the total variation distance between distributions *<sup>μ</sup>*(·) and *<sup>ν</sup>*(·) [19].

For *π*-reversible Markov chains geometric ergodicity implies that if E*π*[ *f* <sup>2</sup>] < ∞ for some *f* : X → R, then

$$
\sqrt{n}\left(f\_n - \mathbb{E}\_\pi[f]\right) \xrightarrow{d} \mathcal{N}\left(0, v(P, f)\right), \tag{2}
$$

for some asymptotic variance *v*(*P*, *f*) [20]. Equation (2) enables the construction of asymptotic confidence intervals for ˆ *fn*.

In practice, geometric ergodicity does not guarantee that ˆ *fn* will be a sensible estimator, as *M*(*x*) can be arbitrarily large if the chain is initialised far from the typical set under *π*(·), and *ρ* may be very close to 1. However, chains which are not geometrically ergodic can often either get 'stuck' for a long time in low-probability regions or fail to explore the entire distribution adequately, sometimes in ways that are difficult to diagnose using standard MCMC diagnostics.

#### *Establishing Geometric Ergodicity*

It is shown in Chapter 15 of Reference [21] that Equation (1) is equivalent to the condition that there exists a *Lyapunov* function *V* : X → [1, ∞) and some *λ* < 1, *b* < ∞ such that

$$PV(\mathbf{x}) \le \lambda V(\mathbf{x}) + b\mathbb{I}\_{\mathbb{C}}(\mathbf{x}),\tag{3}$$

where *PV*(*x*) := *V*(*y*)*P*(*x*, *dy*). The set *C* ⊂ X must be *small*, meaning that for some *m* ∈ N, *ε* > 0 and probability measure *ν*(·)

$$P^{\mathrm{nr}}(\mathbf{x}, A) \ge \varepsilon \nu(A),\tag{4}$$

for any *x* ∈ *C* and *A* ∈ B. Equations (3) and (4) are referred to as *drift* and *minorisation* conditions. Intuitively, *C* can be thought of as the centre of the space, and Equation (3) ensures that some one dimensional projection of {*Xt*}*t*≥<sup>0</sup> drifts towards *C* at a geometric rate when outside. In fact, Equation (3) is sufficient for the return time distribution to *C* to have geometric tails [21]. Once in *C*, (4) ensures that with some probability the chain forgets its past and hence *regenerates*. This regeneration allows the chain to couple with another initialised from *π*(·), giving a bound on the total variation distance through the *coupling inequality* (e.g., [19]). More intuition is given in Reference [22].

Transition kernels considered here will be of the *Metropolis–Hastings* type, given by

$$P(\mathbf{x}, dy) = a(\mathbf{x}, y)Q(\mathbf{x}, dy) + r(\mathbf{x})\delta\_{\mathbf{x}}(dy),\tag{5}$$

where *Q*(*x*, *dy*) = *q*(*y*|*x*)*dy* is some candidate kernel, *α* is called the acceptance rate and *r*(*x*) = 1 − *α*(*x*, *y*)*Q*(*x*, *dy*). Here we choose

$$\kappa(\mathbf{x}, y) = 1 \land \frac{\pi(y)q(\mathbf{x}|y)}{\pi(\mathbf{x})q(y|\mathbf{x})},\tag{6}$$

where *a* ∧ *b* denotes the minimum of *a* and *b*. This choice implies that *P* satisfies detailed balance for *π*(·) [23], and hence the chain is *π*-reversible (note that other choices for *α* can result in non-reversible chains, see Reference[24] for details).

Roberts and Tweedie [5], following on from Reference[21], introduced the following regularity conditions.

**Theorem 1.** *(Roberts and Tweedie). Suppose that π*(*x*) *is bounded away from* 0 *and* ∞ *on compact sets, and there exists δ<sup>q</sup>* > 0 *and ε<sup>q</sup>* > 0 *such that for every x*

$$|x - y| \le \delta\_q \Rightarrow q(y|x) \ge \varepsilon\_q.$$

*Then the chain with kernel (5) is μLeb-irreducible and aperiodic, and every nonempty compact set is small.*

For the choices of *Q* considered in this article these conditions hold, and we will restrict ourselves to forms of *π*(*x*) for which the same is true (apart from a specific case in Section 5). Under Theorem 1 then (1) only holds if a Lyapunov function *V* : X → [1, ∞] with E*π*[*V*] < ∞ exists such that

$$\limsup\_{|\mathbf{x}|\to\infty} \frac{PV(\mathbf{x})}{V(\mathbf{x})} < 1. \tag{7}$$

when *P* is of the Metropolis–Hastings type, (7) can be written

$$\limsup\_{|\mathbf{x}|\to\infty} \int \left[\frac{V(\mathbf{y})}{V(\mathbf{x})} - 1\right] a(\mathbf{x}, \mathbf{y}) Q(\mathbf{x}, d\mathbf{y}) < 0. \tag{8}$$

In this case, a simple criterion for lack of geometric ergodicity is

$$\limsup\_{|x|\to\infty} r(x) = 1. \tag{9}$$

Intuitively this implies that the chain is likely to get 'stuck' in the tails of a distribution for large periods.

Jarner and Tweedie [25] introduce a necessary condition for geometric ergodicity through a *tightness* condition.

**Theorem 2.** *(Jarner and Tweedie). If for any ε* > 0 *there is a δ* > 0 *such that for all x* ∈ X

$$P(\mathbf{x}, B\_\delta(\mathbf{x})) > 1 - \varepsilon\_\prime$$

*where Bδ*(*x*) := {*y* ∈ X : *d*(*x*, *y*) < *δ*}*, then a necessary condition for P to produce a geometrically ergodic chain is that for some s* > 0

$$\int e^{s|x|} \pi(dx) < \infty.$$

The result highlights that when *π*(·) is heavy-tailed the chain must be able to make very large moves and still be capable of returning to the centre quickly for (1) to hold.

#### **3. Position-Dependent Random Walk Metropolis**

In the RWM, *Q*(*x*, *dy*) = *q*(*y* − *x*)*dy* with *q*(*y* − *x*) = *q*(*x* − *y*), meaning (6) reduces to *α*(*x*, *y*) = 1 ∧ *π*(*y*)/*π*(*x*). A common choice is *Q*(*x*, ·) = N (*x*, *h*Σ), with Σ chosen to mimic the global covariance structure of *π*(·) [3]. Various results exist concerning the optimal choice of *h* in a given setting (e.g., [26]). It is straightforward to see that Theorem 2 holds here, so that the tails of *π*(*x*) must be uniformly exponential or lighter for geometric ergodicity. In one dimension this is in fact a sufficient condition [4], while for higher dimensions additional conditions are required [5]. We return to this case in Section 5.

In the PDRWM *<sup>Q</sup>*(*x*, ·) = <sup>N</sup> (*x*, *hG*(*x*)−1), so (6) becomes

$$\mu(\mathbf{x}, y) = 1 \wedge \frac{\pi(y)|G(y)|^{\frac{1}{2}}}{\pi(\mathbf{x})|G(\mathbf{x})|^{\frac{1}{2}}} \exp\left(-\frac{1}{2}(\mathbf{x} - y)^{T}[G(y) - G(\mathbf{x})](\mathbf{x} - y)\right).$$

The motivation for designing such an algorithm is that proposals are more able to reflect the local dependence structure of *π*(·). In some cases this dependence may vary greatly in different parts of the state-space, making a global choice of Σ ineffective [9].

Readers familiar with differential geometry will recognise the volume element |*G*(*x*)| 1/2*dx* and the linear approximations to the distance between *x* and *y* taken at each point through *G*(*x*) and *G*(*y*) if X is viewed as a Riemannian manifold with metric *G*. We do not explore these observations further here, but the interested reader is referred to Reference [27] for more discussion.

The choice of *G*(*x*) is an obvious question. In fact, specific variants of this method have appeared on many occasions in the literature, some of which we now summarise.


The latter cases also motivate any choice of the form

$$G(\mathbf{x})^{-1} = \sum\_{i=1}^{n} w(\mathbf{x}, z\_i) (z\_i - \mathbf{x})^T (z\_i - \mathbf{x})$$

for some past samples {*z*1, ..., *zn*} and weight function *w* : X ×X → [0, ∞) with ∑*<sup>i</sup> w*(*x*, *zi*) = 1 that decays as |*x* − *zi*| grows, which would also mimic the local curvature of *π*(·) (taking care to appropriately regularise and diminish adaptation so as to preserve ergodicity, as outlined in Reference [10]).

Some of the above schemes are examples of adaptive MCMC, in which a candidate from among a family of Markov kernels {*P<sup>θ</sup>* : *θ* ∈ Θ} is selected by learning the parameter *θ* ∈ Θ during the simulation [10]. Additional conditions on the adaptation process (i.e., the manner in which *θ* is learned) are required to establish ergodicity results for the resulting stochastic processes. We consider the decisions on how to learn *θ* appropriately to be a separate problem and beyond the scope of the present work, and instead focus attention on establishing geometric ergodicity of the base kernels *P<sup>θ</sup>* for any fixed *θ* ∈ Θ. We note that this is typically a pre-requisite for establishing convergence properties of any adaptive MCMC method [10].

#### **4. Results in One Dimension**

Here we consider two different general scenarios as |*x*| → ∞, i) *G*(*x*) is bounded above and below, and ii) *G*(*x*) → 0 at some specified rate. Of course there is also the possibility that *G*(*x*) → ∞, though intuitively this would result in chains that spend a long time in the tails of a distribution, so we do not consider it (if *G*(*x*) → ∞ then chains will in fact exhibit the *negligible moves* property studied in Reference [29]). Proofs to Propositions in Sections 4 and 5 can be found in Appendix A.

We begin with a result that emphasizes that a growing variance is a necessary requirement for geometric ergodicity in the heavy-tailed case.

**Proposition 1.** *If <sup>G</sup>*(*x*) <sup>≥</sup> *<sup>σ</sup>*−<sup>2</sup> *for some <sup>σ</sup>*−<sup>2</sup> <sup>&</sup>gt; <sup>0</sup>*, then unless <sup>e</sup>η*|*x*<sup>|</sup> *π*(*dx*) < ∞ *for some η* > 0 *the PDRWM cannot produce a geometrically ergodic Markov chain.*

The above is a simple extension of a result that is well-known in the RWM case. Essentially the tails of the distribution should be exponential or lighter to ensure fast convergence. This motivates consideration of three different types of behaviour for the tails of *π*(·).

**Assumption 1.** *The density π*(*x*) *satisfies one of the following tail conditions for all y*, *x* ∈ X *such that* |*y*| > |*x*| > *t, for some finite t* > 0*.*

*1. π*(*y*)/*π*(*x*) ≤ exp{−*a*(|*y*|−|*x*|)} *for some a* > 0


Naturally Assumption 1 implies 2 and Assumption 2 implies 3. If Assumption 1 is not satisfied then *π*(·) is generally called *heavy-tailed*. When *π*(*x*) satisfies Assumption 2 or 3 but not 1, then the RWM typically fails to produce a geometrically ergodic chain [4]. We show in the sequel, however, that this is not always the case for the PDRWM. We assume the below assumptions for *G*(*x*) to hold throughout this section.

**Assumption 2.** *The function <sup>G</sup>* : X → (0, <sup>∞</sup>) *is bounded above by some <sup>σ</sup>*−<sup>2</sup> *<sup>b</sup>* < ∞ *for all x* ∈ X *, and bounded below for all x* ∈ X *with* |*x*| < *t, for some t* > 0*.*

The heavy-tailed case is known to be a challenging scenario, but the RWM will produce a geometrically ergodic Markov chain if *π*(*x*) is log-concave. Next we extend this result to the case of sub-quadratic variance growth in the tails.

**Proposition 2.** *If* ∃*r* < ∞ *such that G*(*x*) ∝ |*x*| <sup>−</sup>*<sup>γ</sup> whenever* <sup>|</sup>*x*<sup>|</sup> <sup>&</sup>gt; *r, then the PDRWM will produce a geometrically ergodic chain in both of the following cases:*


The second part of Proposition 2 is not true for the RWM, for which Assumption 2 alone is not sufficient for geometric ergodicity [4].

We do not provide a complete proof that the PDRWM will not produce a geometrically ergodic chain when only Assumption 3 holds and *G*(*x*) ∝ |*x*| <sup>−</sup>*<sup>γ</sup>* for some *γ* < 2, but do show informally that this will be the case. Assuming that in the tails *π*(*x*) ∝ |*x*| <sup>−</sup>*<sup>p</sup>* for some *p* > 1 then for large *x*

$$\mathfrak{a}(\mathbf{x}, \mathbf{x} + c\mathbf{x}^{\gamma/2}) = 1 \wedge \left(\frac{\mathbf{x}}{\mathbf{x} + c\mathbf{x}^{\gamma/2}}\right)^{p + \gamma/2} \exp\left(-\frac{c^2 \mathbf{x}^{\gamma}}{2h} \left[\frac{1}{(\mathbf{x} + c\mathbf{x}^{\gamma/2})^{\gamma}} - \frac{1}{\mathbf{x}^{\gamma}}\right]\right). \tag{10}$$

The first expression on the right hand side converges to 1 as *x* → ∞, which is akin to the case of fixed proposal covariance. The second term will be larger than one for *c* > 0 and less than one for *c* < 0. So the algorithm will exhibit the same 'random walk in the tails' behaviour which is often characteristic of the RWM in this scenario, meaning that the acceptance rate fails to enforce a geometric drift back into the centre of the space.

When *γ* = 2 the above intuition will not necessarily hold, as the terms in Equation (10) will be roughly constant with *x*. When only Assumption 3 holds, it is, therefore, tempting

to make the choice *<sup>G</sup>*(*x*) = *<sup>x</sup>*−<sup>2</sup> for <sup>|</sup>*x*<sup>|</sup> <sup>&</sup>gt; *<sup>r</sup>*. Informally we can see that such behaviour may lead to a favourable algorithm if a small enough *h* is chosen. For any fixed *x* > *r* a typical proposal will now take the form *y* = (1 + *ξ* <sup>√</sup>*h*)*x*, where *<sup>ξ</sup>* <sup>∼</sup> *<sup>N</sup>*(0, 1). It therefore holds that

$$y = e^{\frac{\pi}{6}\sqrt{h}}x + r(x, h, \xi),\tag{11}$$

where for any fixed *x* and *ξ* the term *r*(*x*, *h*, *ξ*)/ <sup>√</sup>*<sup>h</sup>* <sup>→</sup> 0 as *<sup>h</sup>* <sup>→</sup> 0. The first term on the right-hand side of Equation (11) corresponds to the proposal of the *multiplicative Random Walk Metropolis*, which is known to be geometrically ergodic under Assumption 3 (e.g., [3]), as this equates to taking a logarithmic transformation of *x*, which 'lightens' the tails of the target density to the point where it becomes log-concave. So in practice we can expect good performance from this choice of *G*(*x*). The above intuition does not, however, provide enough to establish geometric ergodicity, as the final term on the right-hand side of (11) grows unboundedly with *x* for any fixed choice of *h*. The difference between the acceptance rates of the multiplicative Random Walk Metropolis and the PDRWM with *G*(*x*) = *x*−<sup>2</sup> will be the exponential term in Equation (10). This will instead become polynomial by letting the proposal noise *ξ* follow a distribution with polynomial tails (e.g., student's t), which is known to be a favourable strategy for the RWM when only Assumption 3 holds [6]. One can see that if the heaviness of the proposal distribution is carefully chosen then the acceptance rate may well enforce a geometric drift into the centre of the space, though for brevity we restrict attention to Gaussian proposals in this article.

The final result of this section provides a note of warning that lack of care in choosing *G*(*x*) can have severe consequences for the method.

**Proposition 3.** *If <sup>G</sup>*(*x*)*x*<sup>2</sup> <sup>→</sup> <sup>0</sup> *as* <sup>|</sup>*x*| → <sup>∞</sup>*, then the PDRWM will not produce a geometrically ergodic Markov chain.*

The intuition for this result is straightforward when explained. In the tails, typically |*<sup>y</sup>* − *<sup>x</sup>*| will be the same order of magnitude as *G*(*x*)−1, meaning |*y* − *x*|/|*x*| grows arbitrarily large as |*x*| grows. As such, proposals will 'overshoot' the typical set of the distribution, sending the sampler further out into the tails, and will therefore almost always be rejected. The result can be related superficially to a lack of geometric ergodicity for Metropolis–Hastings algorithms in which the proposal mean is comprised of the current state translated by a drift function (often based in ∇ log *π*(*x*)) when this drift function grows faster than linearly with |*x*| (e.g., [30,31]).

#### **5. A Higher-Dimensional Case Study**

An easy criticism of the above analysis is that the one-dimensional scenario is sometimes not indicative of the more general behaviour of a method. We note, however, that typically the geometric convergence properties of Metropolis–Hastings algorithms do carry over somewhat naturally to more than one dimension when *π*(·) is suitably regular (e.g., [5,32]). Because of this we expect that the growth conditions specified above could be supplanted onto the determinant of *G*(*x*) when the dimension is greater than one (leaving the details of this argument for future work).

A key difference in the higher-dimensional setting is that *G*(*x*) now dictates both the *size* and *direction* of proposals. In the case *G*(*x*)−<sup>1</sup> = Σ, some additional regularity conditions on *π*(*x*) are required for geometric ergodicity in more than one dimension, outlined in References [5,32]. An example is also given in Reference [5] of the simple two-dimensional density *<sup>π</sup>*(*x*, *<sup>y</sup>*) <sup>∝</sup> exp(−*x*<sup>2</sup> <sup>−</sup> *<sup>y</sup>*<sup>2</sup> <sup>−</sup> *<sup>x</sup>*2*y*2), which fails to meet these criteria. The difficult models are those for which probability concentrates on a ridge in the tails, which becomes ever narrower as |*x*| increases. In this instance, proposals from the RWM are less and less likely to be accepted as |*x*| grows. Another well-known example of this phenomenon is the *funnel* distribution introduced in Reference [33].

To explore the behaviour of the PDRWM in this setting, we design a model problem, the *staircase* distribution, with density

$$\mathfrak{s}(\mathbf{x}) \propto \mathfrak{z}^{-\lfloor \mathbf{x}\_2 \rfloor} \mathbb{I}\_R(\mathbf{x}), \ \ R := \{ y \in \mathbb{R}^2; y\_2 \ge 1, |y\_1| \le 3^{1 - \lfloor y\_2 \rfloor} \}, \tag{12}$$

where *z* denotes the integer part of *z* > 0. Graphically the density is a sequence of cuboids on the upper-half plane of R<sup>2</sup> (starting at *y*<sup>2</sup> = 1), each centred on the vertical axis, with each successive cuboid one third of the width and height of the previous. The density resembles an ever narrowing staircase, as shown in Figure 1.

**Figure 1.** The staircase distribution, with density given by Equation (12).

We denote by *QR* the proposal kernel associated with the Random Walk Metropolis algorithm with fixed covariance *h*Σ. In fact, the specific choice of *h* and Σ does not matter provided that the result is positive-definite. For the PDRWM we denote by *QP* the proposal kernel with covariance matrix

$$hG(x)^{-1} = \begin{pmatrix} 3^{-2\lfloor x\_2 \rfloor} & 0\\ 0 & 1 \end{pmatrix}'$$

which will naturally adapt the scale of the first coordinate to the width of the ridge.

**Proposition 4.** *The Metropolis–Hastings algorithm with proposal QR does not produce a geometrically ergodic Markov chain when π*(*x*) = s(*x*)*.*

The design of the PDRWM proposal kernel *QP* in this instance is such that the proposal covariance reduces at the same rate as the width of the stairs, therefore naturally adapting the proposal to the width of the ridge on which the density concentrates. This statedependent adaptation results in a geometrically ergodic chain, as shown in the below result.

**Proposition 5.** *The Metropolis–Hastings algorithm with proposal QP produces a geometrically ergodic Markov chain when π*(*x*) = s(*x*)*.*

#### **6. Discussion**

In this paper we have analysed the ergodic behaviour of a Metropolis–Hastings method with proposal kernel *<sup>Q</sup>*(*x*, ·) = <sup>N</sup> (*x*, *hG*(*x*)−1). In one dimension we have characterised the behaviour in terms of growth conditions on *G*(*x*)−<sup>1</sup> and tail conditions on the target distribution, and in higher dimensions a carefully constructed model problem is discussed. The fundamental question of interest was whether generalising an existing Metropolis–Hastings method by allowing the proposal covariance to change with position

can alter the ergodicity properties of the sampler. We can confirm that this is indeed possible, either for the better or worse, depending on the choice of covariance. The take home points for practitioners are (i) lack of sufficient care in the design of *G*(*x*) can have severe consequences (as in Proposition 3), and (ii) careful choice of *G*(*x*) can have much more beneficial ones, perhaps the most surprising of which are in the higher-dimensional setting, as shown in Section 5.

We feel that such results can also offer insight into similar generalisations of different Metropolis–Hastings algorithms (e.g., [13,34]). For example, it seems intuitive that any method in which the variance grows at a faster than quadratic rate in the tails is unlikely to produce a geometrically ergodic chain. There are connections between the PDRWM and some extensions of the Metropolis-adjusted Langevin algorithm [34], the ergodicity properties of which are discussed in Reference [35]. The key difference between the schemes is the inclusion of the drift term *<sup>G</sup>*(*x*)−1<sup>∇</sup> log *<sup>π</sup>*(*x*)/2 in the latter. It is this term which in the main governs the behaviour of the sampler, which is why the behaviour of the PDRWM is different to this scheme. Markov processes are also used in a wide variety of application areas beyond the design of Metropolis–Hastings algorithms (e.g., [36]), and we hope that some of the results established in the present work prove to be beneficial in some of these other settings.

We can apply these results to the specific variants discussed in Section 3. Provided that sensible choices of regions/weights are made and that an adaptation scheme which obeys the diminishing adaptation criterion is employed, the Regional adaptive Metropolis– Hastings, Locally weighted Metropolis and Kernel-adaptive Metropolis–Hastings samplers should all satisfy *G*(*x*) → Σ as |*x*| → ∞, meaning they can be expected to inherit the ergodicity properties of the standard RWM (the behaviour in the centre of the space, however, will likely be different). In the State-dependent Metropolis method provided *b* < 2 the sampler should also behave reasonably. Whether or not a large enough value of *b* would be found by a particular adaptation rule is not entirely clear, and this could be an interesting direction of further study. The Tempered Langevin diffusion scheme, however, will fail to produce a geometrically ergodic Markov chain whenever the tails of *π*(*x*) are lighter than that of a Cauchy distribution. To allow reasonable tail exploration when this is the case, two pragmatic options would be to upper bound *G*(*x*)−<sup>1</sup> manually or use this scheme in conjunction with another, as there is evidence that the sampler can perform favourably when exploring the centre of a distribution [8]. None of the specific variants discussed here are able to mimic the local curvature of the *π*(*x*) in the tails, so as to enjoy the favourable behaviour exemplified in Proposition 5. This is possible using Hessian information as in Reference [13], but should also be possible in some cases using appropriate surrogates.

**Funding:** This research was supported by a UCL IMPACT PhD scholarship co-funded by Xerox Research Centre Europe and EPSRC.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not Applicable

**Acknowledgments:** The author thanks Alexandros Beskos, Krzysztof Łatuszy ´nski and Gareth Roberts for several useful discussions, Michael Betancourt for proofreading the paper, and Mark Girolami for general supervision and guidance.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A. Proofs**

**Proof of Proposition 1.** In this case, for any choice of *ε* > 0 there is a *δ* > 0 such that *Q*(*x*, *Bδ*(*x*)) > 1 − *ε*. Noting that *P*(*x*, *Bδ*(*x*)) ≥ *Q*(*x*, *Bδ*(*x*)) when *P* is of Metropolis– Hastings type, Theorem 2 can be applied directly.

**Proof of Proposition 2.** For the log-concave case, take *V*(*x*) = *es*|*x*<sup>|</sup> for some *s* > 0, and let *BA* denote the integral (8) over the set *<sup>A</sup>*. We first break up <sup>X</sup> into (−∞, 0] <sup>∪</sup> (0, *<sup>x</sup>* <sup>−</sup> *cx <sup>γ</sup>* <sup>2</sup> ] ∪ (*<sup>x</sup>* <sup>−</sup> *cx <sup>γ</sup>* <sup>2</sup> , *<sup>x</sup>* + *cx <sup>γ</sup>* <sup>2</sup> ] <sup>∪</sup> (*<sup>x</sup>* <sup>+</sup> *cx <sup>γ</sup>* <sup>2</sup> , *<sup>x</sup>* <sup>+</sup> *cxγ*] <sup>∪</sup> (*<sup>x</sup>* <sup>+</sup> *cxγ*, <sup>∞</sup>) for some *<sup>x</sup>* <sup>&</sup>gt; 0 and fixed constant *c* ∈ (0, ∞), and show that the integral is strictly negative on at least one of these sets, and can be made arbitrarily small as *x* → ∞ on all others. The −∞ case is analogous from the tail conditions on *π*(*x*). From the conditions we can choose *x* > *r* and therefore write *G*(*x*)−<sup>1</sup> = *ηx<sup>γ</sup>* for some fixed *η* < ∞.

On (−∞, 0], we have

$$\begin{aligned} B\_{(-\\\\\infty,0]} &= e^{-s\mathbf{x}} \int\_{-\\\infty}^{0} \boldsymbol{\varepsilon}^{s|y|} a(\mathbf{x},y) \boldsymbol{Q}(\mathbf{x},dy) - \int\_{-\\\infty}^{0} a(\mathbf{x},y) \boldsymbol{Q}(\mathbf{x},dy),\\ &\leq e^{-s\mathbf{x}} \int\_{0}^{\infty} \boldsymbol{\varepsilon}^{sy} \boldsymbol{Q}(-\mathbf{x},dy). \end{aligned}$$

The integral is now proportional to the moment generating function of a truncated Gaussian distribution (see Appendix B), so is given by

$$e^{-s\mathbf{x} + \hbar \eta \mathbf{x}^{\gamma} \mathbf{s}^2 / 2} \left[ 1 - \Phi \left( \mathbf{x}^{1 - \gamma/2} / \sqrt{\hbar \eta} - \sqrt{\hbar \eta} \mathbf{s} \mathbf{x}^{\gamma/2} \right) \right].$$

A simple bound on the error function is <sup>√</sup>2*πx*Φ*c*(*x*) <sup>&</sup>lt; *<sup>e</sup>*−*x*2/2 [37], so setting *<sup>ϑ</sup>* <sup>=</sup> *x*1−*<sup>γ</sup>*/2/ *hη* − *hηsxγ*/2 we have

$$\begin{split} B\_{(-\\\\\infty,0]} &\leq \frac{1}{\sqrt{2\pi}} \exp\left(-2s\mathbf{x} + \frac{\hbar\eta s^{2}}{2}\mathbf{x}^{\gamma} - \frac{1}{2}\left(\frac{1}{\hbar\eta}\mathbf{x}^{2-\gamma} - 2s\mathbf{x} + \hbar\eta\mathbf{s}^{2}\mathbf{x}^{\gamma}\right) + \log\theta\right),\\ &= \frac{1}{\sqrt{2\pi}} \exp\left(-s\mathbf{x} - \frac{1}{2\hbar\eta}\mathbf{x}^{2-\gamma} + \log\theta\right). \end{split}$$

which → 0 as *x* → ∞, so can be made arbitrarily small.

On (0, *<sup>x</sup>* <sup>−</sup> *cxγ*/2], note that *<sup>e</sup>s*(|*y*|−|*x*|) <sup>−</sup> 1 is clearly negative throughout this region provided that *c* < *x*1−*<sup>γ</sup>*/2, which can be enforced by choosing *x* large enough for any given *<sup>c</sup>* <sup>&</sup>lt; <sup>∞</sup>. So the integral is straightforwardly bounded as *<sup>B</sup>*(0,*x*−*cxγ*/2] <sup>≤</sup> 0 for all *<sup>x</sup>* ∈ X .

On (*<sup>x</sup>* <sup>−</sup> *cx<sup>γ</sup>*/2, *<sup>x</sup>* <sup>+</sup> *cxγ*/2], provided *<sup>x</sup>* <sup>−</sup> *cxγ*/2 <sup>&</sup>gt; *<sup>r</sup>* then for any *<sup>y</sup>* in this region we can either upper or lower bound *α*(*x*, *y*) with the expression

$$\exp\left(-a(y-\mathbf{x})+\frac{\gamma}{2}\log\left|\frac{\mathbf{x}}{y}\right|-\frac{1}{2l\eta}\left[(\mathbf{x}-y)^{2}y^{-\gamma}-(\mathbf{x}-y)^{2}\mathbf{x}^{-\gamma}\right]\right).$$

A Taylor expansion of *y*−*<sup>γ</sup>* about *x* gives

$$y^{-\gamma} = \mathbf{x}^{-\gamma} - \gamma \mathbf{x}^{-\gamma - 1}(y - \mathbf{x}) + \gamma(\gamma + 1)\mathbf{x}^{-\gamma - 2}(y - \mathbf{x})^2 + \dots$$

and multiplying by (*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*)<sup>2</sup> gives

$$\chi(y-x)^2 y^{-\gamma} = \frac{(y-x)^2}{x^{\gamma}} - \gamma \frac{(y-x)^3}{x^{\gamma+1}} + \gamma(\gamma+1) \frac{(y-x)^4}{x^{\gamma+2}} + \dots$$

If <sup>|</sup>*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*<sup>|</sup> <sup>=</sup> *cxγ*/2 then this is:

$$\frac{\mathfrak{c}^2 \mathfrak{x}^\gamma}{\mathfrak{x}^\gamma} - \gamma \frac{\mathfrak{c}^3 \mathfrak{x}^{3\gamma/2}}{\mathfrak{x}^{\gamma+1}} + \gamma(\gamma+1) \frac{\mathfrak{c}^4 \mathfrak{x}^{2\gamma}}{\mathfrak{x}^{\gamma+2}} + \dots$$

As *γ* < 2 then 3*γ*/2 < *γ* + 1, and similarly for successive terms, meaning each gets smaller as <sup>|</sup>*x*| → <sup>∞</sup>. So we have for large *<sup>x</sup>*, *<sup>y</sup>* <sup>∈</sup> (*<sup>x</sup>* <sup>−</sup> *cx<sup>γ</sup>*/2, *<sup>x</sup>* <sup>+</sup> *cxγ*/2) and any *<sup>δ</sup>* <sup>&</sup>gt; <sup>0</sup>

$$(y-x)^2y^{-\gamma} \ge \frac{(y-x)^2}{x^{\gamma}} - \gamma \frac{(y-x)^3}{x^{\gamma+1}} - 2\ln\eta\delta.\tag{A1}$$

So we can analyse how the acceptance rate behaves. First note that for fixed > 0

$$\log(\mathbf{x}, \mathbf{x} + \boldsymbol{\epsilon}) \le \exp\left(-a\boldsymbol{\epsilon} + \frac{\gamma}{2} \log\left|\frac{\mathbf{x}}{\mathbf{x} + \boldsymbol{\epsilon}}\right| + \frac{1}{2h} \gamma \frac{\boldsymbol{\epsilon}^3}{\mathbf{x}^{\gamma + 1}} + \boldsymbol{\delta}\right) \to \exp(-a\boldsymbol{\epsilon} + \boldsymbol{\delta}),$$

recalling that *δ* can be made arbitrarily small. In fact, it holds that the *e*−*a* term will be dominant for any for which 3/*xγ*+<sup>1</sup> <sup>→</sup> 0, i.e., any <sup>=</sup> *<sup>o</sup>*(*xγ*+1/3). If *<sup>γ</sup>* <sup>&</sup>lt; 2 then <sup>=</sup> *cxγ*/2 satisfies this condition. So for any *y* > *x* in this region we can choose an *x* such that

$$a(x, y) \le \exp\left(-a(y - x) + \delta\_x\right),$$

where *δ<sup>x</sup>* → 0 as *x* → ∞. Similarly we have (for any fixed > 0)

$$a(\mathbf{x}, \mathbf{x} - \boldsymbol{\epsilon}) \ge \exp\left(a\boldsymbol{\epsilon} + \frac{\gamma}{2} \log \left|\frac{\mathbf{x}}{\mathbf{x} - \boldsymbol{\epsilon}}\right| - \frac{1}{2h} \gamma \frac{\boldsymbol{\epsilon}^3}{\mathbf{x}^{\gamma + 1}} - \boldsymbol{\delta}\right) \to \exp(a\boldsymbol{\epsilon} - \boldsymbol{\delta}).$$

So by a similar argument we have *α*(*x*, *y*) > 1 here when *x* → ∞. Combining gives

$$B\_{(x-c\chi^{\gamma/2},x+c\chi^{\gamma/2}]} \stackrel{\prec}{=} \int\_0^{c\chi^{\gamma/2}} \left[ e^{(s-a)z+\delta\_x} - e^{-az+\delta\_x} + e^{-sz} - 1 \right] q\_x(dz),$$

where *qx*(·) denotes a zero mean Gaussian distribution with the same variance as *Q*(*x*, ·). Using the change of variables *z* = *z*/(*hηxγ*/2) we can write the above integral

$$\int\_{0}^{\xi} \left[ e^{(s-a)\mathrm{l}\eta \chi^{\gamma/2}z' + \delta\_{\mathbf{x}}} - e^{-al\eta \chi \chi^{\gamma/2}z + \delta\_{\mathbf{x}}} + e^{-sl\eta \chi \chi^{\gamma/2}z'} - 1 \right] \mu(dz) = 0$$

where *μ*(·) denotes a Gaussian distribution with zero mean and variance one. Provided *s* < *a*, then by dominated convergence as *x* → ∞ this asymptotes to

$$-\int\_0^{\frac{c}{\hbar\eta}} \mu(dz) = -\frac{1}{2} \text{erf}\left(\frac{c}{\sqrt{2}\hbar\eta}\right) < 0\_\star$$

where erf(*z*) := (2/√*π*) *z* <sup>0</sup> *<sup>e</sup>*−*<sup>t</sup>* 2 *dt* is the Gaussian error function. On (*x* + *cx<sup>γ</sup>*/2, *x* + *cxγ*] we can upper bound the acceptance rate as

$$\kappa(x, y) \le \frac{\pi(y)}{\pi(x)} \exp\left(\frac{1}{2} \log \frac{|G(y)|}{|G(x)|} + \frac{G(x)}{2h} (x - y)^2\right),$$

If *y* ≥ *x* and *x* > *x*<sup>0</sup> we have

$$a(\mathbf{x}, y) \le \exp\left(-a(|y| - |\mathbf{x}|) + \frac{1}{2h\eta} \frac{(\mathbf{x} - y)^2}{\mathbf{x}^\gamma}\right).$$

For <sup>|</sup>*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*<sup>|</sup> <sup>=</sup> *cx*this becomes

$$u(x,y) \le \exp\left(-acx^{\ell} + \frac{c^2}{2h\eta}x^{2\ell-\gamma}\right)$$

So provided *γ* > the first term inside the exponential will dominate the second for large enough *x*. In the equality case we have

$$a(x,y) \le \exp\left(\left(\frac{c^2}{2h\eta} - a\right)cx^{\gamma}\right),$$

so provided we choose *c* such that *a* > *c*2/(2*hη*) then the acceptance rate will also decay exponentially. Because of this we have

$$\begin{aligned} B\_{\left(\mathbf{x} + c\mathbf{x}^{\gamma/2}, \mathbf{x} + c\mathbf{x}^{\gamma}\right)} &\leq \int\_{\mathbf{x} + c\mathbf{x}^{\gamma/2}}^{\mathbf{x} + c\mathbf{x}^{\gamma}} e^{\mathbf{s}\left(\mathbf{y} - \mathbf{x}\right)} a(\mathbf{x}, \mathbf{y}) Q(\mathbf{x}, d\mathbf{y}), \\ &\leq e^{\left(c^{2}/(2l\eta) + s - a\right)c\mathbf{x}^{\gamma/2}} Q(\mathbf{x}, (\mathbf{x} + c\mathbf{x}^{\gamma/2}, \mathbf{x} + c\mathbf{x}^{\gamma}]), \end{aligned}$$

so provided *a* > *c*2/(2*hη*) + *s* then this term can be made arbitrarily small.

On (*x* + *cxγ*, ∞) using the same properties of truncated Gaussians we have

$$\begin{aligned} B\_{(\mathbf{x}+c\mathbf{x}^{\gamma},\infty)} &\leq \varepsilon^{-s\mathbf{x}} \int\_{\mathbf{x}+c\mathbf{x}^{\gamma}}^{\infty} \varepsilon^{s\mathbf{y}} Q(\mathbf{x},d\mathbf{y}),\\ &= \varepsilon^{s^2h\eta\mathbf{x}^{\gamma}/2} \Phi^c \left( \left( \frac{c}{\sqrt{h\eta}} - \sqrt{h\eta\mathbf{s}} \right) \mathbf{x}^{\gamma} \right). \end{aligned}$$

which can be made arbitrarily small provided that *s* is chosen to be small enough using the same simple bound on <sup>Φ</sup>*<sup>c</sup>* as for the case of *<sup>B</sup>*(−∞,0].

Combining gives that the integral (8) is bounded above by −erf(*c*/ 2*h*2*η*2)/2, which is strictly less than zero as *c*, *h* and *η* are all positive. This completes the proof under Assumption 1.

Under Assumption 2 the proof is similar. Take *V*(*x*) = *es*|*x*<sup>|</sup> *β* , and divide X up into the same regions. Outside of (*<sup>x</sup>* <sup>−</sup> *cx<sup>γ</sup>*/2, *<sup>x</sup>* <sup>+</sup> *cxγ*/2] the same arguments show that the integral can be made arbitrarily small. On this set, note that in the tails

$$
\mu (\mathbf{x} + c\mathbf{x}^{\ell})^{\beta} - \mathbf{x}^{\beta} = \beta c \mathbf{x}^{\ell + \beta - 1} + \frac{\beta(\beta - 1)}{2} c^2 \mathbf{x}^{2\ell + \beta - 2} + \dots
$$

For *<sup>y</sup>* <sup>−</sup> *<sup>x</sup>* <sup>=</sup> *cx*-, then for - < 1 − *β* this becomes negligible. So in this case we further divide the typical set into (*x*, *<sup>x</sup>* <sup>+</sup> *cx*1−*β*] <sup>∪</sup> (*<sup>x</sup>* <sup>+</sup> *cx*1−*β*, *<sup>x</sup>* <sup>+</sup> *cxγ*/2). On (*<sup>x</sup>* <sup>−</sup> *cx*1−*β*, *<sup>x</sup>* <sup>+</sup> *cx*1−*β*) the integral is bounded above by *<sup>e</sup>*−*c*1*Q*(*x*,(*<sup>x</sup>* <sup>−</sup> *cx*1−*β*, *<sup>x</sup>* <sup>+</sup> *cx*1−*β*)) <sup>→</sup> 0, for some suitably chosen *<sup>c</sup>*<sup>1</sup> <sup>&</sup>gt; 0. On (*<sup>x</sup>* <sup>−</sup> *cx<sup>γ</sup>*/2, *<sup>x</sup>* <sup>−</sup> *cx*1−*β*] <sup>∪</sup> (*<sup>x</sup>* <sup>+</sup> *cx*1−*β*, *<sup>x</sup>* <sup>+</sup> *cxγ*/2] then for *<sup>y</sup>* <sup>&</sup>gt; *<sup>x</sup>* we have *<sup>α</sup>*(*x*, *<sup>y</sup>*) <sup>≤</sup> *<sup>e</sup>*−*c*2(*yβ*−*xβ*), so we can use the same argument as in the the log-concave case to show that the integral will be strictly negative in the limit.

**Proof of Proposition 3.** First note that in this case for any *g* : R → (0, ∞) such that as |*x*| → ∞ it holds that *g*(*x*)/|*x*| → ∞ but *g*(*x*) *G*(*x*) → 0, then

$$Q(\mathbf{x}, \{\mathbf{x} - \mathbf{g}(\mathbf{x}), \mathbf{x} + \mathbf{g}(\mathbf{x})\}) = \Phi\left(\mathbf{g}(\mathbf{x})\sqrt{G(\mathbf{x})}\right) - \Phi\left(-\mathbf{g}(\mathbf{x})\sqrt{G(\mathbf{x})}\right) \to 0$$

as |*x*| → ∞. The chain therefore has the property that P({|*Xi*+1| > *g*(*Xi*)/2}∪{*Xi*+<sup>1</sup> = *Xi*}) can be made arbitrarily close to 1 as |*Xi*| grows, which leads to two possible behaviours. If the form of *π*(·) enforces such large jumps to be rejected then *r*(*x*) → 1 and lack of geometric ergodicity follows from (9). If this is not the case then the chain will be transient (this can be made rigorous using a standard Borel–Cantelli argument, see e.g., the proof of Theorem 12.2.2 on p. 299 of [21]).

**Proof of Proposition 4.** It is sufficient to construct a sequence of points *xp* <sup>∈</sup> <sup>R</sup><sup>2</sup> such that |*xp*| → ∞ as *p* → ∞, and show that *r*(*xp*) → 1 in the same limit, then apply (9). Take *xp* = (0, *p*) for *p* ∈ N. In this case

$$r(\mathbf{x}\_p) = 1 - \int \alpha(\mathbf{x}\_{p'}y) Q\_R(\mathbf{x}\_{p'}dy)$$

Note that for every > 0 there is a *δ* < ∞ such that *Q*(*xp*, *B<sup>c</sup> <sup>δ</sup>*(*xp*)) < for all *xp*, where *<sup>B</sup>δ*(*x*) :<sup>=</sup> {*<sup>y</sup>* <sup>∈</sup> <sup>R</sup><sup>2</sup> : <sup>|</sup>*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*| ≤ *<sup>δ</sup>*}. The set *<sup>A</sup>*(*xp*, *<sup>δ</sup>*) :<sup>=</sup> *<sup>B</sup>δ*(*xp*) <sup>∩</sup> *<sup>R</sup>* denotes the possible values of *y* ∈ *Bδ*(*x*) for which the acceptance rate is non-zero. Note that *<sup>A</sup>*(*xp*, *<sup>δ</sup>*) <sup>⊂</sup> *<sup>S</sup>*(*xp*, *<sup>δ</sup>*) :<sup>=</sup> {*<sup>y</sup>* <sup>∈</sup> *<sup>B</sup>δ*(*xp*) : <sup>|</sup>*y*1| ≤ <sup>3</sup>1−*p*−*<sup>δ</sup>*}, which is simply a strip that can be made arbitrarily narrow for any fixed *δ* by taking *p* large enough. Combining these ideas gives

$$\begin{aligned} \int \mathfrak{a}(\mathfrak{x}\_{p\prime}y) Q\_{\mathbb{R}}(\mathfrak{x}\_{p\prime}dy) &\leq \int\_{A(\mathfrak{x}\_{p\prime}\delta)} \mathfrak{a}(\mathfrak{x}\_{p\prime}y) Q\_{\mathbb{R}}(\mathfrak{x}\_{p\prime}dy) + \mathfrak{e} \\ &\leq Q\_{\mathbb{R}}(\mathfrak{x}\_{p\prime}S(\mathfrak{x}\_{p\prime}\delta)) + \mathfrak{e}. \end{aligned}$$

Both of the quantities on the last line can be made arbitrarily small by choosing *p* suitably large. Thus, *r*(*xp*) → 1 as |*xp*| → ∞, as required.

**Proof of Proposition 5.** First note that inf*x*∈*<sup>R</sup> QP*(*x*, *R*) is bounded away from zero, unlike in the case of *QR*, owing to the design of *QP*. The acceptance rate here simplifies, since for any *y* ∈ *R*

$$\frac{\mathfrak{s}(y)|G(y)|^{\frac{1}{2}}}{\mathfrak{s}(x)|G(x)|^{\frac{1}{2}}} = 1,$$

meaning only the expression exp- −1 <sup>2</sup> (*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*)*T*[*G*(*y*) <sup>−</sup> *<sup>G</sup>*(*x*)](*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*) needs to be considered. In this case the expression is simply

$$\exp\left(-\frac{1}{2}(\mathfrak{J}^2|\!^{y\_2}\!\!/ -\mathfrak{J}^{2\lfloor \varkappa\_2 \rfloor})\left(\mathcal{Y}\_1 - \mathfrak{x}\_1\right)^2\right).$$

Provided that *x*<sup>1</sup> = *y*1, then when 1 ≤ *y*2 < *x*2 this expression is strictly greater than 1, whereas in the reverse case it is strictly less than one. The resulting Metropolis– Hastings kernel *P* using proposal kernel *QP* will therefore satisfy *y*2*P*(*x*, *dy*) < *x*<sup>2</sup> for large enough *x*2, and hence geometric ergodicity follows by taking the Lyapunov function *V*(*x*) = *es*|*x*2<sup>|</sup> (which can be used here since the domain of *x*<sup>1</sup> is compact) and following an identical argument to that given on pages 404–405 of Reference [21] for the case of the proof of geometric ergodicity of the random walk on the half-line model for suitably small *s* > 0, taking the small set *C* := [0, 1] × [1,*r*] for suitably large *r* < ∞ and *ν*(·) = · <sup>s</sup>(*x*)*dx*.

#### **Appendix B. Needed Facts about Truncated Gaussian Distributions**

Here we collect some elementary facts used in the article. For more detail see e.g., [38]. If *<sup>X</sup>* follows a truncated Gaussian distribution <sup>N</sup> *<sup>T</sup>* [*a*,*b*] (*μ*, *σ*2) then it has density

$$f(\mathbf{x}) = \frac{1}{\sigma Z\_{a,b}} \phi \left( \frac{\mathbf{x} - \mu}{\sigma} \right) \mathbb{I}\_{[a,b]}(\mathbf{x}),$$

where *φ*(*x*) = *e*−*<sup>x</sup>*2/2/ <sup>√</sup>2*π*, <sup>Φ</sup>(*x*) = *<sup>x</sup>* <sup>−</sup><sup>∞</sup> *<sup>φ</sup>*(*y*)*dy* and *Za*,*<sup>b</sup>* <sup>=</sup> <sup>Φ</sup>((*<sup>b</sup>* <sup>−</sup> *<sup>μ</sup>*)/*σ*) <sup>−</sup> <sup>Φ</sup>((*<sup>a</sup>* <sup>−</sup> *<sup>μ</sup>*)/*σ*). Defining *B* = (*b* − *μ*)/*σ* and *A* = (*a* − *μ*)/*σ*, we have

$$\mathbb{E}[X] = \mu + \frac{\phi(A) - \phi(B)}{Z\_{a,b}}\sigma^2$$

and

$$\mathbb{E}[\mathfrak{e}^{tX}] = \mathfrak{e}^{\mathfrak{u}t + \mathfrak{e}^2t^2/2} \left[ \frac{\Phi(B - \sigma t) - \Phi(A - \sigma t)}{Z\_{\mathfrak{u},b}} \right].$$

In the special case *b* = ∞, *a* = 0 this becomes *eμt*+*σ*2*<sup>t</sup>* 2/2Φ(*σt*)/*Za*,*b*.

#### **References**


### *Article* **Non-Homogeneous Markov Set Systems**

**P.-C.G. Vassiliou**

Department of Statistical Science, University College London, Gower St, London WC1E 6BT, UK; vasiliou@math.auth.gr

**Abstract:** A more realistic way to describe a model is the use of intervals which contain the required values of the parameters. In practice we estimate the parameters from a set of data and it is natural that they will be in confidence intervals. In the present study, we study Non-Homogeneous Markov Systems (NHMS) processes for which the required basic parameters are in intervals. We call such processes Non-Homogeneous Markov Set Systems (NHMSS). First we study the set of the relative expected population structure of memberships and we prove that under certain conditions of convexity of the intervals of the parameters the set is compact and convex. Next, we establish that if the NHMSS starts with two different initial distributions sets and allocation probability sets under certain conditions, asymptotically the two expected relative population structures coincide geometrically fast. We continue proving a series of theorems on the asymptotic behavior of the expected relative population structure of a NHMSS and the properties of their limit set. Finally, we present an application for geriatric and stroke patients in a hospital and through it we solve problems that surface in an application.

**Keywords:** Non-Homogeneous Markov Systems; Markov Set Systems; limiting set

#### **1. Introduction**

The class of stochastic processes called Non-Homogeneous Markov Systems (NHMS) was first defined in [1] The class of NHMS provided a general framework for many applied probabilities models used to model populations of a wide diversity of entities. The primary motive was to provide a general framework for a wide class of stochastic models in social processes ([2]). They also include as special cases non-homogeneous Markov chain models in manpower systems such as [3–5]. The literature on NHMS has flourished since then to a large extent and presently exist a large volume of theoretical results as well a variety of applications. In Section 2 of the present we provide a definition and a concise description of a NHMS. As we will discuss in Section 2, it is important for the reader to have in mind that actually the well-known non-homogeneous Markov chain is a special case of a NHMS.

In many stochastic processes and so naturally in Markov chains and specifically in non-homogenous Markov chains and NHMS, the values of the various parameters are assumed to be exact while in practice these are estimated from the data. Therefore, actually the values of the parameters is more realistic to be viewed as being contained in intervals with the desired probability confidence. This approach has been used in systems of linear equations and in this case the solutions are given as the set of all possible solutions. Two books have been written on this topic by [6,7]. For the analogous problem for differential equations a book was written by [8]. For homogeneous Markov chains with this approach a book was written see [9].

In Section 3 of the present we will now add some additional assumptions on a NHMS in our way to define a non-homogeneous Markov set system (NHMSS). In this way now a NHMSS will be a NHMS whose basic parameters will be assumed to be in compact convex intervals.

The NHMSS is a stochastic system which has a population of members which increases at every point in time. I addition the initial members need not to be the same entities at

**Citation:** Vassiliou, P.-C.G. Non-Homogeneous Markov Set Systems. *Mathematics* **2021**, *9*, 471. https://doi.org/10.3390/math9050471


Academic Editor: Andras Telcs

Received: 24 January 2021 Accepted: 19 February 2021 Published: 25 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

different time points since there is wastage from the system. The members of the population move among the different states, exit from the system (population) and new members are coming into the population (system) as replacements or to expand the system. In the case of non-homogeneous Markov set chains we have only one particle in the population, which never leaves the system and no procedure to replace this particle exists. Mathematically now the NHMSS has more elements in the one step in time equation with more parameters introduced in the stochastic difference equation. As the equation is applied recursively we end up with series of components which interact together with more parameters being in an interval. Hence, the problems to be solved are a lot harder, and new strategies and tools must be used, than the simple case of the Markov set chain. The introduction of the concept of a membership is crucial in dealing with the different individual members as time progress. The tool of Minkowski sum of vectors and its properties for convex combination of compact sets will play a vital role which was not needed in the case of Markov set chains. One of the hard problems which we encounter which does not exist in the Markov set chains is finding the range of infinite series. The Hausdorff metric for compact sets and the coefficient of ergodicity together with properties of appropriate norms introduced and the manipulation of infinite series will help to provide the following:

In Section 4, we establish in the form of a theorem, using the Minkowski sum of two sets, under which conditions in a NHMSS the set of all possible expected relative population structures at a certain point in time is a convex set. Also, we establish a Theorem where we provide conditions under which the set of all expected relative population structures at a certain point in time is a convex polygon.

In Section 5 we study the asymptotic behavior of an NHMSS, a problem that has been of central importance for homogeneous Markov chains, non-homogeneous Markov chains, NHMS and homogeneous Markov set chains. In Theorem 4, with the use of the coefficient of ergodicity and the Hausdorff metric we prove the following: Let that in an NHMSS the sets of initial structures are different but compact and convex; also, the sets of allocation probabilities of the memberships are different but convex and compact; the inherent non-homogeneous Markov set chain is common; then the Hausdorff metric of the two different sets of all possible expected relative structures asymptotically goes to zero geometrically fast. This is equivalent with concluding that the two sets asymptotically coincide geometrically fast. In Theorem 5 we prove that in an NHMSS if the total population of memberships converges in a finite number geometrically fast, and the sets of initial structures and allocation probabilities of memberships are compact and convex, then the set of all possible expected relative population structure converges to a limit set geometrically fast. These two theorems have important consequences for a NHMS process also. The first is Theorem 6 which relaxes important assumptions of the basic asymptotic theorem for NHMS which is provided as Theorem 7. The second labeled as Theorem 8 answers a novel question for NHMS, i.e., provides conditions under which two different NHMS, with the same number of states and population, but different initial states and different allocation probabilities of memberships if they have the same transition probabilities sequence of memberships, they converge in the same relative population structure geometrically fast.

In Section 6 we study properties of the limit set of expected relative population structures. In Theorem 10 we prove the first property, that under some mild conditions the limit set of the expected relative population structures of an NHMSS remains invariant if any selected transition probability matrix of the inherent non-homogeneous Markov chain from the respective interval is multiplied by it from the right. We also prove that the limit set is the only set with this property if the interval of selection of transition probabilities of the inherent non-homogeneous Markov chain is product scrambling. In Theorem 11 the second property is established, i.e., let two different NHMSS in the sense that they have different sets of selecting initial distributions, different sets of selecting allocation probabilities and different intervals of selecting the transition probabilities of the inherent non-homogeneous Markov chains. What they have in common is that their respective intervals are uniformly scrambling with a common bound and they have the same total population of memberships. We prove that the Hausdorff metric of the limit sets of the expected relative population structures of the two NHMSS is bounded by the multiplication of a function of the Hausdorff metric of the two tight intervals of selection of the stochastic matrices of the inherent non-homogeneous Markov set chains and the bound of their uniform coefficients of ergodicity.

In Section 7 we present a representative application for geriatric and stroke patients in a hospital. Through this application we provide solutions in problems arising in an application by providing respective Lemmas and a general Algorithm with computational geometry procedures which are applicable to any population system.

#### **2. The Non-Homogeneous Markov System**

Consider a population which has *T*(*t*) memberships at time *t*. These memberships could be held by any kind of entities, i.e., human beings, animals, T-cells in a biological entity, fish in an organized area in the sea, cars on a highway etc. We assume that *T*(*t*) is known for every *t*, for example in a hospital the memberships are the beds for patients and from the management of the hospital's planning the number of beds are known. Let that the population is stratified into classes which we call states and let that there are a finite number of states, i.e., the state space is S ={1, 2, . . . , *k*}. We assume that the evolution of the population is in discrete time, i.e., *t* = 1, 2, ... and we call the vector of random variables *N*(*t*) = [*N*1(*t*), *N*2(*t*),..., *Nk*(*t*)] where *Ni*(*t*) is the number of memberships in state *i* at time *t*, the *population structure* of the NHMS. Define by *q*(*t*) = *N*(*t*)/*T*(*t*) to be the *relative population structure.* At every time instant *t* = 1, 2, ..., we have internal transitions of members among the states in S with probabilities which we collect in the *k* × *k* matrix *P*(*t*); we have wastage from all the states with probabilities which we collect in the 1 × *k* vector *pk*+1(*t*) = [*p*1,*k*+1(*t*), *p*2,*k*+1(*t*),..., *pk*,*k*+1(*t*)]; finally, we have recruitment or allocation probabilities of replacements or new entrants in the various states at time *t* which we collect in the 1 × *k* stochastic vector *p*0(*t*) = [*p*01(*t*), *p*02(*t*),..., *p*0*k*(*t*)]. We assume that the system is expanding, i.e., Δ*T*(*t*) = *T*(*t*) − *T*(*t* − 1) ≥ 0. During the time interval (*t* − 1, *t*] a member of the system in state *i* either moves internally to another state *j* of the system with probability *pij*(*t*) or leave the system and his membership remains at the exit of the system. New entrants to the system are of two types, those to replace leavers and those needed to be added in the system to meet the target of *T*(*t*) total memberships. The new entrant gets his membership at the entrance and he is being allocated or recruited at state *j* with probability *p*0*j*(*t*). Hence, the probability of movement of a membership from state *i* to state *j* at time *t* is *qij*(*t*) = *pij*(*t*) + *p*0*j*(*t*)*pi*,*k*+1(*t*). We collect these probabilities in the *k* × *k* matrix *Q*(*t*) = *P*(*t*) + *p <sup>k</sup>*+1(*t*)*p*0(*t*) which apparently is a stochastic matrix. We call the Markov chain defined by the sequence of matrices {*Q*(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> the *imbedded or inherent non-homogeneous Markov chain* of the NHMS.

It is of interest the *expected relative population structure.* Let *Xt* be the random variable representing the state that a membership of the system is at time *t*. Define by

$$\boldsymbol{q}(\mathbf{s},t) = [q\_1(\mathbf{s},t), q\_2(\mathbf{s},t), \dots, q\_2(\mathbf{s},t)]\_{\prime\prime}$$

where

$$\mathfrak{q}\_{j}(s,t) = \mathbb{P}[X\_{t} = j \mid \mathfrak{q}(s)] \text{ for } s \le t,\tag{1}$$

then from ([10] p. 140) we get that

$$\mathbb{E}[q(s,t)] = a(t-1)\mathbb{E}[q(t-1)]Q(t) + b(t-1)p\_0(t),\tag{2}$$

where

$$a(t-1) = \frac{T(t-1)}{T(t)} \quad \text{and} \quad b(t-1) = \frac{T(t) - T(t-1)}{T(t)} \tag{3}$$

from which we get that

$$\mathbb{E}[q(0,t)] = \frac{T(0)}{T(t)} q(0) \mathbb{Q}(0,t) + \\\\
$$

$$\frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) p\_0(\tau) \mathbb{Q}(\tau, t), \\\tag{4}$$

where *Q*(*s*, *t*) = *Q*(*s* + 1)*Q*(*s* + 2)... *Q*(*t*) for (*s* ≤ *t*). We set *Q*(*s*, *t*) = *I* the identity matrix for *s* ≥ *t*. Please note that we set *q*(*s*, *t*) **= 0** for *s* > *t*. We call any such process as described above a *Non-homogeneous Markov process* in discrete time and discrete state space. It is important for the reader to realize that the well-known ordinary Markov chain is a very special case of a NHMS with *T*(*t*) = 1, *p*0(*t*) = **0**, *pk*+1(*t*) = 0 and *Q*(*t*) = *P*.

As we mentioned in the Introduction the stochastic process NHMS was first introduced in [1] as a discrete time, discrete state space stochastic process with motives which have their roots in actual applications in manpower systems see for example [1,2,11], and also the review papers [12,13]. Since then, a large literature on theoretical developments on many aspects of a NHMS were published which also included the developments in [14,15] of NHMS's in a general state space. In [16] there appeared the link between the theory of NHMS and martingale theory. Lately, also another area of large interest has been the Law of Large Numbers in NHMS ([17]) which has its roots as a motive the study of Laws of Large numbers on homogeneous Markov chains by Markov himself. Also, many applications in areas with great diversity have also appeared in the literature. For example we could selectively refer to some of them. Let as start with [18–20] which are applications in the evolution of the HIV virus on the T-cells of the human body; population consisting with patients with asthma was studied in [21]; reliability studies were presented in [22]; applications in biomedical research appeared in [23,24]; various applications for human populations [25–29]; interesting application to consumption credit [30] infections of populations [31]; a very interesting application in DNA and web navigation [32]; interesting ecological applications [33]; results in Physical Chemistry [34]. Finally, there are a large number of publications by the research school of Prof McClean in hospital systems which are large manpower systems [35–39].

#### **3. Non-Homogeneous Markov Set System**

In Section 2 we defined the NHMS process and we will now define for the first time ever the non-homogeneous Markov set system. So far in the well developed theory of NHMS's the various perimeters are assumed to be exact while in practice they are naturally estimated by the data. Therefore, it is more realistic to be viewed as being contained in intervals with the desired probability confidence. In summary as we will see bellow a NHMSS is a NHMS for which its parameters are defined in intervals. In addition the study of NHMSS provides a new area of theoretical research with different mathematical tools in many instances than the corresponding theory of NHMS and a potential to be applied in other stochastic processes.

The practical advantages of NHMSS's are rather apparent since the assumptions on the parameters are less restrictive. The assumption of the parameters being in appropriate intervals absorbs in a way the errors of point estimates which increase their variability. In addition it provides the tools to study NHMS's whose parameters will be in "desired" intervals which increases considerably the control of the system since we could choose policies of the systems in intervals with desired outcomes for the expected relative population structures or to avoid trouble some situations.

We will start with the definition of an interval for a stochastic vector following [9] who first defined Markov set chains. Denote by *Mn*(R) or simply *Mn* the set of all *n* × *n* matrices with elements from the field R.

**Definition 1.** *Let SM*1*<sup>n</sup> the set of all* 1 × *n stochastic vectors. Also let λ and μ be non-negative* 1 × *n vectors with λ* ≤ *μ componentwise. Then define the corresponding interval in SM*1*<sup>n</sup> by*

$$[\lambda, \mu] = \{ \mathfrak{p} : \mathfrak{p} \in \operatorname{SM}\_{1,n} \text{ with } \lambda \le \mathfrak{p} \le \mu \},$$

*where λ,μ are chosen such that* [*λ*, *μ*] = ∅*.*

**Example 1.** *It is sometimes helpful to view mathematics geometrically. Let SM*1,3 *the set of all* 1×3 *stochastic vectors, then it is easy to see that this is the convex hull of the vectors e*<sup>1</sup> = 100 *, e*<sup>2</sup> = 010 *and e*<sup>3</sup> = 001 *in* R3*. Now, all the non-negative vectors x such that λ* ≤ *x* ≤ *μ are within and the surface of a rectangle the coordinates of which are determined by λ*, *μ. The interval* [*λ*, *μ*] *will be the intersection of the two above described spaces. We can visualize this more easily if we consider the triangle e*1*e*2*e*<sup>3</sup> *in* R2*. Let λ* = 0.1 0.2 0.3 *and μ* = 0.5 0.7 0.8 *then the interval* [*λ*, *μ*] *could be easily designed in the following way. Draw two parallel lines to the line e*2*e*<sup>3</sup> *at the points λ*<sup>1</sup> = 0.1 *and μ*<sup>1</sup> = 0.5; *also draw two parallel lines to the line e*1*e*<sup>3</sup> *at the points λ*<sup>2</sup> = 0.2 *and μ*<sup>2</sup> = 0.7*; finally draw two parallel lines to the line e*1*e*<sup>2</sup> *at the points λ*<sup>3</sup> = 0.3 *and μ*<sup>3</sup> = 0.8*. Then the interval* [*λ*, *μ*] *is the common area between these lines.*

Tight intervals are important in what follows:

**Definition 2.** *Let* [*λ*, *μ*] *be an interval, then if*

$$
\lambda\_i = \min\_{\mathbf{x} \in [\lambda, \mu]} \mathbf{x}\_i \quad \text{and} \quad \mu\_i = \max\_{\mathbf{x} \in [\lambda, \mu]} \mathbf{x}\_{i\prime}
$$

*then λi, μ<sup>i</sup> are called tight, respectively. If λi, μ<sup>i</sup> are tight for all i, then the interval* [*λ*, *μ*] *is called tight.*

Intervals can be tested for tightness using the following Lemma ([9]). Also, with the use of this Lemma an interval which is not tight, we can tighten it up using an algorithm without actually changing it. That is the new interval, the tightened one will contain the same stochastic vectors.

**Lemma 1.** *([9]). Let* [*λ*, *μ*] *be an interval. Then for each coordinate i*

$$(i) \ \lambda\_i \text{ is tight if and only if } \ \lambda\_i + \sum\_{k \neq i} \mu\_k \ge 1.$$

$$(ii) \ \mu\_i \text{ is tight if and only if } \mu\_i + \sum\_{k \neq i} \lambda\_k \ge 1.$$

We now need the following definition of when in a tight interval a vector is called *free.*

**Definition 3.** *Let* [*λ*, *μ*] *be a tight interval and p* ∈ [*λ*, *μ*]*. Then if λ<sup>i</sup>* < *pi* < *μ<sup>i</sup> for some coordinate i, then the coordinate pi in p is called free.*

Tight intervals and convex sets are well linked and play an important role in the preservation of many properties. In this respect, the following Lemma is very useful.

**Lemma 2.** *([9]). Let* [*λ*, *μ*] *be a tight an interval . Then* [*λ*, *μ*] *is a convex polytope. A vector p* ∈ [*λ*, *μ*] *is a vertex of* [*λ*, *μ*] *if and only if p has at most one free component.*

We will now extend the definition of an interval of a vector to an interval of a matrix and to a tight interval of a matrix.

**Definition 4.** *Let SMn the set of all* 1 × *n stochastic matrices. Also let* **Λ** *and M be non-negative n* × *n matrices with* **Λ** ≤ *M componentwise. Then define the corresponding interval in SMn by*

$$[\Lambda, \mathcal{M}] = \{ P \colon P \in \mathcal{SM}\_{\mathcal{U}} \text{ with } \Lambda \le P \le \mathcal{M} \},$$

*where* **Λ**, *M are chosen such that* [**Λ**, *M*] = ∅*.*

We now proceed to define a tight interval of matrices:

**Definition 5.** *Let* [**Λ**, *M*] *be an interval of matrices. If*

$$\lambda\_{ij} = \min\_{P \in [\mathsf{A}, \mathsf{M}]} p\_{ij} \text{ and } \mu\_{ij} = \max\_{P \in [\mathsf{A}, \mathsf{M}]} p\_{ij\prime}$$

*for all i and j, then* [**Λ**, *M*] *is called tight.*

The interval [**Λ**, *M*] can be constructed also by rows, i.e.,

$$[\Lambda, \mathcal{M}] = \left\{ \begin{array}{c} \mathbf{P} : \ \mathcal{P}\_{i} \in [\lambda\_{i}, \mu\_{i}] \text{ for all } i, \text{ with } \mathcal{p}\_{i'} \lambda\_{i'} \mu\_{i} \\ \text{ being the rows of the respective matrices } \mathcal{P}, \Lambda, \mathcal{M} \end{array} \right\}.$$

In what follows we will define a non-homogeneous Markov set system. We will keep the entire notation introduced in Section 2 for a NHMS and we will build on that.

Let [M] = *Q***ˇ** , *Q***ˆ** be an interval of *<sup>k</sup>* <sup>×</sup> *<sup>k</sup>* stochastic matrices with *<sup>Q</sup>***<sup>ˇ</sup>** <sup>≤</sup> *<sup>Q</sup>*(*t*) <sup>≤</sup> *<sup>Q</sup>***<sup>ˆ</sup>** for every *t* ∈ N which is tight, i.e.,

$$[\mathbb{M}] = \left\{ \mathbf{Q}(t) : \text{is an } k \times k \text{ stochastic matrix with } \mathbf{Q} \le \mathbf{Q}(t) \le \mathbf{Q} \right\}$$

with

$$\begin{aligned} \mathfrak{q}\_{ij} &= \min\_{\mathbb{Q}(t) \in \left[\vec{\mathbb{Q}}, \mathbb{Q}\right]} q\_{ij}(t) \text{ for every } t \in \mathbb{N}, \\\mathfrak{q}\_{ij} &= \max\_{\mathbb{Q}(t) \in \left[\vec{\mathbb{Q}}, \mathbb{Q}\right]} q\_{ij}(t) \text{ for every } t \in \mathbb{N}, \end{aligned}$$

and the notation *Q***ˇ** , *Q***ˆ** will be taken to imply that *Q***ˇ** , *Q***ˆ** = ∅. We will make now the following basic assumptions:

**Assumption 1.** *Let that the imbedded non-homogeneous Markov chain of the NHMS has all its probability matrices in* [M]*.*

We call [M] the *probability transition matrix set (PTMS) of the imbedded non-homogeneous Markov chain.*

Now define by

$$\left[\mathbb{M}^2\right] = \{\mathbb{Q}(0)\mathbb{Q}(1) : \mathbb{Q}(0), \mathbb{Q}(1) \in [\mathbb{M}]\},$$

... ... ...

$$[\mathbb{M}^n] = \{ \mathbf{Q}(0)\mathbf{Q}(1)...\mathbf{Q}(n-1) : \mathbf{Q}(0),...,\mathbf{Q}(n-1) \in [\mathbb{M}] \}.$$

We call the sequence {[M*n*]}<sup>∞</sup> *<sup>n</sup>*=<sup>1</sup> the *inherent or imbedded non-homogeneous Markov set chain.*

**Assumption 2.** *Let* [S0] *be the set of* 1 × *k stochastic vectors from which the initial distribution q*(0) *is chosen.*

$$[\mathbb{B}\_0] = [\mathfrak{A}\_0, \mathfrak{A}\_0] = \{ \mathfrak{q}(0) \, : \, \text{is a stochastic vector with } \mathfrak{q}(0) \in [\mathbb{B}\_0] \}.$$

**Assumption 3.** *Let* [R0] *be the set of* 1 × *k stochastic vectors from which the allocation probabilities p*0(*t*) *are being selected. That is*

$$[\mathbb{R}\_0] = [\mathfrak{p}\_{0'}\mathfrak{p}\_0] = \{\mathfrak{p}\_0(t) \, : \, \text{is a stochastic vector with } \mathfrak{p}\_0(t) \in [\mathbb{R}\_0] \, \text{for every } t\}.$$

*We call a NHMS whose parameters are assumed to be in intervals as in Assumptions 1–3 a Non-homogeneous Markov Set System (NHMSS).*

Note that it is apparent by now that Markov set chains that were initiated by [9,40–43] are special cases of a NHMSS.

#### **4. The Set of the Expected Relative Population Structures of a NHMSS**

In geometry the Minkowski sum (also known as dilation) of two sets of position vectors A and B in Euclidean space is formed by adding each vector in A to each vector in B. That is

$$\mathbb{A} + \mathbb{B} = \{a + b \,:\, a \in \mathbb{A}, b \in \mathbb{B}\}.$$

**Example 2.** *If we have two sets* A *and* B *consisting of three position vectors (informally, three points) representing the vertices of two triangles in* R<sup>2</sup> *with coordinates*

$$\mathbb{A} = \{(1,0), (0,1), (0,-1)\} \text{ and } \mathbb{B} = \{(0,0), (1,1), (1,-1)\}, \mathbb{A}$$

*then their Minkowski sum is*

$$\mathbb{A} + \mathbb{B} = \{ (1,0), (2,1), (2,-1), (0,1), (1,2), (0,-1), (1,-2) \}\_{\prime \prime}$$

*which comprises the vertices of a hexagon.*

For Minkowski addition, the zero set containing only the zero vector **0**, is an identity element for every subset V of a vector space, i.e., V+{0} = V.

The empty set is important in Minkowski addition because the empty set annihilates every other subset for every subset V of a vector space, its sum with the empty set is empty, i.e., V+∅ = ∅.

We are now in a position to state the following Lemma ([44])

**Lemma 3.** *If* V *is a convex set then μ*V + V *is also a convex set andfurthermore μ*V + V =(*μ* + *λ*)V *for every λ*, *μ* > 0*. Conversely, if this "distributive property" holds for all non-negative real numbers λ*, *μ* > 0 *then the set is convex.*

**Remark 1.** *For two convex polygons* V<sup>1</sup> *and* V<sup>2</sup> *in the plane with m and n vertices, their Minkowski sum is a convex polygon with at most m* + *n vertices and may be computed in time O*(*m* + *n*) *by a very simple procedure.*

We need the following sets for the Lemma that follows

$$R\eta \circ (q) = \{ y : y = qQ \text{ for some } Q \in [\mathbb{M}] \text{ and any } q \in SM\_{1,n} \}.$$

Also

$$\operatorname{Rng}(\mathbb{S}) = \cup\_{\mathfrak{q} \in \mathbb{S}} \operatorname{Rng}(\mathfrak{q}) = \{ y : y = \mathfrak{q}Q \text{ for some } \mathbb{Q} \in [\mathbb{M}] \text{ and some } \mathfrak{q} \in \mathbb{S} \}.$$

**Lemma 4.** *Let an NHMSS with T*(*t*) ≥ 0 *and finite and which is expanding* (Δ*T*(*t*) ≥ 0)*. Also let* [S0] *the set from which the initial distribution of memberships is drawn,* [R0] *the set from which the allocation probabilities in the various states are chosen at every time step and finally let* [M] *the set to which all the transition probability matrices of the inherent Markov chain of memberships belong. Then the set Rngt*(S0, R0) *of all possible expected relative population structures at time t is given by*

$$\begin{aligned} \text{Rng}\_t(\mathbb{S}\_0, \mathbb{R}\_0) &= \{ \mathbb{E}[q(0, t)] : \mathbb{E}[q(0, t)] \in \frac{T(0)}{T(t)} \cup\_{q(0) \in [\mathbb{S}\_0]} q(0) \, \_0 \mathbb{I}\mathbb{M}^t \} \\ &+ \frac{1}{T(t)} \sum\_{\tau=1}^t \Delta T(\tau) \cup\_{p\_0(\tau) \in [\mathbb{R}\_0]} p\_0(\tau) \, \_0 \mathbb{I}\mathbb{M}^{t-\tau} \} \, \_t \\ \text{Rm}\_{\tau \to \infty} \mathbb{I}\, \_{\tau \to 1} \mathbb{M}^{-1} &= \{ \mathbb{I} \} \end{aligned}$$

*with* M<sup>0</sup> = [M] *and* M−<sup>1</sup> = {*I*}.

**Proof.** Following the relevant proofs that lead to Equations (2.1), (2.2) and (3.4) in [10] or Equations (2) and (4) in the present, we could easily prove (5).

We will need the following Lemma from ([9] p. 39):

**Lemma 5.** *In a non-homogeneous Markov set chain let the set of initial distributions* [S0] *be convex and let* [M] *be a tight interval from which the transition probability sequence of matrices is being selected. Then the set* [S*t*] *of all possible probability distributions at the various states at time t is a convex set.*

In the next theorem we show under which conditions the set of all possible expected relative population structures in a NHMSS is a convex set.

**Theorem 1.** *Let an NHMSS with T*(*t*) *finite and which is expanding* (Δ*T*(*t*) ≥ 0)*. If* [S0], [R0] *are convex sets and* [M] *a tight interval then the set of all possible expected relative population structures is a convex set.*

**Proof.** Define {E[*q***˚**(0, *t*)]} be the set of all possible expected relative structures of the initial memberships then from Lemma 4 we have

$$\mathbb{E}\left\{\mathbb{E}[\mathfrak{q}(0,t)]\right\} = \left\{\mathbb{E}[\mathfrak{q}(0,t)] : \mathbb{E}[\mathfrak{q}(0,t)] \in \frac{T(0)}{T(t)} \cup\_{\mathfrak{q}(0) \in [\mathbb{S}\_0]} \mathfrak{q}(0) \left[\mathbb{M}^t\right] \right\},\tag{6}$$

since *<sup>T</sup>*(0) *<sup>T</sup>*(*t*) ≥ 0, [S0] is a convex set and [M] a tight interval, from Lemma 4 and 5 we get that {E[*q*˚(0, *t*)]} is a convex set. Also, the set

$$\mathbb{E}\left\{\mathbb{E}[\mathbf{r}\_{\tau}]\right\} = \left\{\mathbb{E}[\mathbf{r}\_{\tau}] : \mathbb{E}[\mathbf{r}\_{\tau}] \in \Delta T(\tau) \cup\_{p(\tau) \in [\mathbb{R}\_{0}]} p\_{0}(\tau) \left[\mathcal{M}^{t-\tau}\right] \right\},$$

is the set of all possible expected structures of new memberships at time *t* which entered in the system at time *τ*. Now, since the system is expanding, i.e., Δ*T*(*τ*) ≥ 0, [R0] is a convex set and [M] a tight interval, then with the same reasoning as in (6), we get that {E[*rτ*]} is a convex set. Also, from Remark 1 we get that the Minkowski sum of sets

$$\frac{1}{T(t)}\sum\_{\tau=1}^{t}\{\mathbb{E}[r\_{\tau}]\}\_{\tau}$$

is a convex set. Hence, since the two sets in the right-hand side of Equation (5) are convex and then according to Remark 1 their Minkowski sum *Rngt*(S0, R0) is a convex set.

We will now borrow the following Theorem from ([9] p. 40).

**Theorem 2.** *In a non-homogeneous Markov set chain let the set of initial distributions* [S0] *be a convex polytope and let* [M] *be a tight interval from which the transition probability sequence of matrices is being selected. Then the set* [S*t*] *of all possible probability distributions at the various states at time t is a convex polytope with vertices of the form* E*iEi*<sup>1</sup> ...*Eik for some vertices* E*<sup>i</sup> of* [S0] *and some vertices Eij of* [M]*.*

In the next theorem we show under which conditions the set of all possible expected relative population structures in a NHMSS is a convex polytope.

**Theorem 3.** *Let an NHMSS with T*(*t*) *finite and which is expanding* (Δ*T*(*t*) ≥ 0)*. If* [S0], [R0] *are convex polytopes and* [M] *a tight interval then the set of all possible expected relative population structures is a convex polytope.*

**Proof.** The proof follows the steps of the proof of Theorem 1 using Theorem 2.

#### **5. Asymptotic Behavior of NHMSS**

The problem of asymptotic behavior has been one of central importance for, homogeneous Markov chains, non-homogeneous Markov chains, NHMS, and non-homogeneous Markov set chains. In the present section we will prove a series of theorems with which we establish the asymptotic behavior of NHMSS.

Since Markov himself and his student Dobrushin the coefficient of ergodicity T (*Q*) of a *k* × *k* stochastic matrix *Q* **=** *qij<sup>k</sup> i*,*j*=1 , has been a fundamental tool in the study of Markov chains. We have

$$\mathcal{T}(\mathbf{Q}) = \frac{1}{2} \max\_{ij} \sum\_{l=1}^{k} \left| q\_{il} - q\_{jl} \right|\_{\prime} \tag{7}$$

thus 0 ≤ T (*Q*) ≤ 1. We clarify in here that if T (*Q*) < 1 the stochastic matrix *Q* is called *scrambling.* Scrambling matrices are regular, but not all regular matrices are scrambling. Yet if *Q* is a regular stochastic matrix then some power of *Q*, say *Q<sup>n</sup>* is scrambling. We define by

$$\vec{\mathcal{T}}([\mathbb{M}]) = \max\_{\mathbb{Q}\in\left[\mathbb{Q},\mathbb{Q}\right]} \mathcal{T}(\mathbb{Q}),\tag{8}$$

if <sup>T</sup>¯ ([M]) <sup>&</sup>lt; 1 we say that [M] is *uniformly scrambling.* More on uniform scrambling and the interpretation of the coefficient of ergodicity of a matrix *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* as a matrix norm when the norm is restricted to a specified subspace could be found, in [45]. For explicit forms for ergodicity coefficients and properties see also [46,47]. In what follows we will use the following norm for a matrix *A* ∈ *Mn*(R)

$$||A|| = \max\_{i} \sum\_{j=1}^{k} |a\_{ij}|$$

We will use the concept of <sup>T</sup>¯ ([M]) <sup>&</sup>lt; 1 to study asymptotic behavior in a NHMSS. It is important to note that if we consider as M*C* , . the space of non-empty compact subsets of ([M], .) then M*C* , . is a metric space ([48]). This space can be topologized using the Hausdorff metric *d*(., .) defined by

$$d(\mathbb{S}\_1, \mathbb{S}\_2) = \max\{\delta(\mathbb{S}\_1, \mathbb{S}\_2), \delta(\mathbb{S}\_2, \mathbb{S}\_1)\},\tag{9}$$

where

$$\delta(\mathbb{S}\_1, \mathbb{S}\_2) = \max\_{\mathbb{Q}\_1 \in \mathbb{S}\_1} \min\_{\mathbb{Q}\_2 \in \mathbb{S}\_2} ||\mathbb{Q}\_1 - \mathbb{Q}\_2|| \text{ and } \mathbb{S}\_1, \mathbb{S}\_2 \in \left[\mathbb{M}^\mathbb{C}\right],\tag{10}$$

From [48] we get also that M*C* , *d*(., .) is a metric space. We will need the following Lemmas

**Lemma 6** ([49])**.** *The following statements are equivalent:*


**Lemma 7** ([50] p. 541)**.** *Suppose that*

$$\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty}$$

*converges to zero as <sup>t</sup>* <sup>→</sup> <sup>∞</sup> *geometrically fast with <sup>T</sup>*(*t*) <sup>≥</sup> *<sup>T</sup>*(*<sup>t</sup>* <sup>−</sup> <sup>1</sup>). *Then* {*T*(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> *converges geometrically fast.*

**Remark 2.** *The restriction*

$$\lim\_{t \to \infty} \frac{\Delta T(t)}{T(t)} = 0,$$

*is a general assumption with the physical interpretation that the proportional growth rate vanishes in the limit. This assumption allows limt*→∞*T*(*t*) = ∞.

We will now study what happens asymptotically to the two sets of the expected relative population structures when the initial structures belong to two different sets as well as the allocation probabilities in the respective cases.

**Theorem 4.** *Let an NHMSS for which* [M] *is a tight interval which is uniformly scrambling. Assume that*

$$\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty} \text{ converges to zero geometrically fast.}$$

*Also, assume that* [S], [R0] *are compact and convex. Let* S1, S<sup>2</sup> ⊆ S*C and* R1, R<sup>2</sup> ⊆ R*C* 0 *then*

$$d(\mathbb{R}n\_{\mathfrak{J}t}(\mathbb{S}\_1, \mathbb{R}\_1), \mathbb{R}n\_{\mathfrak{J}t}(\mathbb{S}\_2, \mathbb{R}\_2)) \to\_{t \to \infty} 0\text{ }\text{geometrically fast.}$$

**Proof.** From (5) we get that

$$R n\_{\mathbb{S}t}(\mathbb{S}\_1, \mathbb{R}\_1) \quad = \quad \{ \mathbb{E}[q(0, t)] : \mathbb{E}[q(0, t)] \in \frac{T(0)}{T(t)} \cup\_{q(0) \in [\mathbb{S}\_1]} q(0) \left[ \mathbb{M}^t \right] + \\ \tag{11}$$

$$\frac{1}{T(t)} \sum\_{\tau=1}^t \Delta T(\tau) \cup\_{p\_0(\tau) \in [\mathbb{R}\_1]} p\_0(\tau) \left[ \mathbb{M}^{t-\tau} \right] \},$$

and from (11) if we replace *q*(0) with *q***¯**(0), [S1] with [S2]; *p*0(*τ*) with *p***¯**0(*τ*); [R1] with [R2] and E[*q*(0, *t*)] with E[*q***¯**(0, *t*)] we get *Rngt*(S2, R2). Since[S1], [R1] are compact and [M] is a tight interval, it is not difficult to show that *Rngt*(S1, R1) is compact. The same applies for *Rngt*(S2, R2) since [S2], [R2] are also compact. Hence, we may take their Hausdorff metric. Now, we have that

$$=\underset{\begin{subarray}{c}\mathsf{q}(0)\in[\mathbb{S}\_{1},\mathbb{R}\_{1}),\mathsf{R}\eta\_{t}(\mathbb{S}\_{2},\mathbb{R}\_{2})\\ \mathsf{q}(0)\in[\mathbb{S}\_{1}],\mathsf{p}\_{0}(\tau)\in[\mathbb{R}\_{1}],\ \mathsf{q}(0)\in[\mathbb{S}\_{2}],\mathsf{p}\_{0}(\tau)\in[\mathbb{R}\_{2}]\\ \mathsf{q}(t)\in[\mathbb{M}]\text{ for all }\mathsf{v},t\end{subarray}}{\displaystyle\max}\limits\_{\begin{subarray}{c}\mathsf{q}(t)\in[\mathbb{R}],\ \mathsf{p}\text{ for all }\mathsf{v},t\end{subarray}}\min\_{\begin{subarray}{c}\mathsf{q}(\mathsf{p}(0,t))\in[\mathbb{R}\_{2}]\\ \mathsf{q}(\tau)\in[\mathbb{M}]\text{ for all }\mathsf{v},t\end{subarray}}\left\lVert\mathbb{E}[\mathsf{q}(0,t)]\right\rVert\text{-\}\mathbb{E}[\mathsf{q}(0,t)]\,\|\,.$$

by continuity of

$$f(\mathbb{E}[\vec{q}(0,t)]) = \min \| \mathbb{E}[q(0,t)] - \mathbb{E}[\vec{q}(0,t)] \|\_{2}$$

then for some *q*∗(0) ∈ [S1], *p*<sup>∗</sup> <sup>0</sup> (*τ*) ∈ [R1] for every *τ* ∈ [1, *t*], *Q*∗(*t*) ∈ [M] for every *t* ∈ [0, *t*] and by denoting with

$$\mathbb{E}[\boldsymbol{q}^\*(0,t)] = \frac{T(0)}{T(t)}\boldsymbol{q}^\*(0)\boldsymbol{Q}^\*(0,t) + \frac{1}{T(t)}\sum\_{\tau=1}^t \Delta T(\tau)p\_0^\*(\tau)\boldsymbol{Q}^\*(\tau,t),\tag{13}$$

we get

$$\delta(\operatorname{Rng}\_{\mathbb{S}^t}(\mathbb{S}\_1, \mathbb{R}\_1), \operatorname{Rng}\_{\mathbb{S}^t}(\mathbb{S}\_2, \mathbb{R}\_2)) = \tag{14}$$

min *q***¯**(0)∈[S2],*p***¯**0(*τ*)∈[R2] *<sup>Q</sup>***¯**(*t*)∈[M] for all *<sup>τ</sup>*,*<sup>t</sup>* E[*q*∗(0, *t*)] − E[*q***¯**(0, *t*)]

$$\leq \|\mathbb{E}[\mathfrak{q}^\*(0,t)] - \mathbb{E}[\mathfrak{q}^\*(0,t)]\|\_{\prime\prime}$$

where E[*q***¯**∗(0, *t*)] is given by (13) if we replace *q*∗(0) with any *q***¯**∗(0) ∈ [S2], *p*<sup>∗</sup> <sup>0</sup> (*τ*) with any *p***¯**∗ <sup>0</sup> (*τ*) ∈ [R2] for every *τ* ∈ [1, *t*]. Now, from (13) and (14) we get that

$$\delta\left(\mathrm{Rug}\_t(\mathbb{S}\_1, \mathbb{R}\_1), \mathrm{Rug}\_t(\mathbb{S}\_2, \mathbb{R}\_2)\right) \le \frac{T(0)}{T(t)} ||[\boldsymbol{q}^\*(0) - \bar{\boldsymbol{q}}^\*(0)] \mathbf{Q}^\*(0, t)||$$

$$+ \frac{1}{T(t)} \sum\_{\tau=1}^t \Delta T(\tau) ||p\_0^\*(\tau) \mathbf{Q}^\*(\tau, t) - \bar{p}\_0^\*(\tau) \mathbf{Q}^\*(\tau, t)||$$

$$\le \frac{T(0)}{T(t)} \mathcal{T}(\mathbf{Q}^\*(0, t)) ||[\boldsymbol{q}^\*(0) - \bar{\boldsymbol{q}}^\*(0)]||$$

$$+ \frac{1}{T(t)} \sum\_{\tau=1}^t \Delta T(\tau) \mathcal{T}(\mathbf{Q}^\*(\tau, t)) ||p\_0^\*(\tau) - \bar{p}\_0^\*(\tau)||$$

(since [M] is uniformly scrambling <sup>T</sup>˜ <sup>=</sup> max *<sup>Q</sup>*∗(*t*)<sup>∈</sup> [M] T (*Q*∗(*t*)) < 1)

$$\leq \frac{T(0)}{T(t)} \mathcal{T}^{t} \| \left[ \boldsymbol{\mathfrak{q}}^{\*}(0) - \boldsymbol{\mathfrak{q}}^{\*}(0) \right] \|$$

$$+ \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \boldsymbol{\mathcal{T}}^{t-\tau} \| \left[ \boldsymbol{\mathfrak{p}}\_{0}^{\*}(\tau) - \boldsymbol{\mathfrak{p}}\_{0}^{\*}(\tau) \right] \|$$

$$\leq \frac{T(0)}{T(t)} \boldsymbol{\widetilde{\mathcal{T}}^{t}} \delta(\mathbb{S}\_{1}, \mathbb{S}\_{2}) + \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \boldsymbol{\widetilde{\mathcal{T}}^{t-\tau}} \delta(\mathbb{R}\_{1}, \mathbb{R}\_{2}).\tag{15}$$

Since <sup>Δ</sup>*T*(*t*) *T*(*t*) <sup>∞</sup> *t*=0 converges to zero geometrically fast, from Lemmas 6 and 7 we get that there are *c*1, *c*<sup>2</sup> > 0 and 0 < *b*1, *b*<sup>2</sup> < 1 such that

$$\left| \left| \frac{1}{T(t)} - \frac{1}{T} \right| \le c\_1 b\_1^t \quad \text{and} \left| \frac{\Delta T(t)}{T(t)} \right| \le c\_2 b\_2^t. \tag{16}$$

From (15),(16) we get that

$$\begin{split} \delta(\operatorname{Rng}\_{\mathcal{I}}(\mathbb{S}\_{1},\mathbb{R}\_{1}),\operatorname{Rng}\_{\mathcal{I}}(\mathbb{S}\_{2},\mathbb{R}\_{2})) &\leq T(0) \Big(\frac{1}{T} + c\_{1}b\_{1}^{t}\Big) \bar{\mathcal{T}}^{t}\delta(\mathbb{S}\_{1},\mathbb{S}\_{2}) \\ &+ \delta(\mathbb{R}\_{1},\mathbb{R}\_{2}) \Big(\frac{1}{T} + c\_{1}b\_{1}^{t}\Big) \sum\_{\tau=1}^{t} c\_{2}b\_{2}^{\tau}\bar{\mathcal{T}}^{t-\tau}, \\ & \qquad \text{(assuming } \bar{\mathcal{T}} < b\_{2}) \\ &\leq T(0) \Big(\frac{1}{T} + c\_{1}b\_{1}^{t}\Big) \mathcal{T}^{t}\delta(\mathbb{S}\_{1},\mathbb{S}\_{2}) + \delta(\mathbb{R}\_{1},\mathbb{R}\_{2}) \Big(\frac{1}{T} + c\_{1}b\_{1}^{t}\Big) c\_{2}b\_{2}^{t-1} \sum\_{\tau=1}^{t} \Big(\frac{\bar{\mathcal{T}}}{b\_{2}}\Big)^{t-\tau}. \end{split}$$

$$\leq T(0)\left(\frac{1}{T} + c\_1 b\_1^t\right) \tilde{T}^t \delta(\mathbb{S}\_1, \mathbb{S}\_2) + \delta(\mathbb{R}\_1, \mathbb{R}\_2) \left(\frac{1}{T} + c\_1 b\_1^t\right) c\_2 b\_2^{t-1} \left(1 - \frac{\tilde{T}}{b\_2}\right)^{-1} . \tag{17}$$

From (17) we conclude that *δ*(*Rngt*(S1, R1), *Rngt*(S2, R2)) converges to zero geometrically fast. Similarly, we get the same conclusion if <sup>T</sup>˜ <sup>&</sup>gt;*b*2.

Now, in a similar way we may prove that *δ*(*Rngt*(S2, R2), *Rngt*(S1, R1)) converges to zero geometrically fast. Thus, we arrive at the desired conclusion.

We will now establish under what conditions the convergence to the limiting set in a NHMSS is geometrically fast.

**Theorem 5.** *Let an NHMSS for which* [M] *is a tight interval which is uniformly scrambling. Assume that*

$$\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty} \text{ converges to zero geometrically fast.}$$

*Also, let that* [S], [R0] *are compact and convex. Then we have that*

*d*(*Rngt*(S, R0), *Rng*∞(S, R0) → 0, *geometrically fast.* (18)

**Proof.** We will first show that

$$\operatorname{Rug}(\operatorname{Rug}\_t(\mathbb{S}, \mathbb{R}\_0)) = \operatorname{Rug}\_{t+1}(\mathbb{S}, \mathbb{R}\_0). \tag{19}$$

By the evaluation of the function *Rng*(., .) given in Lemma 4 we get that

$$\begin{split} \mathrm{Rng}(\mathrm{Rng}\_{\mathbb{M}}(\mathbb{S}, \mathbb{R}\_{0})) &= \frac{T(t)}{T(t+1)} \cup\_{\mathbb{E}[q(0,t)] \in \mathrm{Rng}\_{\mathbb{M}}(\mathbb{S}, \mathbb{R}\_{0})} \mathbb{E}[q(0,t)][\mathbb{M}] \\ &+ \frac{1}{T(t+1)} \Delta T(t+1) \cup\_{p\_{0}(t) \in [\mathbb{R}\_{0}]} p\_{0}(t) = \\ &\frac{T(t)}{T(t+1)} \frac{T(0)}{T(t)} \cup\_{q(0) \in [\mathbb{S}[\mathbb{S}]} q(0) \left[\mathbb{M}^{t}\right][\mathbb{M}] + \\ &\frac{T(t)}{T(t+1)} \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \cup\_{p\_{0}(\tau) \in [\mathbb{R}\_{0}]} p\_{0}(\tau) \left[\mathbb{M}^{t-\tau}\right][\mathbb{M}] \\ &+ \frac{1}{T(t+1)} \Delta T(t+1) \cup\_{p\_{0}(t) \in [\mathbb{R}\_{0}]} p\_{0}(t) \\ &= \frac{T(0)}{T(t+1)} \cup\_{q(0) \in [\mathbb{S}]} q(0) \left[\mathbb{M}^{t+1}\right] + \\ &\frac{1}{T(t+1)} \sum\_{\tau=1}^{t+1} \Delta T(\tau) \cup\_{p\_{0}(\tau) \in [\mathbb{R}\_{0}]} p\_{0}(\tau) \left[\mathbb{M}^{t-\tau+1}\right] = R n\_{\mathbb{S}t+1}(\mathbb{S}, \mathbb{R}\_{0}). \end{split}$$

Now, we will show that

$$\operatorname{Rug}\_t(\operatorname{Rug}\_{\infty}(\mathbb{S}, \mathbb{R}\_0)) = \operatorname{Rug}\_{\infty}(\mathbb{S}, \mathbb{R}\_0). \tag{20}$$

Assume that E[*q*(∞)] is an element of *Rng*∞(S, R0) then from Lemma 4 we get that

$$\operatorname{Rng}(\operatorname{Rng}\_{\infty}(\mathbb{S}, \mathbb{R}\_{0})) = \frac{T}{T} \cup\_{\operatorname{\mathbb{E}}[q(\infty)] \in \operatorname{Rng}\_{\infty}(\mathbb{S}, \mathbb{R}\_{0})} \operatorname{\mathbb{E}}[q(\infty)][\mathbb{M}] +$$

$$\frac{1}{T}(T - T) \cup\_{p\_{0}(\infty) \in [\mathbb{R}\_{0}]} p\_{0}(\infty) = \operatorname{Rng}\_{\infty}(\mathbb{S}, \mathbb{R}\_{0})[\mathbb{M}] = \operatorname{Rng}\_{\infty}(\mathbb{S}, \mathbb{R}\_{0}), \tag{21}$$

where the last equality will be proved in Theorem 10. From (21) we recursively get (20).

Since the conditions of Theorem 4 hold in the present theorem we get that

$$d(\operatorname{Ring}\_t(\mathbb{S}\_1, \mathbb{R}\_1), \operatorname{Rug}\_t(\mathbb{S}\_2, \mathbb{R}\_2)) \to 0 \text{ geometrically fast.} \tag{22}$$

Now, in (22) replace *Rngt*(S1, R1) with *Rngt*(*Rng*∞(S, R0)) and *Rngt*(S2, R2) with *Rngt*(S, R0), then we get that

$$d(\operatorname{Rug}\_l(\operatorname{Rug}\_{\infty}(\mathbb{S}, \mathbb{R}\_0)), \operatorname{Rug}\_l(\mathbb{S}, \mathbb{R}\_0)) \to 0, \text{ geometrically fast.} \tag{23}$$

From (20),(23) we arrive at

$$d(\operatorname{Ring}\_t(\mathbb{S}, \mathbb{R}\_0), \operatorname{Ring}\_{\infty}(\mathbb{S}, \mathbb{R}\_0) \to 0, \text{ geometrically fast.})$$

The above Theorems 4 and 5 have an important consequence since it provides a generalization, by relaxing important assumptions, of the basic asymptotic theory for an NHMS. The theorem is the following and could be proved by just following the analogous steps as in the proofs of Theorems 4 and 5 and Theorem 3.3 in [11]:

**Theorem 6.** *Consider an NHMS and assume that*

 $\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty}$  converges geometrically fast.

*Let* {*Q*(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> *be the sequence of transition matrices and* {*p*0(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> *be the sequence of allocation probabilities. If* sup*<sup>t</sup>* <sup>T</sup>˜ (*Q*(*t*)) <sup>&</sup>lt; <sup>1</sup> *then*

> lim *<sup>t</sup>*→<sup>∞</sup> <sup>E</sup>(*q*(0, *<sup>t</sup>*)) <sup>=</sup> *<sup>q</sup>*(∞) *geometrically fast,*

*where q*(∞) *is the row of the stable matrix Q*(∞) = lim*t*→<sup>∞</sup> *Q*(0, *t*).

The basic asymptotic theorem for an NHMS and which has been used in many papers to provide further results is:

**Theorem 7.** *Let an NHMS and let that* (*a*) lim*t*→∞*Q*(*t*) − *Q* = 0 *and Q a regular stochastic matrix;* (*b*) *limt*→∞*p*0(*t*) − *p*0 = 0;*and* (*c*) lim*t*→∞[Δ*T*(*t*)/*T*(*t*)] = 0*. Then*

$$\lim\_{t \to \infty} \left\| \mathbb{E} (q(0, t)) - q(\infty) \right\| = 0$$

*where <sup>q</sup>*(∞) *is the row of the stable matrix <sup>Q</sup>*(∞) <sup>=</sup> lim*t*→<sup>∞</sup> *<sup>Q</sup><sup>t</sup>* .

Theorem 7 has been used extensively in the theory of NHMS to produce further results see for example [10–12,17,51], and the relaxation of the necessary conditions in Theorem 6 is apparent.

Another consequence is that Theorem 4 provides conditions under which two different NHMS, with the same number of states and population, but different initial states and different allocation probabilities of memberships if they have the same transition probabilities sequence of memberships (PTMS of the embedded Markov chains), they converge in the same expected relative population structure geometrically fast. This is stated in detail in the following theorem which could be proved by just following the analogous steps as in the proof of Theorems 4 and 5.

**Theorem 8.** *Let two NHMS a and b which have the same number of states, a common sequence of transition probability matrices of memberships* {*Q*(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> *and the same total population of memberships. Assume that*

$$\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty} \text{ converges geometrically fast.}$$

*Let qa*(0) *and qb*(0) *be their initial relative population structures respectively, and* {*pa*(*t*)}<sup>∞</sup> *t*=0 *and* {*pb*(*t*)}<sup>∞</sup> *<sup>t</sup>*=<sup>0</sup> *their sequences of allocation probabilities. If* sup*<sup>t</sup>* <sup>T</sup>˜ (*Q*(*t*)) <sup>&</sup>lt; <sup>1</sup> *then*

> lim *<sup>t</sup>*→<sup>∞</sup> <sup>E</sup>(*qa*(0, *<sup>t</sup>*)) <sup>=</sup> lim *<sup>t</sup>*→<sup>∞</sup> <sup>E</sup>(*qb*(0, *<sup>t</sup>*)) <sup>=</sup> *<sup>q</sup>*(∞) *geometrically fast,*

*where q*(∞) *is the row of the stable matrix Q*(∞) = lim*t*→<sup>∞</sup> *Q*(0, *t*).

#### **6. Properties of the Limit Set**

In the present section we establish some important properties of the limit set *Rng*∞(S, R0). We start with the following definition:

**Definition 6.** *Let n be an integer such that* T (*Q*1*Q*2**...***Qn*) < 1 *for all Q*1, *Q*2, **...,** *Q<sup>n</sup>* ∈ [M]*. Then* [M] *is said to be product scrambling and n its scrambling integer.*

We will make use of the following Theorem 3.3 in [9]:

**Theorem 9.** *([9]). Let x, y be non-compact subsets of SM*1,*n. Then using the Hausdorff metric we have*

$$d(\mathfrak{xM}, \mathfrak{yM}) \le \mathcal{T}(\mathbb{M})d(\mathfrak{x}, \mathfrak{y})\,.$$

We will now establish with a Theorem some important properties of the limit set.

**Theorem 10.** *Consider an NHMSS for which* [M] *is an interval which is uniformly scrambling. Assume that T*(*t*) ≥ *T*(*t* − 1) > 0 *for all t* = 1, 2, ... *and*

$$\left\{ \frac{\Delta T(t)}{T(t)} \right\}\_{t=0}^{\infty} \text{ converges to zero geometrically fast.}$$

*Also, assume that* [S0], [R0] *are compact and convex. Then if we define by*

$$\operatorname{Rug}\_{\infty}(\mathbb{S}\_{0}, \mathbb{R}\_{0}) = \lim\_{t \to \infty} \operatorname{Rug}\_{t}(\mathbb{S}\_{0}, \mathbb{R}\_{0}) = \cap\_{t=1}^{\infty} \operatorname{Rug}\_{t}(\mathbb{S}\_{0}, \mathbb{R}\_{0}),$$

*the limit set Rng*∞(S0, R0) *satisfies*

$$\operatorname{Rug}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0) = \operatorname{Rug}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbb{M}]\dots$$

*If in addition* [M] *is product scrambling with integer n then it is the unique set that has this property.*

**Proof.** For the first part of the Theorem 10 since *Rng*∞(S0, R0) is compact, it is sufficient to show that

$$d(\operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0), \operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbb{M}]) = 0.$$

In this respect

$$\delta(\operatorname{Ring}\_{\infty}(\mathbb{S}\_{0},\mathbb{R}\_{0}),\operatorname{Ring}\_{\infty}(\mathbb{S}\_{0},\mathbb{R}\_{0})[\mathbb{M}]) = $$

$$\begin{array}{c|c} \max\limits\_{q(0)\in[\mathbb{S}\_{0}]\_{\star}} & \min\limits\_{q(0)\in[\mathbb{S}\_{0}]\_{\star}} & \left|\lim\_{t\to\infty}\{\operatorname{\mathbb{E}}[q(0,t+1)]\}-\lim\_{t\to\infty}\{\operatorname{\mathbb{E}}[q(0,t)]\}\right| \\ \operatorname{p}\_{0}(\tau)\in[\mathbb{R}\_{0}] & \operatorname{p}\_{0}(\tau)\in[\mathbb{R}\_{0}] \\ \operatorname{Q}(t)\in[\mathbb{M}] \text{ for all } \tau,t \text{ } \operatorname{Q}(t)\in[\mathbb{M}] \text{ for all } \tau,t \end{array}$$

where

$$\begin{aligned} \{ \mathbb{E}[q(0, t+1)] \} &= \cup\_{q(0) \in [\mathbb{S}\_0]} \frac{T(0)}{T(t+1)} q(0) \left[ \mathbb{M}^{t+1} \right], \\ &+ \frac{1}{T(t+1)} \sum\_{\tau=1}^{t+1} \Delta T(\tau) \cup\_{p\_0(\tau) \in [\mathbb{R}\_0]} p\_0(\tau) \left[ \mathbb{M}^{t-\tau+1} \right]. \end{aligned}$$

Now there exists a *q*∗(0) ∈ [S0], *Q*∗(*t*) ∈ [M] for *t* ∈ [0, *t*], *p*0(*τ*) ∈ [R0] for *τ* ∈ [1, *t*], i.e.,

$$\mathbb{E}[q(0,t+1)] = \frac{T(0)}{T(t+1)} q^\*(0) \mathbf{Q}^\*(0,t+1) + 1$$

$$\frac{1}{T(t+1)} \sum\_{\tau=1}^{t+1} \Delta T(\tau) p\_0^\*(\tau) \mathbf{Q}^\*(\tau, t+1),$$

such that

$$\delta(\operatorname{Ring}\_{\infty}(\mathbb{S}\_{0}, \mathbb{R}\_{0}), \operatorname{Ring}\_{\infty}(\mathbb{S}\_{0}, \mathbb{R}\_{0})[\mathbb{M}]) = $$

$$\min\_{\substack{q(\mathbf{0}) \in [\mathbb{S}\_{0}], p\_{0}(\mathbf{r}) \in [\mathbb{R}\_{0}], \\ Q(t) \in [\mathbb{M}] \text{ for all } \mathbf{r}, t}} \left\| \lim\_{t \to \infty} \mathbb{E}[q^{\*}(0, t+1)] - \lim\_{t \to \infty} \{\mathbb{E}[q(0, t)]\}[\mathbb{M}] \right\| \right\|. \tag{24}$$

Now, for any values of the parameters of E[*q*(0, *t*)] that does not maximize it, the difference above is still greater or equal than the ones which minimize E[*q*(0, *t*)]. Thus, we are free to choose the parameters *q*∗(0) ∈ [S0], *Q*∗(0, *t*) ∈ M*t* and *p*∗ <sup>0</sup> (*τ*) ∈ [R0] for every *τ* ∈ [0, *t*]. With the same reasoning we could choose *Q*∗(*t*) in the place of [M]. Thus, we get that

$$\delta(\operatorname{Ring}\_{\mathbb{S}^{\infty}}(\mathbb{S}\_{0},\mathbb{R}\_{0}),\operatorname{Ring}\_{\mathbb{S}^{\infty}}(\mathbb{S}\_{0},\mathbb{R}\_{0})[\mathbb{M}]) =$$

$$\leq \lim\_{t \to \infty} \left\| \frac{T(0)}{T(t+1)} \mathfrak{q}^{\*}(0) \mathbf{Q}^{\*}(0,t+1) - \frac{T(0)}{T(t)} \mathfrak{q}^{\*}(0) \mathbf{Q}^{\*}(0,t+1) \right\| $$

$$+ \lim\_{t \to \infty} \left\| \frac{1}{T(t+1)} \sum\_{\tau=1}^{t+1} \Delta T(\tau) \mathfrak{p}\_{0}^{\*}(\tau) \mathbf{Q}^{\*}(\tau,t+1) -$$

$$\frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \mathfrak{p}\_{0}^{\*}(\tau) \mathbf{Q}^{\*}(\tau,t+1) \right\| $$

$$\leq \lim\_{t \to \infty} \left| \frac{T(0)}{T(t+1)} - \frac{T(0)}{T(t)} \right| \| \mathfrak{q}^{\*}(0) \| \| 1 \| \mathbf{Q}^{\*}(0,t+1) \| +$$

$$\lim\_{t \to \infty} \left| \frac{1}{T(t+1)} - \frac{1}{T(t)} \right| \left\| \sum\_{\tau=1}^{t} \Delta T(\tau) \mathfrak{p}\_{0}^{\*}(\tau) \mathbf{Q}^{\*}(\tau,t+1) \right\| $$

$$+ \lim\_{t \to \infty} \frac{\Delta T(t)}{T(t+1)} \left\| \| \mathbf{p}\_{0}^{\*}(t) \| \right\|. \tag{25}$$

Since [M] is uniformly scrambling we have that <sup>T</sup>¯ <sup>=</sup>*maxt*∈N<sup>T</sup> (*Q*∗(*t*)) <sup>&</sup>lt; 1 and thus *Q*∗(*τ*, *<sup>t</sup>*) <sup>&</sup>lt; <sup>T</sup>¯ *<sup>t</sup>*−*<sup>τ</sup>* and

$$\left\| \left| \sum\_{\tau=1}^{t} \Delta T(\tau) p\_0^\*(\tau) Q^\*(\tau, t+1) \right| \right\| \le \sum\_{\tau=1}^{t} \Delta T(\tau) \mathcal{T}^{t-\tau+1} \omega$$

which goes to zero as *t* → ∞ . Hence

$$\delta(\mathrm{Ring}\_{\mathbb{S}^\infty}(\mathbb{S}\_0, \mathbb{R}\_0), \mathrm{Ring}\_{\mathbb{S}^\infty}(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbb{M}]) = 0.$$

In a similar way we could prove that

$$\delta(\mathsf{Rng}\_{\mathbb{G}^\diamond}(\mathbb{S}\_{0},\mathbb{R}\_{0})[\mathsf{M}],\mathsf{Rng}\_{\mathbb{G}^\diamond}(\mathbb{S}^{0},\mathbb{R}^{0}))=0,$$

which lead us to the conclusion

$$d(\operatorname{Ring}\_{\infty}(\mathbb{S}\_{0\prime}, \mathbb{R}\_{0}) | \mathsf{M} |, \operatorname{Ring}\_{\infty}(\mathbb{S}\_{0\prime}, \mathbb{R}\_{0})) = 0.$$

For the second part of the theorem in addition we have that [M] is product scrambling with index *n*. Assume that there is a second *Rng*∗ <sup>∞</sup>(S0, R0) which is compact for which

$$
\operatorname{Rng}\_{\infty}^\*(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbb{M}] = \operatorname{Rng}\_{\infty}^\*(\mathbb{S}\_0, \mathbb{R}\_0).
$$

Then we get that

$$d(\operatorname{Rng}^\*\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0), \operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)) = d(\operatorname{Rng}^\*\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbf{M}^n], \operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)[\mathbf{M}^n])$$

$$\leq (\text{by Theorem 9 })$$

$$\leq \mathcal{T}[\mathbf{M}^n] d(\operatorname{Rng}^\*\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0), \operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)),$$

and since <sup>T</sup> [M*n*] <sup>&</sup>lt; 1 we get

$$d(\operatorname{Rng}\_{\infty}^\*(\mathbb{S}\_0, \mathbb{R}\_0), \operatorname{Rng}\_{\infty}(\mathbb{S}\_0, \mathbb{R}\_0)) = 0$$

from which we conclude that *Rng*∗ <sup>∞</sup>(S0, R0) and *Rng*∞(S0, R0) are the same set.

We will now establish an interesting result, that under certain conditions if we have two different inherent Markov set chains for two NHMSS, then the Hausdorff metric of the two sets of all possible expected relative structures of the NHMSS is less than the multiplication of a function of the common bound of the two uniform coefficients of ergodicity of the two intervals and the Hausdorff metric of the two intervals. We will need the following Lemma from ([9] p. 70), or [52].

**Lemma 8.** *Let <sup>q</sup>*(0), *<sup>q</sup>***˜**(0) *be stochastic vectors and <sup>Q</sup>*(*t*) <sup>∈</sup> [M]*;Q***˜**(*t*) <sup>∈</sup> M˜ *for t* = 1, 2, ... *for* T (M) ≤ T <1 *and* T M˜ ≤ T <1*. Then*

$$\left\| \left| \boldsymbol{q}(0)\mathbf{Q}(1,t) - \widetilde{\boldsymbol{q}}(0)\widetilde{\mathbf{Q}}(1,t) \right\| \right\| \leq \mathcal{T}^{t} \left\| \boldsymbol{q}(0) - \widetilde{\boldsymbol{q}}(0) \right\| \left\| 1 + \left( \mathcal{T}^{t-1} + \ldots + 1 \right) \mathbf{D} \right\| $$

*where D* **=** max*<sup>n</sup> Q*(*n*) <sup>−</sup> *<sup>Q</sup>***˜**(*n*) .

**Theorem 11.** *Let two NHMSS with inherent Markov set chains with two different tight intervals* [M] *and* M˜ *which have a common bound, i.e.,* T (M) ≤ T <1 *and* T M˜ ≤ T <1*. Let that for the first NHMSS we have q*(0) ∈ [S1] *and p*0(*τ*) ∈ [R1] *for τ* = 1, ... , *t where* [S1]*,* [R1] *are convex and compact. For the second NHMSS we assume that q***˜**(0) ∈ [S2] *and p***˜**0(*τ*) ∈ [R2] *for τ* = 1, ... , *t where* [S2]*,* [R2] *are convex and compact. Let that the two NHMSS have common total population of memberships* {*T*(*t*)}<sup>∞</sup> *<sup>t</sup>*=0*which is known. Assume that T*(*t*) ≥ *T*(*t* − 1) > 0 *and*

$$\left\{\frac{\Delta T(t)}{T(t)}\right\}\_{t=0}^{\infty} \text{ converges to zero geometrically fast.}$$

*Denote by Rngt*(S1, R1, M) *the set of all possible expected relative population structures for the first NHMSS and Rngt* S2, R2, M˜ *for the second one, respectively. Then the Hausdorff metric of the two limit sets of expected relative population structures is bounded by*

$$d\left(\mathrm{Rng}\_{\mathbb{G}^\infty}(\mathbb{S}\_1, \mathbb{R}\_1, \mathbb{M}), \mathrm{Rng}\_{\mathbb{G}^\infty}(\mathbb{S}\_2, \mathbb{R}\_2, \mathbb{M})\right) \le (1 - \mathcal{T})^{-1} d\left(\mathbb{M}, \mathbb{\tilde{M}}\right).$$

**Proof.** From Lemma 4 we get that

$$R\mathfrak{g}\_t(\mathbb{S}\_1, \mathbb{R}\_1, \mathbb{M}) = \{ \mathbb{E}[q(0, t)] : \mathbb{E}[q(0, t)] \in \frac{T(0)}{T(t)} \cup\_{q(0) \in [\mathbb{S}\_1]} q(0) \, [\mathbb{M}^t] + \} $$

$$+\frac{1}{T(t)}\sum\_{\tau=1}^{t}\Delta T(\tau)\cup\_{p\_0(\tau)\in[\mathbb{R}\_1]}\mathfrak{p}\_0(\tau)\left[\mathbb{M}^{t-\tau}\right]\rangle,\tag{26}$$

and an analogous description is valid for the set *Rngt* S2, R2, M˜ . Since [S1], [R1], [S2], [R2] are convex and compact and [M] and M˜ are tight intervals from Theorem 1 we get that the sets *Rngt*(S1, R1, M) and *Rngt* S2, R2, M˜ are convex and compact, hence there is a meaning to get their Hausdorff metric

$$d\left(\operatorname{Rng}\_{t}(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M}),\operatorname{Rng}\_{t}(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbb{M})\right) = $$

$$\max\left\{\delta\left(\operatorname{Rng}\_{t}(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M}),\operatorname{Rng}\_{t}(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbb{M})\right)\right\},\tag{27}$$

$$\delta\left(\operatorname{Rng}\_{t}(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbb{M}),\operatorname{Rng}\_{t}(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M})\right)\right\}.$$

Now, we have that

$$\delta\left(\mathrm{Rng}\_t(\mathbb{S}\_1, \mathbb{R}\_1, \mathbb{M}), \mathrm{Rng}\_t(\mathbb{S}\_2, \mathbb{R}\_2, \mathbb{M})\right) = $$

$$\max\_{\substack{q(\emptyset) \in [\mathbb{S}\_1], p\_0(\tau) \in [\mathbb{R}\_1], \\ Q(t) \in [\mathbb{M}] \text{ for all } \mathbf{r}, t \\ \mathbf{Q}(t) \in [\mathbb{M}] \text{ for all } \mathbf{r}, t \\ \mathbf{r}}} \min\_{\substack{\mathbf{q}(\emptyset) \in [\mathbb{R}\_2], p\_0(\tau) \in [\mathbb{R}\_2], \\ \mathbf{Q}(t) \in [\mathbb{M}] \text{ for all } \mathbf{r}, t}} \left\| \mathbb{E}[q(\emptyset, t)] - \mathbb{E}[\overline{q}(\emptyset, t)] \right\| \,. \tag{28}$$

There exists some *q*∗(0) ∈ [S1], *p*<sup>∗</sup> <sup>0</sup> (*τ*) for every *τ* ∈ [1, *t*], *Q*∗(*t*) ∈ [M] for every *t* ∈ [0, *t*] which determines a E[*q*∗(0, *t*)]; also for any *q***˜**∗(0) ∈ [S2], any *p***˜**<sup>∗</sup> <sup>0</sup> (*τ*) for every *<sup>τ</sup>* <sup>∈</sup> [1, *<sup>t</sup>*], and finally any *<sup>Q</sup>***˜** <sup>∗</sup> (*t*) ∈ M˜ for every *t* ∈ [0, *t*] which determines a E[*q***˜**∗(0, *t*)] for which we have

$$\delta\left(\mathrm{Rng}\_{t}(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M}),\mathrm{Rng}\_{t}(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbb{M})\right) \leq \left\|\mathbb{E}[\mathbf{q}^{\*}(0,t)] - \mathbb{E}[\tilde{\mathbf{q}}^{\*}(0,t)]\right\|$$

$$\leq \frac{T(0)}{T(t)} \left\|\mathbf{q}^{\*}(0)\mathbf{Q}^{\*}(0,t) - \tilde{\mathbf{q}}^{\*}(0)\tilde{\mathbf{Q}}^{\*}(0,t)\right\|\tag{29}$$

$$+ \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \left\|p\_{0}^{\*}(\tau)\mathbf{Q}^{\*}(\tau,t) - \tilde{p}\_{0}^{\*}(\tau)\tilde{\mathbf{Q}}^{\*}(\tau,t)\right\|.$$

Using Lemma 8 we get that

$$\frac{T(0)}{T(t)} \left\| \boldsymbol{\mathfrak{q}}^{\*}(0)\boldsymbol{\mathfrak{Q}}^{\*}(0,t) - \boldsymbol{\mathfrak{q}}^{\*}(0)\bar{\boldsymbol{\mathcal{Q}}}^{\*}(0,t) \right\| \leq$$

$$\frac{T(0)}{T(t)} \boldsymbol{\mathcal{T}}^{t} \left\| \boldsymbol{\mathfrak{q}}^{\*}(0) - \bar{\boldsymbol{\mathfrak{q}}}^{\*}(0) \right\| + \frac{T(0)}{T(t)} \left\| \boldsymbol{\mathcal{T}}^{t-1} + \dots + 1 \right\| \max\_{t} \left\| \boldsymbol{\mathcal{Q}}^{\*}(t) - \bar{\boldsymbol{\mathcal{Q}}}^{\*}(t) \right\| $$

$$\leq \frac{T(0)}{T(t)} \boldsymbol{\mathcal{T}}^{t} \delta(\mathbb{S}\_{1}, \mathbb{S}\_{2}) + \frac{T(0)}{T(t)} \left\| \boldsymbol{\mathcal{T}}^{t-1} + \dots + 1 \right\| \delta \{\mathbb{M}, \bar{\mathbf{M}}\}. \tag{30}$$

Similarly, we have

$$\frac{1}{T(t)}\sum\_{\tau=1}^{t}\Delta T(\tau)\left\|p\_0^\*(\tau)\mathbf{Q}^\*(\tau,t)-\bar{p}\_0^\*(\tau)\mathbf{Q}^\*(\tau,t)\right\|$$

$$\leq \frac{1}{T(t)}\sum\_{\tau=1}^{t}\Delta T(\tau)\delta(\mathbb{R}\_1,\mathbb{R}\_2)\mathcal{T}^{t-\tau}+$$

$$\frac{1}{T(t)}\sum\_{\tau=1}^{t}\Delta T(\tau)\left(\mathcal{T}^{t-1}+\dots+1\right)\delta(\mathbb{M},\mathbb{M}).\tag{31}$$

From (29),(30) and (31) as *t* → ∞ we get that

$$\delta\left(\mathrm{Rng}\_{\mathcal{Q}^{\otimes}}(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M}),\mathrm{Rng}\_{\mathcal{Q}^{\otimes}}(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbb{M})\right) \leq \left(1-\mathcal{T}\right)^{-1}\delta\left(\mathbb{M},\mathbb{\tilde{M}}\right),\tag{32}$$

since

$$\lim\_{t \to \infty} \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \left( \mathcal{T}^{t-1} + \dots + 1 \right) \delta \left( \mathbb{M}, \mathbb{M} \right)$$

$$= \lim\_{t \to \infty} \frac{1}{T(t)} \delta \left( \mathbb{M}, \mathbb{M} \right) \sum\_{\tau=1}^{t} \Delta T(\tau) \sum\_{k=\tau}^{t-1} \mathcal{T}^{t-k} \le$$

$$\lim\_{t \to \infty} \frac{1}{T(t)} (1 - \mathcal{T})^{-1} \delta \left( \mathbb{M}, \mathbb{M} \right) \sum\_{\tau=1}^{t} \Delta T(\tau) = \left[ 1 - \frac{T(0)}{T} \right] (1 - \mathcal{T})^{-1} \delta \left( \mathbb{M}, \mathbb{M} \right),$$

also

$$\lim\_{t \to \infty} \frac{T(0)}{T(t)} \mathcal{T}^t \delta(\mathbb{S}\_1, \mathbb{S}\_2) + \lim\_{t \to \infty} \frac{T(0)}{T(t)} \left\| \mathcal{T}^{t-1} + \dots + 1 \right\| \mathbf{1} \delta(\mathbb{M}, \mathbb{M})$$

$$= \frac{T(0)}{T} (1 - \mathcal{T})^{-1} \delta(\mathbb{M}, \mathbb{M})$$

and finally

$$\lim\_{t \to \infty} \frac{1}{T(t)} \sum\_{\tau=1}^{t} \Delta T(\tau) \delta(\mathbb{R}\_1, \mathbb{R}\_2) \mathcal{T}^{t-\tau+1} = 0,$$

as we have seen in the proof of Theorem 8. Similarly, as we proved (32) we could prove that

$$\delta\left(\mathrm{Rug}\_{\infty}\left(\mathbb{S}\_{2},\mathbb{R}\_{2},\mathbf{\tilde{M}}\right),\mathrm{Rug}\_{\infty}\left(\mathbb{S}\_{1},\mathbb{R}\_{1},\mathbb{M}\right)\right) \leq \left(1-\mathcal{T}\right)^{-1}\delta\left(\mathbb{\mathbf{\tilde{M}}},\mathbb{M}\right).\tag{33}$$

,

From (32) and (33) we arrive at the conclusion of the Theorem.

#### **7. An Illustrative Representative Example**

In the present section we present an illustrative representative example to a Geriatric and Stroke Patients system and through it we will present the methodology in terms of computational geometry algorithms needed for an application to any population system. The NHMS model used is a general Coxian phase type model, special forms of which has been used by the school of research by McClean and her co-authors [38,53–59]. We distinguish three states which are called hospital pathways. For the system of Geriatric and Stroke Patients these stages are labeled as "Acute Care", the "Rehabilitative" and the "Long Stay". From each stay we have movements outside the hospital due to discharge or death. Also, geriatric patients may be thought of as progressing through stages of acute care, rehabilitation and long-stay care, where most patients are eventually rehabilitated and discharged. Geriatric medical services are an important asset in the care of elderly and their quality is certainly an indication of the level of civilization in a society. At the same time their funding could be easily reduced due to political pressure on savings in health care expenditure.

It is apparent that the number of pathways could be increased as much as it is needed to accommodate any important characteristics of any patients systems. However, there is no need to consider in here a larger number of states due to the restriction of space. Also, the internal movements in a population of patients could be of any number to accommodate any important characteristics.

Consider a hospital which starts with *T*(0) = 400 patients and in a very short time reaches its full capacity of 435 patients, i.e., *T*(1) = 420, *T*(2) = 430, *T*(3) = 435. Assume three hospital pathways and let that the initial relative population structure be any stochastic vector which lies in the set

$$\mathbb{B}^0 = \{ [0 \ 0 \ 0], \ [1 \ 1 \ 1] \}.$$

The physical meaning of selecting S<sup>0</sup> as above is that the initial relative structure could be any stochastic vector, i.e., S<sup>0</sup> contains all possible initial structures. For example *q*(0) = 0.2 0.3 0.5 means that 20% of the patients are in pathway 1, 30% are in pathway 2, and 50% are in pathway 3. Now, there are some initial structures which might not be acceptable for the management of the hospital, such as for example let say *q*(0) = 0 0 1.0 . In this case in the initial design of the hospital measures should be taken that such a situation will be avoided in cooperation with other nearby hospitals. Then S<sup>0</sup> could be chosen to be

$$\mathbb{S}^0 = \{ [0 \ 0 \ 0], \ [0.1 \ 1 \ 0.9] \}.$$

This is a convex set which excludes the initial relative structure *q*(0) = 0 0 1.0 . However, it also needs to be tight . How to make it tight is explained below with the use of the Algorithm 1. Naturally, we could exclude more than one relative structure from the chosen initial relative structure but the procedure will be the same.

Most new patients enter the system in hospital pathway one, either by taking an empty place or as a virtual replacement of a discharged patient. Hence, let R<sup>0</sup> be the convex set from which allocation probabilities are drawn and let it be the interval

$$
\mathbb{R}\_0 = \{ [0.5 \ 0.2 \ 0.1], \ [0.7 \ 0.4 \ 0.1] \}.
$$

The set of stochastic vectors of allocation probabilities that are in the above interval is a convex set with vertices *r*<sup>1</sup> = [0.7 0.2 0.1] and *r*<sup>2</sup> = [0.5 0.4 0.1]. How to find the vertices will be explained below. The physical meaning of the above interval is that any stochastic vector that belongs in the interval R<sup>0</sup> is a candidate to represent a registration policy for the hospital. Naturally, we can restrict our interval R<sup>0</sup> to an interval which will contain the desired recruitment policies of the hospital management and find in this way using the results of the present paper the consequences of these policies. The recruitment vectors are the best control variables for human populations ([2]), as are the hospitals in this case. Methods of control by recruitment could be found in [50,60–62]. However, the interval R<sup>0</sup> needs to be tight and how to make it tight is explained below with the use of the Algorithm 1.

Now, by observing past data it is not difficult to determine an interval of matrices [M] = *Q***ˇ ,** *Q***ˆ** where all stochastic transition matrices of the movements of memberships lie. Please note that the fact that the matrices *Q***ˇ ,** *Q***ˆ** are not necessarily stochastic matrices makes this task easy. We need that [M] = *Q***ˇ ,** *Q***ˆ** should be tight. In order to test that interval [M] is tight we use Lemma 1 and Definition 5

We chose [M] to be

$$
\check{Q} = \begin{pmatrix} 0.5 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.3 & 0.2 & 0.3 \end{pmatrix} \text{ and } \quad \hat{Q} = \begin{pmatrix} 0.7 & 0.4 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.5 & 0.5 & 0.3 \end{pmatrix}.
$$

in order that [M] is tight every row should be tight hence using Lemma 1 we could see that the interval is tight. Hence, the stochastic matrices that belong to [M] is a convex polytope and we need to find its vertices. The same applies for S<sup>0</sup> and R<sup>0</sup> which are tight, and we need to find the vertices of the convex polytope on which all stochastic vectors of the interval lie.

Applying Lemma 1 we find that the set S<sup>0</sup> is tight and applying Algorithm 1 we get that S<sup>0</sup> is convex with vertices *υ*<sup>1</sup> = (100), *υ*<sup>2</sup> = (010) and *υ*<sup>3</sup> = (001). The physical meaning of the set S<sup>0</sup> in here is that at the start of our study at time 0 we allow that the hospital could have any relative population structure. Applying Lemma 1 we find that the set R<sup>0</sup> is tight and applying Algorithm 1 we get that R<sup>0</sup> is convex with vertices *r*<sup>1</sup> = (0.7 0.2 0.1), *r*<sup>2</sup> = (0.5 0.4 0.1).

**Algorithm 1** ([9]) **Finding the vertices of a tight interval** [*p***,** *q*].

For each *i* = 1, 2, 3 construct the vectors: For *i* = 1 and the vector *p*

[*p*<sup>1</sup> *p*<sup>2</sup> *p*3] , [*p*<sup>1</sup> *p*<sup>2</sup> *q*3] , [*p*<sup>1</sup> *q*<sup>2</sup> *p*3] , [*p*<sup>1</sup> *q*<sup>2</sup> *q*3],

replace *p*<sup>1</sup> with *p*˜1 such that the above vectors will become stochastic. If any of the four resulting stochastic vectors

[*p*˜1 *p*<sup>2</sup> *p*3] , [*p*˜1 *p*<sup>2</sup> *q*3] , [*p*˜1 *q*<sup>2</sup> *p*3] , [*p*˜1 *q*<sup>2</sup> *q*3],

belongs in the interval [*p***,** *q*] then it is a vertex. Do the same for the vector *q*. END.

To find the vertices of the convex set of stochastic matrices that belong to the tight interval [M] we apply Algorithm 1 for each row vector in *Q***ˇ ,** *Q***ˆ** and we find that the vertices are

$$\begin{aligned} V\_1 &= \begin{pmatrix} 0.7 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.5 & 0.2 & 0.3 \end{pmatrix}, & V\_2 &= \begin{pmatrix} 0.7 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.3 & 0.4 & 0.3 \end{pmatrix}, \\\ V\_3 &= \begin{pmatrix} 0.5 & 0.4 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.5 & 0.2 & 0.3 \end{pmatrix}, & V\_4 &= \begin{pmatrix} 0.5 & 0.4 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.3 & 0.4 & 0.3 \end{pmatrix}. \end{aligned}$$

Now we compute all the row vectors *υiV<sup>j</sup>* for *i* = 1, 2, 3 and *j* = 1, 2, 3, 4 :


These 12 stochastic vectors belong to a convex set, hence using any of the computational geometry methods in [61] we find that the vertices of the convex hull of these vectors are

> *ω*<sup>1</sup> = 0.7 0.2 0.1 , *ω*<sup>2</sup> = 0.5 0.4 0.1 , *ω*<sup>1</sup> = 0.2 0.6 0.2 , *ω*<sup>4</sup> = 0.5 0.2 0.3 , *ω*<sup>5</sup> = 0.3 0.4 0.3 .

Hence

$$R\eta\_1(\mathbb{S}\_0, \mathbb{R}\_0) = \frac{T(0)}{T(1)} conv\{\omega\_1, \omega\_2, \omega\_3, \omega\_4, \omega\_5\} + 1$$

$$\frac{1}{T(1)}[T(1) - T(0)] conv\{r\_1, r\_2\} \_\prime$$

from which we get that *Rng*1(S0, R0) is the convex set with vertices given by the above Minkowski sum thus

$$\begin{aligned} \text{(0.7\\_0.2\\_0.1), (0.51\\_0.39\\_0.1), &\text{(0.22\\_0.58\\_0.2), (0.51\\_0.2\\_0.29)},\\ \text{(0.32\\_0.39\\_0.29), (0.69\\_0.21\\_0.1), &\text{(0.5\\_0.4\\_0.1), }\\ \text{(0.21\\_0.59\\_0.2), (0.5\\_0.21\\_0.29), &\text{(0.31\\_0.4\\_0.29)},\\ \text{(0.51\\_0.2\\_0.29).} \end{aligned}$$

Taking into account the rounding errors done with all the multiplications and additions we compute the vertices with one decimal point of accuracy and we get that *Rng*1(S0, R0) is the convex hull of the vertices

#### 0.7 0.2 0.1 , 0.5 0.4 0.1 , 0.2 0.6 0.2 , 0.5 0.2 0.3 , and 0.3 0.4 0.3 .

This result verifies Theorem 1. With now the vertices of the convex set *Rng*1(S0, R0) we repeat the previous process to find *Rng*2(S0, R0).

Now, in this way at every point in time we have the convex compact space *Rngt*(S0, R0) of all possible expected population structures. If any of these are problematic in some way then apparently the hospital has a lead time to adapt new policies and an instrument to visualize their consequences.

To verify that a tight interval of transition probability matrices is uniformly scrambling we need a sufficient condition as a criterion. This is given in the following easily proved Lemma.

**Lemma 9.** *Let* [M] = *Q***ˇ ,** *Q***ˆ** *then* [M] *is uniformly scrambling if the following holds* T *<sup>Q</sup>***<sup>ˆ</sup>** *− <sup>Q</sup>***<sup>ˇ</sup>** < 1.

Let [M] be the interval of the application we are working so far, then it is easy to check that T *<sup>Q</sup>***<sup>ˆ</sup>** *− <sup>Q</sup>***<sup>ˇ</sup>** < 1 and hence any stochastic matrix selected from [M] will be scrambling. We select as *Q*(1), *Q*(2), *Q*(3), *Q*(4) the four vertices of the convex set of stochastic matrices in [M] and as *Q*(6), *Q*(7) any convex combination of them. We also select as *q*(0) = 0.5 0.25 0.25 and as vectors of allocation probabilities we select *p*0(1) = 0.6 0.3 0.1 and *p*0(2) = 0.5 0.4 0.1 which both belong to R0. Then we compute

> E[*q*(0, 1)] = 0.5 0.3 0.2 E[*q*(0, 2)] = 0.48 0.36 0.16 E[*q*(0, 3)] = 0.4 0.43 0.17 E[*q*(0, 4)] = 0.34 0.48 0.18 E[*q*(0, 5)] = 0.35 0.46 0.19 E[*q*(∞)] = 0.4 0.4 0.2 .

Hence the expected relative population structure converges in six steps, that is geometrically fast which verifies Theorem 6. Also, it is easy to see that

$$\mathbb{E}[q(\infty)]V\_1 = \begin{pmatrix} 0.4 & 0.4 & 0.2 \end{pmatrix} \begin{pmatrix} 0.7 & 0.2 & 0.1 \\ 0.2 & 0.6 & 0.2 \\ 0.5 & 0.2 & 0.3 \end{pmatrix} = \begin{pmatrix} 0.4 & 0.4 & 0.2 \end{pmatrix}\_t$$

and the same happens with all the vertices of [M] which was proved in Theorem 10.

The hospital now has beforehand knowledge with a good lead time where its policies and tendencies of the hospital system will converge in terms of relative expected population structure. Hence it is able to decide if this is a desirable situation; to find out if it can cope with the resources available in doctors, nurses and medical material; if its medical facilities are adequate; it can also have an estimate of the cost of the system see [62,63].

Consider now that the previous NHMS is system *a* and let *b* be a second NHMS with initial population structure *q*(0) = 0.6 0.2 0.2 ; allocation probabilities *<sup>p</sup>*0(1) <sup>=</sup> 0.5 0.4 0.1 and *p*0(2) = 0.7 0.2 0.1 and the remaining parameters the same. Then the asymptotically relative population structure is again E[*q*(∞)] = 0.4 0.4 0.2 which verifies Theorem 8. The physical meaning of the previous result is that when the hospital is at full capacity for some time, then with different initial structures and allocation probabilities the expected relative population structure remains unchanged under the condition that the maximum ergodicity coefficient of the transition probability matrices is less than one.

To be able to use for the benefit of the hospital the last theorem, that is Theorem 11, we need a way to find a numerical value that will replace *d* M, M˜ . The following Lemma, which is not difficult to be proved, provides a solution to the problem.

**Lemma 10.** *Let* M, M˜ *be two tight intervals of transition probability matrices for the memberships. Let V*1, *V*2, ... , *V<sup>m</sup> be the vertices of the convex set* M*, and U*1, *U*2, ... , *U<sup>n</sup> be the vertices of the convex set* M˜ *. Then*

> *δ* M, M˜ = *a*<sup>∗</sup> <sup>1</sup>*V*<sup>1</sup> + *a*<sup>∗</sup> <sup>2</sup>*V*<sup>2</sup> + ... + *a*<sup>∗</sup> *<sup>m</sup>V<sup>m</sup>* − [*b*<sup>∗</sup> <sup>1</sup>*U*<sup>1</sup> + *b*<sup>∗</sup> <sup>2</sup>*U*<sup>2</sup> + ... + *b*<sup>∗</sup> *<sup>n</sup>Un*],

*where a*∗ <sup>1</sup>, *a*<sup>∗</sup> <sup>2</sup>,..., *a*<sup>∗</sup> *<sup>m</sup> is the solution of the optimization problem*

max[*a*1*V*<sup>1</sup> + *a*2*V*<sup>2</sup> + ... + *amVm*] *with*

$$a\_1 + a\_2 + \dots + a\_m = 1,\\ a\_1 \ge 0, a\_2 \ge 0, \dots, a\_m \ge 0.$$

*and b*∗ <sup>1</sup> **,***b*<sup>∗</sup> <sup>2</sup> **,**..., *b*<sup>∗</sup> *<sup>n</sup> is the solution of the optimization problem*

$$\min \left[ b\_1 \mathbf{U}\_1 + b\_2 \mathbf{U}\_2 + \dots + b\_n \mathbf{U}\_n \right] \text{ with } 1$$

$$b\_1 + b\_2 + \dots + b\_n = 1,\\ b\_1 \ge 0, b\_2 \ge 0, \dots, b\_m \ge 0.$$

In what follows we summarize in Algorithm 2 for convenience of the interest readers the previous steps which are necessary for using the present results in any population system.

#### **Algorithm 2**

Use Lemma 1 to check that [S0] is tight. Apply **Algorithm 1** to find the vertices of the convex set [S0] :

*υ*1, *υ*2,..., *υs*.

Use Lemma 1 to check that [R0] is tight. Apply **Algorithm 1** to find the vertices of the convex set [R0] :

*r*1,*r*2,...,*rπ*.

Use Lemma 1 to check that each row in *Q***ˇ ,** *Q***ˆ** is tight. If yes then [M] is tight. Apply **Algorithm 1** for each row vector in *Q***ˇ ,** *Q***ˆ** to find the vertices of [M] :

$$\mathcal{V}\_{1\prime}\mathcal{V}\_{2\prime}\dots\mathcal{V}\_{m\cdot}$$

Compute all the raw vectors

$$\mathbf{v}\_{i}\mathbf{V}\_{j}\text{ for all }i=1,2,\dots,s; j=1,2,\dots,m.$$

Use any of the computational geometry methods in [64]. to find the vertices of the convex hull of the vectors *υiV<sup>j</sup>* for all *i* = 1, 2, ... ,*s*; *j* = 1, 2, ... , *m*. Let that

$$w\_{1\prime}w\_{2\prime}\ldots,w\_{\prime\prime}$$

the vertices found. Compute using properties of the Minkowski sum of vectors

$$R n\_{\mathcal{S}1}(\mathbb{S}\_0, \mathbb{R}\_0) = \frac{T(0)}{T(1)} conv\{\omega\_1, \omega\_2, \omega\_3, \omega\_4, \omega\_5\} + 1$$

$$\frac{1}{T(1)} [T(1) - T(0)] conv\{r\_1, r\_2\}.$$

Repeat the process until *Rng*∞(S0, R0) geometrically fast, i.e., in 6 to 8 steps. Use Lemma 9 to verify that [M] = *Q***ˇ ,** *Q***ˆ** is uniformly scrambling, i.e., T *<sup>Q</sup>***<sup>ˆ</sup>** *− <sup>Q</sup>***<sup>ˇ</sup>** < 1. Use Lemma 10 to find bounds for

> *δ* M, M˜ < *μ* and *δ* M˜ , M < *μ*˜.

Set

$$d\left(\mathcal{M}, \check{\mathbb{M}}\right) < \max\left(\mu, \check{\mu}\right).$$

#### **8. Conclusions**

The concept of the non-homogeneous Markov set system was introduced which is a NHMS with its parameters in an interval. It was established under which conditions in a NHMSS the set of all possible expected relative population structures at a certain point in time is a convex set and a convex polygon. Then it was founded that if in an NHMSS the sets of initial structures are different but compact and convex; also, the sets of allocation probabilities of the memberships are different but convex and compact; the inherent non-homogeneous Markov set chain is common; then the Hausdorff metric of the two different sets of all possible expected relative structures asymptotically goes to zero geometrically fast, i.e., asymptotically they coincide geometrically fast. Then we established that in an NHMSS if the total population of memberships converge in a finite number geometrically fast, and the sets of initial structures and allocation probabilities of memberships are compact and convex, then the set of all possible expected relative population structure converge to a limit set geometrically fast. Then it was proved that these results generalize certain well-known results for NHMS's. Then it was proved that under some mild conditions the limit set of the expected relative population structures of an NHMSS remains invariant if any selected transition probability matrix of the inherent non-homogeneous Markov chain from the respective interval is multiplied by it from the right. It was also proved that the limit set is the only set with this property if the interval of selection of transition probabilities of the inherent non-homogeneous Markov chain is product scrambling. Finally it was assumed that two different NHMSS in the sense that they have different sets of selecting initial distributions, different sets of selecting allocation probabilities and different intervals of selecting the transition probabilities of the inherent non-homogeneous Markov chains, while they have in common that their respective intervals are uniformly scrambling with a common bound and they have the same total population of memberships. Then it was proved that the Hausdorff metric of the limit sets of the expected relative population structures of the two NHMSS is bounded by the multiplication of a function of the Hausdorff metric of the two tight intervals of selection of the stochastic matrices of the inherent non-homogeneous Markov set chains and the bound of their uniform coefficients of ergodicity.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

