**Some Dissimilarity Measures of Branching Processes and Optimal Decision Making in the Presence of Potential Pandemics**

#### **Niels B. Kammerer <sup>1</sup> and Wolfgang Stummer 2,\***


Received: 26 June 2020; Accepted: 28 July 2020; Published: 8 August 2020

**Abstract:** We compute exact values respectively bounds of dissimilarity/distinguishability measures–in the sense of the Kullback-Leibler information distance (relative entropy) and some transforms of more general power divergences and Renyi divergences–between two competing discrete-time *Galton-Watson branching processes with immigration* GWI for which the offspring as well as the immigration (importation) is arbitrarily Poisson-distributed; especially, we allow for arbitrary type of extinction-concerning criticality and thus for non-stationarity. We apply this to optimal decision making in the context of the spread of potentially pandemic infectious diseases (such as e.g., the current COVID-19 pandemic), e.g., covering different levels of dangerousness and different kinds of intervention/mitigation strategies. Asymptotic distinguishability behaviour and diffusion limits are investigated, too.

**Keywords:** Galton-Watson branching processes with immigration; Hellinger integrals; power divergences; Kullback-Leibler information distance/divergence; relative entropy; Renyi divergences; epidemiology; COVID-19 pandemic; Bayesian decision making; INARCH(1) model; GLM model; Bhattacharyya coefficient/distance

#### **Contents**




#### **1. Introduction**

(This paper is a thoroughly revised, extended and retitled version of the preprint arXiv:1005.3758v1 of both authors) Over the past twenty years, *density-based divergences D*(*P*, *Q*) –also known as (dis)similarity measures, directed distances, disparities, distinguishability measures, proximity measures–between probability distributions *P* and *Q*, have turned out to be of substantial importance for decisive statistical tasks such as parameter estimation, testing for goodness-of-fit, Bayesian decision procedures, change-point detection, clustering, as well as for other research fields such as information theory, artificial intelligence, machine learning, signal processing (including image and speech processing), pattern recognition, econometrics, and statistical physics. For some comprehensive overviews on the divergence approach to statistics and probability, the reader is referred to the insightful books of e.g., Liese & Vajda [1], Read & Cressie [2], Vajda [3], Csiszár & Shields [4], Stummer [5], Pardo [6], Liese & Miescke [7], Basu et al. [8], Voinov et al. [9], the survey articles of e.g., Liese & Vajda [10], Vajda & van der Meulen [11], the structure-building papers of Stummer & Vajda [12], Kißlinger & Stummer [13] and Broniatowski & Stummer [14], and the references therein. Divergence-based bounds of minimal mean decision risks (e.g., Bayes risks in finance) can be found e.g., in Stummer & Vajda [15] and Stummer & Lao [16].

Amongst the above-mentioned dissimilarity measures, an important omnipresent subclass are the so-called *f*−divergences of Csiszar [17], Ali & Silvey [18] and Morimoto [19]; important special cases thereof are the total variation distance and the very frequently used *λ*−*order power divergences Iλ*(*P*, *Q*) (also known as alpha-entropies, Cressie-Read measures, Tsallis cross-entropies) with *λ* ∈ R. The latter cover e.g., the very prominent Kullback-Leibler information divergence *I*1(*P*, *Q*) (also called relative entropy), the (squared) Hellinger distance *I*1/2(*P*, *Q*), as well as the Pearson chi-square divergence *I*2(*P*, *Q*). It is well known that the power divergences can be build with the help of the *λ*−*order Hellinger integrals Hλ*(*P*, *Q*) (where e.g., the case *λ* = 1/2 corresponds to the well-known Bhattacharyya coefficient), which are information measures of interest by their own and which are also the crucial ingredients of *λ*−*order Renyi divergences Rλ*(*P*, *Q*) (see e.g., Liese & Vajda [1], van Erven & Harremoes [20]); the case *R*1/2(*P*, *Q*) corresponds to the well-known Bhattacharyya distance.

The above-mentioned information/dissimilarity measures have been also investigated in non-static, time-dynamic frameworks such as for various different contexts of *stochastic processes* like *processes with independent increments* (see e.g., Newman [21], Liese [22], Memin & Shiryaev [23], Jacod & Shiryaev [24], Liese & Vajda [1], Linkov & Shevlyakov [25]), *Poisson point processes* (see e.g., Liese [26], Jacod & Shiryaev [24], Liese & Vajda [1]), *diffusion prcoesses and solutions of stochastic differential equations with continuous paths* (see e.g., Kabanov et al. [27], Liese [28], Jacod & Shiryaev [24], Liese & Vajda [1], Vajda [29], Stummer [30–32], Stummer & Vajda [15]), and *generalized binomial processes* (see e.g., Stummer & Lao [16]); further related literature can be found e.g., in references of the aforementioned papers and books.

Another important class of time-dynamic models is given by *discrete-time integer-valued branching processes*, in particular *(Bienaymé-)Galton-Watson processes without immigration* GW respectively *with immigration (resp. importation, invasion)* GWI, which have numerous applications in biotechnology, population genetics, internet traffic research, clinical trials, asset price modelling, derivative pricing, and many others. As far as important terminology is concerned, we abbreviatingly subsume both models as GW(I) and, simply as GWI in case that GW appears as a parameter-special-case of GWI; recall that a GW(I) is called *subcritical* respectively *critical* respectively *supercritical* if its offspring mean is less than 1 respectively equal to 1 respectively larger than 1.

For applications of GW(I) in *epidemiology*, see e.g., the works of Bartoszynski [33], Ludwig [34], Becker [35,36], Metz [37], Heyde [38], von Bahr & Martin-Löf [39], Ball [40], Jacob [41], Barbour & Reinert [42], Section 1.2 of Britton & Pardoux [43]); for more details see Section 2.3 below.

For connections of GW(I) to *time series of counts* including GLM models, see e.g., Dion, Gauthier & Latour [44], Grunwald et al. [45], Kedem & Fokianos [46], Held, Höhle & Hofmann [47], and Weiß [48]; a more comprehensive discussion can be found in Section 2.2 below.

As far as the combined study of information measures and GW processes is concerned, let us first mention that (transforms of) power divergences have been used for supercritical Galton-Watson processes without immigration for instance as follows: Feigin & Passy [49] study the problem to find an offspring distribution which is closest (in terms of relative entropy type distance) to the original offspring distribution and under which ultimate extinction is certain. Furthermore, Mordecki [50] gives an equivalent characterization for the stable convergence of the corresponding log-likelihood process to a mixed Gaussian limit, in terms of conditions on Hellinger integrals of the involved offspring laws. Moreover, Sriram & Vidyashankar [51] study the properties of offspring-distribution-parameters which minimize the squared Hellinger distance between the model offspring distribution and the corresponding non-parametric maximum likelihood estimator of Guttorp [52]. For the setup of GWI with Poisson offspring and nonstochastic immigration of constant value 1, Linkov & Lunyova [53] investigate the asymptotics of Hellinger integrals in order to deduce large deviation assertions in hypotheses testing problems.

In contrast to the above-mentioned contexts, this paper pursues the following main goals:


Because of the involved Poisson distributions, these goals can be tackled with a high degree of tractability, which is worked out in detail with the following structure (see also the full table of contents after this paragraph): in Section 2, we first introduce (i) the basic ingredients of Galton-Watson processes together with their interpretations in the above-mentioned pandemic setup where it is essential to study *all* types of criticality (being connected with levels of reproduction numbers), (ii) the employed fundamental information measures such as Hellinger integrals, power divergences and Renyi divergences, (iii) the underlying decision-making framework, as well as (iv) connections to time series of counts and asymptotical distinguishability. Thereafter, we start our detailed technical analyses by giving *recursive* exact values respectively *recursive* bounds–as well as their applications–of Hellinger integrals *H<sup>λ</sup>* (*P*A||*P*H) (see Section 3), power divergences *I<sup>λ</sup>* (*P*A||*P*H) and Renyi divergences *R<sup>λ</sup>* (*P*A||*P*H) (see Sections 4 and 5). *Explicit closed-form* bounds of Hellinger integrals *H<sup>λ</sup>* (*P*A||*P*H) will be worked out in Section 6, whereas Section 7 deals with Hellinger integrals and power divergences of the above-mentioned Galton-Watson type diffusion approximations.

#### **2. The Framework and Application Setups**

#### *2.1. Process Setup*

We investigate dissimilarity measures and apply them to decisions, in the following context. Let the integer-valued random variable *X<sup>n</sup>* (*n* ∈ N0) denote the size of the *n*th generation of a population (of persons, organisms, spreading news, other kind of objects, etc.) with specified characteristics, and suppose that for the modelling of the time-evolution *n* 7→ *X<sup>n</sup>* we have the choice between the following two (e.g., alternative, competing) models (H) and (A):

(H) a discrete-time homogeneous *Galton-Watson process with immigration GWI*, given by the recursive description

$$X\_0 \in \mathbb{N}; \qquad \mathbb{N}\_0 \ni X\_n = \sum\_{k=1}^{X\_{n-1}} Y\_{n-1,k} + \tilde{Y}\_{n\nu} \qquad n \in \mathbb{N}.\tag{1}$$

where *Yn*−1,*<sup>k</sup>* is the number of offspring of the *k*th object (e.g., organism, person) within the (*n* − 1)th generation, and *Y*e*<sup>n</sup>* denotes the number of immigrating objects in the *n*th generation. Notice that we employ an arbitrary *deterministic* (i.e., degenerate random) initial generation size *X*0. We always assume that under the corresponding dynamics-governing law *P*<sup>H</sup>


(A) a discrete-time homogeneous *Galton-Watson process with immigration GWI* given by the same recursive description (1), but with different dynamics-governing law *P*<sup>A</sup> under which (GWI1) holds with parameter *β*<sup>A</sup> > <sup>0</sup> (instead of *β*<sup>H</sup> > 0), (GWI2) holds with *α*<sup>A</sup> ≥ <sup>0</sup> (instead of *α*<sup>H</sup> ≥ 0), and (GWI3) holds. As a side remark, in some contexts the two models (H) and (A) may function as a "sandwich" of a more complicated not fully known model.

Basic and advanced facts on general GWI (introduced by Heathcote [54]) can be found e.g., in the monographs of Athreya & Ney [55], Jagers [56], Asmussen & Hering [57], Haccou [58]; see also e.g., Heyde & Seneta [59], Basawa & Rao [60], Basawa & Scott [61], Sankaranarayanan [62], Wei & Winnicki [63], Winnicki [64], Guttorp [52] as well as Yanev [65] (and also the references therein all those) for adjacent fundamental statistical issues including the involved technical and conceptual challenges.

For the sake of brevity, wherever we introduce or discuss corresponding quantities *simultaneously* for both models H and A, we will use the subscript • as a synonym for either the symbol H or A. For illustration, recall the well-known fact that the corresponding conditional probabilities *P*•(*X<sup>n</sup>* = · |*Xn*−<sup>1</sup> = *k*) are again Poisson-distributed, with parameter *β*• · *k* + *α*•.

In oder to achieve a transparently representable structure of our results, we subsume the involved parameters as follows:


Notice that for (unbridgeable) technical reasons, we *do not allow for* "crossovers" between "immigration and no-immigration" (i.e., *α*<sup>A</sup> = 0 and *α*<sup>H</sup> 6= 0, respectively, *α*<sup>A</sup> 6= 0 and *α*<sup>H</sup> = 0). For practice, this is not a strong restriction, since one may take e.g., *<sup>α</sup>*<sup>A</sup> <sup>=</sup> <sup>10</sup>−<sup>12</sup> and *<sup>α</sup>*<sup>H</sup> <sup>=</sup> 1.

For the non-immigration case *α*• = 0 one has the following *extinction properties* (see e.g., Harris [66], Athreya & Ney [55]). As usual, let us define the extinction time *τ* := min *i* ∈ N : *X*` = 0 for all integers ` ≥ *i* if this minimum exists, and *τ* := ∞ else. Correspondingly, let B := {*τ* < ∞} be the extinction set. If the *offspring mean β*• satisfies *β*• < 1—which is called the *subcritical* case– or *β*• = 1—which is known as the *critical* case–then extinction is certain, i.e., there holds *P*(B | *X*<sup>0</sup> = 1) = 1. However, if the offspring mean satisfies *β*• > 1—which is called the *supercritical* case–then there is a probability greater than zero, that the population never dies out, i.e., *P*(B | *X*<sup>0</sup> = 1) ∈]0, 1[. In the latter case, *X<sup>n</sup>* explodes (a.s.) to infinity as *n* → ∞.

In contrast, for the (nondegenerate, nonvanishing) immigration case *α*• 6= 0 there is *no extinction*, viz. *P*(B | *X*<sup>0</sup> = 1) = 0, although there may be zero population *X*`<sup>0</sup> = 0 for some intermediate time `<sup>0</sup> ∈ N; but due to the immigration, with probability one there is always a later time `<sup>1</sup> > `0, such that *X*`<sup>1</sup> > 0. Nevertheless, also for the setup *α*• 6= 0 it is important to know whether *β*• T 1—which is still called (super-, sub-)criticality–since e.g., in the case *β*• < 1 the population size *X<sup>n</sup>* converges (as *n* → ∞) to a stationary distribution on N whereas for *β*• > 1 the behaviour is non-stationary (non-ergodic), see e.g., Athreya & Ney [55].

At this point, let us emphasize that in our investigations (both for *α*• = 0 and for *α*• 6= 0) we *do allow for* "crossovers" between "different criticalities", i.e., we deal with all cases *β*<sup>A</sup> T 1 versus all cases *β*<sup>H</sup> T 1; as will be explained in the following, this unifying flexibility is especially important for corresponding epidemiological-model comparisons (e.g., for the sake of decision making).

One of our main goals is to quantitatively compare (the time-evolution of) two competing GWI models H and A with respective parameter sets (*β*H, *α*H) and (*β*A, *α*A), in terms of the information measures *H<sup>λ</sup>* (*P*A||*P*H) (Hellinger intergrals), *I<sup>λ</sup>* (*P*A||*P*H) (power divergences), *R<sup>λ</sup>* (*P*A||*P*H) (Renyi divergences). The latter two express a distance (degree of dissimilarity) between H and A. From this, we shall particularly derive applications for decision making under uncertainty (including tests).

#### *2.2. Connections to Time Series of Counts*

It is well known that a Galton-Watson process with Poisson offspring (with parameter *β*•) and Poisson immigration (with parameter *α*•) is "distributionally" equal to each of the following models (listed in "tree-type" chronological order):

(M1) a Poissonian *Generalized Integer-valued Autoregressive process* GINAR(1) in the sense of Gauthier & Latour [67] (see also Dion, Gauthier & Latour [44], Latour [68], as well as Grunwald et al. [45]), that is, a first-order autoregressive times series with Poissonian thinning (with parameter *β*•) and Poissonian innovations (with parameter *α*•);

	- (M2i) under the name BIN(1) by Rydberg & Shephard [69] for the description of the number *X<sup>n</sup>* of stock transactions/trades recorded up to time *n*;
	- (M2ii) under the name *Poisson autoregressive model* PAR(1) by Brandt & Williams [70] for the description of event counts in political and other social science applications;
	- (M2iii) under the name *Autoregressive Conditional Poisson model* ACP(1,0) by Heinen [71];
	- (M2iv) by Held, Höhle & Hofmann [47] as well as Held et al. [72], as a description of the time-evolution of counts from infectious disease surveillance databases, where *β*• (respectively, *α*•) is interpreted as driving parameter of epidemic (respectively, endemic) component; in principle, this type of modelling can be also implicitly recovered as a special case of the epidemics-treating work of Finkenstädt, Bjornstad & Grenfell [73], by assuming trend- and season-neglecting (e.g., intra-year) measles data in urban areas of about 10 million people (provided that their population size approximation extends linearly);
	- (M2v) under the name *integer-valued Generalized Autoregressive Conditional Heteroscedastic model* INGARCH(1,0) by Ferland, Latour & Oraichi [74] (since the conditional variance is *VarP*•[*Xn*|F*n*−1] = *α*• + *β*• · *Xn*−1), see also Weiß [75]; this has been refinely named as INARCH(1) model by Weiß [76,77], and frequently applied thereafter; for an "overlapping-generation type" interpretation of the INARCH(1) model, which is an adequate description for the time-evolution of overdispersed counts with an autoregressive serial dependence structure, see Weiß & Testik [78]; for a corresponding comprehensive recent survey (also to more general count time series), the reader is referred to the book of Weiß [48];

Moreover, according to the general considerations of Grunwald et al. [45], the Poissonian Galton-Watson model with immigration may possibly be "distributionally equal" to an integer-valued autoregressive model with random coefficient (thinning).

Nowadays, besides the name *homogeneous Galton-Watson model with immigration GWI*, the name *INARCH(1)* seems to be the most used one, and we follow this terminology (with emphasis on GWI). Typical features of the above-mentioned models (M1) to (M2v), are the use of Z as the set of times, and the assumptions *α*• > 0 as well as *β*• ∈]0, 1[, which guarantee stationarity and ergodicity (see above). In contrast, we employ N<sup>0</sup> as the set of times, degenerate (and thus, non-equilibrium) starting distribution, and arbitrary *α*• ≥ 0 as well as *β*• > 0. For such a situation, as explained above, we quantitatively compare two competing GWI models H and A with respective parameter sets (*β*H, *α*H) and (*β*A, *α*A). Since–as can be seen e.g., in (29) below—we basically employ only (conditionally) distributional ingredients, such as the corresponding likelihood ratio (see e.g., (13) to (15), (27) to (29) below), *all the results of the Sections 3–6 can be immediately carried over to the above-mentioned time-series contexts* (where we even allow for non-stationarities, in fact we start with a one-point/Dirac distribution); for the sake of brevity, in the rest of the paper this will not be mentioned explicitly anymore.

Notice that a Poissonian GWI as well as all models (M1) and (M2) are–despite of their *conditional* Poisson law– typically overdispersed since

$$\operatorname{EP}\_{\bullet}[X\_{\mathfrak{n}}] = \mathfrak{a}\_{\bullet} + \mathfrak{f}\_{\bullet} \cdot \operatorname{EP}\_{\bullet}[X\_{\mathfrak{n}-1}] \le \mathfrak{a}\_{\bullet} + \mathfrak{f}\_{\bullet} \cdot \operatorname{EP}\_{\bullet}[X\_{\mathfrak{n}-1}] + \mathfrak{f}\_{\bullet}^{2} \cdot \operatorname{Var}\_{\bullet}\operatorname{P}\_{\bullet}[X\_{\mathfrak{n}-1}] = \operatorname{Var}\_{\bullet}[X\_{\mathfrak{n}}], \quad \mathfrak{n} \in \mathbb{N}\backslash\{1\}, \quad \mathfrak{f}\_{\bullet} \in \operatorname{E}\_{\bullet}[X\_{\mathfrak{n}}]$$

with equality iff (i.e., if and only if) *α*• = 0 (NI) and *Xn*−<sup>2</sup> = 0 (extinction at *n* − 2 with *n* ≥ 3).

#### *2.3. Applicability to Epidemiology*

The above-mentioned framework can be used for any of the numerous fields of applications of discrete-time branching processes, and of the closely related INARCH(1) models. For the sake of brevity, we explain this—as a kind of running-example—in detail for the currently highly important context of the epidemiology of infectious diseases. For insightful non-mathematical introductions to the latter, see e.g., Kaslow & Evans [79], Osterholm & Hedberg [80]; for a first entry as well as overviews on modelling, the reader is referred to e.g., Grassly & Fraser [81], Keeling & Rohani [82], Yan [83,84], Britton [85], Diekmann, Heesterbeek & Britton [86], Cummings & Lessler [87], Just et al. [88], Britton & Giardina [89], Britton & Pardoux [43]. A survey on the particular role of branching processes in epidemiology can be found e.g., in Jacob [41].

Undoubtedly, by nature, the spreading of an infectious disease through a (human, animal, plant) population is a branching process with possible immigration. Indeed, typically one has the following mechanism:


All the above-mentioned times *t* · *k* and time intervals are random, by nature. Two further connected quantities are also important for modelling (see e.g., Yan & Chowell [84] (p. 241ff), including a history of corresponding terminology). Firstly, the *generation interval* (generation time, transmission interval) is the time interval from the onset of infectiousness in a primary case (called the infector) to the onset of infectiousness in a secondary case (called the infectee) infected by the primary case; clearly, the generation interval is random, and so is its duration (often, the (population-)mean of the latter is also called generation interval). Typically, generation intervals are important ingredients of branching process models of infectious diseases. Secondly, the *serial interval* describes time interval from the onset of symptoms in a primary case to the onset of symptoms in a secondary case infected by the primary case. By nature, the serial interval is random, and so is its duration (often, the (population-)mean of the latter is also called serial interval). Typically, the serial interval is easier to observe than the generation interval, and thus, the latter is often approximately estimated from data of the former. For further investigations on generation and serial intervals, the reader is referred to e.g., Fine [90], Svensson [91,92], Wallinga & Lipsitch [93], Forsberg White & Pagano [94], Nishiura [95], Scalia Tomba et al. [96], Trichereau et al. [97], Vink, Bootsma & Wallinga [98], Champredon & Dushoff [99], Just et al. [88], and–especially for the novel COVID-19 pandemics—An der Heiden & Hamouda [100], Ferretti et al. [101], Ganyani et al. [102], Li et al. [103], Nishiura, Linton & Akhmetzhanov [104], Park et al. [105].

With the help of the above-mentioned *individual* ingredients, one can aggregatedly build numerous different *population-wide* models of infectious diseases in discrete time as well as in continuous time; the latter are typically observed only in discrete-time steps (discrete-time sampling), and hence in the following we concentrate on discrete-time modelling (of the real or the observational process). In fact, we confine ourselves to the important task of modelling the evolution *n* 7→ *X<sup>n</sup>* of the number of *incidences* at "stage" *n*, where *incidence* refers to the number of *new* infected/infectious individuals. Here, *n* may be a generation number where, inductively, *n* = 0 refers to the generation of the first appearing primary cases in the population (also called *initial importations*), and *n* refers to the generation of offsprings of all individuals of generation *n* − 1. Alternatively, *n* may be the index of a physical ("calender") point of time *tn*, which may be deterministic or random; e.g., (*tn*)*n*∈<sup>N</sup> may be a strictly increasing series of (i) equidistant deterministic time points (and thus, one can identify *t<sup>n</sup>* = *n* in appropriate time units such as days, weeks, bi-weeks, months), or (ii) non-equidistant deterministic time points, or (iii) random time points (as a side remark, let us mention that in some situations, *X<sup>n</sup>* may alternatively denote the number of *prevalences* at "stage" *n*, where *prevalence* refers to the total number of infected/infectious individuals (e.g., through some methodical tricks like "self-infection")).

In the light of this, one can loosely define an *epidemic* as the rapid spread of an infectious disease within a specified population, where the numbers *X<sup>n</sup>* of incidences are high (or much higher than expected) for that kind of population. A *pandemic* is a geographically large-scale (e.g., multicontinental or worldwide) epidemic. An *outbreak/onset* of an epidemic in the narrow sense is the (time of) change where an infectious disease turns into an epidemic, which is typically quantified by exceedance over an threshold; analogously, an *outbreak/onset* of a pandemic is the (time of) change where the epidemic turns into a pandemic. Of course, one goal of infectious-disease modelling is to quantify "early enough" the potential danger of an emerging outbreak of an epidemic or a pandemic.

Returning to possible models of the incidence-evolution *n* 7→ *Xn*, its description may be theoretically derived from more detailed, time-finer, highly sophisticated, individual-based "mechanistic" infectious-disease models such as e.g., continuous-time suscetible-exposed-infectious-recovered (SEIR) models (see the above-mentioned introductory texts); however, as e.g., pointed out in Held et al. [72], the estimation of the correspondingly involved numerous parameters may be too ambitious for routinely collected, non-detailed disease data, such as e.g., daily/weekly counts *X<sup>n</sup>* of incidences–especially in decisive emerging/early phases of a novel disease (such as the current COVID-19 pandemic). Accordingly, in the following we assume that *X<sup>n</sup>* can be approximately described by a Poissonian Galton-Watson process with immigration respectively a ("distributionally equal") Poissonian autoregressive Generalized Linear Model in the sense of (M2). Depending on the situation, this can be quite reasonable, for the following arguments (apart from the usual "if the data say so"). Firstly, it is well known (see e.g., Bartoszynski [33], Ludwig [34],

Becker [35,36], Metz [37], Heyde [38], von Bahr & Martin-Löf [39], Ball [40], Jacob [41], Barbour & Reinert [42], Section 1.2 of Britton & Pardoux [43]) that in populations with a relatively high number of susceptible individuals and a relatively low number of infectious individuals (e.g., in a large population and in decisive emerging/early phases of the disease spreading), the incidence-evolution *n* 7→ *X<sup>n</sup>* can be well approximated by a (e.g., Poissonian) Galton-Watson process with possible immigration where *n* plays the role of a *generation number*. If the above-mentioned generation interval is "nearly" deterministic (leading to nearly synchronous, non-overlapping generations)—which is the case e.g., for (phases of) Influenza A(H1N1)pdm09, Influenza A(H3N2), Rubella (cf. Vink, Bootsma & Wallinga [98]), and COVID-19 (cf. Ferretti et al. [101])—and the length of the generation interval is approximated by its mean length and the latter is tuned to be equal to the unit time between consecutive observations, then *n* plays the role of an *observation* (*surveillance*) *time*. This effect is even more realistic if the period of infectiousness is nearly deterministic and relatively short. Secondly, as already mentioned above, the spreading of an infectious disease is intrinsically a (not necessarily Poissonian Galton-Watson) branching mechanism, which may be blurred by other effects in a way that a Poissonian autoregressive Generalized Linear Model is still a reasonably fitting model for the observational process in disease surveillance. The latter have been used e.g., by Finkenstädt, Bjornstad & Grenfell [73], Held, Höhle & Hofmann [47], and Held et al. [72]; they all use non-constant parameters (e.g., to describe seasonal effects, which are however unknown in early phases of a novel infectious disease such as COVID-19). In contrast, we employ different new–namely divergence-based–statistical techniques, for which we assume constant parameters but also indicate procedures for the detection of changes; the extension to non-constant parameters is straightforward.

Returning to Galton-Watson processes, let us mention as a *side remark* that they can be also used to model the above-mentioned within-host replication dynamics (D2) (e.g., in the time-interval [*t E k* , *t I k* [ and beyond) on a sub-cellular level, see e.g., Spouge [106], as well as Taneyhill, Dunn & Hatcher [107] for parasitic pathogens; on the other hand, one can also employ Galton-Watson processes for quantifying snowball-effect (avalanche-effect, cascade-effect) type, economic-crisis triggered consequences of large epidemics and pandemics, such as e.g., the potential spread of transmissible (i) foreclosures of homes (cf. Parnes [108]), or clearly also (ii) company insolvencies, downsizings and credit-risk downgradings; moreover, the time-evolution of integer-valued indicators concerning the spread of (rational or unwarranted) fears resp. perceived threats may be modelled, too.

Summing up things, we model the evolution *n* 7→ *X<sup>n</sup>* of the number of incidences at stage *n* by a Poissonian Galton Watson process with immigration GWI

$$X\_0 \in \mathbb{N}; \qquad \mathbb{N}\_0 \ni X\_{\mathbb{N}} = \sum\_{k=1}^{X\_{\mathbb{N}-1}} Y\_{\mathbb{N}-1,k} + \tilde{Y}\_{\mathbb{N}} \qquad \mathbb{w} \in \mathbb{N}, \qquad \text{cf. (1), (GWI1)-(GWI3) with law } P\_{\bullet,k}$$

(where *<sup>Y</sup>n*−1,*<sup>k</sup>* corresponds to the *<sup>Y</sup><sup>k</sup>* of (D3), equipped with an additional stage-index *<sup>n</sup>* − 1), respectively by a corresponding "distributionally equal"–possibly non-stationary– Poissonian autoregressive Generalized Linear Model in the sense of (M2); depending on the situation, we may also fix a (deterministic or random) upper time horizon other than infinity. Recall that both models are overdispersed, which is consistent with the current debate on overdispersion in connection with the current COVID-19 pandemic. In infectious-disease language, the sum ∑ *Xn*−<sup>1</sup> *k*=1 *Yn*−1,*<sup>k</sup>* can also be loosely interpreted as *epidemic component* (in a narrow sense) driven by the parameter *β*•, and *Y*e*<sup>n</sup>* as *endemic component* driven by the parameter *α*•. In fact, the offspring mean (here, *β*•) is called *reproduction number* and plays a major role–also e.g., in the current public debate about the COVID-19 pandemic–because it crucially determines the rapidity of the spread of the disease and—as already indicated above in the second and third paragraph after (PS3)–also the probability that the epidemic/pandemic becomes (maybe temporally) extinct or at least stationary at a low level (that is, *endemic*). For this to happen, *β*• should be subcritical, i.e., *β*• < 1, and even better, close to zero. Of course, the size of the *importation mean α*• ≥ 0 matters, too, in a secondary order.

Keeping this in mind, let us discuss on which factors the reproduction number *β*• and the *importation mean α*• depend upon, and how they can be influenced/controlled. To begin with, by recalling the above-mentioned points (D1) to (D5) and by adapting the considerations of e.g., Grassly & Fraser [81] to our model, one encounters the fact that the distribution of the offspring *Yn*−1,*k*—here driven by the reproduction number (offspring mean) *β*•—depends on the following factors:

	- (B1a) degree of *biological* infectiousness; this reflects the within-host dynamics (D2) of the "representative" individual *k*, in particular the duration and amount of the corresponding replication and shedding/excretion of the infectious pathogens; this degree depends thus on (i) the number of host-invading pathogens (called the *initial infectious dose*), (ii) the type of the pathogen with respect to e.g., its principal capabilities of replication speed, range of spread and drug-sensitivity, (iii) features of the immune system of the host *k* including the level of innate or acquired immunity, and (iv) the interaction between the genetic determinants of disease progression in both the pathogen and the host;
	- (B1b) degree of *behavioural* infectiousness; this depends on the contact patterns of an infected/infectious individual (and, if relevant, the contact patterns of intermediate hosts or vectors), in relation to the disease-specific type of route(s) of transmission of the infectious pathogens (for an overview of the latter, see e.g., Table 3 of Kaslow & Evans [79]); a long-distance-travel behaviour may also lead to the disease exportation to another, outside population (and thus, for the latter to a disease importation);
	- (B1c) degree of *environmental* infectiousness; this depends on the location and environment of the host *k*, which influences the duration of outside-host survival of the pathogens (and, if relevant, of the intermediate hosts or vectors) as well as the speed and range of their outside-host spread; for instance, high temperature may kill the pathogens, high airflow or rainfall dynamics may ease their spread, etc.
	- (B2a) degree of *biological* susceptibility;
	- (B2b) degree of *behavioural* susceptibility;
	- (B2c) degree of *environmental* susceptibility.

All these factors (B1a) to (B2c) can be principally influenced/controlled to a certain–respective–extent. Let us briefly discuss this for *human* infectious diseases, where one major goal of epidemic risk management is to operate countermeasures/interventions in order to slow down the disease transmission (e.g., by reducing the reproduction number *β*• to less than 1) and eventually even break the chain of transmission, for the sake of containment or mitigation; preparedness and preparation are motives, too, for instance as a part of governmental pandemic risk management.

For instance, (B1a) can be reduced or even erased through pharmaceutical interventions such as medication (if available), and preventive strengthening of the immune system through non-extreme sports activities and healthy food.

Moreover, the following exemplary control measures for (B2) can be either put into action by common-sense self-behaviour, or by large-scale public recommendations (e.g., through mass media), or by rules/requirements from authorities:

(i) personal preventive measures such as frequent washing and disinfecting of hands; keeping hands away from face; covering coughs; avoidance of handshakes and hugs with non-family-members; maintaining physical distance (e.g., of two meters) from non-family-members; wearing a

face-mask of respective security degree (such as homemade cloth face mask, particulate-filtering face-piece respirator, medical (non-surgical) mask, surgical mask); self-quarantine;


As far as the degree of *biological* susceptibility (B2a) is concerned, one obvious therapeutic countermeasure is a mass vaccination program/campaign (if available).

In case of *highly virulent* infectious diseases causing epidemics and pandemics with substantial *fatality rates*, some of the above-mentioned control strategies and countermeasures may (have to) be "drastic" (e.g., lockdown), and thus imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions.

In order to prepare corresponding suggestions for decisions about appropriate control measures (e.g., public policies), it is therefore important–especially for a novel infectious disease such as the current COVID-19 pandemic–to have a model for the time-evolution of the incidences in (i) a natural (basically uncontrolled) set-up, as well as in (ii) the control set-ups under consideration. As already mentioned above, we assume that all these situations can be distilled into an incidence evolution *n* 7→ *X<sup>n</sup>* which follows a Poissonian Galton-Watson process with respectively different parameter pairs (*β*•, *α*•). Correspondingly, we always compare two alternative models (H) and (A) with parameter pairs (*β*H, *α*H) and (*β*A, *α*A) which reflect either a "pure" statistical uncertainty (under the *same* uncontrolled or controlled set-up), or the uncertainty between two *different* potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones); the economic impact can be also taken into account, within a Bayesian decision framework discussed in Section 2.5 below. As will be explained in the next subsections, we achieve such comparisons by means of density-based dissimilarity distances/divergences and related quantities thereof.

From the above-mentioned detailed explanations, it is immediately clear that for the described epidemiological context one should investigate *all* types of criticality and importation means for the therein involved two Poissonian Galton-Watson processes with/without immigration (respectively the equally distributed INARCH(1) models); in particular, this motivates (or even "justifies") the necessity of the very lengthy detailed studies in the Sections 3–7 below.

#### *2.4. Information Measures*

Having two competing models (H) and (A) at stake, it makes sense to study questions such as "how far are they apart?" and thus "how dissimilar are they?". This can be quantified in terms of divergences in the sense of directed (i.e., not necessarily symmetric) distances, where usually the triangular inequality fails. Let us first discuss our employed divergence subclasses in a *general* set-up of two *equivalent* probability measures *P*H, *P*<sup>A</sup> on a measurable space (Ω, F). In terms of the parameter *λ* ∈ R, the *power divergences*—also known as Cressie-Read divergences, relative Tsallis entropies, or generalized cross-entropy family– are defined as (see e.g., Liese & Vajda [1,10])

$$0 \le I\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) := \begin{cases} \begin{array}{ll} I \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right), & \text{if } \lambda = 1, \\\\ \frac{1}{\lambda(\lambda - 1)} \left( H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) - 1 \right), & \text{if } \lambda \in \mathbb{R} \backslash \{0, 1\}, \end{array} \\\\ I \left( P\_{\mathcal{H}} || P\_{\mathcal{A}} \right), & \text{if } \lambda = 0, \end{cases} \tag{2}$$

where

$$I(P\_{\mathcal{A}}||P\_{\mathcal{H}}) := \int p\_{\mathcal{A}} \log \frac{p\_{\mathcal{A}}}{p\_{\mathcal{H}}} \, d\mu \ge 0 \tag{3}$$

is the *Kullback-Leibler information divergence* (also known as *relative entropy*) and

$$H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) \ := \int\_{\Omega} p\_{\mathcal{A}}^{\lambda} p\_{\mathcal{H}}^{1-\lambda} \, d\mu \geq 0 \tag{4}$$

is the *Hellinger integral of order λ* ∈ R\{0, 1}; for this, we assume as usual without loss of generality that the probability measures *P*H, *P*<sup>A</sup> are dominated by some *σ*−finite measure *µ*, with densities

$$p\_{\mathcal{A}} = \frac{\mathrm{d}P\_{\mathcal{A}}}{\mathrm{d}\mu} \qquad \text{and} \qquad p\_{\mathcal{H}} = \frac{\mathrm{d}P\_{\mathcal{H}}}{\mathrm{d}\mu} \tag{5}$$

defined on Ω (the zeros of *p*H, *p*<sup>A</sup> are handled in (3) and (4) with the usual conventions). Clearly, for *λ* ∈ {0, 1} one trivially gets

$$H\_0\left(P\_{\mathcal{A}}||P\_{\mathcal{H}}\right) = H\_1\left(P\_{\mathcal{A}}||P\_{\mathcal{H}}\right) = 1\,.$$

The Kullback-Leibler information divergences (relative entropies) in (2) and (3) can alternatively be expressed as (see, e.g., Liese & Vajda [1])

$$I\left(P\_{\mathcal{A}}||P\_{\mathcal{H}}\right) = \lim\_{\lambda \nearrow 1} \frac{1 - H\_{\lambda}\left(P\_{\mathcal{A}}||P\_{\mathcal{H}}\right)}{\lambda(1 - \lambda)}, \qquad I\left(P\_{\mathcal{H}}||P\_{\mathcal{A}}\right) = \lim\_{\lambda \searrow 0} \frac{1 - H\_{\lambda}\left(P\_{\mathcal{A}}||P\_{\mathcal{H}}\right)}{\lambda(1 - \lambda)}.\tag{6}$$

Apart from the Kullback-Leibler information divergence (relative entropy), other prominent examples of power divergences are the squared Hellinger distance <sup>1</sup> 2 *I*1/2 (*P*A||*P*H) and Pearson's *χ* <sup>2</sup>−divergence 2 *I*<sup>2</sup> (*P*A||*P*H); the Hellinger integral *H*1/2 (*P*A||*P*H) is also known as (multiple of) the *Bhattacharyya coefficent*. Extensive studies about basic and advanced general facts on power divergences, Hellinger integrals and the related Renyi divergences of order *λ* ∈ R\{0, 1}

$$0 \le R\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) := \frac{1}{\lambda(\lambda - 1)} \log H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) \quad \text{with } \log 0 = -\infty \tag{7}$$

can be found e.g., in Liese & Vajda [1,10], Jacod & Shiryaev [24], van Erven & Harremoes [20] (as a side remark, *R*1/2 (*P*A||*P*H) is also known as (multiple of) *Bhattacharyya distance*). For instance, the integrals in (3) and (4) do not depend on the choice of *µ*. Furthermore, one has the skew symmetries

$$H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) = H\_{1-\lambda} \left( P\_{\mathcal{H}} || P\_{\mathcal{A}} \right) \; \; \; \text{as well as} \qquad I\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) = I\_{1-\lambda} \left( P\_{\mathcal{H}} || P\_{\mathcal{A}} \right) \; \; \text{ans}$$

for all *λ* ∈ R (see e.g., Liese & Vajda [1]). As far as finiteness is concerned, for *λ* ∈]0, 1[ one gets the rudimentary bounds

$$0 < H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) \le 1 \, \quad \text{and equivalently,} \tag{9}$$

$$0 \le I\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) = \frac{1 - H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right)}{\lambda (1 - \lambda)} < \frac{1}{\lambda (1 - \lambda)} \, \, \, \tag{10}$$

where the lower bound in (10) (upper bound in (9)) is achieved iff *P*<sup>A</sup> = *P*H. For *λ* ∈ R\]0, 1[, one gets the bounds

$$0 \le I\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) \le \infty, \qquad \text{and equivalently,} \qquad 1 \le H\_{\lambda} \left( P\_{\mathcal{A}} || P\_{\mathcal{H}} \right) \le \infty \,\tag{11}$$

where, in contrast to above, both the lower bound of *H<sup>λ</sup>* (*P*A||*P*H) and the lower bound of *I<sup>λ</sup>* (*P*A||*P*H) is achieved iff *P*<sup>A</sup> = *P*H; however, the power divergence *I<sup>λ</sup>* (*P*A||*P*H) and Hellinger integral *H<sup>λ</sup>* (*P*A||*P*H) might be infinite, depending on the particular setup.

The Hellinger integrals can be also used for bounds of the well-known *total variation*

$$0 \le \operatorname{V}(P\_{\mathcal{A}}||P\_{\mathcal{H}}) := 2 \sup\_{A \in \mathcal{F}} \left\{ P\_{\mathcal{A}}(A) - P\_{\mathcal{H}}(A) \right\} \\ = \int\_{\Omega} |p\_{\mathcal{A}} - p\_{\mathcal{H}}| \, d\mu \, \mathcal{A}$$

with *p*<sup>A</sup> and *p*<sup>H</sup> defined in (5). Certainly, the total variation is one of the best known statistical distances, see e.g., Le Cam [109]. For arbitrary *λ* ∈]0, 1[ there holds (cf. Liese & Vajda [1])

$$1 - \frac{V(P\_{\mathcal{A}}||P\_{\mathcal{H}})}{2} \le H\_{\mathbb{A}}(P\_{\mathcal{A}}||P\_{\mathcal{H}}) \le \left(1 + \frac{V(P\_{\mathcal{A}}||P\_{\mathcal{H}})}{2}\right)^{\max\{\lambda, 1-\lambda\}} \left(1 - \frac{V(P\_{\mathcal{A}}||P\_{\mathcal{H}})}{2}\right)^{\min\{\lambda, 1-\lambda\}}$$

From this together with the particular choice *λ* = <sup>1</sup> 2 , we can derive the fundamental universal bounds

$$2\left(1 - H\_{\frac{1}{2}}(P\_{\mathcal{A}}||P\_{\mathcal{H}})\right) \le V(P\_{\mathcal{A}}||P\_{\mathcal{H}}) \le 2\sqrt{1 - \left(H\_{\frac{1}{2}}(P\_{\mathcal{A}}||P\_{\mathcal{H}})\right)^2}.\tag{12}$$

We apply these concepts to our setup of Section 2.1 with two competing models (H) and (A) of Galton-Watson processes with immigration, where one can take Ω ⊂ N N0 0 to be the space of all paths of (*Xn*)*n*∈N. More detailed, in terms of the extinction set B := {*τ* < ∞} and the parameter-set notation (PS1) to (PS3), it is known that for PSP the two laws *P*<sup>H</sup> and *P*<sup>A</sup> are equivalent, whereas for PNI the two restrictions *<sup>P</sup>*H|<sup>B</sup> and *<sup>P</sup>*A|<sup>B</sup> are equivalent (see e.g., Lemma 1.1.3 of Guttorp [52]); with a slight abuse of notation we shall henceforth omit <sup>|</sup><sup>B</sup> . Consistently, for fixed time *<sup>n</sup>* <sup>∈</sup> <sup>N</sup><sup>0</sup> we introduce *<sup>P</sup>*A,*<sup>n</sup>* :<sup>=</sup> *<sup>P</sup>*A|F*<sup>n</sup>* and *<sup>P</sup>*H,*<sup>n</sup>* :<sup>=</sup> *<sup>P</sup>*H|F*<sup>n</sup>* as well as the corresponding Radon-Nikodym-derivative (likelihood ratio)

$$Z\_{\mathbb{M}} := \left. \frac{\mathrm{d}P\_{\mathcal{A},\mathbb{M}}}{\mathrm{d}P\_{\mathcal{H},\mathbb{M}}} \right| \tag{13}$$

.

where (F*n*)*n*∈<sup>N</sup> denotes the corresponding canonical filtration generated by *X* := (*Xn*)*n*∈N; in other words, F*<sup>n</sup>* reflects the "process-intrinsic" information known at stage *n*. Clearly, *Z*<sup>0</sup> = 1. By choosing the reference measure *µ* = *P*H,*<sup>n</sup>* one obtains from (4) the Hellinger integral *H<sup>λ</sup>* (*P*A,0||*P*H,0) = 1, as well as and for all *n* ∈ N

$$H\_{\lambda} \left( P\_{\mathcal{A}, \mathfrak{n}} || P\_{\mathcal{H}, \mathfrak{n}} \right) \; = \; \; EP\_{\mathcal{H}, \mathfrak{n}} \left[ \left( Z\_{\mathfrak{n}} \right)^{\lambda} \right] , \tag{14}$$

$$I\left(P\_{\mathcal{A},\mathfrak{n}}||P\_{\mathcal{H},\mathfrak{n}}\right) = \operatorname{EP}\_{\mathcal{A},\mathfrak{n}}\left[\log Z\_{\mathfrak{n}}\right],\tag{15}$$

from which one can immediately build *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) (*λ* ∈ R) respectively *R<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) (*λ* ∈ R\{0, 1}) respectively bounds of *V* (*P*A,*n*||*P*H,*n*) via (2) respectively (7) respectively (12).

The outcoming values (respectively bounds) of *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) are quite diverse and depend on the choice of the involved parameter pairs (*β*H, *α*H), (*β*A, *α*A) as well as *λ*; the exact details will be given in the Sections 3 and 6 below.

Before we achieve this, in the following we explain how the outcoming dissimilarity results can be applied to Bayesian testing and more general Bayesian decision making, as well as to Neyman-Pearson testing.

#### *2.5. Decision Making under Uncertainty*

Within the above-mentioned context of two competing models (H) and (A) of Galton-Watson processes with immigration, let us briefly discuss how knowledge about the time-evolution of the Hellinger integrals *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*)–or equivalently, of the power divergences *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*), cf. (2)—can be used in order to take decisions under uncertainty, within a framework of Bayesian decision making BDM, or alternatively, of Neyman-Pearson testing NPT.

In our context of BDM, we decide between an action *d*<sup>H</sup> "associated with" the (say) hypothesis law *P*<sup>H</sup> and an action *d*<sup>A</sup> "associated with" the (say) alternative law *P*A, based on the sample path observation X*<sup>n</sup>* := {*X<sup>l</sup>* : *l* ∈ {0, 1, . . . , *n*} } of the GWI-generation-sizes (e.g., infectious-disease incidences, cf. Section 2.3) up to observation horizon *n* ∈ N. Following the lines of Stummer & Vajda [15] (adapted to our branching process context), for our BDM let us consider as admissible decision rules *δ<sup>n</sup>* : Ω*<sup>n</sup>* 7→ {*d*H, *d*A} the ones generated by all path sets *G<sup>n</sup>* ∈ Ω*<sup>n</sup>* (where Ω*<sup>n</sup>* denotes the space of all possible paths of (*X<sup>k</sup>* )*k*∈{1,...,*n*} ) through

$$\delta\_{\mathfrak{n}}(\mathcal{X}\_{\mathfrak{n}}) := \delta\_{G\_{\mathfrak{n}}}(\mathcal{X}\_{\mathfrak{n}}) \quad := \begin{cases} \displaystyle d\_{\mathcal{A}\prime} & \text{if } \mathcal{X}\_{\mathfrak{n}} \in \mathcal{G}\_{\mathfrak{n}\prime} \\\displaystyle d\_{\mathcal{H}\prime} & \text{if } \mathcal{X}\_{\mathfrak{n}} \notin \mathcal{G}\_{\mathfrak{n}\prime} \end{cases}$$

as well as loss functions of the form

$$
\begin{pmatrix}
\mathcal{L}(d\_{\mathcal{H}}, \mathcal{H}) & \mathcal{L}(d\_{\mathcal{H}}, \mathcal{A}) \\
\mathcal{L}(d\_{\mathcal{A}}, \mathcal{H}) & \mathcal{L}(d\_{\mathcal{A}'}, \mathcal{A})
\end{pmatrix} := \begin{pmatrix}
0 & L\_{\mathcal{A}} \\
L\_{\mathcal{H}} & 0
\end{pmatrix} \tag{16}
$$

with pregiven constants *L*<sup>A</sup> > 0, *L*<sup>H</sup> > 0 (e.g., arising as bounds from quantities in worst-case scenarios); notice that in (16), *d*<sup>H</sup> is assumed to be a zero-loss action under H and *d*<sup>A</sup> a zero-loss action under A. Per definition, the *Bayes decision rule δGn*,min minimizes–over *Gn*—the *mean decision loss*

$$\begin{split} \mathcal{L}(\delta\_{\mathrm{G}\_{\mathrm{H}}}) &:= \left. p\_{\mathcal{H}}^{\mathrm{prior}} \cdot \mathcal{L}\_{\mathcal{H}} \cdot \mathrm{Pr} \left( \delta\_{\mathrm{G}\_{\mathrm{H}}}(\mathcal{X}\_{\mathrm{n}}) = d\_{\mathcal{A}} \Big| \mathcal{H} \right) + \left. p\_{\mathcal{A}}^{\mathrm{prior}} \cdot \mathcal{L}\_{\mathcal{A}} \cdot \mathrm{Pr} \left( \delta\_{\mathrm{G}\_{\mathrm{n}}}(\mathcal{X}\_{\mathrm{n}}) = d\_{\mathcal{H}} \Big| \mathcal{A} \right) \right. \\ &= \left. \left. p\_{\mathcal{H}}^{\mathrm{prior}} \cdot \mathcal{L}\_{\mathcal{H}} \cdot \mathcal{P}\_{\mathrm{H},\mathrm{n}}(\mathcal{G}\_{\mathrm{n}}) + \left. p\_{\mathcal{A}}^{\mathrm{prior}} \cdot \mathcal{L}\_{\mathcal{A}} \cdot \mathcal{P}\_{\mathcal{A},\mathrm{n}}(\Omega\_{\mathrm{n}} - \mathcal{G}\_{\mathrm{n}}) \right. \end{split} \right. \end{split} \tag{17}$$

for given prior probabilities *p* prior <sup>H</sup> <sup>=</sup> *Pr*(H) <sup>∈</sup>]0, 1[ for <sup>H</sup> and *<sup>p</sup>* prior A := *Pr*(A) = 1 − *p* prior H for A. As a side remark let us mention that, in a certain sense, the involved model (parameter) uncertainty expressed by the "superordinate" Bernoulli-type law *Pr* = *Bin*(1, *p* prior H ) can also be reinterpreted as a rudimentary static random environment caused e.g., by a random Bernoulli-type external static force. By straightforward calculations, one gets with (13) the minimizing path set *Gn*,min = *Z<sup>n</sup>* ≥ *p* prior H *L*H *p* prior A *L*A leading to the *minimal mean decision loss*, i.e., the *Bayes risk*,

$$\mathcal{R}\_{\mathfrak{N}} := \min\_{\mathbb{G}\_{\mathfrak{N}}} \mathcal{L}(\delta\_{\mathbb{G}\_{\mathfrak{n}}}) = \mathcal{L}(\delta\_{\mathbb{G}\_{\mathfrak{n},\min}}) \quad = \int\_{\Omega\_{\mathfrak{n}}} \min \left\{ p\_{\mathcal{H}}^{\text{prior}} L\_{\mathcal{H}}, p\_{\mathcal{A}}^{\text{prior}} L\_{\mathcal{A}} Z\_{\mathfrak{n}} \right\} dP\_{\mathfrak{H},\mathfrak{n}} \,. \tag{18}$$

Notice that—by straightforward standard arguments—the *alternative* decision procedure

take action *d*<sup>A</sup> (resp. *d*H) if *L*<sup>H</sup> · *p* post H (X*n*) ≤ (resp. >) *L*<sup>A</sup> · *p* post A (X*n*)

with posterior probabilities *p* post H (X*n*) := *p* prior H (1−*p* prior H )·*Zn*(X*n*) + *p* prior H =: 1 − *p* post A (X*n*), leads exactly to the same actions as *δGn*,min . By adapting the Lemma 6.5 of Stummer & Vajda [15]—which on general probability spaces gives *fundamental universal* inequalities relating Hellinger integrals (or equivalently, power divergences) and Bayes risks—one gets for all *L*<sup>H</sup> > 0, *L*<sup>A</sup> > 0, *p* prior H ∈]0, 1[, *λ* ∈]0, 1[ and *n* ∈ N the upper bound

$$\mathcal{R}\_{\hbar} \le \Lambda\_{\mathcal{A}}^{\lambda} \Lambda\_{\mathcal{H}}^{1-\lambda} H\_{\lambda} \left( P\_{\mathcal{A},n} || P\_{\mathcal{H},n} \right) \quad , \qquad \text{with } \Lambda\_{\mathcal{H}} := p\_{\mathcal{H}}^{prior} L\_{\mathcal{H}} \; \Lambda\_{\mathcal{A}} := (1 - p\_{\mathcal{H}}^{prior}) L\_{\mathcal{A}} \tag{19}$$

as well as the lower bound

$$\left( (\mathcal{R}\_{\mathfrak{n}})^{\min\{\lambda, 1-\lambda\}} \cdot (\Lambda\_{\mathcal{H}} + \Lambda\_{\mathcal{A}} - \mathcal{R}\_{\mathfrak{n}})^{\max\{\lambda, 1-\lambda\}} \geq \Lambda\_{\mathcal{A}}^{\lambda} \Lambda\_{\mathcal{H}}^{1-\lambda} H\_{\lambda} \left( P\_{\mathcal{A}, \mathfrak{n}} || P\_{\mathcal{H}, \mathfrak{n}} \right) \right)$$

which implies in particular the "direct" lower bound

$$\mathcal{R}\_{\mathfrak{n}} \ge \frac{\Lambda\_{\mathcal{A}}^{\max\{1, \frac{\lambda}{1-\lambda}\}} \Lambda\_{\mathcal{H}}^{\max\{1, \frac{1-\lambda}{\lambda}\}}}{(\Lambda\_{\mathcal{A}} + \Lambda\_{\mathcal{H}})^{\max\{\frac{\lambda}{1-\lambda}, \frac{1-\lambda}{\lambda}\}}} \cdot (H\_{\lambda} \left(P\_{\mathcal{A},\mathfrak{n}} || P\_{\mathcal{H},\mathfrak{n}}\right))^{\max\{\frac{1}{\lambda}, \frac{1}{1-\lambda}\}} \,. \tag{20}$$

By using (19) (respectively (20)) together with the exact values and the upper (respectively lower) bounds of the Hellinger integrals *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) derived in the following sections, we end up with upper (respectively lower) bounds of the Bayes risk R*n*. Of course, with the help of (2) the bounds (19) and (20) can be (i) immediately rewritten in terms of the power divergences *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) and (ii) thus be *directly* interpreted in terms of dissimilarity-size arguments. As a side-remark, in such a Bayesian context the *λ*−order Hellinger integral *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) = *EP*H,*<sup>n</sup>* - (*Zn*) *λ* (cf. (14)) can be also interpreted as *λ*−order Bayes-factor moment (with respect to *P*H,*n*), since *Z<sup>n</sup>* = *Zn*(X*n*) = *p* post A (X*n*) *p* post H (X*n*) *p* prior A *p* prior H is the Bayes factor (i.e., the posterior odds ratio of (A) to (H), divided by the prior odds ratio of (A) to (H)).

At this point, the potential applicant should be warned about the *usual way of* asynchronous decision making, where one first *tests* (A) versus (H) (i.e., *L*<sup>A</sup> = *L*<sup>H</sup> = 1 which leads to 0–1 losses in (16)) and afterwards, based on the outcoming result (e.g., in favour of (A)), takes the attached economic decision (e.g., *d*A); this can lead to distortions compared with synchronous decision making with "full" monetary losses *L*<sup>A</sup> and *L*H, as is shown in Stummer & Lao [16] within an economic context in connection with discrete approximations of financial diffusion processes (they call this distortion effect a *non-commutativity between Bayesian statistical and investment decisions*).

For different types of–mainly parameter estimation (squared-error type loss function) concerning—Bayesian analyses based on GW(I) generation size observations, see e.g., Jagers [56], Heyde [38], Heyde & Johnstone [110], Johnson et al. [111], Basawa & Rao [60], Basawa & Scott [61], Scott [112], Guttorp [52], Yanev & Tsokos [113], Mendoza & Gutierrez-Pena [114], and the references therein.

Within our running-example epidemiological context of Section 2.3, let us briefly discuss the role of the above-mentioned losses *L*<sup>A</sup> and *L*H. To begin with, as mentioned above the *unit-free* choice *L*<sup>A</sup> = *L*<sup>H</sup> = 1 corresponds to *Bayesian testing*. Recall that this concerns with two alternative infectious-disease models (H) and (A) with parameter pairs (recall the interpretation of *β*• as reproduction number and *α*• as importation mean) (*β*H, *α*H) and (*β*A, *α*A) which reflect either a "pure" statistical uncertainty (under the *same* uncontrolled or controlled set-up), or the uncertainty between two *different* potential control set-ups (for the sake of assessing the potential impact/efficiency of some planned interventions, compared with alternative ones). As far as *non-unit-free*–e.g., macroeconomic or monetary–losses is concerned, recall that some of the above-mentioned control strategies (countermeasures, public policies, governmental pandemic risk management plans) may imply considerable social and economic costs, with a huge impact and potential danger of triggering severe social, economic and political disruptions; a corresponding tradeoff between health and economic issues can be incorporated by choosing *L*<sup>A</sup> and *L*<sup>H</sup> to be (e.g., monetary) values which reflect estimates or upper bounds of losses due to wrong decisions, e.g., if at stage *n* due to the observed data one erroneously thinks (reinforced by fear) that a novel infectious disease (e.g., COVID-19) will lead (or re-emerge) to a severe pandemic and consequently decides for a lockdown with drastic future economic consequences, versus, if one erroneously thinks (reinforced by carelessness) that the infectious disease is (or stays) non-severe and consequently eases some/all control measures which will lead to extremely devastating future economic consequences. For the estimates/bounds of *L*<sup>A</sup> and *L*H, one can e.g., employ (i) the comprehensive stochastic studies of Feicht & Stummer [115] on the

quantitative degree of elasticity and speed of recovery of economies after a sudden macroeconomic disaster, or (ii) the more short-term, German-specific, scenario-type (basically non-stochastic) studies of Dorn et al. [116,117] in connection with the current COVID-19 pandemic.

Of course, the above-mentioned Bayesian decision procedure can be also operated in *sequential way*. For instance, suppose that we are encountered with a novel infectious disease (e.g., COVID-19) of non-negligible fatality rate and let (A) reflect a "potentially dangerous" infectious-disease-transmission situation (e.g., a reproduction number of substantially supercritical case *β*<sup>A</sup> = 2, and an importation mean of *α*<sup>A</sup> = 10, for *weekly* appearing new incidence-generations) whereas (H) describes a "relatively harmless/mild" situation (e.g., a substantially subcritical *β*<sup>H</sup> = 0.5, *α*<sup>H</sup> = 0.2). Moreover, let *d*<sup>A</sup> respectively *d*<sup>H</sup> denote (non-quantitatively) the decision/action to accept (A) respectively (H). It can then be reasonable to decide to stop the observation process *n* 7→ X*<sup>n</sup>* (also called *surveillance* or *online-monitoring*) of incidence numbers at the first time at which *n* 7→ *Z<sup>n</sup>* = *Zn*(X*n*) exceeds the threshold *p* prior H /*p* prior A ; if this happens, one takes *d*<sup>A</sup> as decision (and e.g., declare the situation as *occurrence of an epidemic outbreak* and start with control/intervention measures (however, as explained above, one should synchronously involve also the potential economic losses)) whereas as long as this does not happen, one continues the observation (and implicitly takes *d*<sup>H</sup> as decision). This can be modelled in terms of the pair (*τ*e, *<sup>d</sup>*A) with (random) stopping time *<sup>τ</sup>*<sup>e</sup> :<sup>=</sup> inf *n* ∈ N : *Z<sup>n</sup>* ≥ *p* prior H *p* prior A (with the usual convention that the infimum of the empty set is infinity), and the corresponding decision *<sup>d</sup>*A. After the time *<sup>τ</sup>*<sup>e</sup> <sup>&</sup>lt; <sup>∞</sup> and e.g., immediate subsequent employment of some control/counter measures, one can e.g., take the old model (A) as new (H), declare a new target (A) for the desired quantification of the effectiveness of the employed control measures (e.g., a mitigation to a slightly subcritical case of *β*<sup>A</sup> = 0.95, *α*<sup>H</sup> = 0.8), and starts to observe the new incidence numbers until the new target (A) has been reached. This can be interpreted as online-detection of a distributional change; a related comprehensive new framework for the use of divergences (even much beyond power divergences) for distributional change detection can be found e.g., in the recent work of Kißlinger & Stummer [118]. A completely different, SIR-model based, approach for the detection of change points in the spread of COVID-19 is given in Dehning et al. [119]. Moreover, other different surveillance methods can be also found e.g., in the corresponding overview of Frisen [120] and the Swedish epidemics outbreak investigations of Friesen & Andersson & Schiöler [121].

One can refine the above-mentioned sequential procedure via two (instead of one) appropriate thresholds *c*<sup>1</sup> < *c*<sup>2</sup> and the pair (*τ*˘, *δτ*˘), with the stopping time *τ*˘ := inf *n* ∈ N : *Z<sup>n</sup>* ∈/ [*c*1, *c*2] as well as corresponding decision rule

$$\delta\_{\tilde{\tau}} \quad := \begin{cases} d\_{\mathcal{A}\prime} & \text{if } Z\_{\tilde{\tau}} > c\_{2\prime}, \\ d\_{\mathcal{H}\prime} & \text{if } Z\_{\tilde{\tau}} < c\_{1\prime}. \end{cases}$$

An exact optimized treatment on the two above-mentioned sequential procedures, and their connection to Hellinger integrals (and power divergences) of Galton-Watson processes with immigration, is beyond the scope of this paper.

As a side remark, let us mention that our above-mentioned suggested method of Bayesian decision making with Hellinger integrals of GWIs differs completely from the very recent work of Brauner et al. [122] who use a Bayesian hierarchical model for the concrete, very comprehensive study on the effectiveness and burden of non-pharmaceutical interventions against COVID-19 transmission.

The power divergences *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) (*λ* ∈ R) can be employed also in other ways within Bayesian decision making, of statistical nature. Namely, by adapting the general lines of Österreicher & Vajda [123] (see also Liese & Vajda [10], as well as diffusion-process applications in Stummer [5,31,32]) to our context of Galton-Watson processes with immigration, we can proceed as follows. For the sake of comfortable notations, we first attach the value *θ* := 1 to the GWI model (A) (which has prior probability *p* prior A ∈ ]0, 1[) and *θ* := 0 to (H) (which has prior probability 1 − *p* prior A ). Suppose we

want to decide, in an optimal Bayesian way, which *degree of evidence* deg ∈ [0, 1] we should attribute (according to a pregiven *loss function* LO) to the model (A). In order to achieve this goal, we choose a nonnegatively-valued loss function LO(*θ*,deg) defined on {0, 1} × [0, 1], of two types which will be specified below. The risk at stage 0 (i.e., prior to the GWI-path observations X*n*), from the optimal decision about the degree of evidence deg concerning the decision parameter *θ*, is defined as

$$\mathcal{BR}\_{\mathcal{L}\mathcal{O}}\left(p\_{\mathcal{A}}^{\mathrm{prior}}\right) \coloneqq \min\_{\mathsf{Key}\in[0,1]} \left\{ \left(1 - p\_{\mathcal{A}}^{\mathrm{prior}}\right) \cdot \mathcal{L}\mathcal{O}(0,\mathsf{key}) \; + \; p\_{\mathcal{A}}^{\mathrm{prior}} \cdot \mathcal{L}\mathcal{O}(1,\mathsf{key}) \; \right\},$$

which can be thus interpreted as a *minimal prior expected loss* (the minimum will always exist). The corresponding risk *posterior* to the GWI-path observations X*n*, from the optimal decision about the degree of evidence deg concerning the parameter *θ*, is given by

$$\mathcal{BR}\_{\mathcal{L}\mathcal{O}}^{\mathrm{post}} \left( p\_{\mathcal{A}}^{\mathrm{prior}} \right) := \int\_{\Omega\_{\mathrm{n}}} \mathcal{BR}\_{\mathcal{L}\mathcal{O}} \left( p\_{\mathcal{A}}^{\mathrm{post}} (\mathcal{X}\_{\mathbb{n}}) \right) \left( p\_{\mathcal{A}}^{\mathrm{prior}} \, dP\_{\mathcal{A},\mathbb{n}} + (1 - p\_{\mathcal{A}}^{\mathrm{prior}}) \, dP\_{\mathcal{H},\mathbb{n}} \right) \, d\mathbb{n}$$

which is achieved by the optimal decision rule (about the degree of evidence)

$$\mathfrak{D}^\*(\mathcal{X}\_\mathbb{n}) \coloneqq \arg\min\_{\mathsf{lchg} \in [0,1]} \left\{ \left(1 - p\_{\mathcal{A}}^{\mathrm{post}}(\mathcal{X}\_\mathbb{n})\right) \cdot \mathcal{L}\mathcal{O}(\mathsf{0},\mathsf{lchg}) + p\_{\mathcal{A}}^{\mathrm{post}}(\mathcal{X}\_\mathbb{n}) \cdot \mathcal{L}\mathcal{O}(\mathsf{1},\mathsf{lchg}) \right\}.$$

The corresponding *statistical information measure* (in the sense of De Groot [124])

$$\Delta \mathcal{B} \mathcal{R}\_{\mathcal{L}\mathcal{O}} \left( p\_{\mathcal{A}}^{\text{prior}} \right) := \mathcal{B} \mathcal{R}\_{\mathcal{L}\mathcal{O}} \left( p\_{\mathcal{A}}^{\text{prior}} \right) - \mathcal{B} \mathcal{R}\_{\mathcal{L}\mathcal{O}}^{\text{post}} \left( p\_{\mathcal{A}}^{\text{prior}} \right) \ge 0$$

represents the *reduction of the decision risk* about the degree of evidence deg concerning the parameter *θ*, that can be attained by observing the GWI-path X*<sup>n</sup>* until stage *n*. For the first-type loss function LOg(*θ*,deg) := deg − (<sup>2</sup> deg − <sup>1</sup>) · **<sup>1</sup>**{1} (*θ*), defined on {0, 1} × [0, 1] with the help of the indicator function **1***A*(.) on the set *A*, one can show that

$$\mathfrak{P}^\*(\mathcal{X}\_\mathbb{n}) \ := \begin{cases} 0, & \text{if } p^{\text{post}}\_{\mathcal{A}}(\mathcal{X}\_\mathbb{n}) \in [0, \frac{1}{2}], \\ 1, & \text{if } p^{\text{post}}\_{\mathcal{A}}(\mathcal{X}\_\mathbb{n}) \in [\frac{1}{2}, 1[\ \mathsf{X}\_\mathbb{n}]) \\ \text{any number in } [0, 1], & \text{if } p^{\text{post}}\_{\mathcal{A}}(\mathcal{X}\_\mathbb{n}) = \frac{1}{2} \end{cases}$$

as well as the representation formula

$$I\_{\lambda} \left( P\_{\mathcal{A}, \mathfrak{u}} || P\_{\mathcal{H}, \mathfrak{u}} \right) = \int\_{0}^{1} \Delta \mathcal{B} \mathcal{R}\_{\widetilde{\mathcal{L}\mathcal{O}}} \left( p\_{\mathcal{A}}^{\mathrm{prior}} \right) \cdot \left( 1 - p\_{\mathcal{A}}^{\mathrm{prior}} \right)^{\lambda - 2} \cdot \left( p\_{\mathcal{A}}^{\mathrm{prior}} \right)^{-1 - \lambda} \mathrm{d}p\_{\mathcal{A}}^{\mathrm{prior}} \, \qquad \lambda \in \mathbb{R} \,, \tag{21}$$

(cf. Österreicher & Vajda [123], Liese & Vajda [10], adapted to our GWI context); in other words, the power divergence *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) can be regarded as a *weighted-average statistical information measure* (*weighted-average decision risk reduction*). One can also use other weights of *p* prior A in order to get bounds of *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) (analogously to Stummer [5]).

For the second-type loss function LO*λ*,*χ*(*θ*,deg) := *λ θ*−1 deg*<sup>λ</sup>*−*<sup>θ</sup> <sup>χ</sup><sup>λ</sup>* (1−*χ*) <sup>1</sup>−*<sup>λ</sup>* (1−*λ*) *<sup>θ</sup>* (1−deg) *<sup>λ</sup>*−*<sup>θ</sup>* defined on {0, 1} × [0, 1] with parameters *λ* ∈]0, 1[ and *χ* ∈]0, 1[, one can derive the optimal decision rule

$$\mathfrak{P}^\*(\mathcal{X}\_n) \,=\, p\_{\mathcal{A}}^{\mathrm{post}}(\mathcal{X}\_n)$$

as well as the representation formula as a *limit statistical information measure* (*limit decision risk reduction*)

$$I\_{\lambda} \left( P\_{\mathcal{A},n} || P\_{\mathcal{H},n} \right) \ = \lim\_{\chi \to p\_{\mathcal{A}}^{\text{prior}}} \Delta \mathcal{B} \mathcal{R}\_{\mathcal{L}\mathcal{O}\_{\lambda,\chi}} \left( p\_{\mathcal{A}}^{\text{prior}} \right) \ =: \Delta \mathcal{B} \mathcal{R}\_{\mathcal{L}\mathcal{O}\_{\lambda,\mathcal{P}\_{\mathcal{A}}}} \left( p\_{\mathcal{A}}^{\text{prior}} \right) \tag{22}$$

(cf. Österreicher & Vajda [123], Stummer [5], adapted to our GWI context).

As an alternative to the above-mentioned Bayesian-decision-making applications of Hellinger integrals *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*), let us now briefly discuss the use of the latter for the corresponding *Neyman-Pearson* (NPT) framework with randomized tests T*<sup>n</sup>* : Ω*<sup>n</sup>* 7→ [0, 1] of the hypothesis *P*<sup>H</sup> against the alternative *P*A, based on the GWI-generation-size sample path observations X*<sup>n</sup>* := {*X<sup>l</sup>* : *l* ∈ {0, 1, . . . , *n*} }. In contrast to (17) and (18) a Neyman-Pearson test minimizes—over T*n*–the type II error probability R Ω*n* (1 − T*n*) d*P*A,*<sup>n</sup>* in the class of the tests for which the type I error probability R Ω*n* T*<sup>n</sup>* d*P*H,*<sup>n</sup>* is at most *ς* ∈]0, 1[. The corresponding minimal type II error probability

$$\mathcal{E}\_{\xi} \left( P\_{\mathcal{A},i} || P\_{\mathcal{H},i} \right) := \inf\_{\mathcal{T}\_i : \int\_{\Omega} \mathcal{T}\_i \operatorname{\mathbf{d}} P\_{\mathcal{H},i} \le \xi} \int\_{\Omega\_i} (1 - \mathcal{T}\_i) \operatorname{\mathbf{d}} P\_{\mathcal{A},i}$$

can for all *ς* ∈]0, 1[, *λ* ∈]0, 1[, *i* ∈ I be bounded from above by

$$\mathcal{E}\_{\xi} \left( \mathbb{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathcal{H},i} \right) \leq \mathcal{E}\_{\xi}^{\mathrm{II}} \left( \mathbb{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathcal{H},i} \right) \\ := \min \left\{ \left( 1 - \lambda \right) \cdot \left( \frac{\lambda}{\xi} \right)^{\lambda/(1-\lambda)} \cdot \left( H\_{\lambda} \left( \mathbb{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathcal{H},i} \right) \right)^{1/(1-\lambda)}, 1 \right\}, \tag{23}$$

and for all *<sup>λ</sup>* > <sup>1</sup>, *<sup>i</sup>* ∈ I it can be bounded from below by

$$\mathcal{E}\_{\xi} \left( \mathcal{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathsf{H},i} \right) \; \geq \; \mathcal{E}\_{\xi}^{L} \left( \mathcal{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathsf{H},i} \right) \; := \; (1 - \varsigma)^{\lambda/(\lambda - 1)} \cdot \left( H\_{\lambda} \left( \mathcal{P}\_{\mathcal{A},i} || \mathcal{P}\_{\mathsf{H},i} \right) \right)^{1/(1 - \lambda)},\tag{24}$$

which is an adaption of a general result of Krafft & Plachky [125], see also Liese & Vajda [1] as well as Stummer & Vajda [15]. Hence, by combining (23) and (24) with the exact values respectively upper bounds of the Hellinger integrals *H*1−*<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) from the following sections, we obtain for our context of Galton-Watson processes with Poisson offspring and Poisson immigration (including the non-immigration case) some upper bounds of E*<sup>ς</sup>* (*P*A,*n*||*P*H,*n*), which can also be immediately rewritten as lower bounds for the power 1 − E*<sup>ς</sup>* (*P*A,*n*||*P*H,*n*) of a most powerful test at level *ς*. In contrast to such finite-time-horizon results, for the (to our context) incompatible setup of Galton-Watson processes with Poisson offspring but nonstochastic immigration of constant value 1, the asymptotic rates of decrease as *n* → ∞ of the unconstrained type II error probabilities as well as the type I error probabilites were studied in Linkov & Lunyova [53] by a different approach employing also Hellinger integrals. Some other types of Galton-Watson-process concerning Neyman-Pearson testing investigations different to ours can be found e.g., in Basawa & Scott [126], Feigin [127], Sweeting [128], Basawa & Scott [61], and the references therein.

#### *2.6. Asymptotical Distinguishability*

The next two concepts deal with two general families (*P*A,*i*) *<sup>i</sup>*∈I and (*P*H,*i*) *<sup>i</sup>*∈I of probability measures on the measurable spaces (Ω*<sup>i</sup>* , <sup>F</sup>*i*)*i*∈I , where the index set <sup>I</sup> is either <sup>N</sup><sup>0</sup> or <sup>R</sup>+. For them, the following two general types of asymptotical distinguishability are well known (see e.g., LeCam [109], Liese & Vajda [1], Jacod & Shiryaev [24], Linkov [129], and the references therein).

**Definition 1.** *The family* (*P*A,*i*)*i*∈I *is contiguous to the family* (*P*H,*i*)*i*∈I *– in symbols,* (*P*A,*i*) / (*P*H,*i*)*– if for all sets A<sup>i</sup>* ∈ F*<sup>i</sup> with* lim*i*→<sup>∞</sup> *P*H,*i*(*Ai*) = 0 *there holds* lim*i*→<sup>∞</sup> *P*A,*i*(*Ai*) = 0*.*

**Definition 2.** *Families of measures* (*P*A,*i*)*i*∈I *and* (*P*H,*i*)*i*∈I *are called entirely separated (completely asymptotically distinguishable)—in symbols,* (*P*A,*i*) 4 (*P*H,*i*)*–if there exist a sequence i<sup>m</sup>* ↑ ∞ *as m* ↑ ∞ *and for each m* ∈ N<sup>0</sup> *an Ai<sup>m</sup>* ∈ F*i<sup>m</sup> such that* lim*m*→<sup>∞</sup> *P*A,*i<sup>m</sup>* (*Ai<sup>m</sup>* ) = 1 *and* lim*m*→<sup>∞</sup> *P*H,*i<sup>m</sup>* (*Ai<sup>m</sup>* ) = 0*.*

It is clear that the notion of contiguity is the attempt to carry the concept of absolute continuity over to families of measures. Loosely speaking, (*P*A,*i*) is contiguous to (*P*H,*i*), if the limit lim*i*→∞(*P*A,*i*) (existence preconditioned) is absolute continuous to the limit lim*i*→∞(*P*H,*i*). However, for the definition

of contiguity, we do not need to require the probability measures to converge to limiting probability measures. On the other hand, entire separation is the generalization of singularity to families of measures.

The corresponding negations will be denoted by / and 4. One can easily check that a family (*P*A,*i*) cannot be both contiguous and entirely separated to a family (*P*H,*i*). In fact, as shown in Linkov [129], the relation between the families (*P*A,*i*) and (*P*H,*i*) can be uniquely classified into the following *distinguishability types*:

(a) (*P*A,*i*) / . (*P*H,*i*) ;


As demonstrated in the above-mentioned references for a general context, one can conclude the type of distinguishability from the time-evolution of Hellinger integrals. Indeed, the following assertions can be found e.g., in Linkov [129], where part (c) was established in Liese & Vajda [1] and (f), (g) in Vajda [3].

**Proposition 1.** *The following assertions are equivalent:*

$$(a) \quad (P\_{\mathcal{A},i}) \quad \triangle \ (P\_{\mathcal{H},i}) \quad \triangle$$


$$I(f) \quad \text{there exists a } \lambda \in ]0, 1[: \quad \limsup\_{i \to \infty} I\_{\lambda}(P\_{\mathcal{A},i}||P\_{\mathcal{H},i}) \ = \frac{1}{\lambda \cdot (1 - \lambda)}\text{.}$$

$$I(\mathcal{g}) \quad \limsup\_{i \to \infty} I\_{\lambda}(P\_{\mathcal{A},i} || P\_{\mathcal{H},i}) \ = \frac{1}{\lambda \cdot (1 - \lambda)} \ , \quad \text{for all } \lambda \in ]0, 1[.$$

In combination with the discussion after Definition 2, one can thus interpret the *λ*−order Hellinger integral *Hλ*(*P*A,*<sup>i</sup>* ||*P*H,*i*) as a "measure" for the distinctness of the two families *P*A,*<sup>i</sup>* and *P*H,*<sup>i</sup>* up to a fixed finite time horizon *i* ∈ I.

Furthermore, for the contiguity we obtain the equivalence (see e.g., Liese & Vajda [1], Linkov [129])

$$\begin{aligned} \left( \left( P\_{\mathcal{A},i} \right) \right) \circ \left( P\_{\mathcal{H},i} \right) \\ \iff & \qquad \liminf\_{\lambda \nearrow 1} \left\{ \liminf\_{i \to \infty} H\_{\lambda} \left( P\_{\mathcal{A},i} || P\_{\mathcal{H},i} \right) \right\} = 1 \\ \iff & \qquad \limsup\_{\lambda \nearrow 1} \left\{ \limsup\_{i \to \infty} \lambda \cdot \left( 1 - \lambda \right) \cdot I\_{\lambda} \left( P\_{\mathcal{A},i} || P\_{\mathcal{H},i} \right) \right\} = 0. \end{aligned} \tag{26}$$

All the above-mentioned general results can be applied to our context of two competing Poissonian Galton-Watson processes with immigration (GWI) (H) and (A) (reflected by the two different laws *P*<sup>H</sup> resp. *P*<sup>A</sup> with parameter pairs (*β*H, *α*H) resp. (*β*A, *α*A)), by taking *P*A,*<sup>i</sup>* :<sup>=</sup> *<sup>P</sup>*A|F*<sup>i</sup>* and *P*H,*<sup>i</sup>* :<sup>=</sup> *<sup>P</sup>*H|F*<sup>i</sup>* . Recall from the preceding subsections (by identifying *i* with *n*) that the latter two describe the stochastic dynamics of the respective GWI within the restricted time-/stage-frame {0, 1, . . . , *i*}.

In the following, we study in detail the evolution of Hellinger integrals between two competing models of Galton-Watson processes with immigration, which turns out to be quite extensive.

#### **3. Detailed Recursive Analyses of Hellinger Integrals**

#### *3.1. A First Basic Result*

In terms of our notations (PS1) to (PS3), a typical situation for applications in our mind is that one particular constellation (*β*A, *β*H, *α*A, *α*H) ∈ P (e.g., obtained from theoretical or previous statistical investigations) is fixed, whereas–in contrast–the parameter *λ* ∈ R\{0, 1} for the Hellinger integral or the power divergence might be chosen freely, e.g., depending on which (transform of a) dissimilarity measure one decides to choose for further analysis. At this point, let us emphasize that *in general* we will not make assumptions of the form *β*• T 1, i.e., upon the type of criticality.

To start with our investigations, in order to justify for all *n* ∈ N<sup>0</sup>

$$Z\_n := \frac{\mathbf{d}P\_{\mathcal{A},n}}{\mathbf{d}P\_{\mathcal{H},n}} \qquad \text{ (cf. (13)),}$$

(14) and (15) (as well as *I<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) for *λ* ∈ R respectively *R<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) for *λ* ∈ R\{0, 1}), we first mention the following straightforward facts: (i) if (*β*A, *β*H, *α*A, *α*H) ∈ PNI, then *P*A,*<sup>n</sup>* and *P*H,*<sup>n</sup>* are equivalent (i.e., *P*A,*<sup>n</sup>* ∼ *P*H,*n*), as well as (ii) if (*β*A, *β*H, *α*A, *α*H) ∈ PSP, then *P*A,*<sup>n</sup>* and *P*H,*<sup>n</sup>* are equivalent (i.e., *P*A,*<sup>n</sup>* ∼ *P*H,*n*). Moreover, by recalling *Z*<sup>0</sup> = 1 and using the "rate functions" *f*•(*x*) = *β*• *x* + *α*• (*x* ∈ [0, ∞[), a version of (13) can be easily determined by calculating for each ~*x* := (*x*0, *x*1, *x*2, · · ·) ∈ Ω := N × N<sup>0</sup> × N<sup>0</sup> × · · ·

$$Z\_{\mathfrak{n}}(\vec{\mathfrak{x}}) = \prod\_{k=1}^{\mathfrak{n}} Z\_{\mathfrak{n},k}(\vec{\mathfrak{x}}) \qquad \text{with } Z\_{\mathfrak{n},k}(\vec{\mathfrak{x}}) := \exp\left\{ - \left( f\_{\mathcal{A}}(\mathbf{x}\_{k-1}) - f\_{\mathcal{H}}(\mathbf{x}\_{k-1}) \right) \right\} \left[ \frac{f\_{\mathcal{A}}(\mathbf{x}\_{k-1})}{f\_{\mathcal{H}}(\mathbf{x}\_{k-1})} \right]^{\chi\_{\mathfrak{n}}}.$$

where for the last term we use the convention 0 0 *x* = 1 for all *x* ∈ N0. Furthermore, we define for each ~*x* ∈ Ω

$$Z\_{n,k}^{(\lambda)}(\vec{x}) := \exp\left\{-\left(\lambda f\_{\mathcal{A}}(\mathbf{x}\_{k-1}) + (1-\lambda)f\_{\mathcal{H}}(\mathbf{x}\_{k-1})\right)\right\} \frac{\left[\left(f\_{\mathcal{A}}(\mathbf{x}\_{k-1})\right)^{\lambda}\left(f\_{\mathcal{H}}(\mathbf{x}\_{k-1})\right)^{1-\lambda}\right]^{x\_k}}{\mathbf{x}\_k!} \tag{27}$$

with the convention (0) 0 0! = 1 for the last term. Accordingly, one obtains from (14) the Hellinger integral *H<sup>λ</sup>* (*P*A,0||*P*H,0) = 1, as well as for all (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ P × (R\{0, 1})

$$H\_{\lambda} \left( P\_{\mathcal{A},1} || P\_{\mathcal{H},1} \right) = \exp \left\{ \left( f\_{\mathcal{A}}(\mathbf{x}\_{0}) \right)^{\lambda} \left( f\_{\mathcal{H}}(\mathbf{x}\_{0}) \right)^{(1-\lambda)} - \left( \lambda f\_{\mathcal{A}}(\mathbf{x}\_{0}) + (1-\lambda) f\_{\mathcal{H}}(\mathbf{x}\_{0}) \right) \right\} \tag{28}$$

for *x*<sup>0</sup> = *X*<sup>0</sup> ∈ N, and for all *n* ∈ N\{1}

$$\begin{split} \, \_{\lambda}H\_{\lambda}(P\_{\mathcal{A},\mathbb{H}}||P\_{\mathcal{H},\mathbb{H}}) &= \, \_{\lambda}P\_{\mathcal{H},\mathbb{H}}\left[ (Z\_{n})^{\lambda} \right] &= \sum\_{\mathbf{x}\_{1}=0}^{\infty} \cdots \sum\_{\mathbf{x}\_{n}=0}^{\infty} \prod\_{k=1}^{n} Z\_{n,k}^{(\lambda)}(\mathbf{\bar{x}}) \\ &= \sum\_{\mathbf{x}\_{1}=0}^{\infty} \cdots \cdot \sum\_{\mathbf{x}\_{n-1}=0}^{\infty} \prod\_{k=1}^{n-1} Z\_{n,k}^{(\lambda)}(\mathbf{\bar{x}}) \cdot e^{-\left(\lambda f\_{\mathcal{A}}(\mathbf{x}\_{n-1}) + (1-\lambda)f\_{\mathcal{H}}(\mathbf{x}\_{n-1})\right)} \sum\_{\mathbf{x}\_{n}=0}^{\infty} \frac{\left[ (f\_{\mathcal{A}}(\mathbf{x}\_{n-1}))^{\lambda} \left( f\_{\mathcal{H}}(\mathbf{x}\_{n-1}) \right)^{1-\lambda} \right]^{x\_{n}}}{\mathbf{x}\_{n}!} \\ &= \sum\_{\mathbf{x}\_{1}=0}^{\infty} \cdots \cdot \sum\_{\mathbf{x}\_{n-1}=0}^{\infty} \prod\_{k=1}^{n-1} Z\_{n,k}^{(\lambda)}(\mathbf{\bar{x}}) \cdot \exp\{ (f\_{\mathcal{A}}(\mathbf{x}\_{n-1}))^{\lambda} \left( f\_{\mathcal{H}}(\mathbf{x}\_{n-1}) \right)^{1-\lambda} - (\lambda f\_{\mathcal{A}}(\mathbf{x}\_{n-1}) + (1-\lambda)f\_{\mathcal{H}}(\mathbf{x}\_{n-1})) \}. \tag{29} \end{split}$$

From (29), one can see that a crucial role for the exact calculation (respectively the derivation of bounds) of the Hellinger integral is played by the functions defined for *x* ∈ [0, ∞[

$$\phi\_{\lambda}(\mathbf{x}) := \, \phi(\mathbf{x}, \mathcal{Y}\_{\mathcal{A}} \mathcal{Y}\_{\mathcal{H}} \mathfrak{a}\_{\mathcal{A}}, \mathfrak{a}\_{\mathcal{H}} \lambda) \, := \, \phi\_{\lambda}(\mathbf{x}) - f\_{\lambda}(\mathbf{x}) \,, \tag{30}$$

*ϕλ*(*x*) := *ϕ*(*x*, *β*A, *β*H, *α*A, *α*H, *λ*) := (*f*A(*x*)) *λ* (*f*H(*x*)) 1−*λ* and (31)

$$f\_{\lambda}(\mathbf{x}) := f(\mathbf{x}, \boldsymbol{\beta}\_{\mathcal{A}}, \boldsymbol{\beta}\_{\mathcal{H}} \boldsymbol{\alpha}\_{\mathcal{A}}, \boldsymbol{\alpha}\_{\mathcal{H}} \boldsymbol{\lambda}) := \lambda f\_{\mathcal{A}}(\mathbf{x}) + (1 - \lambda) f\_{\mathcal{H}}(\mathbf{x}) = \boldsymbol{\alpha}\_{\lambda} + \boldsymbol{\beta}\_{\lambda} \mathbf{x} \,, \tag{32}$$

where we have used the *λ*-*weighted-averages*

$$\mathfrak{a}\_{\lambda} := \mathfrak{a}(\mathfrak{a}\_{\mathcal{A}}, \mathfrak{a}\_{\mathcal{H}}, \lambda) := \lambda \cdot \mathfrak{a}\_{\mathcal{A}} + (1 - \lambda) \cdot \mathfrak{a}\_{\mathcal{H}} \quad \text{and} \quad \mathfrak{f}\_{\lambda} := \mathfrak{f}(\mathfrak{f}\_{\mathcal{A}} \mathfrak{f}\_{\mathcal{H}} \lambda) := \lambda \cdot \mathfrak{f}\_{\mathcal{A}} + (1 - \lambda) \cdot \mathfrak{f}\_{\mathcal{H}}.$$

Since *λ* plays a special role, henceforth we typically use it as index and often omit (*β*A, *β*H, *α*A, *α*H). According to Lemma A1 in the Appendix A.1, it follows that for *λ* ∈]0, 1[ (respectively *λ* ∈ R\[0, 1]) one gets *φλ*(*x*) ≤ 0 (respectively *φλ*(*x*) ≥ 0) for all *x* ∈ [0, ∞[. Furthermore, in both cases there holds *φλ*(*x*) = 0 iff *f*A(*x*) = *f*H(*x*), i.e., for *x* = *x* ∗ := *α*A−*α*<sup>H</sup> *β*H−*β*<sup>A</sup> ≥ 0. This is consistent with the corresponding generally valid upper and lower bounds (cf. (9) and (11)) 0 < *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) ≤ 1 , for *λ* ∈ ]0, 1[ , 1 ≤ *H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) ≤ ∞ , for *λ* ∈ R\[0, 1] .

As a first indication for our proposed method, let us start by illuminating the simplest case *λ* ∈ R\{0, 1} and *γ* := *α*H*β*<sup>A</sup> − *α*A*β*<sup>H</sup> = 0. This means that (*β*A, *β*H, *α*A, *α*H) ∈ PNI ∪ PSP,1, where PSP,1 is the set of all (componentwise) strictly positive (*β*A, *β*H, *α*A, *α*H) with *β*<sup>A</sup> 6= *β*H, *α*<sup>A</sup> 6= *α*<sup>H</sup> and *β*A *β*H = *α*A *α*H 6= 1 ("the equal-fraction-case"). In this situation, *all* the three functions (30) to (32) are linear. Indeed,

$$
\varphi\_{\lambda}(\mathbf{x}) = \left. p\_{\lambda}^{E} + q\_{\lambda}^{E} \mathbf{x} \right| \tag{33}
$$

with *p E λ* := *α λ* A *α* 1−*λ* H and *q E λ* := *β λ* A *β* 1−*λ* H (where the index E stands for exact linearity). Clearly, *q E <sup>λ</sup>* > 0 on PNI ∪ PSP,1, as well as *p E <sup>λ</sup>* > 0 on PSP,1 and *p E <sup>λ</sup>* = 0 on PNI. Furthermore,

$$\phi\_{\lambda}(\mathfrak{x}) := r\_{\lambda}^{E} + s\_{\lambda}^{E}\mathfrak{x}$$

with *r E λ* := *p E <sup>λ</sup>* − *α<sup>λ</sup>* = *α λ* A *α* 1−*λ* <sup>H</sup> <sup>−</sup> (*λα*<sup>A</sup> + (<sup>1</sup> <sup>−</sup> *<sup>λ</sup>*)*α*H) and *<sup>s</sup> E λ* := *q E <sup>λ</sup>* − *β<sup>λ</sup>* = *β λ* A *β* 1−*λ* <sup>H</sup> <sup>−</sup> (*λβ*<sup>A</sup> + (<sup>1</sup> <sup>−</sup> *λ*)*β*H). Due to Lemma A1 one knows that on PNI ∪ PSP,1 one gets *s E <sup>λ</sup>* < 0 for *λ* ∈]0, 1[ and *s E <sup>λ</sup>* > 0 for *λ* ∈ R\[0, 1]. Furthermore, on PSP,1 one gets *r E <sup>λ</sup>* < 0 (resp. *r E <sup>λ</sup>* > 0) for *λ* ∈]0, 1[ (resp. *λ* ∈ R\[0, 1]), whereas on PNI, the no-immigration setup, we get for all *λ* ∈ R\{0, 1} *r E <sup>λ</sup>* = 0.

As it will be seen later on, such kind of linearity properties are useful for the recursive handling of the Hellinger integrals. However, only on the parameter set PNI ∪ PSP,1 the functions *ϕ<sup>λ</sup>* and *φ<sup>λ</sup>* are linear. Hence, in the general case (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ P × R\{0, 1} we aim for linear lower and upper bounds

$$\varphi\_{\lambda}^{L}(\mathbf{x}) := p\_{\lambda}^{L} + q\_{\lambda}^{L}\mathbf{x} \le \ \varphi\_{\lambda}(\mathbf{x}) \le \ \varphi\_{\lambda}^{\mathrm{II}}(\mathbf{x}) := p\_{\lambda}^{\mathrm{II}} + q\_{\lambda}^{\mathrm{II}}\mathbf{x} \tag{34}$$

*x* ∈ [0, ∞[ (ultimately, *x* ∈ N0), which by (30) and (31) leads to

$$\phi\_{\lambda}(\mathbf{x}) \begin{cases} \leq & \phi\_{\lambda}^{\mathrm{II}}(\mathbf{x}) := r\_{\lambda}^{\mathrm{II}} + s\_{\lambda}^{\mathrm{II}} \cdot \mathbf{x} := (p\_{\lambda}^{\mathrm{II}} - \boldsymbol{\alpha}\_{\lambda}) + (q\_{\lambda}^{\mathrm{II}} - \beta\_{\lambda}) \cdot \mathbf{x}, \\\\ \geq & \phi\_{\lambda}^{\mathrm{I}}(\mathbf{x}) := r\_{\lambda}^{\mathrm{I}} + s\_{\lambda}^{\mathrm{I}} \cdot \mathbf{x} := (p\_{\lambda}^{\mathrm{I}} - \boldsymbol{\alpha}\_{\lambda}) + (q\_{\lambda}^{\mathrm{I}} - \beta\_{\lambda}) \cdot \mathbf{x}, \end{cases} \tag{35}$$

*x* ∈ [0, ∞[ (ultimately, *x* ∈ N0). Of course, the involved slopes and intercepts should satisfy reasonable restrictions. Later on, we shall impose further restrictions on the involved slopes and intercepts, in order to guarantee nice properties of the general Hellinger integral bounds given in Theorem 1 below for instance, in consistency with the nonnegativity of *ϕ<sup>λ</sup>* we could require *p U <sup>λ</sup>* ≥ *p L <sup>λ</sup>* ≥ 0, *q U <sup>λ</sup>* ≥ *q L <sup>λ</sup>* <sup>≥</sup> <sup>0</sup> which nontrivially implies that these bounds possess certain monotonicity properties . For the formulation of our first assertions on Hellinger integrals, we make use of the following notation: **Definition 3.** *For all* (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ P × R\{0, 1} *and all p*, *q* ∈ R *let us define the sequences a* (*q*) *n n*∈N<sup>0</sup> *and b* (*p*,*q*) *n n*∈N<sup>0</sup> *recursively by*

$$a\_0^{(q)} := 0 \quad ; \qquad a\_n^{(q)} := \mathfrak{F}\_{\lambda}^{(q)} \left( a\_{n-1}^{(q)} \right) \; : \;= \; q \cdot e^{a\_{n-1}^{(q)}} - \beta\_{\lambda \prime} \; \; n \in \mathbb{N} \tag{36}$$

$$b\_0^{(p,q)} := 0 \quad ; \qquad b\_n^{(p,q)} := \ p \cdot e^{a\_{n-1}^{(q)}} - a\_{\lambda \nu} \cdot n \in \mathbb{N}.\tag{37}$$

Notice the interrelation *a* (*q A λ* ) <sup>1</sup> = *s A λ* and *b* (*p A λ* ,*q A λ* ) <sup>1</sup> = *r A λ* for *A* ∈ {*E*, *L*, *U*}. Clearly, for all *q* ∈ R\{0} and *p* ∈ R one has the linear interrelation

$$b\_n^{(p,q)} = \frac{p}{q} a\_n^{(q)} + \frac{p}{q} \beta\_\lambda - a\_\lambda, \ n \in \mathbb{N}.\tag{38}$$

Accordingly, we obtain fundamental Hellinger integral evaluations:

#### **Theorem 1.**

*(a) For all* (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ (P*NI* ∪ P*SP*,1) × R\{0, 1}*, all initial population sizes X*<sup>0</sup> ∈ N *and all observation horizons n* ∈ N *one can recursively compute the exact value*

$$H\_{\lambda}(P\_{\mathcal{A},n}||P\_{\mathcal{H},n}) = \exp\left\{a\_n^{(q\_{\lambda}^{\mathbb{E}})}X\_0 + \frac{\alpha\_{\mathcal{A}}}{\beta\_{\mathcal{A}}} \sum\_{k=1}^n a\_k^{(q\_{\lambda}^{\mathbb{E}})} \right\} =: V\_{\lambda, \mathcal{X}\_0, n} \tag{39}$$

*where <sup>α</sup>*<sup>A</sup> *β*A *can be equivalently replaced by <sup>α</sup>*<sup>H</sup> *β*H *. Recall that q E λ* := *β λ* A *β* 1−*λ* H *. Notice that on* P*NI* × (R\{0, 1}) *the formula* (39) *simplifies significantly, since α*<sup>A</sup> = *α*<sup>H</sup> = 0*.*

*(b) For all* (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ (P*SP*\P*SP,1*) × (R\{0, 1})*, all coefficients p L λ* , *p U λ* , *q L λ* , *q U λ* ∈ R *which satisfy* (35) *for all x* ∈ N<sup>0</sup> *and thus in particular p L <sup>λ</sup>* ≤ *p U λ* , *q L <sup>λ</sup>* ≤ *q U λ , all initial population sizes X*<sup>0</sup> ∈ N *and all observation horizons n* ∈ N *one gets the following recursive (i.e., recursively computable) bounds for the Hellinger integrals:*

$$\text{for } \lambda \in ]0, \mathbf{1}[: \quad \mathcal{B}^{L}\_{\lambda, \mathbf{X}\_{0}, \mathfrak{u}} := \, \widetilde{\mathcal{B}}^{(p^{l}\_{\lambda}, q^{L}\_{\lambda})}\_{\lambda, \mathbf{X}\_{0}, \mathfrak{u}} < \, \, H\_{\lambda}(\mathcal{P}\_{\mathcal{A}, \mathfrak{u}} || \mathcal{P}\_{\mathcal{H}, \mathfrak{u}}) \; \leq \, \, \min \left\{ \widetilde{\mathcal{B}}^{(p^{l\mathcal{I}}\_{\lambda}, q^{\mathcal{I}\lambda}\_{\lambda})}\_{\lambda, \mathbf{X}\_{0}, \mathfrak{u}}, 1 \right\} =: \, \, \mathcal{B}^{\mathcal{U}}\_{\lambda, \mathbf{X}\_{0}, \mathfrak{u}} \; \vert \; \mathbf{40} \rangle$$

$$\text{ for } \lambda \in \mathbb{R} \backslash [0, 1]: \quad \mathcal{B}^{L}\_{\lambda, \mathcal{X}\_{0}, \mathfrak{n}} := \max \left\{ \mathcal{\bar{B}}^{(p^{l}\_{\lambda}, p^{l}\_{\mathfrak{l}})}\_{\lambda, \mathcal{X}\_{0}, \mathfrak{n}}, 1 \right\} \\ \leq \ H\_{\lambda}(\mathcal{P}\_{\mathcal{A}, \mathfrak{n}} || \mathcal{P}\_{\mathcal{H}, \mathfrak{n}}) \\ < \left. \mathcal{B}^{(p^{l}\_{\lambda}, p^{l}\_{\mathfrak{l}})}\_{\lambda, \mathcal{X}\_{0}, \mathfrak{n}} \right| =: B^{L}\_{\lambda, \mathcal{X}\_{0}, \mathfrak{n}}. \tag{41}$$

*where for general λ* ∈ R\{0, 1}*, p* ∈ R, *q* ∈ R\{0} *we use the definitions*

$$\hat{B}\_{\lambda, \mathbf{X}\_0, n}^{(p, q)} := \exp \left\{ a\_n^{(q)} \cdot \mathbf{X}\_0 + \sum\_{k=1}^n b\_k^{(p, q)} \right\} \\ = \exp \left\{ a\_n^{(q)} \cdot \mathbf{X}\_0 + \frac{p}{q} \sum\_{k=1}^n a\_k^{(q)} + n \cdot \left( \frac{p}{q} \beta\_\lambda - a\_\lambda \right) \right\}, \tag{42}$$

*as well as*

$$\widetilde{B}^{(p,0)}\_{\lambda, \mathbf{X}\_0, n} := \exp \left\{ -\beta\_\lambda \cdot \mathbf{X}\_0 + \left( p \cdot e^{-\beta\_\lambda} - \alpha\_\lambda \right) \cdot n \right\}.$$

#### **Remark 1.**


*replacing the parameters* (*β*A, *β*H, *α*A, *α*H, *λ*) *with* ( ←→*β*A, ←→*β*H, ←→*α*A, ←→*α*H, ←→*<sup>λ</sup>* )*. Then, there holds* ←→*<sup>f</sup>* ←→*<sup>λ</sup>* (*x*) = *<sup>f</sup>λ*(*x*), ←→*<sup>ϕ</sup>* ←→*<sup>λ</sup>* (*x*) = *<sup>ϕ</sup>λ*(*x*) *and* ←→*<sup>φ</sup>* ←→*<sup>λ</sup>* (*x*) = *φλ*(*x*)*, and the set of (lower and upper bound) parameters p L λ* , *q L λ* , *p U λ* , *q U λ satisfying* (35) *does not change under this transformation.*


**Proof of Theorem 1.** Let us fix (*β*A, *β*H, *α*A, *α*H) ∈ P as well as *x*<sup>0</sup> := *X*<sup>0</sup> ∈ N, and start with arbitrary *λ* ∈]0, 1[. We first prove the upper bound *B U λ*,*X*0,*n* of part (b). Correspondingly, we suppose that the coefficients *p U λ* , *q U λ* satisfy (35) for all *x* ∈ N0. From (28), (30), (31), (32) and (35) one gets immediately *B U <sup>λ</sup>*,*X*0,1 in terms of the first sequence-element *a* (*q U λ* ) 1 (cf. (36)). With the help of (29) for all observation horizons *n* ∈ N\{1} we get (with the obvious shortcut for *n* = 2)

*H<sup>λ</sup>* (*P*A,*n*||*P*H,*n*) = ∞ ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−1=<sup>0</sup> *n*−1 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> *ϕλ*(*xn*−1) − *fλ*(*xn*−1) o < ∞ ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−1=<sup>0</sup> *n*−1 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> (*p U <sup>λ</sup>* − *αλ*) + (*q U <sup>λ</sup>* − *βλ*) *xn*−<sup>1</sup> o = ∞ ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−1=<sup>0</sup> *n*−1 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> *b* (*p U λ* ,*q U λ* ) <sup>1</sup> + *a* (*q U λ* ) 1 *xn*−<sup>1</sup> o <sup>=</sup> exp <sup>n</sup> *b* (*p U λ* ,*q U λ* ) 1 o <sup>∞</sup> ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−2=<sup>0</sup> *n*−2 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> exp *a* (*q U λ* ) 1 *ϕλ*(*xn*−2) − *fλ*(*xn*−2) o <sup>&</sup>lt; exp <sup>n</sup> *b* (*p U λ* ,*q U λ* ) 1 o <sup>∞</sup> ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−2=<sup>0</sup> *n*−2 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> exp *a* (*q U λ* ) 1 *p U <sup>λ</sup>* − *α<sup>λ</sup>* + exp *a* (*q U λ* ) 1 *q U <sup>λ</sup>* − *β<sup>λ</sup>* · *xn*−<sup>2</sup> o <sup>&</sup>lt; exp <sup>n</sup> *b* (*p U λ* ,*q U λ* ) 1 o <sup>∞</sup> ∑ *x*1=0 · · · ∞ ∑ *<sup>x</sup>n*−2=<sup>0</sup> *n*−2 ∏ *k*=1 *Z* (*λ*) *n*,*k* (~*x*) · exp <sup>n</sup> *b* (*p U λ* ,*q U λ* ) <sup>2</sup> + *a* (*q U λ* ) 2 *xn*−<sup>2</sup> o <sup>&</sup>lt; · · · <sup>&</sup>lt; exp <sup>n</sup> *a* (*q U λ* ) *<sup>n</sup> x*<sup>0</sup> + *n* ∑ *k*=1 *b* (*p U λ* ,*q U λ* ) *k* o . (43)

Notice that for the strictness of the above inequalities we have used the fact that *φλ*(*x*) < *φ U λ* (*x*) for some (in fact, all but at most two) *x* ∈ N<sup>0</sup> (cf. Properties 3(P19) below). Since for some admissible choices of *p U λ* , *q U λ* and some *n* ∈ N the last term in (43) can become larger than 1, one needs to take into account the cutoff-point 1 arising from (9). The lower bound *B L λ*,*X*0,*n* of part (b), as well as the exact value of part (a) follow from (29) in an analogous manner by employing *p L λ* , *q L λ* and *p E λ* , *q E λ* respectively. Furthermore, we use the fact that for (*β*A, *β*H, *α*A, *α*H, *λ*) ∈ (PNI ∪ PSP,1)×]0, 1[ one gets from (38) the relation *b* (*p E λ* ,*q E λ* ) *<sup>n</sup>* = *α*A *β*A *a* (*q E λ* ) *<sup>n</sup>* . For the sake of brevity, the corresponding straightforward details are omitted here. Although we take the minimum of the upper bound derived in (43) and 1, the inequality *B L <sup>λ</sup>*,*X*0,*<sup>n</sup>* < *B U λ*,*X*0,*n* is nevertheless valid: the reason is that for constituting a lower bound, the parameters *p L λ* , *q L <sup>λ</sup>* must fulfill either the conditions - *p L <sup>λ</sup>* − *α<sup>λ</sup>* < 0 and *q L <sup>λ</sup>* − *β<sup>λ</sup>* ≤ 0 or - *p L <sup>λ</sup>* − *α<sup>λ</sup>* ≤ 0 and *q L <sup>λ</sup>* − *β<sup>λ</sup>* < 0 (or both), which guarantees that *B L <sup>λ</sup>*,*X*0,*<sup>n</sup>* < 1. The proof for all *λ* ∈ R\[0, 1] works

*n*∈N

out completely analogous, by taking into account the generally valid lower bound *Hλ*(*P*A,*n*||*P*H,*n*) ≥ <sup>1</sup> (cf. (11)).

#### *3.2. Some Useful Facts for Deeper Analyses*

Theorem 1(b) and Remark 1(a) indicate the crucial role of the expression *B*e (*p*,*q*) *λ*,*X*0,*n* and that the choice of the quantities *p*, *q* depends on the underlying (e.g., fixed) offspring-immigration parameter constellation (*β*A, *β*H, *α*A, *α*H) as well as on the (e.g., selectable) value of *λ*, i.e., *p A <sup>λ</sup>* = *p <sup>A</sup>* (*β*A, *<sup>β</sup>*H, *<sup>α</sup>*A, *<sup>α</sup>*H, *<sup>λ</sup>*) and *<sup>q</sup> A <sup>λ</sup>* = *q <sup>A</sup>* (*β*A, *<sup>β</sup>*H, *<sup>α</sup>*A, *<sup>α</sup>*H, *<sup>λ</sup>*) with *<sup>A</sup>* ∈ {*E*, *<sup>L</sup>*, *<sup>U</sup>*}. In order to study the desired time-behaviour *n* 7→ *B*e (·,·) *λ*,*X*0,*n* of the Hellinger integral bounds resp. exact values, one therefore faces a six-dimensional (and thus highly non-obvious) detailed analysis, including the search for criteria (in addition to (35)) on good/optimal choices of *p L λ* , *q L λ* , *p U λ* , *q U λ* . Since these criteria will (almost) always imply the nonnegativity of *p A λ* , *q A λ* (*A* ∈ {*L*, *U*}) and *p E <sup>λ</sup>* ≥ 0, *q E <sup>λ</sup>* > 0 (cf. Remark 1(a)), let us first present some fundamental properties of the underlying crucial sequences *a* (*q*) *n n*∈N and *b* (*p*,*q*) *n* for *general p* ≥ 0, *q* ≥ 0.

**Properties 1.** *For all λ* ∈ R *the following holds:*

*(P1) If* <sup>0</sup> <sup>&</sup>lt; *<sup>q</sup>* <sup>&</sup>lt; *<sup>β</sup>λ, then the sequence a* (*q*) *n n*∈N *is strictly negative, strictly decreasing and converges to the unique negative solution x*(*q*) <sup>0</sup> ∈] − *βλ*, *q* − *βλ*[ *of the equation*

$$\mathfrak{F}\_{\lambda}^{(q)}(\mathfrak{x}) = q \cdot \mathfrak{e}^{\mathfrak{x}} - \mathfrak{P}\_{\lambda} = \mathfrak{x} \,. \tag{44}$$

	- *(P3a) If additionally <sup>q</sup>* <sup>≤</sup> min 1 , *e βλ*−1 *, then the sequence a* (*q*) *n n*∈N *converges to the smallest positive solution x*(*q*) <sup>0</sup> ∈]0, − log *q*] *of the Equation* (44) *.*
	- *(P3b) If additionally q* > min 1 , *e βλ*−1 *, then the sequence a* (*q*) *n n*∈N *diverges to* ∞*, faster than exponentially (i.e., there do not exist constants c*1, *<sup>c</sup>*<sup>2</sup> <sup>∈</sup> <sup>R</sup> *such that a*(*q*) *<sup>n</sup>* ≤ *e c*1+*c*2*n for all n* ∈ N*).*
	- *(P5a) If additionally p* <sup>&</sup>lt; *<sup>α</sup>λ, then b* (*p*,*q*) *n n*∈N *is strictly negative for all n* ∈ N*.*

*(P5b) If additionally p* <sup>=</sup> *<sup>α</sup>λ, then b* (*p*,*q*) *n n*∈N *is strictly negative for all n* ∈ N\{1}*.*

*(P5c) If additionally p* <sup>&</sup>gt; *<sup>α</sup>λ, then b* (*p*,*q*) *n n*∈N *is strictly positive for some (and possibly for all) n* ∈ N*.*

*(P6) If* <sup>0</sup> <sup>&</sup>lt; *<sup>q</sup>* <sup>=</sup> *<sup>β</sup>λ, then b*(*p*,*q*) *<sup>n</sup>* ≡ *p* − *αλ. (P7) If p* <sup>&</sup>gt; <sup>0</sup> *and q* <sup>&</sup>gt; max{0, *<sup>β</sup>λ*}*, then the sequence b* (*p*,*q*) *n n*∈N *is strictly increasing.*


*Moreover, in our investigations we will repeatedly make use of the function ξ* (*q*) *λ* (·) *from the definition* (36) *of a*(*q*) *<sup>n</sup> (see also* (44)*), which has the following properties:*

*(P9) For q* ∈]0, ∞[ *and all λ* ∈ R\{0, 1} *the function ξ* (*q*) *λ* (·) *is strictly increasing, strictly convex and smooth, and there holds*

$$(\text{P9a}) \qquad \qquad \qquad \qquad \xi\_{\lambda}^{(q)}(0) \begin{cases} < 0, & \text{if} & q < \beta\_{\lambda}, \\ = 0, & \text{if} & q = \beta\_{\lambda}, \\ > 0, & \text{if} & q > \beta\_{\lambda}. \end{cases}$$

$$(P\mathfrak{B}b) \qquad \lim\_{\mathbf{x}\to-\infty} \mathfrak{f}\_{\lambda}^{(q)}(\mathbf{x}) = -\mathfrak{f}\_{\lambda} \, \mathsf{A} \qquad \text{and} \qquad \lim\_{\mathbf{x}\to\infty} \mathfrak{f}\_{\lambda}^{(q)}(\mathbf{x}) = \infty \dots$$

The proof of these properties is provided in Appendix A.1. From Properties 1 (P1) to (P4) we can see, that the behaviour of the sequence *a* (*q*) *n n*∈N can be classified basically into four different types; besides the case (P2) where *a* (*q*) *<sup>n</sup>* is *constant*, the sequence can be either (i) *strictly decreasing and convergent* (e.g., for the NI case (*β*A, *β*H, *α*A, *α*H, *λ*) = (0.5, 2, 0, 0, 0.5) leading to *β<sup>λ</sup>* = *λβ*<sup>A</sup> + (1 − *λ*)*β*<sup>H</sup> = 1.25 and to *q* := *q E <sup>λ</sup>* = *β λ* A *β* 1−*λ* <sup>H</sup> <sup>=</sup> 1, cf. (33) resp. Theorem 1(a)), or (ii) *strictly increasing and convergent* (e.g., for (*β*A, *β*H, *α*A, *α*H, *λ*) = (0.5, 2, 0, 0, 1.5) leading to *β<sup>λ</sup>* = −0.25, *q* := *q E <sup>λ</sup>* = 0.25), or (iii) *strictly increasing and divergent* (e.g., for (*β*A, *β*H, *α*A, *α*H, *λ*) = (0.5, 2, 0, 0, 2.7) leading to *β<sup>λ</sup>* = −2.05, *q* := *q E λ* ≈ 0.047366). Within our running-example epidemiological context of Section 2.3, this corresponds to a "potentially dangerous" infectious-disease-transmission situation (H) (with supercritical reproduction number *β*<sup>H</sup> = 2), whereas (A) describes a "mild" situation (with "low" subcritical *β*<sup>A</sup> = 0.5).

As already mentioned before, the sequences *a* (*q*) *n n*∈N and *b* (*p*,*q*) *n n*∈N –whose behaviours for general *p* ≥ 0 and *q* ≥ 0 were described by the Properties 1–have to be evaluated at setup-dependent choices *p* = *p<sup>λ</sup>* = *p* (*β*A, *β*H, *α*A, *α*H, *λ*) and *q* = *q<sup>λ</sup>* = *q* (*β*A, *β*H, *α*A, *α*H, *λ*). Hence, for fixed (*β*A, *β*H, *α*A, *α*H), one of the questions–which arises in the course of the desired investigations of the time-behaviour of the Hellinger integral bounds (resp. exact values)–is for which *λ* ∈ R the sequence *a* (*qλ*) *n n*∈N converges. In the following, we illuminate this for the important special case *q<sup>λ</sup>* = *β λ* A *β* 1−*λ* H . Suppose at first that *β*<sup>A</sup> 6= *β*H. Properties 1 (P1) implies that for *λ* ∈]0, 1[ one has lim*n*→<sup>∞</sup> *a* (*qλ*) *<sup>n</sup>* = *x* (*qλ*) <sup>0</sup> ∈] − *βλ*, *q<sup>λ</sup>* − *βλ*[, and Lemma A1 states that *q<sup>λ</sup>* − *β<sup>λ</sup>* < 0. For *λ* ∈ R\[0, 1], there holds *<sup>q</sup><sup>λ</sup>* <sup>&</sup>gt; max{0, *<sup>β</sup>λ*}, and from (P3) one can see that *a* (*qλ*) *n n*∈N does not converge to *x* (*qλ*) 0 in general, but for *q<sup>λ</sup>* ≤ min{1,*e <sup>β</sup>λ*−1} which constitutes an implicit condition on *<sup>λ</sup>*. This can be made explicit, with the help of the auxiliary variables

$$
\lambda\_+ := \lambda\_+ (\pounds\_{\mathcal{A}} \pounds\_{\mathcal{H}}) := \left\{ \begin{aligned} \inf \left\{ \lambda \le 0 &: \operatorname{\mathcal{B}}\_{\mathcal{A}}^{\lambda} \pounds\_{\mathcal{H}}^{1-\lambda} \le \min\left\{ 1, \exp\{\lambda \beta\_{\mathcal{A}} + (1-\lambda)\beta\_{\mathcal{H}} - 1\} \right\} \right\}, \\ \qquad & \text{in case that the set is nonempty,} \\ \mathcal{O}\_{\prime} \left\{ \begin{aligned} \text{else}, \\ \sum\limits \{\lambda \ge 1 &: \beta\_{\mathcal{A}}^{\lambda} \beta\_{\mathcal{H}}^{1-\lambda} \le \min\left\{ 1, \exp\{\lambda \beta\_{\mathcal{A}} + (1-\lambda)\beta\_{\mathcal{H}} - 1\} \right\} \right\}, \\ \qquad & \text{in case that the set is nonempty,} \end{aligned} \right\}, \end{aligned} \right}
$$

For the constellation *β*<sup>A</sup> = *β*<sup>H</sup> > 0 we clearly obtain *q<sup>λ</sup>* = *β λ* A *β* 1−*λ* <sup>H</sup> <sup>=</sup> *<sup>β</sup>*<sup>A</sup> <sup>=</sup> *<sup>β</sup>*<sup>H</sup> <sup>=</sup> *<sup>β</sup>λ*. Hence, (P2) implies that the sequence *a* (*qλ*) *n n*∈N converges *for all λ* ∈ R\{0, 1} and we can set *λ*<sup>−</sup> := −∞ as well as *λ*<sup>+</sup> := ∞. Incorporating this and by adapting a result of Linkov & Lunyova [53] on *λ*−(*v*1, *v*2), *λ*+(*v*1, *v*2) for *β*<sup>A</sup> 6= *β*H, we end up with

**Lemma 1.** *(a) For all β*<sup>A</sup> > 0, *β*<sup>H</sup> > 0 *with β*<sup>A</sup> 6= *β*<sup>H</sup> *there holds*

$$
\lambda\_- = \lambda\_- (\mathfrak{f}\_{\mathcal{A}}, \mathfrak{f}\_{\mathcal{H}}) \\
= \begin{cases} 0, & \text{if } \mathfrak{f}\_{\mathcal{H}} \ge 1, \\\ \check{\lambda}\_{\prime} & \text{if } \mathfrak{f}\_{\mathcal{H}} < 1 \text{ and } \mathfrak{f}\_{\mathcal{A}} \notin [\mathfrak{f}\_{\mathcal{H}}, \mathfrak{f}\_{\mathcal{H}} z(\mathfrak{f}\_{\mathcal{H}})], \\\ -\infty, & \text{if } \mathfrak{f}\_{\mathcal{H}} < 1 \text{ and } \mathfrak{f}\_{\mathcal{A}} \in [\mathfrak{f}\_{\mathcal{H}}, \mathfrak{f}\_{\mathcal{H}} z(\mathfrak{f}\_{\mathcal{H}})], \end{cases}
$$

$$
\lambda\_+ = \lambda\_+ (\mathfrak{f}\_{\mathcal{A}} \mathfrak{f}\_{\mathcal{H}}) \\
= \begin{cases}
1, & \text{if } \mathfrak{f} \mathfrak{f}\_{\mathcal{A}} \ge 1, \\
\check{\lambda}, & \text{if } \mathfrak{f} \mathfrak{f}\_{\mathcal{A}} < 1 \text{ and } \mathfrak{f}\_{\mathcal{H}} \notin [\mathfrak{f} \mathcal{A}\_{\mathcal{A}} \mathfrak{f}\_{\mathcal{A}} z(\mathfrak{f}\_{\mathcal{A}})], \\
\quad \text{\infty}, & \text{if } \mathfrak{f} \mathfrak{f}\_{\mathcal{A}} < 1 \text{ and } \mathfrak{f}\_{\mathcal{H}} \in [\mathfrak{f} \mathcal{A}\_{\mathcal{A}} \mathfrak{f}\_{\mathcal{A}} z(\mathfrak{f}\_{\mathcal{A}})].
\end{cases}
$$

*where*

$$
\check{\lambda} \ \check{\lambda} \ \ := \check{\lambda}(\mathfrak{f}\_{\mathcal{A}}, \mathfrak{f}\_{\mathcal{H}}) \ \ := \begin{cases}
\mathfrak{f}\_{\mathcal{H}} - 1 - \log \left(\mathfrak{f}\_{\mathcal{H}}\right) \\
\mathfrak{f}\_{\mathcal{H}} - \mathfrak{f}\_{\mathcal{A}} + \log \left(\frac{\mathfrak{f}\_{\mathcal{A}}}{\mathfrak{f}\_{\mathcal{H}}}\right)
\end{cases} \ \left\{ \ < 0, \text{ if } \mathfrak{f}\_{\mathcal{H}} < 1 \text{ and } \mathfrak{f}\_{\mathcal{A}} \notin [\mathfrak{f}\_{\mathcal{H}}, \mathfrak{f}\_{\mathcal{H}} z(\mathfrak{f}\_{\mathcal{H}})].
\end{cases}$$

*Here, for fixed β* ∈]0, ∞[\{1} *we denote by z*(*β*) *the unique solution of the equation* log(*x*) − *β*(*x* − 1) = 0*, x* ∈]0, ∞[\{1}*. For β* = 1*, z*(*β*) = 1 *denotes the unique solution of* log(*x*) − (*x* − 1) = 0, *x* ∈]0, ∞[*. (b) For all β*<sup>A</sup> = *β*<sup>H</sup> > 0 *one gets λ*<sup>−</sup> = *λ*−(*β*A, *β*H) = −∞ *as well as λ*<sup>+</sup> = *λ*+(*β*A, *β*H) = ∞*. Notice that the relationship <sup>λ</sup>*˘(*β*A, *<sup>β</sup>*H) = <sup>1</sup> <sup>−</sup> *<sup>λ</sup>*˘(*β*H, *<sup>β</sup>*A) *is consistent with the skew symmetry* (8)*.*

A corresponding proof is given in Appendix A.1.

With these auxiliary basic facts in hand, let us now work out our detailed investigations of the time-behaviour *n* 7→ *Hλ*(*P*A,*n*||*P*H,*n*), where we start with the exactly treatable case (a) in Theorem 1.
