*Article* **A Well-Posed Fractional Order Cholera Model with Saturated Incidence Rate**

**Isa Abdullahi Baba 1,2 , Usa Wannasingha Humphries 2,\* and Fathalla A. Rihan 3,4**

<sup>1</sup> Department of Mathematics, Bayero University, Kano 700241, Nigeria


**Abstract:** A fractional-order cholera model in the Caputo sense is constructed. The model is an extension of the Susceptible–Infected–Recovered (SIR) epidemic model. The transmission dynamics of the disease are studied by incorporating the saturated incidence rate into the model. This is particularly important since assuming that the increase in incidence for a large number of infected individualsis equivalent to a small number of infected individualsdoes not make much sense. The positivity, boundedness, existence, and uniqueness of the solution of the model are also studied. Equilibrium solutions are computed, and their stability analyses are shown to depend on a threshold quantity, the basic reproduction ratio (*R*0). It is clearly shown that if *R*<sup>0</sup> < 1, the disease-free equilibrium is locally asymptotically stable, whereas if *R*<sup>0</sup> > 1, the endemic equilibrium exists and is locally asymptotically stable. Numerical simulations are carried out to support the analytic results and to show the significance of the fractional order from the biological point of view. Furthermore, the significance of awareness is studied in the numerical section.

**Keywords:** mathematical model; fractional order; Caputo; cholera; well-posedness; saturated incidence rate

### **1. Introduction**

Cholera is a prolific diarrheal disease that leads to death in a short period of time if treatment measures are not taken. The disease is estimated to cause about 21,000 to 143,000 deaths from the 1,300,000 to 4,000,000 cholera cases annually. This represents about 1.62% to 3.58% of the reported cases [1]. People who live in slums and refugee camps due to natural disasters, social conflicts, climate change, and economic meltdowns are the most affected. The camps often possess poor drinking water, which serves as a cholera-causing factor [2]. Many of the symptoms of cholera include vomiting, profuse rice–water stool, sunken eyes, cramps, shock, and severe dehydration. *Vibrio cholerae* carriers are those people that are exposed to an incomplete cholera-causing dose, therefore, the disease may not manifest any symptoms in their body [3]. Acute cholera leads to death in a short period that varies from hours up to three days. Exposed individuals have only half a chance of being infected with the disease if the concentration of *Vibrio cholerae* is 105 cells per milliliter. A minimum of 1 liter per day is the least daily consumption of untreated water that may cause the disease [4].

Up to 75% of *Vibrio cholerae* carriers do not show any symptoms of the disease, potentially spreading it through their feces [5]. This causes a big risk of cholera disease and outbreaks. The cholera infection falls into one of three classes: asymptomatic, mild, or severe. In a population of infected individuals, 80% have mild or moderate symptoms and only 20% develop severe watery diarrhea [6].

**Citation:** Baba, I.A.; Humphries, U.W.; Rihan, F.A. A Well-Posed Fractional Order Cholera Model with Saturated Incidence Rate. *Entropy* **2023**, *25*, 360. https://doi.org/ 10.3390/e25020360

Academic Editors: Pavel Kraikivski and António M. Lopes

Received: 1 December 2022 Revised: 11 January 2023 Accepted: 16 January 2023 Published: 15 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Mathematical models help in studying the dynamics of a given infectious disease [7–9]. The investigation of cholera models using the SIR epidemic model introduced in 1927 [10] provides a cavernous understanding of the transmission mechanisms of the disease, while other models use time-dependent coefficients [11]. This is why many researchers put more effort into constructing and analyzing mathematical models of a cholera epidemic. Cholera dynamical models involve the study of the disease transmission among humans and V. cholerae concentration in contaminated water. Most of the time, the direct and indirect pathway transmission of cholera gives rise to a basic framework for investigating the dynamics of the disease [12–18].

Due to their hereditary properties and memory description abilities, many researchers prefer to use fractional order derivatives and integrals as tools in the study of mathematical modeling. Nowadays, a fractional differential equation is used to study biological phenomenaby developing related mathematical models [19,20]. This is due to the fact that fractional calculus can be used to explain the retention and heritage properties of many materials more accurately compared to its corresponding integer-order analog [21,22]. In this paper, we use the Caputo fractional order to model the dynamics of cholera in a homogeneous setting.

The mechanism of transmission of any transmissible disease is controlled by a certain function that depends on the subpopulations of infected individuals called the incidence rate. In epidemic models, the most frequently used incidence rates are the standard incidence rate *<sup>β</sup>SI N* and the bilinear incidence rate *βSI*, where β is the contact rate, *S* stands for the susceptible population, *I* stands for the infected population, and *N* is the total population. Assuming the increase in incidence for a large number of infected individualsis equivalent to a small number of infected individuals does not make much sense, hence, there is a need for a more realistic incidence rate of the form *g*(*I*)*S* [23] where g tends to a saturation level, as it is a nonnegative function, such that *g*(0) = 0. To incorporate the effect of the behavioral changes of the susceptible individuals, a more general incidence rate of the form *<sup>β</sup>SI<sup>m</sup>* 1+*δI <sup>n</sup>* , where *m* and *n* are positive constants and *δ* is a nonnegative term, was proposed [24]. These types of incidence rates that allow for the possibility of the introduction of psychological effects are called saturated incidence rates; *δ* and 1 + *δI n* determine the amount of psychological effect and the inhibition effect, respectively. As the number of infective individuals increases, the rate of infection spread may decrease due to public awareness, potentially leading to contact reduction [25].

Many mathematical models of cholera transmission exist in the literature, and they study the dynamics of the disease. For example, Leo [26] developed an ML reference cholera model that can be used to overcome the existing complexities of modeling the disease. His results indicate, at an average of 87%, that the developed measures can integrate a large number of datasets, including environmental factors, for the timely prediction of cholera epidemics in Tanzania. Daudel et al. [27] constructed and studied a compartmental malaria model. Their results show that the higher implementation of strategies combining awareness programs and therapeutic treatments increases the efficacy of control measures. In [28], a stochastic norovirus epidemic model with a time delay and random perturbations was explored. In [29], a mathematical model for cholera considering vaccination effects was proposed. In [30], Capasso and Paveri-Fontana suggested a mathematical model for the 1973 cholera epidemic in the European Mediterranean region. In 2017, the transmission dynamics of cholera in Yemen were investigated by Nishiura et al. [31]. Lastly, a model containing optimal intervention strategies for cholera control was formulated in [32].

Luchko and Yamamoto [33] proposed a new differential operator with a general kernel function. Due to the existence of flexibility in choosing the kernel, they provide a basis for a broad range of applications [34]. Changing the kernel in the general derivative leads to the discovery of various asymptotic behaviors. Hence, the hidden features of real-world systems are more accurately observed than in the classical sense. However, both properties and applications regarding this operator must be studied in practical situations. There is also the need to state and prove some theorems in order to study models using this operator.

For cholera, the saturated incidence rate is more reasonable than the bilinear incidence rate. This is because it includes the behavioral change and crowding effect of the cholerainfective individuals and prevents the unboundedness of the contact rate by choosing suitable parameters. Motivated by the above discussions, we construct a novel mathematical model that studies the transmission dynamics of cholera in the Caputosense. The model is novel because, to the best of our knowledge, no previous study has analyzed a mathematical model with a saturated incidence rate and fractional derivative for the cholera disease in detail. The main contributions of this research and the new achievements obtained within this manuscript are summarized as follows:

	- A numerical scheme is developed to carry out numerical simulations.

Hence, in this paper, we study a fractional-order cholera model with a saturated incidence rate. The main contribution of the paper is the introduction of a more reasonable incidence rate of the form *<sup>β</sup>SI<sup>m</sup>* 1+*δI <sup>n</sup>* , which makes more sense when considering that assuming the increase in incidence for a large number of infected individualsis equivalent to a small number of infected individualsdoes not make much sense. It is also the aim of the paper to consider the effect of awareness in the control of cholera and study all the properties of well-posedness.

The paper is arranged as follows: Section 1 gives an introduction, Section 2 gives important definitions and preliminaries, Section 3 gives model formulation, Section 4 gives the well-posedness properties of the model, and Section 5 gives the numerical simulation results to support the analytic results. In Section 6, a discussion and conclusion are given.

### **2. Preliminary Definitions and Theorems**

**Definition 1** [35]**.** *The Caputo fractional derivative of order α* ∈ (*n* − 1, *n*] *of f*(*x*) *is defined as*,

$$\,\_a^C D\_x^{\alpha} f(x) = \frac{1}{\Gamma(n-a)} \int\_a^x (x-t)^{n-a-1} f^n(t)dt, \qquad n = [\alpha] + 1.$$

**Definition 2** [36]**.** *The linearity of the fractional derivative*.

Let *f* , *g* be continuous and *b*, *c* be scalars, then

$${}^{RL}\_{a}D\_{x}^{\alpha}[bf(\mathbf{x}) + dg(\mathbf{x})] = b\_{a}^{RL}D\_{x}^{\alpha}f(\mathbf{x}) + d\_{a}^{RL}D\_{x}^{\alpha}g(\mathbf{x}),$$

$${}^{C}\_{a}D\_{x}^{\alpha}[bf(\mathbf{x}) + dg(\mathbf{x})] = b\_{a}^{C}D\_{x}^{\alpha}f(\mathbf{x}) + d\_{a}^{C}D\_{x}^{\alpha}g(\mathbf{x}).$$

**Definition 3** [37]**.** *Contraction*.

An operator *f* : *X* → *X* that maps a metric space onto itself is said to be contractive if for 0 < *q* < 1.

$$d(f(\mathbf{x}), f(y)) = qd(\mathbf{x}, y), \forall \mathbf{x}, \ y \in X.$$

**Theorem 1** [37]**.** *Picard–Banach fixed point*.

Any contractive operator that maps a metric space onto itself has a unique fixed point. Furthermore, if *f* : *X* → *X* is a contractive operator that maps a metric space onto itself and *a* is its fixed point: *f*(*a*) = *a*; then for any iterative sequence

$$\mathbf{x}\_{0\prime} \qquad \mathbf{x}\_1 = f(\mathbf{x}\_0), \quad \mathbf{x}\_2 = f(\mathbf{x}\_1), \dots, \mathbf{x}\_{n+1} = f(\mathbf{x}\_n), \dots$$

converges to *a*.

In other words, *a* is a solution or equilibrium for a continuous dynamical system and a fixed point for a discrete dynamical system.

**Theorem 2** [36]**.** *The equilibrium solutions x* ∗ *of a system of fractional order differential equation is locally asymptotically stable if all the eigenvalues <sup>λ</sup><sup>i</sup> of the Jacobian matrix <sup>∂</sup> <sup>f</sup> ∂x<sup>i</sup> evaluated at the equilibrium points satisfy*

$$|\arg(\lambda\_i)| > \frac{\alpha \pi}{2}, \qquad 0 < \alpha < 1.$$

**Theorem 3** [36]**.** *Let <sup>x</sup>*(*t*) <sup>∈</sup> <sup>R</sup><sup>+</sup> *be a continuous and derivable function. Then, for any time instant t* ≥ *t*<sup>0</sup> *and α* ∈ (0, 1)

$$\, \_0^\mathbb{C} D\_t^a[\mathbf{x}(t) - \mathbf{x}^\* - \mathbf{x}^\* \ln \left( \frac{\mathbf{x}(t)}{\mathbf{x}^\*} \right)] \le \left( 1 - \frac{\mathbf{x}(t)}{\mathbf{x}^\*} \right) \, \_0^\mathbb{C} D\_t^a \mathbf{x}(t), \qquad \mathbf{x}^\* \in \mathbb{R}^+.$$

### **3. Model Formulation**

The model consists of the following classes: susceptible *S*, latently infected individuals *E*, infectious individuals *I*, and those that recovered from the infection *R*. We also define *N*(*t*) = *S*(*t*) + *E*(*t*) + *I*(*t*) + *R*(*t*). Assuming homogeneous mixing of the population, the model is given as:

$$\begin{aligned} \, \, \_0^\mathbb{C} D\_t^a S(t) &= \Lambda^a - \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} + \gamma^a E(t) + \zeta^a I(t) - \mu^a S(t), \\ \, \_0^\mathbb{C} D\_t^a E(t) &= p \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} - (\gamma^a + \eta^a + \phi^a + \mu^a) E(t), \\ \, \_0^\mathbb{C} D\_t^a I(t) &= (1 - p) \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} + \eta^a E(t) - (\xi^a + \delta^a + \eta^a + \mu^a) I(t), \\ &\quad \, \_0^\mathbb{C} D\_t^a R(t) = \delta^a I(t) - \mu^a R(t), \end{aligned} \tag{1}$$

subject to the initial conditions;

$$S(0) > 0, \ E(0) \ge 0, \ I(0) \ge 0, \ R(0) \ge 0.$$

where the function *f*(*I*) is given as;

$$f(I) = 1 + I^2.$$

We then modify the fractional operator via an auxiliary parameter *Y* > 0 to avoid dimensional mismatching [38] to obtain

$$\begin{split} Y^{\mathfrak{a}-1} \prescript{\mathsf{C}}{D}{D}\_{I}^{a} S(t) &= \mathsf{A}^{\mathfrak{a}} - \frac{\beta^{\mathfrak{a}} (1 - \mathsf{A}^{\mathfrak{a}}) S(t) I(t)}{f(I)} + \gamma^{\mathfrak{a}} E(t) + \xi^{\mathfrak{a}} I(t) - \mu^{\mathfrak{a}} S(t), \\ Y^{\mathfrak{a}-1} \prescript{\mathsf{C}}{D}{D}\_{I}^{a} E(t) &= p \frac{\beta^{\mathfrak{a}} (1 - \mathsf{A}^{\mathfrak{a}}) S(t) I(t)}{f(I)} - (\gamma^{\mathfrak{a}} + \eta^{\mathfrak{a}} + \phi^{\mathfrak{a}} + \mu^{\mathfrak{a}}) E(t), \\ Y^{\mathfrak{a}-1} \prescript{\mathsf{C}}{D}{D}\_{I}^{a} I(t) &= (1 - p) \frac{\beta^{\mathfrak{a}} (1 - \mathsf{A}^{\mathfrak{a}}) S(t) I(t)}{f(I)} + \eta^{\mathfrak{a}} E(t) - (\xi^{\mathfrak{a}} + \delta^{\mathfrak{a}} + \eta^{\mathfrak{a}} + \mu^{\mathfrak{a}}) I(t), \\ Y^{\mathfrak{a}-1} \prescript{\mathsf{C}}{D}{D}\_{I}^{a} R(t) &= \delta^{\mathfrak{a}} I(t) - \mu^{\mathfrak{a}} R(t), \end{split} \tag{2}$$

subject to the initial conditions;

$$S(0) > 0, \ E(0) \ge 0, \ I(0) \ge 0, \ R(0) \ge 0.$$

The meaning of the parameters in the model are given in Table 1 as follows.



### **4. Well-Posednessof the Model**

In this section, the mathematical properties of the model are explored. This consists of the positivity and boundedness, the existence and uniqueness, the computation of equilibrium solutions and basic reproduction ratio, and the stability analysis of the solutions.

### *4.1. Positivity and Boundedness*

The positivity of solutions means that the population thrives, while boundedness means that the population growth is restricted naturally due to limited resources.

To show positivity, consider Equation (1),

$$\begin{array}{c} \, \_0^\mathbb{C}D\_t^\alpha S(t)|\_{S=0} = \Lambda^\alpha + \gamma^\alpha E(t) + \xi^\alpha I(t) > 0, \\ \, \_0^\mathbb{C}D\_t^\alpha E(t)|\_{E=0} = p\frac{\beta^\alpha (1 - k^\alpha) S(t) I(t)}{f(I)} \ge 0, \\ \, \_0^\mathbb{C}D\_t^\alpha I(t)|\_{I=0} = \eta^\alpha E(t) \ge 0, \\ \, \_0^\mathbb{C}D\_t^\alpha R(t)|\_{R=0} = \delta^\alpha I(t) \ge 0. \end{array}$$

Hence, we can observe that the solution of (1) is non-negative. For the boundedness, observe that,

$$N(t) = S(t) + E(t) + I(t) + R(t).$$

Then, by Definition 3, we have

$${}^{C}\_{0}D\_{t}^{a}N(t) = {}^{C}\_{0}D\_{t}^{a}S(t) + {}^{C}\_{0}D\_{t}^{a}E(t) + {}^{C}\_{0}D\_{t}^{a}I(t) + {}^{C}\_{0}D\_{t}^{a}R(t).$$

Then,

$$\begin{aligned} \ \_0^C D\_t^\alpha N(t) &= \Lambda^\alpha - \mu^\alpha N(t) - (\phi^\alpha E(t) + q^\alpha I(t)),\\ \ \_0^C D\_t^\alpha N(t) &\le \Lambda^\alpha - \mu^\alpha N(t). \end{aligned}$$

We apply the Laplace transform method to solve the Gronwall's like inequality with initial condition *N*(*t*0) ≥ 0 to find

$$L\left\{{}^{\text{CF}}\_{0}D\_{t}^{\alpha}N(t) + \mu^{\alpha}N\right\} \leq L\{\Lambda^{\alpha}\}.$$

By the linearity of the Laplace transform, we obtain

$$L\left\{{}^{CF}\_{0}D\_{t}^{a}N(t)\right\} + \mu^{a}L\{N(t)\} \leq L\{\Lambda^{a}\},$$

we find,

$$S^{\alpha}L\{N(t)\} - \sum\_{k=0}^{n-1} S^{\alpha-k-1}N^{k}(t\_{0}) + \mu^{\alpha}L\{N(t)\} \le \frac{\Lambda^{\alpha}}{S}.$$

Simplifying, we obtain

$$L\{N(t)\} \le \Lambda^a \left(\frac{1}{S} - \frac{1}{S} \frac{1}{\left(1 + \frac{\mu^a}{S^a}\right)}\right) + \sum\_{k=0}^{n-1} \frac{1}{S^{k+1}} \frac{1}{\left(1 + \frac{\mu^a}{S^k}\right)} N^k(t\_0) \dots$$

Using the Taylor series expansion, we have

$$\frac{1}{\left(1+\frac{\mu^{\alpha}}{S^{\alpha}}\right)} = \sum\_{n=0}^{\infty} \left(\frac{-\mu^{\alpha}}{S^{\alpha}}\right)^{n}.$$

Therefore,

$$L\{N(t)\} \le \Lambda^a \left(\frac{1}{S} - \frac{1}{S} \sum\_{n=0}^{\infty} \left(\frac{-\mu^a}{S^a}\right)^n\right) + \sum\_{k=0}^{n-1} \frac{1}{S^{k+1}} N^k(t\_0) \sum\_{n=0}^{\infty} \left(\frac{-\mu^a}{S^a}\right)^n.$$

Taking the Laplace inverse, we find

$$N(t) \le Y^{\alpha} - \Lambda^{\alpha} \sum\_{n=0}^{\infty} \frac{-(\mu^{\alpha} t^{\alpha})^{n}}{\Gamma(\alpha n + 1)} + \sum\_{k=0}^{n-1} \sum\_{n=0}^{\infty} \frac{-(\mu^{\alpha} t^{\alpha})^{n}}{\Gamma(\alpha n + k + 1)} t^{k} N^{k}(t\_{0}).$$

Substituting the Mittag–Leffler function, we have

$$N(t) \le \Lambda^{\mathfrak{a}}[1 - E\_1(-\mu^{\mathfrak{a}}t^{\mathfrak{a}})] + \sum\_{k=0}^{n-1} E\_{k+1}(-\mu^{\mathfrak{a}}t^{\mathfrak{a}})t^{k}N^{k}(t\_0).$$

where *E*1(−*µ α t α* ), *Ek*+1(−*µ α t α* ) are the series of the Mittag–Leffler function which converges for any argument, hence, we say that the solution to the model is bounded. Thus, we define,

$$\omega = \left\{ (\mathcal{S}(t), \mathcal{E}(t), I(t), \mathcal{R}(t)) \in \mathbb{R}\_+^4 : \mathcal{S}(t), \mathcal{E}(t), I(t), \mathcal{R}(t) \le \Lambda^a [1 - E\_1(-\mu^a t^a)] + \sum\_{k=0}^{n-1} E\_{k+1}(-\mu^a t^a) t^k \mathcal{N}^k(t\_0) \right\}$$

hence, all solutions of (1) commencing in *ω* stays in *ω* for all *t* ≥ 0.

### *4.2. Existence and Uniqueness*

In this section, we study the existence and uniqueness properties of the solution of Equation (1). First, we consider the following theorem to show Lipschitz continuity.

**Theorem 4.** *The kernels of Equation (1) satisfy the Lipschitz continuity for L<sup>i</sup>* ≥ 0, *i* = 1, 2, . . . , 4.

**Proof.** Let the kernel be defined as,

$$\begin{aligned} g\_1(t, \mathcal{S}) &= \Lambda^a - \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} + \gamma^a E(t) + \xi^a I(t) - \mu^a S(t), \\ g\_2(t, \mathcal{E}) &= p \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} - (\gamma^a + \eta^a + \phi^a + \mu^a) E(t), \\ g\_3(t, \mathcal{I}) &= (1 - p) \frac{\beta^a (1 - k^a) S(t) I(t)}{f(I)} + \eta^a E(t) - (\xi^a + \delta^a + \eta^a + \mu^a) I(t), \\ g\_4(t, \mathcal{R}) &= \delta^a I(t) - \mu^a R(t). \end{aligned} \tag{3}$$

Now,

$$\begin{split} \left| \left| \varrho\_{1}(t,\mathcal{S}) - \varrho\_{1}(t,\mathcal{S}^{\*}) \right| \right| &= \left| \left( \frac{\beta^{a}(1-k^{a})I(t)}{f(I)} + \mu^{a} \right) (\mathcal{S} - \mathcal{S}^{\*}) \right| \\ &\leq \left( \left| \mu^{a} \right| + \left| \frac{\beta^{a}(1-k^{a})I(t)}{f(I)} \right| \right) \left\| \mathcal{S} - \mathcal{S}^{\*} \right\| \\ &\leq \left( \left| \mu^{a} \right| + \beta^{a}(1-k^{a}) \max\_{t \in \left[0,\ \frac{h^{a}}{h^{\*}}\right]} \frac{(1-k^{a})I(t)}{f(I)} \right) \left\| \mathcal{S} - \mathcal{S}^{\*} \right\| \\ &\leq L\_{1} \left\| \mathcal{S} - \mathcal{S}^{\*} \right\|\_{\prime} \quad L\_{1} = \left| \mu^{a} \right| + \beta^{a} (1-k^{a}) \max\_{t \in \left[0,\ \frac{h^{a}}{h^{\*}}\right]} \left| \frac{I(t)}{f(I)} \right|. \end{split}$$

Hence,

$$|g\_1(t, \mathcal{S}) - g\_1(t, \mathcal{S}^\*)| \le L\_1 \|\mathcal{S} - \mathcal{S}^\*\|. \tag{4}$$

In a similar way, we obtain

$$\begin{array}{l} \left| \left| \mathbf{g\_2(t, E) - \mathbf{g\_2(t, E^\*)} \right| \le L\_2 \left\| \mathbf{E} - E^\* \right\| \right| \\\left| \left| \mathbf{g\_3(t, I) - \mathbf{g\_3(t, I^\*)} \right| \le L\_3 \left\| I - I^\* \right\| \right| \\\left| \left| \mathbf{g\_4(t, R) - \mathbf{g\_4(t, R^\*)} \right| \le L\_4 \left\| R - R^\* \right\| \right|. \end{array} \tag{5}$$

where

$$L\_2 = \left| \gamma^a + \eta^a + \phi^a + \mu^a \right| \; L\_3 = \left| \zeta^a + \delta^a + q^a + \mu^a \right| + \beta^a (1 - k^a)(1 - p) \max\_{t \in [0, \ h^\*]} \left| \frac{I(t)}{f(I)} \right|.$$

and *L*<sup>4</sup> = |*µ α* |.

The following Lemma converts the system to Volterra integral equations.

**Lemma 1.** *The continuous system (1) can be transformed to equivalent Volterra integral equations*.

**Proof.** Consider

$${}^{\mathbb{C}}D\_t^{\alpha}S(t) = \mathfrak{g}\_1(t, S(t)).$$

On integrating fractionally, we find

$$\begin{array}{c} \, ^{\mathbb{C}}I\_{t}^{a} \left[ ^{\mathbb{C}}\_{0}D\_{t}^{a}S(t) \right] = ^{\mathbb{C}}\_{0}I\_{t}^{a} [g\_{1}(t, \; S(t))]\\ S(t) - S(0) = \frac{1}{\Gamma(a)} \int\_{0}^{t} (t - \tau)^{a - 1} g\_{1}(\tau, \; S(\tau)) dt\\ S(t) = S(0) + \frac{1}{\Gamma(a)} \int\_{0}^{t} (t - \tau)^{a - 1} g\_{1}(\tau, \; S(\tau)) dt. \end{array} \tag{6}$$

Similarly,

$$\begin{array}{l} E(t) = E(0) + \frac{1}{\Gamma(a)} \int\_0^t (t-\tau)^{a-1} g\_2(\tau, E(\tau)) dt, \\ I(t) = I(0) + \frac{1}{\Gamma(a)} \int\_0^t (t-\tau)^{a-1} g\_3(\tau, I(\tau)) dt, \\ R(t) = R(0) + \frac{1}{\Gamma(a)} \int\_0^t (t-\tau)^{a-1} g\_4(\tau, R(\tau)) dt. \end{array} \tag{7}$$

The following theorem provides the existence of the unique solution.

**Theorem 5.** *Let* 0 < *α* < 1, *I* = [0, h∗ ] ⊆ R *and* J = |S(t) − S(0)| ≤ k1.

Let g<sup>1</sup> : I × J → R be continuous bounded function, that is ∃!M > 0 such that |g1 (t, S)| ≤ M1.

Assume that g<sup>1</sup> satisfies the Lipschitz conditions. If *L*1*k*<sup>1</sup> < *M*1, then there exist unique *S* ∈ *C*[0, *h* ∗ ], *where h*∗ = *min*[*h*, *k*1Γ(*α*+1) *M*<sup>1</sup> ) 1 *α* i that holds the equation.

### **Proof.**

$$Let \ T = \{ \mathcal{S} \in \mathbb{C}[0, \ h^\*] \, : \ \|\mathcal{S}(t) - \mathcal{S}(0)\| \le k\_1 \}.$$

Since *T* ⊆ R and its closed set, then *T* is a complete metric space. Recall that,

$$S(t) = S(0) + \frac{1}{\Gamma(a)} \int\_0^t (t - \tau)^{a - 1} g\_1(\tau, S(\tau)) dt. \tag{8}$$

Define operator *F* in *T* by,

$$F(\mathcal{S}(t)) = \mathcal{S}(0) + \frac{1}{\Gamma(\alpha)} \int\_0^t (t - \tau)^{\alpha - 1} g\_1(\tau, \mathcal{S}(\tau)) dt. \tag{9}$$

To show that (6) satisfies Theorem 1, we have

$$\begin{split} |F(S(t)) - S(0)| &= \left| \frac{1}{\Gamma(\alpha)} \int\_0^t (t - \tau)^{\alpha - 1} g\_1(\tau, S(\tau)) dt \right| \\ &\leq \frac{1}{\Gamma(\alpha)} \int\_0^t (t - \tau)^{\alpha - 1} |g\_1(\tau, S(\tau))| dt \\ &\leq \frac{1}{\Gamma(\alpha)} \int\_0^t (t - \tau)^{\alpha - 1} M\_1 dt \\ &= \frac{M\_1}{\Gamma(\alpha + 1)} t^{\alpha} \\ &= \frac{M\_1}{\Gamma(\alpha + 1)} (h^\*)^{\alpha} \\ &\leq \frac{M\_1}{\Gamma(\alpha + 1)} \frac{k\_1 \Gamma(\alpha + 1)}{M\_1} \\ &= k\_1. \end{split}$$

Hence,

$$|F(S(t)) - S(0)| \le k\_1. \tag{10}$$

Similarly,

$$\begin{array}{l} |F(E(t)) - E(0)| \le k\_{2\prime} \\ |F(I(t)) - I(0)| \le k\_{3\prime} \\ |F(R(t)) - R(0)| \le k\_{4\prime} \end{array} \tag{11}$$

Therefore *F* maps *T* onto itself. Secondly, to show that *T* is a contraction, we have

$$F(\mathcal{S}) - F(\mathcal{S}^\*) = \mathcal{S}(0) - \mathcal{S}^\*(0) + \frac{1}{\Gamma(\mathfrak{a})} \int\_0^t (t - \tau)^{\mathfrak{a} - 1} [\mathfrak{g}\_1(\tau, \mathcal{S}(\tau)) - \mathfrak{g}\_1(\tau, \mathcal{S}^\*(\tau))] d\tau.$$

Since *S*(0) = *S* ∗ (0)

$$\begin{split} |F(S) - F(S^\*)| &= \left| \frac{1}{\Gamma(a)} \int\_0^t (t - \tau)^{a - 1} [g\_1(\tau, S(\tau)) - g\_1(\tau, S^\*(\tau))] d\tau \right| \\ &\leq \frac{1}{\Gamma(a)} \int\_0^t (t - \tau)^{a - 1} |g\_1(\tau, S(\tau)) - g\_1(\tau, S^\*(\tau))| d\tau. \end{split}$$

$$\begin{split} \leq \frac{1}{\Gamma(a)} \int\_0^t (t - \tau)^{a - 1} L\_1 \| S - S \| d\tau \\ &= \frac{L\_1}{\Gamma(a)} \| S - S \| \int\_0^t (t - \tau)^{a - 1} \tau^0 d\tau \\ &= \frac{L\_1}{\Gamma(a)} \| S - S^\* \| \frac{\Gamma(a)}{\Gamma(a + 1)} t^a \\ &= \frac{L\_1}{\Gamma(a + 1)} \| S - S^\* \| \| t^a \\ &\leq \frac{L\_1}{\Gamma(a + 1)} \| S - S^\* \| (h^\*)^a \\ &\leq \frac{L\_1}{\Gamma(a + 1)} \| S - S^\* \| \frac{k\_1 \Gamma(a + 1)}{M\_1} . \end{split}$$

Hence,

$$\|\|FS - FS^\*\|\| \le \frac{L\_1 k\_1}{M\_1} \|\|\mathcal{S} - \mathcal{S}^\*\|. \tag{12}$$

Since, by hypothesis *<sup>L</sup>*1*k*<sup>1</sup> *M*<sup>1</sup> < 1, *T* is contractive and has a unique fixed point. In a similar way, we obtain

$$\begin{aligned} \|F(E) - F(E^\*)\| &\le \frac{L\_2 k\_2}{M\_2} \|E - E^\*\|\_{\prime} \\ \|F(I) - F(I^\*)\| &\le \frac{L\_3 k\_3}{M\_3} \|I - I^\*\|\_{\prime} \\ \|F(R) - F(R^\*)\| &\le \frac{L\_4 k\_4}{M\_4} \|R - R^\*\|. \end{aligned} \tag{13}$$

Thus, Equation (1) has a unique solution.

### *4.3. Existence of Equilibrium Solutions*

Since *R* = *N* − (*S* + *E* + *I*), we can consider the first three equations in Equation (1) for this analysis.

Setting the first three equations in Equation (1) to zero and solving simultaneously, we find the following equilibrium solutions;

I. Disease-free equilibrium (*E*0) is given as;

$$E\_0 = \left( S^0, 0, 0 \right) = \left( \frac{\Lambda^a}{\mu^a}, 0, 0 \right).$$

II. Endemic equilibrium (*E*1) is given as;

$$E\_1 = \left(\mathcal{S}^1, E^1, I^1\right)\_f$$

where

$$\begin{aligned} S^1 &= \frac{A\_1 A\_2 f(I)}{\beta^\alpha (1 - k^\alpha) [(1 - p)A\_1 + \eta^\alpha p]}, \\ E^1 &= \frac{p A\_2 I^1}{\beta^\alpha (1 - k^\alpha) [(1 - p)A\_1 + \eta^\alpha p]}. \end{aligned}$$

and *I* 1 can be obtained by solving,

$$
\Lambda^{\alpha} - \frac{\beta^{\alpha}(1 - k^{\alpha})SI}{f(I)} - \mu^{\alpha}\mathcal{S} + \gamma^{\alpha}E + \mathfrak{E}^{\alpha}I = 0,
$$

where

$$A\_1 = \gamma^a + \eta^a + \phi^a + \mu^a, \text{ and } A\_2 = \mathfrak{F}^a + \delta^a + q^a + \mu^a.$$

Simplifying, we find

$$\mathcal{G}(I) = \Lambda^a - I \left[ \frac{A\_1 A\_2}{(1 - p)A\_1 + \eta^a p} - \left( \frac{\gamma^a p A\_2}{(1 - p)A\_1 + \eta^a p} + \mathfrak{J}^a \right) \right] - \frac{A\_1 A\_2 f(I)}{\beta^a (1 - k^a)[(1 - p)A\_1 + \eta^a p]} = 0.$$

Since,

$$\frac{A\_1 A\_2}{(1-p)A\_1 + \eta^a p} - \left(\frac{\gamma^a p A\_2}{(1-p)A\_1 + \eta^a p} + \xi^a\right) > 0, \text{ and } f' \ge 0\_\nu$$

then G is an increasing function.

In addition,

$$G(I) < \Lambda^a - I \left[ \frac{A\_1 A\_2}{(1-p)A\_1 + \eta^a p} - \left( \frac{\gamma^a p A\_2}{(1-p)A\_1 + \eta^a p} + \xi^a \right) \right],$$

then,

$$\lim\_{t \to \infty} G(I) = -\infty.$$

If *f*(0) = 1, clearly

$$\begin{array}{c} G(0) = \Lambda^{\mathfrak{a}} - \frac{A\_1 A\_2}{\beta^{\mathfrak{a}} (1 - k^{\mathfrak{a}}) [(1 - p) A\_1 + \eta^{\mathfrak{a}} p]} \\ = \Lambda^{\mathfrak{a}} \left[ 1 - \left( \frac{\mu^{\mathfrak{a}} A\_1 A\_2}{\eta^{\mathfrak{a}} p \beta^{\mathfrak{a}} (1 - k^{\mathfrak{a}}) \Lambda^{\mathfrak{a}}} + \frac{\mu^{\mathfrak{a}} A\_2}{(1 - p) \beta^{\mathfrak{a}} (1 - k^{\mathfrak{a}}) \Lambda^{\mathfrak{a}}} \right) \right]. \end{array}$$

Thus, a unique positive solution for *G* exists *i f f G*(0) > 0, i.e., if

$$\frac{\mu^{\alpha}A\_{1}A\_{2}}{\eta^{\alpha}p\beta^{\alpha}(1-k^{\alpha})\Lambda^{\alpha}} + \frac{\mu^{\alpha}A\_{2}}{(1-p)\beta^{\alpha}(1-k^{\alpha})\Lambda^{\alpha}} > 1.$$

We shall establish,

$$\frac{\mu^a A\_1 A\_2}{\eta^a p \beta^a (1 - k^a) \wedge^a} + \frac{\mu^a A\_2}{(1 - p) \beta^a (1 - k^a) \wedge^a} = R\_{0 \nu}$$

in the subsequent section where *R*<sup>0</sup> is the basic reproduction ratio.

#### *4.4. Basic Reproduction Ratio 4.4. Basic Reproduction Ratio 4.4. Basic Reproduction Ratio Entropy* **2023**, *25*, x FOR PEER REVIEW 11 of 17 *Entropy* **2023**, *25*, x FOR PEER REVIEW 11 of 17

This is defined as the number of secondary cases caused by a single infected individual in a population of Susceptible [37]. Here we use the idea of the next-generation matrix as in [39]. This is defined as the number of secondary cases caused by a single infected individual in a population of Susceptible [37]**.** Here we use the idea of the next-generation matrix as in [39]**.** This is defined as the number of secondary cases caused by a single infected individual in a population of Susceptible [37]**.** Here we use the idea of the next-generation matrix as in [39]**.** *4.4. Basic Reproduction Ratio* This is defined as the number of secondary cases caused by a single infected indi-*4.4. Basic Reproduction Ratio* This is defined as the number of secondary cases caused by a single infected indi-

Let, Let, Let,

Then,

Then,

Ƴ represents new infection and Ƶ represents the remaining terms in Equation (1). represents new infection and Ƴ represents new infection and Ƶ represents the remaining terms in Equation (1). represents the remaining terms in Equation (1). Then, vidual in a population of Susceptible [37]**.** Here we use the idea of the next-generation matrix as in [39]**.** vidual in a population of Susceptible [37]**.** Here we use the idea of the next-generation matrix as in [39]**.**

)()()

)()()

].

This is defined as the number of secondary cases caused by a single infected indi-

This is defined as the number of secondary cases caused by a single infected indi-

.

(1 − )

(1 − )

], (<sup>0</sup>

], (<sup>0</sup>

) .

)

)

)

(1 −

Note: is the next generation matrix and the dominant eigenvalue of is the basic

) .

)

1

 + <sup>2</sup> ].

1

 + <sup>2</sup> ].

<sup>1</sup> 0

(1 −

 <sup>2</sup> ],

<sup>1</sup> 0 − <sup>2</sup> ],

.

−

.

−

1

 + <sup>2</sup> ].

1

 + <sup>2</sup> ].

) 0 ].

) = [

) is the Jacobian at Ƶ at <sup>0</sup>

(1 − )

(1 − )

) is the Jacobian at Ƶ at <sup>0</sup>

) 0 ].

)= [

<sup>1</sup> 0 − <sup>2</sup> ],

<sup>1</sup> 0 − <sup>2</sup> ],

(1 −

.

.

) 0 ].

) 0 ].

) 0

) 0

(1 −

(1 −

(1 −

) 0

) 0

(1 −

(1 −

<sup>0</sup> 1

<sup>0</sup> <sup>1</sup>

<sup>0</sup> 1

<sup>0</sup> <sup>1</sup>

(1 −

.

<sup>0</sup> 1

() ]

<sup>0</sup> 1

<sup>0</sup> <sup>1</sup>

) 0

<sup>0</sup> <sup>1</sup>

) 0

> ) 0

() ]

)

) 0

(1 −

$$\mathbf{Y} = \begin{bmatrix} p \frac{\beta^a (1 - k^a) S(t) I(t)}{f(l)} \\ (1 - p) \frac{\beta^a (1 - k^a) S(t) I(t)}{f(l)} \end{bmatrix}, \quad \text{and} \ \mathbf{Z} = \begin{bmatrix} A\_1 E \\ -\eta^a E + A\_2 I \end{bmatrix}.$$

Then,

Then,

Then,

reproduction ratio (<sup>0</sup>

Here,

Here,

reproduction ratio (<sup>0</sup>

[ 

(1 − )

[ 

(1 − )

$$Y(E\_0) = \begin{bmatrix} 0 & p\beta^a (1 - k^a) S^0 \\ 0 & (1 - p)\beta^a (1 - k^a) S^0 \end{bmatrix}, \text{ and } Z(E\_0) = \begin{bmatrix} A\_1 & 0 \\ -\eta^a & A\_2 \end{bmatrix},$$

() ]

() ]

 (1 −

0 (1 − ) (1 − ) 0 − <sup>2</sup> where (<sup>0</sup> ) is the Jacobian of Ƴ at 0and (<sup>0</sup> ) is the Jacobian at Ƶ at <sup>0</sup> 0 (1 − ) (1 − ) 0 − <sup>2</sup> where (<sup>0</sup> ) is the Jacobian of Ƴ at 0and (<sup>0</sup> ) is the Jacobian at Ƶ at <sup>0</sup> Then, 0 (1 − ) 0 Then, 0 (1 − ) 0 where *Y*(*E*0) is the Jacobian of Ƴ represents new infection and Ƶ represents the remaining terms in Equation (1). Then, at *E*<sup>0</sup> and *Z*(*E*0) is the Jacobian at Ƴ represents new infection and Ƶ represents the remaining terms in Equation (1). Then, at *E*0. Therefore,

> ) −1 ) = [

).

)((<sup>0</sup> ) −1 ) = [

Then,

 (1 −

$$X = Y(E\_0)(Z(E\_0)^{-1}) = \begin{bmatrix} \eta^a p \beta^a (1 - k^a) S^0 & A\_1 p \beta^a (1 - k^a) S^0 \\ \eta^a (1 - p) \beta^a (1 - k^a) S^0 & A\_1 (1 - p) \beta^a (1 - k^a) S^0 \end{bmatrix}.$$

Note: is the next generation matrix and the dominant eigenvalue of is the basic

(<sup>0</sup>

) +

(<sup>0</sup>

= (<sup>0</sup>

) +

<sup>0</sup> =

Therefore,

= (<sup>0</sup>

−

− µ 

)

)

(1 −

demic equilibrium points.

−

reproduction ratio (<sup>0</sup>

Note: is the next generation matrix and the dominant eigenvalue of is the basic

[ 

 

)= [

) = [

(1 −)

 

[ 

(1 − )

In this section, we carry out a local stability analysis of both disease-free and en-

)((<sup>0</sup> ) −1 ) = [

µ 1<sup>2</sup>

(1 −

)((<sup>0</sup> ) −1 ) = [

(1 −

In this section, we carry out a local stability analysis of both disease-free and en-

µ 1<sup>2</sup>

 (1 −

 (1 −

First, consider the following Jacobian matrix from Equation (1);

).

(1 − )

*4.5. Stability Analysis of the Equilibria*

)

− µ 

(1 − )

)

*4.5. Stability Analysis of the Equilibria*

)

, then we have

, then we have

=

0 −<sup>1</sup> (1 −

(1 − )

) = [

[ − (1 −

−µ 

0 −<sup>1</sup> (1 −

(1 − )

(1 − )

−µ 

(1 − )

(1 −

<sup>1</sup>

In echelon form, we find

) = [

−µ 

In echelon form, we find

) = [

(<sup>0</sup>

(1 −

**Proof**. Consider Equation (14) at <sup>0</sup>

<sup>1</sup>

µ <sup>2</sup> (1 −)(1 −

0 (1 − )

0

) is the Jacobian of Ƴ at 0and (<sup>0</sup>

(1 −

µ <sup>2</sup> (1 −)(1 −

0 (1 − )

0

(1 −)

(1 − )

)

) +

)

−

)

(1 − ′ () ()

In this section, we carry out a local stability analysis of both disease-free and en-

 

(1 − ′ () () )

µ 1<sup>2</sup>

 (1 −

(1 −

(1− ′ () ()

(1 − ′ () () )

(1− ′ () ()

()

First, consider the following Jacobian matrix from Equation (1);

First, consider the following Jacobian matrix from Equation (1);

(1 −

()

(1 −

()

−

()

)

µ 1<sup>2</sup>

 (1 −

(1 −

(1 − ′ () ()

In this section, we carry out a local stability analysis of both disease-free and en-

)

*) is locally asymptotically stable if* <sup>0</sup> < 1.

(1 − )

*) is locally asymptotically stable if* <sup>0</sup> < 1.

(1 − )

) 0

)

)

<sup>0</sup> −<sup>2</sup>

 

(1 −

−<sup>1</sup>

(1 −

)

0

(1 − )

0 −<sup>1</sup> (1 −

(1 − )

0 −<sup>1</sup> (1 −

) 0

(1 −

−µ 

0

0 −<sup>1</sup> (1 −

(1 − )

<sup>0</sup> − 1<sup>2</sup> +

<sup>0</sup> −<sup>2</sup>

 

(1 −

].

−<sup>1</sup>

].

) <sup>0</sup> + 

(1 −

(1 −

<sup>0</sup> − 1<sup>2</sup> +

0 −<sup>1</sup> (1 −

(1 − )

(1 −

(1 −

) 0

<sup>0</sup> − 1<sup>2</sup> +

0 −<sup>1</sup> (1 −

(1 − )

) <sup>0</sup> + 

, then we have

(1 −

(1 −

)

) <sup>0</sup> + 

, then we have

(1 −

) 0

(1 −

(1 −

(1 −

)

)

<sup>0</sup> −<sup>2</sup>

()

) 0

) 0

) 0

(1 − )

)

]. (15)

*) is locally asymptotically stable if* <sup>0</sup> < 1.

*) is locally asymptotically stable if* <sup>0</sup> < 1.

(1 −

) <sup>0</sup> + 

) 0

)

) 0

<sup>0</sup> − <sup>1</sup><sup>2</sup> +

) 0

) <sup>0</sup> + 

<sup>0</sup> − 1<sup>2</sup> +

].

)

) 0

) 0

)

) 0

<sup>0</sup> −<sup>2</sup>

]. (15)

].

].

<sup>0</sup> −<sup>2</sup>

(1 −

(1 −

) 0

) 0

]. (15)

]. (15)

]. (15)

]. (15)

(1 −

(1 −

(1 −

) <sup>0</sup> + 

(1 −

) <sup>0</sup> + 

].

<sup>0</sup> −<sup>2</sup>

()

()

(1 −

()

(1 −

(1 −

()

)+ 

(1 −

(1 −

Note: is the next generation matrix and the dominant eigenvalue of is the basic

)+ 

)

) +

()

(1 −

()

(1 −

)− <sup>2</sup> ] 

(1 − ′ () () )

(1− ′ () ()

()

(1 −

(1 − ′ () ()

(1− ′ () ()

(1 − ′ () () )

In this section, we carry out a local stability analysis of both disease-free and en-

)− <sup>2</sup> ] 

In this section, we carry out a local stability analysis of both disease-free and en-

)

)

*) is locally asymptotically stable if* <sup>0</sup> < 1.

 (1 −

)

)

) +

)

*) is locally asymptotically stable if* <sup>0</sup> < 1.

−

 (1 −

) 0

) <sup>0</sup> + 

(1 − )

)

(1 − ′ () ()

. (14)

µ <sup>2</sup> (1 −)(1 −

)+ 

)+ 

)− <sup>2</sup> ] 

(1 − ′ () ()

)

)

(1 − ′ () () )

(1 − ′ () () )

(1− ′ () ()

(1 − ′ () ()

(1− ′ () ()

)

)

)

)− <sup>2</sup> ] 

) .

) .

. (14)

. (14)

)+ 

)+ 

)− <sup>2</sup> ] 

)− <sup>2</sup> ] 

. (14)

. (14)

. (14)

µ <sup>2</sup> (1 −)(1 −

()

()

First, consider the following Jacobian matrix from Equation (1);

).

(1 −

<sup>0</sup> =

(1 −

<sup>0</sup> =

()

 

)

−<sup>1</sup>

()

 

−<sup>1</sup>

()

(1 −

(1 −

()

(1 −

(1 −

(1 −

()

(1 −

()

()

(1 −

(1 −

(1 −

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

0

()

0

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

**Proof**. Consider Equation (14) at <sup>0</sup>

)

(1 − )

(<sup>0</sup>

0

0

−µ 

)

(1 − )

(<sup>0</sup>

0 −<sup>1</sup> (1 −

) = [

<sup>1</sup>

<sup>1</sup>

0 −<sup>1</sup> (1 −

) <sup>0</sup> + 

)

− µ 

> ) 0

, then we have

)

)

) <sup>0</sup> + 

(1 − )

0 −<sup>1</sup> (1 −

<sup>0</sup> − 1<sup>2</sup> +

(1 −

) = [

−µ 

) 0

(1 −

(1 −

) <sup>0</sup> + 

) <sup>0</sup> + 

)

) ()<sup>−</sup> <sup>µ</sup>

, then we have

()

−

)

) +

(1 −

) .

µ <sup>2</sup> (1 −)(1 −

µ <sup>2</sup> (1 −)(1 −

(1 −

) .

)

(1 −

(1 −

(1 −)

(1 −)

 

)

)

(1 −

Note: is the next generation matrix and the dominant eigenvalue of is the basic

(1 −

(1 −

(1 −

Note: is the next generation matrix and the dominant eigenvalue of is the basic

) is the Jacobian of Ƴ at 0and (<sup>0</sup>

µ 1<sup>2</sup>

(1 −

where (<sup>0</sup>

)

− µ 

demic equilibrium points.

(1 −

=

[ − (1 −

)

− µ 

*4.5. Stability Analysis of the Equilibria*

)

)

)

(1 − )

[ − (1 −

()

()

(1 −

(1 −

()

()

(1 − )

(1 − )

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

**Proof**. Consider Equation (14) at <sup>0</sup>

**Proof**. Consider Equation (14) at <sup>0</sup>

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

In echelon form, we find

−µ 

−µ 

0

) = [

In echelon form, we find

(<sup>0</sup>

) = [

(<sup>0</sup>

()

(1 −

)

demic equilibrium points.

()

=

(<sup>0</sup>

) = [

(<sup>0</sup>

0

(<sup>0</sup>

) = [

−µ 

**Proof**. Consider Equation (14) at <sup>0</sup>

<sup>1</sup>

) = [

−µ 

In echelon form, we find

In echelon form, we find

<sup>1</sup>

) = [

(<sup>0</sup>

−µ 

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

0

**Proof**. Consider Equation (14) at <sup>0</sup>

0

(<sup>0</sup>

(1 − )

−µ 

0

=

(<sup>0</sup>

0 −<sup>1</sup> (1 −

(1 − )

0

(<sup>0</sup>

0 −<sup>1</sup> (1 −

) = [

[ − (1 −

µ 1<sup>2</sup>

(1 −

where (<sup>0</sup>

).

First, consider the following Jacobian matrix from Equation (1);

reproduction ratio (<sup>0</sup>

Here,

<sup>0</sup> =

Therefore,

First, consider the following Jacobian matrix from Equation (1);

Here,

*4.5. Stability Analysis of the Equilibria*

−<sup>1</sup>

)

−<sup>1</sup>

 

()

()

**Theorem 6**.*The disease-free equilibrium (*<sup>0</sup>

(1 −

(1 − )

()

 

(1 −

()

(1 −

demic equilibrium points.

()

()

).

).

<sup>0</sup> =

reproduction ratio (<sup>0</sup>

<sup>0</sup> =

Here,

reproduction ratio (<sup>0</sup>

Here,

*4.5. Stability Analysis of the Equilibria*

*4.5. Stability Analysis of the Equilibria*

demic equilibrium points.

demic equilibrium points.

=

[ − (1 −

=

[ − (1 −

Note: *X* is the next generation matrix and the dominant eigenvalue of *X* is the basic reproduction ratio (*R*0).

Here,

$$R\_0 = \frac{\mu^{\alpha} A\_1 A\_2}{\eta^{\alpha} p \beta^{\alpha} (1 - k^{\alpha}) \Lambda^{\alpha}} + \frac{\mu^{\alpha} A\_2}{(1 - p) \beta^{\alpha} (1 - k^{\alpha}) \Lambda^{\alpha}}.$$

### *4.5. Stability Analysis of the Equilibria*

In this section, we carry out a local stability analysis of both disease-free and endemic equilibrium points.

First, consider the following Jacobian matrix from Equation (1);

$$J = \begin{bmatrix} -\frac{\beta^{\mu}(1-k^{\alpha})I}{f(I)} - \mu^{\alpha} & \gamma^{\mu} & -\frac{\beta^{\mu}(1-k^{\alpha})S}{f(I)} \left(1 - \frac{I f'(I)}{f(I)}\right) + \xi^{\alpha} \\\ \frac{p\beta^{\mu}(1-k^{\alpha})I}{f(I)} & -A\_{1} & \frac{p\beta^{\mu}(1-k^{\alpha})S}{f(I)} \left(1 - \frac{I f'(I)}{f(I)}\right) \\\ \frac{(1-p)\beta^{\mu}(1-k^{\alpha})I}{f(I)} & \eta^{\alpha} & \frac{(1-p)\beta^{\mu}(1-k^{\alpha})S}{f(I)} \left(1 - \frac{I f'(I)}{f(I)}\right) - A\_{2} \end{bmatrix}. \tag{14}$$

**Theorem 6.** *The disease-free equilibrium (E*0*) is locally asymptotically stable if R*<sup>0</sup> < 1.

**Proof.** Consider Equation (14) at *E*0, then we have

$$J(E\_0) = \begin{bmatrix} -\mu^a & \gamma^a & \beta^a(1-k^a)S^0 + \mathfrak{f}^a\\ 0 & -A\_1 & p\beta^a(1-k^a)S^0\\ 0 & \eta^a & (1-p)\beta^a(1-k^a)S^0 - A\_2 \end{bmatrix}.$$

In echelon form, we find

$$J(E\_0) = \begin{bmatrix} -\mu^a & \gamma^a & \beta^a (1 - k^a) S^0 + \xi^a \\ 0 & -A\_1 & p\beta^a (1 - k^a) S^0 \\ 0 & \eta^a & A\_1 (1 - p) \beta^a (1 - k^a) S^0 - A\_1 A\_2 + \eta^a p \beta^a (1 - k^a) S^0 \end{bmatrix}. \tag{15}$$

The eigenvalues of Equation (15) are:

$$
\lambda\_1 = -\mu^\mu < 0,\ \lambda\_2 = -A\_1,\text{and}\ \lambda\_3 = -A\_1A\_2(R\_0 - 1) < 0\ if\ R\_0 < 1.
$$

Hence, *E*<sup>0</sup> is locally asymptotically stable if *R*<sup>0</sup> < 1.

**Theorem 7.** *The endemic equilibrium (E*1*) is locally asymptotically stable if R*<sup>0</sup> > 1.

**Proof.** Consider Equation (14) at *E*1, then we find

$$= \begin{bmatrix} f(E\_1) \\ \begin{bmatrix} -\frac{\beta^a (1-k^a)I^1}{f(I)} - \mu^a & \gamma^a & -\frac{\beta^a (1-k^a)S^1}{f(I)} \left(1 - \frac{I^1 f'(I)}{f(I)}\right) + \xi^a \\\ \frac{p\beta^a (1-k^a)I^1}{f(I)} & -A\_1 & \frac{p\beta^a (1-k^a)S^1}{f(I)} \left(1 - \frac{I^1 f'(I)}{f(I)}\right) \\\ \frac{(1-p)\beta^a (1-k^a)I^1}{f(I)} & \eta^a & \frac{(1-p)\beta^a (1-k^a)S^1}{f(I)} \left(1 - \frac{I^1 f'(I)}{f(I)}\right) - A\_2 \end{bmatrix}.$$

After computing the echelon form,

$$\begin{split} traces(I(E\_1)) &= \begin{array}{l} - (\mu^a A\_1 + p(\mu^a + \phi^a)) \frac{\beta^a (1 - k^a) I}{f(I)} I((1 - p) A\_1 + p \eta^a) \\ - \frac{(1 - p) \beta^a (1 - k^a) I}{f(I)} (\mu^a + q^a + \delta^a) I \\ - A\_2 \mu^a \left[ ((1 - p) A\_1 + p \eta^a) - (1 - p) A\_1 A\_2 \left( 1 - \frac{I f'(I)}{f(I)} \right) \right] \end{array}$$

and

$$\begin{split} \det(I(E\_1)) &= \begin{pmatrix} \mu^a A\_1 + \frac{p\beta^a (1-k^a)(\mu^a + \theta^a)I}{f(I)} \end{pmatrix} \Big[ (\mu^a + q^a + \delta^a)(1-p) \frac{\beta^a (1-k^a)I}{f(I)} \\ &+ A\_2 \mu^a ((1-p)A\_1 + p\eta^a) - \mu^a (1-p)A\_1 A\_2 \left(1 - \frac{I f'(I)}{f(I)}\right) \Big] \\ &+ (\mu^a (1-p) + \mu^a) \left[ \frac{\beta^a \mu^a A\_1 A\_2 \left(1 - \frac{I f'(I)}{f(I)}\right) (1-k^a)I^1}{((1-p)A\_1 + p\eta^a) + f(I) p(\mu^a + q^a + \delta^a)} I \right]. \end{split}$$

Clearly, *trace*(*J*(*E*1)) < 0 and *det*(*J*(*E*1)) > 0 if ((1 − *p*)*A*<sup>1</sup> + *pη α* ) − *µ α* (1 − *p*) *A*1*A*<sup>2</sup> 1 − *I f* 0 (*I*) *f*(*I*) > 0. This is equivalent to *R*<sup>0</sup> > 1.

### **5. Numerical Simulations**

This section is devoted to testing the performance of the proposed fractional-order model (1) under the Caputo differential operator while using a numerical explicit technique called the Adams–Bashforth–Moulton technique, also known as the fractional predictorcorrector method introduced and analyzed for its convergence and error bounds in [40,41]; *s* = *min*(1 + *α*, 2) is the order of accuracy for the numerical technique. It is also worth mentioning that, unlike newly established numerical techniques for classical initial value problems, the present literature is not rich enough with numerical techniques for fractional order differential equations.

In this research, the parameter values used are from [28]; Λ = 6, *β* = 10, *µ* = 0.02, *γ* = 0.01, *ξ* = 0.02, *φ* = 0.01, *q* = 0.02, *η* = 0.2, *δ* = 0.1, *p* ∈ [0, 1], *k* = 0.2, and *α* ∈ (0, 1]. Consider an equi-spaced mesh *t<sup>i</sup>* = *i*∆*t*, *i* = 0, 1, . . . , *M*, where M is a positive integer and ∆*t* = *<sup>T</sup> <sup>M</sup>* , where *T* is the upper limit of the closed interval of integration [0, *T*]. This setting leads to the following structure for the predictor part required by the numerical technique under consideration:

$$\begin{aligned} S\_{l+1}^{s} &= S(0) + \sum\_{i=1}^{n} b\_{a,i,n+1} g\_1(t\_i, \mathbb{S}\_i, \mathbb{E}\_i, \mathbf{I}\_i, \mathbb{R}\_i)\_{\prime} \\\ E\_{i+1}^{s} &= E(0) + \sum\_{i=1}^{n} b\_{a,i,n+1} g\_2(t\_i, \mathbb{S}\_i, \mathbb{E}\_i, \mathbf{I}\_i, \mathbb{R}\_i)\_{\prime} \\\ I\_{i+1}^{s} &= I(0) + \sum\_{i=1}^{n} b\_{a,i,n+1} g\_3(t\_i, \mathbb{S}\_i, \mathbb{E}\_i, \mathbb{I}\_i, \mathbb{R}\_i)\_{\prime} \\\ R\_{l+1}^{s} &= R(0) + \sum\_{i=1}^{n} b\_{a,i,n+1} g\_4(t\_i, \mathbb{S}\_i, \mathbb{E}\_i, \mathbb{I}\_i, \mathbb{R}\_i). \end{aligned} \tag{16}$$

Similarly, for the corrector, we have

*S u <sup>i</sup>*+<sup>1</sup> = *S*(0) + *aα*,*n*+1,*n*+1*g*1(*t<sup>i</sup>* , *S<sup>i</sup> s* , *E<sup>i</sup> s* , *Ii s* , *R<sup>i</sup> s* ) + *n* ∑ *i*=1 *aα*,*i*,*n*+1*g*1(*t<sup>i</sup>* , *S<sup>i</sup>* , *E<sup>i</sup>* , *Ii* , *Ri*), *E u <sup>i</sup>*+<sup>1</sup> = *E*(0) + *aα*,*n*+1,*n*+1*g*2(*t<sup>i</sup>* , *S<sup>i</sup> s* , *E<sup>i</sup> s* , *Ii s* , *R<sup>i</sup> s* ) + *n* ∑ *i*=1 *aα*,*i*,*n*+1*g*2(*t<sup>i</sup>* , *S<sup>i</sup>* , *E<sup>i</sup>* , *Ii* , *Ri*), *I u <sup>i</sup>*+<sup>1</sup> = *I*(0) + *aα*,*n*+1,*n*+1*g*3(*t<sup>i</sup>* , *S<sup>i</sup> s* , *E<sup>i</sup> s* , *Ii s* , *R<sup>i</sup> s* ) + *n* ∑ *i*=1 *aα*,*i*,*n*+1*g*3(*t<sup>i</sup>* , *S<sup>i</sup>* , *E<sup>i</sup>* , *Ii* , *Ri*), *R u <sup>i</sup>*+<sup>1</sup> = *R*(0) + *aα*,*n*+1,*n*+1*g*4(*t<sup>i</sup>* , *S<sup>i</sup> s* , *E<sup>i</sup> s* , *Ii s* , *R<sup>i</sup> s* ) + *n* ∑ *i*=1 *aα*,*i*,*n*+1*g*4(*t<sup>i</sup>* , *S<sup>i</sup>* , *E<sup>i</sup>* , *Ii* , *Ri*). (17)

where

$$a\_{\mathfrak{a},i,n+1} = \frac{\left(\Delta t\right)^a}{\Gamma(\mathfrak{a}+2)}, \text{ and } b\_{\mathfrak{a},i,n+1} = \frac{1}{\Gamma(\mathfrak{a}+2)} \left[ \left(n-i+1\right)^a - \left(n-i\right)^a \right].$$

and *s* denote the predictor and *u* represent the corrector.

Additionally,

$$\begin{cases} n^{a+1} - (n-a)(n+1)^a, & i = 0\\ \ (n-i+2)^a - 2(n-i+1)^{a+1} + (n-1)^{a+1}, & 1 \le i \le n\\ 1, & i = n+1. \end{cases}$$

The behavior of the Susceptible *S*(*t*), Exposed *E*(*t*), Infected *I*(*t*), and Recovered *R*(*t*) population can be seen in Figure 1a–d where the required simulations have been carried out for T = 100 days while varying the values of the fractional order parameter *α*. For the Susceptible population, it is observed that the population decays at a faster rate for higher values of the fractional-order parameter. On the contrary, the higher values of fractional-order parameter *α* lead to an increase in the Exposed, Infected, and Recovered populations. *Entropy* **2023**, *25*, x FOR PEER REVIEW 14 of 17

**Figure 1.**(**a**) The dynamics of the Susceptible population for varying fractional order .(**b**) The dynamics of the Exposed population for varying fractional order . (**c**) The dynamics of the Infected population for varying fractional order .(**d**) The dynamics of the Recovered population for varying fractional order . **Figure 1.** (**a**) The dynamics of the Susceptible population for varying fractional order *α*. (**b**) The dynamics of the Exposed population for varying fractional order *α*. (**c**) The dynamics of the Infected population for varying fractional order *α*. (**d**) The dynamics of the Recovered population for varying fractional order *α*.

The behavior of the Exposed *E(t)* and Infected *I(t)* classes can be seen in Figure 2a,b where the required simulations have been carried out for T = 100 days while varying the values of the awareness parameter *k*. For both the Exposed and Infected classes, it can be observed that a higher value of awareness parameter leads to a decrease in the Exposed and Infected populations. The behavior of the Exposed *E*(*t*) and Infected *I*(*t*) classes can be seen in Figure 2a,b where the required simulations have been carried out for T = 100 days while varying the values of the awareness parameter *k*. For both the Exposed and Infected classes, it can be observed that a higher value of awareness parameter *k* leads to a decrease in the Exposed and Infected populations.

> k=0.2 k=0.5

population

(**a**)

0 10 20 30 40 50 60 70 80 90 100

time

ing fractional order .

0 10 20 30 40 50 60 70 80 90 100

0 10 20 30 40 50 60 70 80 90 100

time

time

0

200

400

600

population

800

1000

1200

population

posed and Infected populations.

(**a**) (**b**)

population

alpha=1.0 alpha=0.8 alpha=0.6 alpha=0.4 alpha=0.2

population

alpha=0.2 alpha=0.4 alpha=0.6 alpha=0.8 alpha=1.0

(**c**) (**d**)

**Figure 1.**(**a**) The dynamics of the Susceptible population for varying fractional order .(**b**) The dynamics of the Exposed population for varying fractional order . (**c**) The dynamics of the Infected population for varying fractional order .(**d**) The dynamics of the Recovered population for vary-

0 10 20 30 40 50 60 70 80 90 100

0 10 20 30 40 50 60 70 80 90 100

time

alpha=1.0 alpha=0.8 alpha=0.6 alpha=0.4 alpha=0.2

> alpha=1.0 alpha=0.8 alpha=0.6 alpha=0.4 alpha=0.2

time

The behavior of the Exposed *E(t)* and Infected *I(t)* classes can be seen in Figure 2a,b where the required simulations have been carried out for T = 100 days while varying the values of the awareness parameter *k*. For both the Exposed and Infected classes, it can be observed that a higher value of awareness parameter leads to a decrease in the Ex-

**Figure 2.**(**a**) The dynamics of the Exposed classes by increasing awareness parameter k. (**b**) The dynamics of the Infected classes by increasing awareness parameter k. **Figure 2.** (**a**) The dynamics of the Exposed classes by increasing awareness parameter k. (**b**) The dynamics of the Infected classes by increasing awareness parameter k.

#### **6. Summary and Conclusions 6. Summary and Conclusions**

In this paper, a fractional order cholera model in the Caputo sense is constructed. The transmission dynamics of the disease are studied by incorporating the saturated incidence rate into the model. Various well-posedness properties such as positivity, boundedness, existence, and uniqueness of the solution are also studied. Equilibrium solutions were computed, and their stability analysis was shown to depend on a threshold quantity, the basic reproduction ratio (<sup>0</sup> ). It was clearly shown that if <sup>0</sup> < 1, the disease-free equilibrium is locally asymptotically stable, whereas if <sup>0</sup> > 1, endemic equilibrium exists and is locally asymptotically stable. The model is found to be well-posed and more realistic than many related models in the literature due to the consideration of the saturated incidence rate. In this paper, a fractional order cholera model in the Caputo sense is constructed. The transmission dynamics of the disease are studied by incorporating the saturated incidence rate into the model. Various well-posedness properties such as positivity, boundedness, existence, and uniqueness of the solution are also studied. Equilibrium solutions were computed, and their stability analysis was shown to depend on a threshold quantity, the basic reproduction ratio (*R*0). It was clearly shown that if *R*<sup>0</sup> < 1, the disease-free equilibrium is locally asymptotically stable, whereas if *R*<sup>0</sup> > 1, endemic equilibrium exists and is locally asymptotically stable. The model is found to be well-posed and more realistic than many related models in the literature due to the consideration of the saturated incidence rate.

Numerical simulations were carried out to support the analytic result and to show the significance of the fractional order from the biological point of view as well as show the significance of awareness programs in curtailing the spread of cholera in a population. From the result obtained, it is observed that the Susceptible population decay at a faster rate for higher values of the fractional-order parameter. On the contrary, the higher values of fractional-order parameter lead to an increase in the Exposed, Infected, and Recovered populations. In addition, it is observed that for both the Exposed and Infected classes, the increase in the value of awareness parameter leads to a decrease in the Numerical simulations were carried out to support the analytic result and to show the significance of the fractional order from the biological point of view as well as show the significance of awareness programs in curtailing the spread of cholera in a population. From the result obtained, it is observed that the Susceptible population decay at a faster rate for higher values of the fractional-order parameter. On the contrary, the higher values of fractional-order parameter *α* lead to an increase in the Exposed, Infected, and Recovered populations. In addition, it is observed that for both the Exposed and Infected classes, the increase in the value of awareness parameter *k* leads to a decrease in the Exposed and Infected populations, respectively.

Exposed and Infected populations, respectively. In conclusion, this paper studied a fractional order model in the Caputo sense with a saturated incidence rate. The entire well-posedness properties of the model were studied in detail. The awareness contribution in controlling cholera was studied numerically. From the findings of this research, one can see there is a need for the consideration of a saturated incidence rate in studying epidemic diseases such as cholera. Additionally, there is a need for studying well-posedness properties before making deductions on any In conclusion, this paper studied a fractional order model in the Caputo sense with a saturated incidence rate. The entire well-posedness properties of the model were studied in detail. The awareness contribution in controlling cholera was studied numerically. From the findings of this research, one can see there is a need for the consideration of a saturated incidence rate in studying epidemic diseases such as cholera. Additionally, there is a need for studying well-posedness properties before making deductions on any mathematical

ity. The limitation of this work is that there is a need to consider the agent-based approach of the model proposed in this paper which will help in providing more decision-making tools for the effective prevention and control of cholera and public health

**Author Contributions:** Conceptualization, I.A.B. and U.W.H.; methodology, I.A.B.; software, I.A.B. and F.A.R.; validation, I.A.B., U.W.H. and F.A.R.; formal analysis, I.A.B. and U.W.H.; investigation, F.A.R.; resources, I.A.B.; data curation, F.A.R.; writing—original draft preparation, I.A.B., U.W.H. and F.A.R.; writing—review and editing, I.A.B. and F.A.R.; visualization, F.A.R.; supervision,

interventions.

mathematical model. From the findings of this model in particular, there is a need for relevant agencies to mount awareness programs in order to curtail the spread of cholera model. From the findings of this model in particular, there is a need for relevant agencies to mount awareness programs in order to curtail the spread of cholera in a given population. Using a fractional-order to construct a mathematical model is also of paramount importance due to its hereditary properties and memory description ability. The limitation of this work is that there is a need to consider the agent-based approach of the model proposed in this paper which will help in providing more decision-making tools for the effective prevention and control of cholera and public health interventions.

**Author Contributions:** Conceptualization, I.A.B. and U.W.H.; methodology, I.A.B.; software, I.A.B. and F.A.R.; validation, I.A.B., U.W.H. and F.A.R.; formal analysis, I.A.B. and U.W.H.; investigation, F.A.R.; resources, I.A.B.; data curation, F.A.R.; writing—original draft preparation, I.A.B., U.W.H. and F.A.R.; writing—review and editing, I.A.B. and F.A.R.; visualization, F.A.R.; supervision, U.W.H.; project administration, U.W.H.; funding acquisition, U.W.H. and F.A.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This research was supported by King Mongkut's University of Science and Technology Thonburi's Postdoctoral Fellowship.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **A Numerical Study of the Dynamics of Vector-Born Viral Plant Disorders Using a Hybrid Artificial Neural Network Approach**

**Hosam Alhakami 1,\* , Muhammad Umar <sup>2</sup> , Muhammad Sulaiman 2,\* , Wajdi Alhakami <sup>3</sup> and Abdullah Baz <sup>4</sup>**


**Abstract:** Most plant viral infections are vector-borne. There is a latent period of disease inside the vector after obtaining the virus from the infected plant. Thus, after interacting with an infected vector, the plant demonstrates an incubation time before becoming diseased. This paper analyzes a mathematical model for persistent vector-borne viral plant disease dynamics. The backpropagated neural network based on the Levenberg—Marquardt algorithm (NN-BLMA) is used to study approximate solutions for fluctuations in natural plant mortality and vector mortality rates. A state-of-the-art numerical technique is utilized to generate reference data for obtaining surrogate solutions for multiple cases through NN-BLMA. Curve fitting, regression analysis, error histograms, and convergence analysis are used to assess accuracy of the calculated solutions. It is evident from our simulations that NN-BLMA is accurate and reliable.

**Keywords:** mathematical modeling; artificial neural networks; numerical solutions; delay differential equations; optimization techniques; machine learning; Levenberg—Marquardt algorithm

### **1. Introduction**

Plant disease epidemiology studies how diseases affect plant populations and how to combat plant diseases. Using spatial and temporal plant epidemiology models can provide useful statistical and mathematical data about disease transmission. In the mid-20th century, plant epidemiological models became prominent [1]. Examples of actual uses of this type of model include cassava mosaic disease [2], pine wilt disease [3], and potato late blight [4]. Later, new methods for studying nonlinear dynamics and numerical simulations helped solve complex ecological problems [5,6]. This accelerated the creation of more realistic and complicated plant disease models.

An essential part of the plant epidemiological system is modeling the interactions between infected and healthy plant populations, either directly or via a vector. Infected vectors feed on healthy plants, infecting them. Similarly, non-infected vectors become infected by diseased plants. The vector-borne plant disease is classified as persistent, semipersistent, or non-persistent based on the infectious agent's residence period in the vector [7,8]. The vector ingests viruses while feeding on infected plant sap in persistent transmission. The salivary glands then release the viruses into the plant tissue as they penetrate the digestive system. The persistent mode of transmission differs from the other two because it takes a long time for a vector to become infected with the virus and become infectious [7,9]. In the case of vectors, this time lag is referred to as the latent phase of infection.

**Citation:** Alhakami, H.; Umar, M.; Sulaiman, M.; Alhakami, W.; Baz, A. A Numerical Study of the Dynamics of Vector-Born Viral Plant Disorders Using a Hybrid Artificial Neural Network Approach. *Entropy* **2021**, *24*, 1511. https://doi.org/10.3390/ e24111511

Academic Editors: Pavel Kraikivski and Cristóbal López

Received: 14 August 2022 Accepted: 18 October 2022 Published: 22 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The latent period in plants is similar to the time it takes for a healthy plant to become infected following infection [10]. The incubation period (or incubation time) is the time it takes for symptoms to manifest following infection [1]. Depending on the plant species, the incubation period varies [11]. Incubation durations for beet mosaic virus (BMV), African cassava mosaic virus (ACMV), tobacco mosaic virus (TMV) and bean golden mosaic virus (BGMV) are 7–15 days [12], 3–5 weeks [13], 5 h [14], and 5–6 days [15], respectively. The incubation and latent periods in plants are distinct. However, the expression of disease symptoms correlates with disease transmission [16]. Furthermore, determining the latent period is challenging, whereas observing disease signs is straightforward. So our model development analysis considers the incubation period.

Among the most frequent vector-borne viral diseases affecting crops, leaf curl disease and mosaic disease are two of the most common. The whitefly (*Bemisia* sp.), which transmits several viral infections to Jatropha, cassava, tomato, tobacco, cotton, and other plants, is a hemipteran vector. Most of the disease is systematically spread by whiteflies, meaning that a latent period is frequently observed [17]. Unfortunately, information on the latent and incubation time of infection for various persistently transmitted diseases is lacking in the literature. Due to the variety of viral agents and host plant species, both delay methods have varying effects on disease severity. It also differs between whitefly species and host plants. These delays may vary due to genetic complexity, climate fluctuation, phenotypic heterogeneity, and plasticity [18]. The plant incubation period is usually longer than the latent period in vectors. For example, ACMV has a 6-hour latent period and a 3–5 week incubation period [13].

Ordinary differential equations (ODEs) models cannot account for the incubation or latent period. However, models based on delay differential equations (DDEs) allow system integration. It can represent a system's dynamics when its evolution depends on prior events. When time lag responses exist, delays are one of the most powerful mathematical modeling tools [19]. DDE models are more sophisticated than ODE models but more realistic. Prey–predator mathematical models with delay differential equations are commonly employed [20,21]. Delay can teach us dynamic phenomena, such as instability, oscillations, and bifurcation.

Van der Plank [1] used DDE to delay plant epidemics. Cooke [22] proposed a model with an incubation time state variable for vector-borne diseases. Wang et al. [23] discussed wheat starch and gluten's thermal characteristics and interactions. Zhang [24] added the plant incubation period to a Meng and Li [25] plant disease model, causing modifications in the model's dynamics. Munyasya et al. [26] proposed an integrated on-site and offsite rainwater-harvesting system that enhances rainfed maize output for better climate change adaption. Buonomo and Cerasuolo [27] presented and analyzed a soil-borne plant disease dynamics model. Miao [28] suggested an accuracy of space-for-time substitution for predicting vegetation status after shrub restoration.

An ODE model of the impact of replanting and roguing on eliminating plant disease latency comprises a compartment for latently diseased plant populations [29]. The model does not consider any vector compartment, but it includes classes of latently infected, healthy, post-infection, and infectious plants. Holt et al. [2] proposed a model with infected plants, healthy vectors, and susceptible vectors but no delays. The vector-borne plant disease model [30] was modified by Jackson and Chen [31] by delaying plant incubation and vector latent periods. The threshold value for delay-induced destabilization was determined by observing changes in system solution dynamics. Li et al. used an updated model [31] to analyze Hopf bifurcation, which included incubation and latent period characteristics [32].

Banerjee and Takeuchi [33] identified several critical elements of the dynamics that could lead to false findings. A long wait can stabilize or cure a system Buonomo, and Cerasuolo [27]. Transcritical bifurcations, periodic oscillations, and stability switches can be revealed if the vector-borne plant disease models' parameters change [2,27,34]. The undelayed model analysis cannot be ignored [31,32]. A mathematical model (1) with parameters

given in Table 1 [2,35], which was previously analyzed by Basir et al. [35] for persistent vector-borne viral plant disease dynamics for the effect of both latent period and incubation delay of the dynamics of the deceased. This model is numerically analyzed using a gradient-based numerical technique. Numerous studies claimed that gradient-based techniques, such as RK-4, take up much more computer time than soft computing methods with comparable accuracy and that it is difficult to produce accurate global estimates of the truncation error [36,37]. For instance, at each step of the RK-4 method, the derivative must be evaluated *n* times. Here, '*n*' is the order of accuracy of the RK-4 method, which is a significant drawback of gradient-based algorithms [38]. Moreover, RK-4 suffers from divergence for complex systems [39]. Failure in the case of singularity is another hurdle in using these gradient-based numerical techniques. Keeping these disadvantages in mind, the authors of this paper aimed to suggest an alternative gradient-free approach that can handle problems, such as model (1), with accuracy and reliability. The key features of this study are outlined as follows:



**Table 1.** Parameters' description and their numerical values.

### **2. Problem Formulation**

This section develops a mathematical model for persistent vector-borne viral plant disease dynamics. The model considers plant and vector populations without explicitly including the mosaic virus. *H*(*t*) signifies healthy plants, while the infected plants are represented by *I*(*t*), *Q*(*t*) represents uninfected, and W(t) represents the infected whiteflies population.

Due to restricted plantation space and natural resources, logistic growth *r* and carrying capacity *K* are considered for healthy plants [2]. A healthy plant becomes infected when it comes into contact with an infected vector. When an infected vector and a susceptible plant are present, *λ* is the transmission rate, and *λHW* is the number of sensitive individuals moving from the susceptible compartment to the infected compartment.

An insect pest, such as a whitefly, shifts its host in response to changing biological and environmental conditions. They generally move between fields of crops [40,41]. They breed in the fields. The Holling type III survival curve describes their life course because of the high death rate they experience early on [41]. Whiteflies (adults and nymphs) can transmit illness.

Crops are typically planted and reaped at specific times of the year. Most crops are reaped a few months after they are produced. A few vectors travel from close or distant patches and reproduce in the vegetation. Vectors grow by migrating from another patch because of reproducing in the same patch or vegetative area. For the same reason, seasonal fluctuations in vector populations are ignored [35].

An open system is considered in this model. Assume Π is the rate of vector birth and migration into the system. No vertical virus transmission is allowed, and a vector cannot infect another vector. Viruses do not destroy or defend vectors. The vector retains the virus and does not recover. However, the infective insects do not get sick from the virus [31]. Let the mortality rate of plants and vectors be represented by *µ* and *d*, respectively. Infection-related plant death is expected to be higher than average plant mortality. *m*<sup>1</sup> is the infection-related mortality of infected plants. Thus, the overall plant mortality rate is *m* = *µ* + *m*1. Consider *β* to be the conversion between uninfected vectors (i.e., *Q*) and the infected plant (i.e., *I*). So, *βQI* signifies entering the number of uninfected vectors *Q* into the infected vectors *W* compartment.

In truth, both plant and vector infection takes time. Let *τ*<sup>1</sup> ∈ *R* <sup>+</sup> be the healthy plant's incubation time following successful infection. At time *t*, the disease transmission is given by the expression *λ*e <sup>−</sup>*mτ*1*H*(*<sup>t</sup>* <sup>−</sup> *<sup>τ</sup>*1)*W*(*<sup>t</sup>* <sup>−</sup> *<sup>τ</sup>*1), where the positive constants described previously are *λ* and *µ*. The term *e* <sup>−</sup>*mτ*<sup>1</sup> denotes the chance of a healthy plant surviving through the incubation time [*t* − *τ*1, *t*], i.e., the number of susceptible plants that came into touch with an infected vector at time *t* − *τ*<sup>1</sup> and lived up to time *t* to become infected plants.

Again the latent period in a vector is *τ*<sup>2</sup> ∈ *R* <sup>+</sup>. At time *t*, the expression *βe* <sup>−</sup>*dτ*2*Q*(*<sup>t</sup>* <sup>−</sup> *<sup>τ</sup>*2) *I*(*t* − *τ*2) describes the transmission of infection, where *e* <sup>−</sup>*dτ*<sup>2</sup> reflects the vector's survival probability across the latent time [*t* − *τ*2, *t*]. The number of uninfected vectors met an infected vector at time *t* − *τ*<sup>2</sup> and survived until time *t* to become infected [35]. Based on the given assumptions, the mathematical model is

$$\begin{aligned} \frac{\mathbf{d}H}{\mathbf{d}t} &= rH\left[1 - \frac{H+I}{K}\right] - \lambda H W\_{\prime} \\ \frac{\mathbf{d}I}{\mathbf{d}t} &= \lambda \mathbf{e}^{-m\tau\_1} H(t - \tau\_1) W(t - \tau\_1) - mI\_{\prime} \\ \frac{\mathbf{d}Q}{\mathbf{d}t} &= \Pi - \beta Q I - dQ\_{\prime} \\ \frac{\mathbf{d}W}{\mathbf{d}t} &= \beta \mathbf{e}^{-d\tau\_2} Q(t - \tau\_2) I(t - \tau\_2) - dW\_{\prime} \end{aligned} \tag{1}$$

The initial biological conditions are

*H*(*t*) > 0, *I*(*t*) > 0, *Q*(*t*) > 0, *W*(*t*) > 0; *t* ∈ [−*τ*, 0], *τ* = *max*[*τ*1, *τ*2], The parameters used in the model (1) assigned some numerical values for solving the model numerically, and Table 1 shows its description and numerical values.

### **3. Design Methodology**

This section examines artificial neural networks (ANN) using a novel approach to machine learning by focusing on the supervised neuronal learning mechanisms of these networks to utilize the study of the model for persistent vector-borne viral plant disease dynamics.

### *3.1. Artificial Neural Network (ANN)*

An artificial neural network is a network of interconnected core components known as neurons that receives various inputs and generates only one output; each neuron represents a mapping. A neuron's output is a function of the total of its inputs produced by the activation function.

### *3.2. Activation Function*

To introduce nonlinear properties, an activation function is used in an ANN. In a neural network, (*Xi*, *Wi*) stands for inputs, weights, and *f*(*Xi*), which is the input function that is sent to the network's output. This output function can then be used as an input for any additional layers or the final output [42–44].

The number of hidden units can be optimized using a multilayer perceptron (MLP). Both the weights and biases of the connections were enhanced as well. The construction of a standard MLP with one hidden layer is as follows:

$$H\_j = \sum\_{i=1}^{n} \mathcal{W}\_{ij} X\_i + b\_{j\prime} \tag{2}$$

*X<sup>i</sup>* represents the inputs, where *Wij* and *b<sup>j</sup>* represent connection weights and biased vectors, respectively. Here, a log–sigmoid function is used as an activation function in the feed-forward neural network model, which is given below.

$$f\_{\hat{\mathbb{I}}}(\mathbf{x}) = \frac{1}{1 + e^{-H\_{\hat{\mathbb{I}}}}}.\tag{3}$$

The MLP, also known as the feed-forward neural network (FNN), is a type of neural network with a hidden layer between the input and output layers. This layer is called the "hidden layer." The number below the hidden layer represents the number of neurons used inside the network. Figure 1 shows an artificial neural network controller.

**Figure 1.** Architecture of an artificial neural network controller.

A backpropagated Levenberg—Marquardt method is used to train the feed-forward neural network. Local minima can be found using the LM algorithm, which is built-in in many applications.

Additionally, NN-BLMA is implemented in two phases. Figure 2 depicts the Algorithm's whole workflow, including all of its steps.


**Table 2.** The NN-BLMA parameters settings for implementation.


**Figure 2.** Working mechanism of the NN-BLMA for solving the nonlinear model of vector-borne viral plant disease dynamics.

The novel machine learning of NN-BLMA is easy to apply, handles nonlinear problems, and is also a gradient-free technique that converges faster than other machine learning technique [45–48].

### **Algorithm 1** Pseudocode of NN-BLMA:


**Ending of NN-BLMA**

### **4. Numerical Experimentation and Discussion**

To study the design algorithms' performance and efficiency, we discuss various cases of the nonlinear model of vector-borne viral plant disease dynamics. The cases are based on variation in two parameters (i.e., plants' natural mortality rate, *µ*, and vector mortality rate, *d*). We set the same numerical value for both parameters in the first case. In case two, there is a slight decrease in the *µ* parameter and a slight increase in the parameter *d*, while in the third case, there is an increase in the parameter *µ* and a decrease in the *d* parameter compared with the first case. Figure 3 illustrates the mathematical model and the cases detail for vector-borne viral plant disease dynamics.

**Figure 3.** Vector-borne viral plant disease dynamics' model with its different cases.

The design technique generates output data sets with probabilities of 60% of the sample data for testing, 20% for training, and 20% for validation. The performance graph of the design technique shows us its mean squared error (MSE). Figures 4–6 depict the best validation performance provided by the design technique because the error is minimized after some epochs of training but may increase on the validation data set as the network begins to overfit the training data. The training is halted after six consecutive rises in the validation error, and the best performance is picked from the epoch with the lowest validation error. The case <sup>1</sup> performance values are in the range of 2.9721 <sup>×</sup> <sup>10</sup>−<sup>9</sup> , 7.1129 <sup>×</sup> <sup>10</sup>−<sup>9</sup> , 3.0066 <sup>×</sup> <sup>10</sup>−<sup>8</sup> and 2.8222 <sup>×</sup> <sup>10</sup>−<sup>5</sup> . Similarly, the case 2 and case 3 performance values are in the range of 1.3057 <sup>×</sup> <sup>10</sup>−<sup>9</sup> , 3.6923 <sup>×</sup> <sup>10</sup>−<sup>11</sup> , 1.17878 <sup>×</sup> <sup>10</sup>−<sup>9</sup> , 1.9703 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , and 9.9788 <sup>×</sup> <sup>10</sup>−11, 3.1230 <sup>×</sup> <sup>10</sup>−<sup>9</sup> , 2.4709 <sup>×</sup> <sup>10</sup>−<sup>8</sup> , 2.7474 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , respectively.

**Figure 4.** *Cont*.

**Figure 4.** NN-BLMA MSE for healthy and infected plants, and infected and uninfected whitefly of case 1. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 5.** NN-BLMA MSE for healthy and infected plants, and infected and uninfected whitefly of case 2. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 6.** *Cont*.

**Figure 6.** NN-BLMA MSE for healthy and infected plants, and infected and uninfected whitefly of case 3. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

The statistical performance of all the cases in gradient, mu, and validation failures are illustrated in Figures 7–9. The gradient values for the case <sup>1</sup> lie in between 8.2149 <sup>×</sup> <sup>10</sup>−<sup>8</sup> , 2.4163 <sup>×</sup> <sup>10</sup>−<sup>6</sup> , 2.3721 <sup>×</sup> <sup>10</sup>−<sup>4</sup> and 0.2785, whereas the values for case <sup>2</sup> and case <sup>3</sup> are 9.2809 <sup>×</sup> <sup>10</sup>−<sup>8</sup> , 1.5761 <sup>×</sup> <sup>10</sup>−<sup>6</sup> , 1.4908 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , 27.6472, and 1.1132 <sup>×</sup> <sup>10</sup>−<sup>8</sup> , 4.0741 <sup>×</sup> <sup>10</sup>−<sup>6</sup> , 3.0463 <sup>×</sup> <sup>10</sup>−<sup>4</sup> , 0.49652, respectively. The mu values for all the cases lie in the range 10−<sup>4</sup> to 10−13. The network output concerning the target for the training, validation, and test sets is shown on the regression plot. The data must fall on a 45-degree line where the network outputs and targets are equal for a perfect match. When the data fall on a 45 degree, the regression plot gives us a value of *R* = 1. This article shows the regression analysis of all the cases in Figures 10–12. From the figures, regression values are 1 for all cases, which perfectly matches the network and the targets.

**Figure 7.** Value of gradient, mu and validation check of NN-BLMA for case 1. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 8.** Value of gradient, mu and validation check of NN-BLMA for case 2. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 9.** Value of gradient, mu and validation check of NN-BLMA for case 3. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 10.** Analysis of regression of the design NN-BLM algorithm for case 1. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 11.** Analysis of regression of the design NN-BLM algorithm for case 2. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 12.** Analysis of regression of the design NN-BLM algorithm for case 3. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

The tables below provide the data information provided by the computing system. The tables show the best performance values in training, testing, validation, etc. Table 3 displays the best performance data for case 1, while the best performance data for case 2 and case 3 are displayed in Tables 4 and 5, respectively. These tables also show the hidden neuron count, iterations, and time spent.

**Table 3.** Performance values of the design NN-BLMA, and time spent by the computing system to obtain solutions for case 1.



**Table 4.** Performance values of the design NN-BLMA, and time spent by the computing system to obtain solutions for case 2.

**Table 5.** Performance values of the design NN-BLMA, and time spent by the computing system to obtain solutions for case 3.


The histogram of errors between targets and outputs after training a neural network is shown in Figures 13–15. Different color bars show the errors in the training, validation, and testing data. The error bars in which most of the points lie are very close to the zero error line, which means targets and the outputs are well matched and have the fewest errors, which shows the accuracy of our design technique. The error values for case 1 lie in the range 10−<sup>3</sup> to 10−<sup>4</sup> , 10−<sup>4</sup> to 10−<sup>6</sup> , 10−<sup>4</sup> to 10−<sup>6</sup> , and 10−<sup>2</sup> to 10−<sup>3</sup> . For case 2 and case 3 the error values lie in the range 10−<sup>4</sup> to 10−<sup>5</sup> , 10−<sup>5</sup> to 10−<sup>7</sup> , 10−<sup>4</sup> to 10−<sup>5</sup> , 10−<sup>3</sup> to 10−<sup>4</sup> , 10−<sup>2</sup> to 10−<sup>3</sup> , 10−<sup>4</sup> to 10−<sup>6</sup> , 10−<sup>4</sup> to 10−<sup>6</sup> , 10−<sup>3</sup> to 10−<sup>5</sup> , and 10−<sup>2</sup> to 10−<sup>3</sup> , respectively.

**Figure 13.** *Cont*.

**Figure 13.** Analysis of the error histogram in terms of the target data and the approximate solutions for case 1. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 14.** Analysis of the error histogram in terms of the target data and the approximate solutions for case 2. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

**Figure 15.** *Cont*.

**Figure 15.** Analysis of the error histogram in terms of the target data and the approximate solutions for case 3. (**a**) *H*(*t*). (**b**) *I*(*t*). (**c**) *Q*(*t*). (**d**) *W*(*t*).

Further, Figure 16 compares the numerical solution of the model obtained by the "ND-Solve" package in Mathematica (targets) to the solution obtained by executing NN-BLMA (outputs). The solid lines show the solution obtained by solving the model numerically by the "NDSolve" package in Mathematica, while the circles show the solution by NN-BLMA. In the figure, we see that the solutions obtained from NN-BLMA come exactly on the targets' solutions lines, which shows how accurate our design technique is. These figures also indicate the model's variation due to some parameters in the model. It is obvious from the figures that healthy plants and uninfected whiteflies rise when there is an increase in plant mortality rate and a drop in vector mortality rate. In contrast, a drop in plant mortality rate and an increase in vector mortality rate leads to a rise in infected plants and whiteflies. The comparison of statistical data given by the 'NDSlove' package in Mathematica with the outputs of NN-BLMA is illustrated in the tables below. Table 6 illustrates the comparative analysis of both the solutions for case 1, while the comparison for case 2 and case 3 are illustrated in Tables 7 and 8, respectively.

**Figure 16.** Numerical solutions' comparison of NN-BLMA with the solution obtained with other numerical methods. (**a**) Healthy plants *H*(*t*). (**b**) Infected plants *I*(*t*). (**c**) Uninfected whiteflies *Q*(*t*). (**d**) Infected whiteflies *W*(*t*).


**Table 6.** Comparative analysis of numerical solution with the solutions obtained from NN-BLMA for case 1.

**Table 7.** Comparative analysis of numerical solution with the solutions obtained from NN-BLMA for case 2.


**Table 8.** Comparative analysis of numerical solution with the solutions obtained from NN-BLMA for case 3.


### **5. Conclusions**

In this paper, we analyzed a mathematical model for persistent vector-borne viral plant disease dynamics. The model includes equations for healthy and infected plants and uninfected and infected whiteflies. The selected set of parameters for numerical simulation is for the cause of the mosaic disease in cassava. To see the impact of variation in the mortality parameters on the model, we made different cases in which we vary both plant and vector mortality parameters. The reference data (targets) for NN-BLMA were generated by solving the model numerically for all the cases in Mathematica. The designed technique uses the targets to train, test, and validate the ANN and to see the impact of variation in plants' natural and vectors' mortality rates. The key points concluded from the study are given below.


**Author Contributions:** All authors contributed equally to this paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** The study was funded by the Deanship of Scientific Research at Umm Al-Qura University, Makkah, Saudi Arabia (Grant Code: 20UQU0067DSR); and Taif University Researchers Supporting Project (TURSP- 2020/107), Taif University, Taif, Saudi Arabia.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data supporting this study's findings are available upon reasonable request from the corresponding author. The MATLAB code and a data file generated by the NDSolve Mathematica packager are available at author's GitHub account: https://github.com/sulaiman513 /AWKUM-Optimization-Lab, accessed on 1 September 2022.

**Acknowledgments:** The authors would like to thank the Deanship of Scientific Research at Umm Al-Qura University for supporting this work by Grant Code: (20UQU0067DSR); and Taif University Researchers Supporting Project (TURSP- 2020/107), Taif University, Taif, Saudi Arabia.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Nomenclature**


### **References**


**Samaneh Gholami \* and Silvana Ilie**

Department of Mathematics, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada; silvana@torontomu.ca

**\*** Correspondence: samaneh.gholami@torontomu.ca

**Abstract:** Stochastic modeling of biochemical processes at the cellular level has been the subject of intense research in recent years. The Chemical Master Equation is a broadly utilized stochastic discrete model of such processes. Numerous important biochemical systems consist of many species subject to many reactions. As a result, their mathematical models depend on many parameters. In applications, some of the model parameters may be unknown, so their values need to be estimated from the experimental data. However, the problem of parameter value inference can be quite challenging, especially in the stochastic setting. To estimate accurately the values of a subset of parameters, the system should be sensitive with respect to variations in each of these parameters and they should not be correlated. In this paper, we propose a technique for detecting collinearity among models' parameters and we apply this method for selecting subsets of parameters that can be estimated from the available data. The analysis relies on finite-difference sensitivity estimations and the singular value decomposition of the sensitivity matrix. We illustrated the advantages of the proposed method by successfully testing it on several models of biochemical systems of practical interest.

**Keywords:** stochastic simulation algorithm; stochastic biochemical systems; sensitivity analysis; finite-difference methods; parameter subset selection; estimability analysis

**Citation:** Gholami, S.; Ilie, S. Quantifying Parameter Interdependence in Stochastic Discrete Models of Biochemical Systems. *Entropy* **2023**, *25*, 1168. https://doi.org/10.3390/e25081168

Academic Editor: Pavel Kraikivski

Received: 3 July 2023 Revised: 31 July 2023 Accepted: 3 August 2023 Published: 5 August 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### **1. Introduction**

Mathematical and computational modeling have become widespread in the study of complex dynamical systems, particularly in investigating cellular processes and biochemical networks [1]. Frequently, mathematical modeling of chemical reaction systems relies on deterministic differential equations and mass action kinetics. However, biochemical systems in the cell are intrinsically noisy [2,3], and thus stochastic models must be employed to account for the random fluctuations observed experimentally, especially when some species have low molecular counts [4,5]. One of the most popular stochastic discrete models of biochemically reacting systems is the Chemical Master Equation [6,7]. This model is utilized to describe the dynamics of systems for which molecular populations of some species are low or noise is significant. It assumes that the system state is a Markov process [6]. It is generally impracticable to solve this model analytically, except for very simple systems.

Gillespie developed the Stochastic Simulation Algorithm (SSA) [8,9], a Monte Carlo technique for simulating statistically exact realizations of the stochastic process whose distribution is governed by the Chemical Master Equation. The random time change representation of the stochastic process depicting the system state was introduced in [10]. Based on this representation, Rathinam et al. [11] designed an exact Monte Carlo method for the Chemical Master Equation, the Random Time Change algorithm. Other simulation strategies for stochastic models of biochemically reacting systems were presented in the literature (for references see, e.g., [12–15]).

The biochemical networks arising in applications may be quite complex, involving many reactions and/or species, which means that their mathematical models have many

parameters. Some of the values of a model's kinetic parameters may not be known [16,17] and they may need to be estimated from the available data. Also, certain parameters have a substantial influence on the system's output. Thus, it is essential to study the system's behavior when these parameters are perturbed. While stochastic discrete models of biochemical systems capture the inherent randomness observed in cellular processes, they pose challenges with regard to their parameter estimation and identification. Hence, developing efficient and accurate methods for identifying and estimating their parameters would be a key advance in studying these models.

Practical identifiability (or estimability) analysis aims to establish if the parameters can be accurately and reliably estimated from the available data [18]. In this context, identifiable parameters are those which can be determined with high confidence from the observed behavior of the system; otherwise, the parameters are unidentifiable. Using practical identifiability, one can select subsets of parameters that significantly impact the behavior of the system. If the parameters in such a subset are not interdependent, then they are identifiable. These parameters can be accurately estimated when sufficient and quality data is available, and their accurate estimation is crucial for building the model. Also, these parameters may provide insight into the key underlying mechanisms of the biochemical system. Furthermore, the identifiability analysis helps select the unidentifiable parameters, which have a negligible impact on the model behavior and can be eliminated, thus guiding model reduction. There exist numerous studies of identifiability analysis for deterministic models, such as the reaction rate equations [19–26]. Nonetheless, much less work has been dedicated to parameter estimability of stochastic models of biological processes.

One important method for practical identifiability is to utilize sensitivity analysis. Local sensitivity analysis assesses the change in the system's behavior caused by a small variation in the value of a certain parameter. Insignificant changes in the system dynamics indicate that the specific parameter is not important, and thus it is not identifiable. Also, a parameter is not identifiable if it is correlated with other parameters, such that a variation in its value can be compensated by suitable adjustments in other parameters. For stochastic models, finite-difference methods can be used to estimate the sensitivity of the expected value of the given function of the system state. In the class of finite-difference sensitivity estimators for the Chemical Master Equation, those employing exact Monte Carlo simulation methods are the Coupled Finite Difference method of Anderson [27], the Common Reaction Path scheme (based on the Random Time Change algorithm) and the Common Random Number strategy (utilizing the SSA) of Rathinam et al. [11]. These estimators utilize coupled perturbed and unperturbed trajectories to approximate sensitivities. The coupling lowers the variance of the estimator so that the method requires fewer realizations to achieve the same accuracy of the estimation. Due to this, the computational time of the algorithm is reduced, for a prescribed accuracy. Of the three strategies, the Coupled Finite Difference algorithm has the lowest variance of the estimator [28]. These schemes perform best for non-stiff models. For stiff problems, finite-difference techniques can be applied with various coupled tau-leaping strategies to increase the efficiency of the simulation [29].

In this work, we consider the problem of practical parameter identifiability for stochastic discrete biochemical networks modeled with the Chemical Master Equation. This is a critical problem, and a direct extension of the techniques developed for ordinary differential equations to stochastic discrete models is not possible. Our contribution is generalizing a method by Gábor et al. [30] to find the highest parameter identifiable sets for models of biochemical systems, from the continuous deterministic to the stochastic discrete models of well-stirred biochemical systems, which is a difficult task. The proposed method identifies the subsets of parameters that are independent and significant for the model's behavior, based on the existing data, and thus are identifiable. We utilize local sensitivity estimations to study parameter estimability. For approximating sensitivities, we apply finite-difference techniques, namely the Coupled Finite Difference [27], the Common Reaction Path, and the Common Random Number methods [11]. We make use of the normalized sensitivity matrix to develop several identifiability metrics, which adapt existing techniques for the

reaction rate equations [19,20] to the more challenging Chemical Master Equation model. In addition, we apply the singular value decomposition of the non-dimensional sensitivity matrix, to determine its rank. This analysis helps gain insight into the interrelations between parameters. Furthermore, the proposed methodology can be employed to decide which parameters can be reliably estimated from the available data, for the Chemical Master Equation, and may assist experimental design for more accurate parameter approximations. It is worth noting that, in general, the expected value of the system state governed by the Chemical Master Equation may not satisfy the deterministic reaction rate equations, when some reactions are of second or higher order [14].

This paper is structured as follows. Section 2 is dedicated to the background on stochastic discrete models for well-stirred biochemical networks and their simulation methods, parametric sensitivity schemes for stochastic and deterministic models, and practical identifiability techniques, including the new algorithm for selecting subsets of identifiable parameters. The proposed algorithm is tested on various stochastic models arising in applications in Section 3. Section 4 presents a summary of our results.

### **2. Materials and Methods**

### *2.1. Background*

Suppose a system has *N* biochemical species, denoted by *S*1, *S*2, . . . , *SN*, that undergo *M* distinct chemical reactions. It is maintained at a constant temperature, in a constant volume. Provided that the biochemical network is well-stirred, it may be represented by a state vector,

$$X(t) = [X\_1(t), X\_2(t), \dots, X\_N(t)]^T$$

where *X*(*t*) has entries *Xi*(*t*), the amount of *S<sup>i</sup>* molecules in the system at time *t*. A reaction *R<sup>j</sup>* produces a variation in the system state, which is given by the state change vector *<sup>ν</sup><sup>j</sup>* <sup>∈</sup> <sup>R</sup>*N*,

$$\boldsymbol{\nu}\_{\boldsymbol{j}} = [\boldsymbol{\nu}\_{1\boldsymbol{j}\prime} \boldsymbol{\nu}\_{2\boldsymbol{j}\prime} \dots \boldsymbol{\nu}\_{N\boldsymbol{j}}]^T \boldsymbol{\nu}\_{\boldsymbol{j}}$$

where *νij* is the perturbation in the molecular amount of *S<sup>i</sup>* after the reaction fires. If one reaction *R<sup>j</sup>* happens during the time interval [*t*, *t* + ∆*t*], then the resulting state is *X*(*t* + ∆*t*) = *X*(*t*) + *ν<sup>j</sup>* . The array having *ν<sup>j</sup>* as the *j*-th column is called the stoichiometric matrix. Also associated with the reaction *R<sup>j</sup>* , we can define the propensity function *aj* , by *aj*(*x*)*dt* = the probability that a single reaction *R<sup>j</sup>* occurs between [*t*, *t* + *dt*), assuming that the system state at time *t* is *x*. The form of the propensity function *a<sup>j</sup>* is determined by the type of reaction. For a first-order reaction, *S<sup>m</sup> cj* −→ *products*, the propensity is expressed as *aj*(*X*(*t*)) = *cjXm*(*t*). For a second-order reaction, *S<sup>m</sup>* + *S<sup>n</sup> cj* −→ *products*, the propensity is *<sup>a</sup>j*(*X*(*t*)) = *<sup>c</sup>jXm*(*t*)*Xn*(*t*), if *<sup>m</sup>* <sup>6</sup><sup>=</sup> *<sup>n</sup>* and *<sup>a</sup>j*(*X*(*t*)) = <sup>1</sup> 2 *cjXm*(*t*)(*Xm*(*t*) − 1), if *m* = *n*.

### 2.1.1. Chemical Master Equation

To study the behavior of the well-stirred biochemical system, we need to determine *P*(*x*, *t*|*x*0, *t*0), the probability of the system state being *X*(*t*) = *x* at time *t*, if at *t*<sup>0</sup> it was *X*(*t*0) = *x*0. This probability satisfies the Chemical Master Equation [6,7]

$$\frac{d}{dt}P(\mathbf{x},t|\mathbf{x}\_0,t\_0) = \sum\_{j=1}^{M} \left[ a\_j(\mathbf{x}-\boldsymbol{\nu}\_j)P(\mathbf{x}-\boldsymbol{\nu}\_j,t|\mathbf{x}\_0,t\_0) - a\_j(\mathbf{x})P(\mathbf{x},t|\mathbf{x}\_0,t\_0) \right].\tag{1}$$

This is a stochastic discrete model. It is a linear system of ordinary differential equations, each equation describing the probability of the system being in a particular state *x*. The biochemical system state *X*(*t*) is a discrete in space and continuous in time Markov process. The space of all possible states is typically quite large, and in such cases the Chemical Master Equation is of very high dimension. Therefore, it is challenging to solve it directly, except for some simple systems.

As an alternative to solving the Chemical Master Equation directly, it is possible to generate correct trajectories one by one. Gillespie [8,9] proposed a Monte Carlo strategy to compute such trajectories, which are in exact agreement with the probability distribution associated with the discrete stochastic model (1). The strategy, also referred to as the Stochastic Simulation Algorithm (SSA), has been broadly employed for solving stochastic models in Systems Biology [3,14,31]. The SSA is described below.

Gillespie's Algorithm


*M*

*r*=1


$$\begin{array}{ccc} \text{(a)} & \tau \leftarrow - (\ln \eta\_1)/a\_0(\mathfrak{x}) \end{array}$$


The Random Time Change (RTC) algorithm [11], based on the Random Time Change representation [10], is another exact Monte Carlo simulation strategy for the Chemical Master Equation. We refer the reader to [11] for details on this algorithm.

### 2.1.2. Chemical Langevin Equation

An intermediate model between the Chemical Master Equation and the reaction rate equation is the Chemical Langevin Equation [32]. This is a system of stochastic differential equations of size equal to the number of reacting species. The Chemical Langevin Equation is a reduction in the Chemical Master Equation model assuming that the biochemical system has a macroscopically infinitesimal scale in time step such that, over *δt*, every reaction occurs multiple times and, at the same time, its propensity function does not vary significantly. Under these assumptions, the system state is governed by

$$dX(t,c) = \sum\_{j=1}^{M} \nu\_j a\_j(X(t,c),c)dt + \sum\_{j=1}^{M} \nu\_j \sqrt{a\_j(X(t,c),c)}dW\_j(t) \tag{2}$$

where *W<sup>j</sup>* are independent Wiener processes for *j* = 1, . . . , *M*. The state *X*(*t*) may be approximated by a Markov process continuous in space. Equation (2) represents the Chemical Langevin Equation.

### 2.1.3. Reaction Rate Equation

A coarser level of resolution in modeling biochemically reacting networks is provided by the continuous deterministic model of chemical kinetics. This model, known as the reaction rate equations, is valid under the assumption of the thermodynamic limit. In the thermodynamic limit, the molecular amounts for all species and the system volume tend towards infinity, as the concentrations of species within the system remain constant. Hence, the stochastic terms in the Chemical Langevin Equation are much smaller than the deterministic terms. As a result, the Chemical Langevin Equation model reduces to the reaction rate equations, in the thermodynamic limit. This condition is satisfied when all *S<sup>i</sup>* molecular counts are very large. The reaction rate equations (RRE) are of the form

$$\frac{dX(t,c)}{dt} = \sum\_{j=1}^{M} \nu\_j a\_j(X(t,c), c) \,. \tag{3}$$

Equation (3) is a set of ordinary differential equations, with one equation for each biochemical species. In the event that all reactions in the system are of order at most one, the reaction rate equation can be obtained from the Chemical Master Equation (1), by considering the expected value of the system state. However, in general, the evolution of the mean trajectory in the Chemical Master Equation model does not obey the continuous deterministic model. Then, the RRE does not properly depict the true behavior of the biochemical network. In fact, there are numerous cellular networks for which noise significantly influences the system dynamics [12,31,33].

### *2.2. Parametric Correlations*

Sensitivity analysis plays a central role in constructing models [24]. It assesses how changes in parameters cause variations in a model's output. If a negligible adjustment in a parameter leads to significant alterations in the outcomes, we consider the model to be sensitive to that specific parameter. Precise estimations are not necessary for parameters with low sensitivity. Conversely, parameters with high associated sensitivity become key control points for the behavior of the system. In what follows, we shall focus on the sensitivity analysis of system outputs with respect to rate parameters.

### 2.2.1. Parametric Sensitivity for the Chemical Master Equation

Let *f* be a function of interest of the system state and *c* a model parameter. In the stochastic setting, the local sensitivity with respect to a parameter *<sup>c</sup>* is defined as *<sup>∂</sup> ∂c* E[ *f*(*X*(*t*, *c*))] where E(·) is the expected value. Popular methods for estimating local sensitivities with respect to the model's parameters for the Chemical Master Equation often rely on finite-difference schemes and Monte Carlo methods for generating the perturbed and unperturbed trajectories. By forward finite-difference schemes, one can estimate *∂ ∂c* E[ *f*(*X*(*t*, *c*))] ≈ {E[ *f*(*X*(*t*, *c* + *θ*))] − E[ *f*(*X*(*t*, *c*))]}/*θ*, where *θ* is a small perturbation of the parameter of interest, *c*. To efficiently approximate the sensitivity by Monte Carlo methods, the trajectories for *X*(*t*, *c* + *θ*) and *X*(*t*, *c*) are generated using common random numbers. Among such methods are the Common Random Number (CRN), the Common Reaction Path (CRP) algorithms [11], and the Coupled Finite-Difference (CFD) algorithm [27].

### 2.2.2. Common Random Number

The Common Random Number presented in [11] is a finite-difference numerical method for estimating parametric sensitivities for the stochastic discrete model (1). It reuses random numbers to generate the perturbed and unperturbed paths. In doing so, it reduces the variance of the sensitivity estimator, and thus it has increased efficiency compared to a strategy based on independent random numbers. For the *r*-th iteration, it computes two SSA trajectories, *X*[*r*] (*t*, *c* + *θ*) -the perturbed and *X*[*r*] (*t*, *c*) -the unperturbed path, each employing the same stream of uniform (0, 1) random numbers. Usually, the coupling of the CRN technique is less efficient than that of the CRN and CFD schemes [27]. The sensitivity of the *r*-th path is approximated by

$$Z\_{[r]}(t,\mathcal{c}) = \frac{f(X\_{[r]}(t,\mathcal{c}+\theta)) - f(X\_{[r]}(t,\mathcal{c}))}{\theta} \,\,\,\,\tag{4}$$

while an estimate of the sensitivity is obtained from the sample mean ( *R* ∑ *i*=1 *Z*[*r*] (*t*, *c*))/*R*, *R* being the number of paired trajectories simulated.

### 2.2.3. Common Reaction Path

The Common Reaction Path technique is also a finite-difference sensitivity estimator for the Chemical Master Equation [11]. The CRP strategy applies the RTC algorithm to simulate sample paths. In this method, coupling of the processes involves some independent unit-rate Poisson processes, {*Yj*}1≤*j*≤*M*. The coupling of the perturbed—*X*(·, *c* + *θ*) and unperturbed—*X*(·, *c*) processes is achieved using the random time change representation

$$\begin{aligned} X(t,c) &= \mathbf{x}\_0 + \sum\_{j=1}^{M} \nu\_j Y\_j \left( \int\_0^t a\_j(\mathbf{X}(s,c), c) ds \right) \\ X(t,c+\theta) &= \mathbf{x}\_0 + \sum\_{j=1}^{M} \nu\_j Y\_j \left( \int\_0^t a\_j(\mathbf{X}(s,c+\theta), c+\theta) ds \right) \end{aligned} \tag{5}$$

The *r*-th iteration of the CRP algorithm generates the paired trajectories *X*[*r*] (*t*, *c* + *θ*) and *X*[*r*] (*t*, *c*) with the RTC algorithm, each using the same *M* independent streams of unit-rate exponential random numbers. As before, the sensitivity of the *r*-th trajectory is estimated by (4). This coupling has been shown to be typically stronger than that of the CRN method, leading to a lower variance of the estimation [11,27].

### 2.2.4. Coupled Finite-Difference

Another finite-difference sensitivity estimator for the stochastic discrete model is the Coupled Finite-Difference scheme [27]. The CFD method relies on the random time change representation of the unperturbed and perturbed processes

$$\begin{split} \mathbf{X}(t,c) &= \quad \mathbf{x}\_0 + \sum\_{j=1}^{M} \nu\_j Y\_j^{(1)} \left( \int\_0^t \min(a\_j(\mathbf{X}(s,c),c), a\_j(\mathbf{X}(s,c+\theta),c+\theta)) ds \right) \\ &+ \quad \sum\_{j=1}^{M} \nu\_j Y\_j^{(2)} \left( \int\_0^t [a\_j(\mathbf{X}(s,c),c) - \min(a\_j(\mathbf{X}(s,c),c), a\_j(\mathbf{X}(s,c+\theta),c+\theta)) ds \right) \\ \mathbf{X}(t,c+\theta) &= \quad \mathbf{x}\_0 + \sum\_{j=1}^{M} \nu\_j Y\_j^{(1)} \left( \int\_0^t [\min(a\_j(\mathbf{X}(s,c),c), a\_j(\mathbf{X}(s,c+\theta),c+\theta))] ds \right) \\ &+ \quad \sum\_{j=1}^{M} \nu\_j Y\_j^{(3)} \left( \int\_0^t [a\_j(\mathbf{X}(s,c+\theta),c+\theta) - \min(a\_j(\mathbf{X}(s,c),c), a\_j(\mathbf{X}(s,c+\theta),c+\theta))] ds \right) \end{split} \tag{6}$$

where {*Y* (1) *j* }1≤*j*≤*M*, {*Y* (2) *j* }1≤*j*≤*M*. and {*Y* (3) *j* }1≤*j*≤*<sup>M</sup>* are independent unit-rate Poisson processes. Furthermore, the CFD strategy uses a version of the Next Reaction Method to compute the coupled perturbed and unperturbed trajectories, *X*[*r*] (*t*, *c* + *θ*) and *X*[*r*] (*t*, *c*), and (4) to approximate the local sensitivity of the *r*-th path. Among the finite-difference sensitivity estimators with exact underlying simulation techniques for the CME, the CFD performs the best, followed by the CRP and the CRN [27,28]. Indeed, the CFD achieves the smallest variance of the sensitivity estimator of the three methods described above [28]. As a consequence, for the same number of trajectories simulated, we shall consider in our investigations the CFD sensitivity approximations to be the most accurate and reliable.

### 2.2.5. Parametric Sensitivity for the Chemical Langevin Equations

Glasserman [34] developed a technique for computing pathwise parametric sensitivities for certain problems modeled by stochastic differential equations. This method was applied to the Chemical Langevin Equation (CLE) model in [33]. For computing the sensitivity of each path, we differentiate Equation (2) with respect to parameter *c* and obtain

$$\begin{split} d(\frac{\partial X}{\partial c}) &= \sum\_{j=1}^{M} \nu\_{j} \left[ \frac{\partial a\_{j}(X)}{\partial X} \frac{\partial X}{\partial c} + \frac{\partial a\_{j}(X)}{\partial c} \right](t) dt \\ &+ \sum\_{j=1}^{M} \nu\_{j} \left[ \frac{1}{2\sqrt{a\_{j}(X)}} \left( \frac{\partial a\_{j}(X)}{\partial X} \frac{\partial X}{\partial c} + \frac{\partial a\_{j}(X)}{\partial c} \right) \right](t) dW\_{j} \,. \end{split} \tag{7}$$

Solving the coupled system of Equations (2) and (7) for (*X*, *∂X*/*∂c*) will determine the pathwise sensitivities. At time *t* = 0, the local sensitivities with respect to the rate parameters

are zero. The Chemical Langevin Equation is, in general, valid when all molecular amounts are sufficiently large. Effective simulation strategies for this model require adaptive timestepping methods [35,36].

### 2.2.6. Parametric Sensitivity for the Reaction Rate Equations

In the deterministic scenario, the behavior of the biochemical system is governed by the reaction rate Equation (3). To find the local sensitivity for this model, the derivative with respect to the desired kinetic parameter is applied to Equation (3), yielding

$$\frac{d}{dt}\mathcal{S} = \sum\_{j=1}^{M} \nu\_{j} \left( \frac{\partial a\_{j}(X(t,c),\mathbf{c})}{\partial c} + \sum\_{i=1}^{N} \frac{\partial a\_{j}(X(t,c),\mathbf{c})}{\partial X\_{i}} \mathcal{S}\_{i} \right) . \tag{8}$$

Here, S = *∂X*(*t*, *c*)/*∂c* is the sensitivity with respect to parameter *c*. The sensitivity is computed by solving for (*X*, S) the system of ordinary differential Equations (3) and (8), with the initial conditions *X*(0, *c*) = *x*<sup>0</sup> and S(0) = 0. The deterministic model is applicable when all reacting molecular populations are very large. Nonetheless, when low molecular counts of some species exist or noise plays a significant role, this approach may fail in accurately capturing the characteristics of the biochemical system. Then, deterministic techniques for sensitivity-based identifiability analysis are not valid.

### *2.3. Practical Identifiability Analysis*

When a model's performance is investigated, it is important to evaluate the accuracy of the parameter values. Still, poor or noisy data, interdependence of parameters, or weak dependence of the system dynamics on certain parameters may hinder the accurate estimation of parameter values. As a result, it is possible for these values to change significantly, without influencing the model's output. Consequently, the concept of identifiability is essential for the analysis of a mathematical model [19,24].

Identifiability can be classified into two main categories: structural identifiability and practical identifiability. For a structurally identifiable model, there exists a unique parameterization for any specified output of the model (see, e.g., [21,26]). On the other hand, practical identifiability involves detecting non-identifiable parameters by fitting the model to data that closely resemble the available observations (see, e.g., [18,19,22,25] for analyses of deterministic models). For this type of identifiability, it is helpful to study the parametric sensitivity of the model. In this work, we use sensitivity-based identifiability for the Chemical Master Equation. We determine identifiability and collinearity indexes by generalizing methods for deterministic models [19] to the more challenging case of stochastic discrete biochemical systems.

### 2.3.1. Sensitivity-Based Identifiability Analysis

Several identifiability strategies for deterministic models exist in the literature. One such approach by Brun et al. [19] is based on local sensitivity analysis of deterministic models. Sensitivity analysis quantifies the impact of parameter variations on the system's dynamics.

Below, we review some techniques for identifiability analysis of deterministic models relying on local parametric sensitivity. These techniques can be applied to the reaction rate Equation (3). Denote by

$$S\_{ik}(X, t, \mathcal{c}) = \frac{\partial X\_i(t, \mathcal{c})}{\partial \mathcal{c}\_k} \tag{9}$$

the local sensitivity of the molecular amount *Xi*(*t*, *c*) at time *t*, with respect to the kinetic parameter *c<sup>k</sup>* . For time *<sup>t</sup>*, the parametric sensitivity matrix is *<sup>S</sup>*(*X*, *<sup>t</sup>*, *<sup>c</sup>*) = *<sup>∂</sup> ∂c X*(*t*, *c*) = {*Sik*(*X*, *<sup>t</sup>*, *<sup>c</sup>*)}1≤*i*≤*N*,1≤*k*≤*M*. In addition, the non-dimensional sensitivity coefficient corresponding to the *i*-th species and the parameter *c<sup>k</sup>* at time *t* is

$$s\_{ik}(t) = \frac{c\_k}{X\_i(t, c)} \frac{\partial X\_i(t, c)}{\partial c\_k} \,. \tag{10}$$

Here, *c* = [*c*1, . . . , *cM*] is the vector of kinetic parameters associated to reactions {*Rj*}1≤*j*≤*M*. Furthermore, let *t*<sup>1</sup> < *t*<sup>2</sup> < · · · < *t<sup>L</sup>* be a sequence of time-points spanning the integration interval [0, *T*]. Ideally, some of these time-points should be inside the interval corresponding to the biochemical network's transient behavior, when applicable. Also, consider the concatenated non-dimensional sensitivity matrix, for all the time-points in the grid, and apply the normalization (10) for each entry,

$$s(X\iota\!z) = \begin{pmatrix} s\_{11}(t\_1) & \cdots & s\_{1M}(t\_1) \\ \vdots & \ddots & \vdots \\ s\_{N1}(t\_L) & \cdots & s\_{NM}(t\_L) \end{pmatrix} \tag{11}$$

To rank the parameters of the model, we utilize the non-dimensional sensitivity matrix of size (*NL*) × *M* from (11). The *k*-th column in this matrix measures the sensitivities with respect to *c<sup>k</sup>* , the rate parameter of reaction *R<sup>k</sup>* . Let us calculate the norm of each column in the sensitivity matrix (11) to obtain a parameter ranking. The norm of each column *s<sup>k</sup>* (*X*, *c*) = [*s*1*<sup>k</sup>* (*t*1), . . . ,*sNk*(*t*1), . . . ,*s*1*<sup>k</sup>* (*tL*), . . . ,*sNk*(*tL*)]*<sup>T</sup>* serves as a measure of the significance of parameter *c<sup>k</sup>* on the dynamics of the system. A higher norm indicates that altering that parameter value has a substantial impact on the system state. Parameters can be arranged in order of their significance. The following sensitivity measure is employed for evaluating the significance of the parameters, based on the sensitivity matrix (adapted after [19])

$$
\delta\_k^{msqr} = \sqrt{\frac{1}{n} \sum\_{i=1}^n s\_{ik}^2} \,\,\,\tag{12}
$$

The larger the measure *δ msqr k* , the more significant the parameter *c<sup>k</sup>* is (for 1 ≤ *k* ≤ *M*).

### 2.3.2. Parameter Collinearity

Extensive research has been conducted to examine the collinearity in various problems. Brun et al. [19] introduces a strategy for identifying parameter relationships based on collinearity analysis, in the deterministic framework, and presents a novel approach to explore the connections between parameters. Note that the columns of a matrix *B* are nearly linearly dependent (or near collinear) if a non-zero vector *z* = [*z*1, . . . , *zM*] *T* exists such that *Bz* ≈ 0, where *B* has *M* columns. If the *Bz* = 0 holds and *z* 6= 0, the columns of *B* are linearly dependent (or collinear).

Now, take the normalized sensitivity matrix *S*˜, having as the *m*-th column the vector

$$\mathfrak{s}\_m(X,\mathfrak{c}) = \frac{s\_m(X,\mathfrak{c})}{||s\_m(X,\mathfrak{c})||\_2} \,\,\,\,\_1$$

for 1 ≤ *m* ≤ *M*. It is useful to first normalize these vectors, to prevent biases due to differences in the absolute value of local sensitivities for various parameters. A large norm of k*sm*k<sup>2</sup> indicates that a small variation in parameter *c<sup>m</sup>* can significantly impact the system's behavior; thus, this parameter is important. For this parameter to be identifiable, it should not be correlated with other parameters.

Let us consider any subsets *K* of *k* parameters (*k* ≤ *M*) from the set of parameters {*c*1, *<sup>c</sup>*2, . . . , *<sup>c</sup>M*} and the corresponding sub-matrix *<sup>S</sup>*˜ *<sup>K</sup>*(*X*, *c*) of the normalized sensitivity matrix, with columns the *k* sensitivity vectors. A measure of collinearity of the subset *K* of parameters, with corresponding matrix *S*˜ *<sup>K</sup>*, is given by

$$CI\_K = \frac{1}{\min\_{||z||\_2=1} \|\tilde{S}\_K z\|\_2} = \frac{1}{\sqrt{\lambda\_k}} \tag{13}$$

where *λ<sup>k</sup>* is the minimum eigenvalue of the matrix *S*˜*<sup>T</sup> K S*˜ *<sup>K</sup>* and k · k<sup>2</sup> is the norm-2 of a vector. The measure (13) is known as the collinearity index of the subset *K* [19,30]. The closest the columns of the matrix *S*˜ *<sup>K</sup>* are to a linearly dependent set of vectors, the smallest min k*z*k2=1 k*S*˜ *<sup>K</sup>z*k<sup>2</sup> is. Thus, a large collinearity index *CI<sup>K</sup>* indicates a high level of collinearity of the parameters in the set. This implies that changes in the model dynamics due to small perturbations in one of the parameters of the almost collinear set may be prevented by suitable variations in the other parameters of the set. As a consequence, if a set of parameters is collinear, it is not identifiable. According to [19], a subset of parameters is considered identifiable if the associated collinearity index satisfies *CI<sup>K</sup>* < 20. With this observation, it is possible to uncover the subsets of model parameters that can be identified as well as those that cannot be identified. The collinearity index may be computed for all the subsets *K* of the parameter space, to determine the parameter subsets that are not collinear. When a group of parameters has a high collinearity index, any set containing it as a subset will also have a high collinearity index.

Another technique to assess the identifiability of the model parameters is to use the singular value decomposition (SVD) of a matrix. In general, the SVD [37,38] of an *n* × *M* matrix *s* is

$$\mathbf{s} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^T \tag{14}$$

where the **U** is an *n* × *n* unitary matrix, **V** is an *M* × *M* unitary matrix and **Σ** is an *n* × *M* non-negative diagonal matrix with the diagonal entries

$$
\sigma\_1 \ge \sigma\_2 \ge \dots \sigma\_r > \sigma\_{r+1} = \dots = \sigma\_M = 0 \dots
$$

The values {*σ* 2 *<sup>m</sup>*}1≤*m*≤*<sup>M</sup>* are the eigenvalues of the matrix *s T s*. The index *r* measures the rank of the matrix *s* and it is the largest number of linearly independent columns of this matrix. Numerically, the singular values *σr*+1, · · · , *σM*, which are below a specified small tolerance are considered practically zero. In this work, we use the singular value decomposition of the matrix *s* to determine its rank. This rank is a reliable measure of the number of rate parameters that are not collinear. Furthermore, zero or very close to zero singular values show that the group of all the reaction rate parameters of the model are collinear. Therefore, there are some model parameters that cannot be estimated from the available data.

Brun et al. [20] also introduced a determinant measure

$$\rho\_k = \det(S\_K^T S\_K)^{1/2k} \tag{15}$$

to find the appropriate number of parameters to estimate.

The metrics considered above can be utilized to determine the identifiability of parameter sets as follows. The sensitivity measure *δ msqr k* is used to evaluate the importance of each parameter *c<sup>k</sup>* . On the other hand, the collinearity index measures whether the set *K* of parameters are independent, whenever *CI<sup>K</sup>* < 20. In the case that both conditions are satisfied, (a) the parameters in the subset *K* are not collinear and (b) each parameter in the group is important, the parameters in *K* are identifiable. Finally, the determinant *ρ<sup>K</sup>* can be employed to compare the identifiability of various groups of parameters.

### 2.3.3. Method for Selecting Subsets of Identifiable Parameters

The practical identifiability methods presented above were developed for continuous deterministic models [19,20], and are thus applicable for the reaction rate equation model. However, this model may fail to faithfully represent the behavior of biochemical systems,

which involve low molecular counts of some species. Consequently, new methodologies are required for the parameter identifiability of stochastic discrete models of biochemical systems. In this work, we develop novel strategies for determining sets of identifiable parameters for the Chemical Master Equation. We generalize the work of Gábor et al. [30] on identifying subsets of identifiable parameters in deterministic models, to address the much more challenging case of stochastic discrete models of well-stirred biochemical systems. This generalization is essential as stochasticity plays a significant role in accurately modeling real-world biological systems, and our approach allows for an in-depth study of more complex biochemical networks encountered in applications.

The measures presented above were designed for deterministic models. We aim to adapt these measures to systems modeled by the Chemical Master Equation. For this model, the sensitivity coefficients are computed as

$$\mathcal{S}\_{ik}(\mathbb{E}[X],t) = \frac{\partial}{\partial c\_k} \mathbb{E}[X\_i(t,c)].$$

Then, we shall compute the sensitivity matrix for the CME according to

$$S(t) = \frac{\partial \mathbb{E}[X(t, c)]}{\partial c} = \begin{pmatrix} \frac{\partial}{\partial c\_1} \mathbb{E}(X\_1(t, c)) & \cdots & \frac{\partial}{\partial c\_M} \mathbb{E}(X\_1(t, c)) \\ \vdots & \ddots & \vdots \\ \frac{\partial}{\partial c\_1} \mathbb{E}(X\_N(t, c)) & \cdots & \frac{\partial}{\partial c\_M} \mathbb{E}(X\_N(t, c)) \end{pmatrix} . \tag{16}$$

Take a sequence of time-points 0 = *t*<sup>1</sup> < *t*<sup>2</sup> < . . . < *t<sup>L</sup>* = *T*, relevant to the biochemical system under consideration. The fully normalized (non-dimensional) sensitivity coefficient of the *i*-th species with respect to the *c<sup>k</sup>* parameter at time *t*` is

$$s\_{i\bar{k}}(t\_\ell) = \frac{c\_k}{\mathbb{E}[X\_i(t\_\ell, c)]} \frac{\partial}{\partial c\_k} \mathbb{E}[X\_i(t\_\ell, c)] \quad \text{for } 1 \le i \le N, \ 1 \le k \le M. \tag{17}$$

The concatenated non-dimensional sensitivity matrix over these discrete time-points with entries (17) is

$$s(\mathbb{E}[X], c) = \begin{pmatrix} s\_{11}(t\_1) & \cdots & s\_{1M}(t\_1) \\ \vdots & \ddots & \vdots \\ s\_{N1}(t\_L) & \cdots & s\_{NM}(t\_L) \end{pmatrix} . \tag{18}$$

Normalizing the `-th column of matrix (18), namely *s*` (E[*X*], *c*), gives

$$\tilde{s}\_{\ell}(\mathbb{E}[X], \mathcal{c}) = \frac{s\_{\ell}(\mathbb{E}[X], \mathcal{c})}{||s\_{\ell}(\mathbb{E}[X], \mathcal{c})||\_2} \,. \tag{19}$$

Finally, the normalized sensitivity matrix *S*˜ has *s*˜` (E[*X*], *c*) as it is `-th column. For the Chemical Master Equation, the sensitivity measure *δ msqr k* and the collinearity index *CI<sup>K</sup>* are computed using (12) and (13), respectively, for the sensitivity matrix of the expected value E[*X*] rather than the system state *X*, as was the case for the reaction rate equation.

Moreover, we will employ the finite-difference methods described above to estimate parametric sensitivities. Recall that a finite-difference estimate of the sensitivity with respect to parameter *c<sup>k</sup>* , over *R* coupled perturbed and unperturbed paths, is

$$\frac{\partial}{\partial c\_k} \mathbb{E}[X(t, c)] \approx Z\_R = \frac{1}{R} \sum\_{r=1}^{R} \frac{X\_{[r]}(t, c\_k + \theta) - X\_{[r]}(t, c\_k)}{\theta}.$$

While we compute the coupled trajectories using the CFD, CRP, or CRN strategies, our method can be applied to other finite-difference sensitivity estimators [29].

The measure (12) can be calculated to rank parameters from most to least influential. Small values of *δ msqr* correspond to parameters with a small influence on the model. We select those parameters that show the value of *δ msqr* larger than 0.2 [39]. With an initial ranked list, we compute the collinearity indices for this list. This method can be applied to models of moderate size.

Algorithm 1 calculates the normalized sensitivity matrix, as follows. A grid with *L* time-points ranging from 0 to *T* is selected. We choose equally distributed time steps, such that data is collected from all important regions of the interval of integration. This depends on the particular model. We note that an adaptive time-stepping procedure can be included instead. Then, the sensitivity matrices *S*(*t<sup>l</sup>* ) from Equation (16) are approximated with a specific finite-difference sensitivity estimator. Afterwards, we compute the concatenated non-dimensional sensitivity matrix *s*. We normalize each column of *s* individually to ensure consistency and comparability. The normalization implies dividing each column *s<sup>k</sup>* by its vectorial norm-2. Column normalization yields a matrix denoted by *S*˜. This matrix has as its *k*-th column {*s*˜*k*} = *sk*/k*sk*k2. Also, for each parameter *c<sup>k</sup>* we compute the sensitivity measure *δ* msqr *k* from Equation (12), using the entries of the *k*-th columns of the sensitivity matrices *S*(*t*` ).

**Algorithm 1** Computing the Normalized Sensitivity Matrix

**Initialize**: Time grid: 0 = *t*<sup>1</sup> < *t*<sup>2</sup> < . . . < *t<sup>L</sup>* = *T*. **Input**: Estimates of sensitivity matrices *S*(*t*` ) from (16). Compute the concatenated non-dimensional sensitivity matrix *s* from (18) with entries (17) **for** *k* = 1 to *M* **do** normalize *s*˜*<sup>k</sup>* = *sk* k*sk*k<sup>2</sup> where *s<sup>k</sup>* is the *k*-th column of *s* and k · k<sup>2</sup> is norm-2 **end for** Compute normalized matrix *<sup>S</sup>*˜ <sup>=</sup> {*s*˜*k*}1≤*k*≤*<sup>M</sup>* **for** *k* = 1 to *M* **do** Compute sensitivity measure *δ msqr k* according to (12) for parameter *c<sup>k</sup>* **end for**

In Algorithm 2, we introduce a method for the selection of identifiable parameter subsets based on sensitivity measures and collinearity indices. This procedure extends and refines a methodology by Gábor et al. [30] from the deterministic to the more difficult case of stochastic biochemical networks. The goal of Algorithm 2 is to iteratively assess the practical identifiability of subsets of model parameters. A threshold value is set for the collinearity indices, which measure the level of collinearity between parameter groups. The threshold value determines the acceptable level of collinearity. With a normalized sensitivity matrix obtained from Algorithm 1 as input, the following steps are considered. The parameters are ranked according to their sensitivity measure, those with a sensitivity measure below a critical value (chosen here as 0.2) are considered unimportant and may be discarded. If the ranked list of parameters is of moderate size, combinations of parameters are generated. For each combination, the algorithm computes the corresponding collinearity index. This involves calculating the collinearity indices for pairs, triples, etc. These indices quantify the degree of collinearity between the parameters of a certain group. When the computed collinearity index for a parameter subset is below the threshold value, that subset of parameters is deemed identifiable. By applying this algorithm, a subset of parameters with low collinearity and high identifiability can be selected. This allows for the reduction in model complexity and for the accurate and reliable estimation of the most important parameters, from the input data.


### **3. Results**

In this section, we apply our method to select subsets of practically identifiable parameters in the Chemical Master Equation on three realistic models. We observe that the collinearity indices play a significant role in finding the subsets of estimable parameters, using local stochastic sensitivities. The parametric sensitivities of the stochastic discrete model of well-stirred biochemical systems are approximated by finite-difference schemes, namely the Common Random Number, Common Reaction Path, and Coupled Finite Difference techniques. By applying perturbation in each of these finite-difference techniques, we can assess the sensitivity of the model outputs to changes in the model's parameters. The choice of perturbation size for finite-difference approximations is essential for obtaining accurate and reliable results while minimizing computational effort. The specific perturbation sizes, representing 5%, 1%, 2% of the parameter value, are often chosen based on a trade-off between accuracy and numerical stability. In addition, we find the parameters with high sensitivities. Those with low sensitivity have a reduced impact on the model outputs and cannot be estimated accurately. In the stochastic context, we consider the SVD of the normalized sensitivity matrix to determine its rank. This rank gives the number of model parameters that are not collinear.

For validation of the methods introduced above, we compare the results obtained with the Chemical Master Equation, with those derived with the Chemical Langevin Equation and those for the reaction rate equations, on two models of biochemically reacting systems. Still, we emphasize the importance of considering stochastic discrete models of biochemical networks to accurately describe the dynamics of these systems, particularly when some molecular populations are small or noise is driving the system behavior. The parametric sensitivities estimated for the reaction rate equations or the Chemical Langevin Equations may not yield accurate estimability results, in general. For each model, we generated 10,000 coupled trajectories to approximate the parametric sensitivities of the Chemical Master Equation by finite-difference schemes. The CFD strategy is considered to be more accurate and reliable than the CRN and the CRP methods [28]. The case studies tested are an infectious disease network [40], the Michaelis–Menten system and a genetic toggle-switch model [11].

### *3.1. Infectious Disease Model*

An infectious disease model [40] considers two species: *S*1—the infected particles and *S*2—the particles which can be infected. These species, which may depict molecules, cells, or humans, participate in five reactions. The first two reactions represent the death of species *S*<sup>1</sup> and *S*2, respectively, while the third and fourth reactions describe the birth

or production of particles of the *S*<sup>1</sup> and *S*<sup>2</sup> type. The two species interact through the fifth reaction, in which an infected particle *S*<sup>1</sup> infects a particle *S*2. The initial conditions are *S*1(0) = 20 and *S*2(0) = 40. The system is studied on the time-interval [0, 10]. For our simulations, 10,000 trajectories were generated to estimate the solution of the Chemical Master Equation.

Table 1 provides information on the reaction channels of the biochemical system and the values of their rate parameters. It includes the reaction channels denoted by *R*1, *R*2, *R*3, *R*4, and *R*5. Each reaction is described by its reactants and products. The last column lists the parameter values corresponding to the rates at which the reactions occur. These parameter values are specified for the stochastic model considering molecular numbers, rather than for the deterministic reaction rate equations expressed in terms of concentrations. A sample trajectory of the number of the infected *S*<sup>1</sup> particles and of the susceptible *S*<sup>2</sup> particles as functions of time, computed using Gillepie's algorithm, is given in Figure 1.


**Table 1.** Infectious disease model: the list of reactions and the corresponding rate parameter values.

**Figure 1.** Infectious disease model: the evolution in time of the number of molecules of the species *S*1—infected individuals and *S*2—individuals which can be infected, generated with Gillespie's algorithm, on the interval [0, 10].

The finite-difference sensitivity estimations are calculated with 10,000 trajectories using the CFD, the CRN, and the CRP strategies, with a perturbation of 5% of the parameter value. The path-wise sensitivities for the Chemical Langevin Equation are computed over 10,000 trajectories, with the Euler-Maruyama scheme applied to the Equations (2) and (7), and are utilized to estimate the sensitivities of the expected value of the state vector. Also, the parametric sensitivities are approximated for the reaction rate equations. These estimations are used to calculate the collinearity indices for all parameter combinations, for the Chemical Master Equation, the Chemical Langevin Equation, and the RRE models. The results are presented in Tables 2–6. The sensitivity measures are reported in Table 2, showing that *c*<sup>2</sup> is the least significant among all the parameters.


**Table 2.** Infectious disease model: comparison of *δ msqr* for a 5% perturbation.

Tables 3–6 reveal that the collinearity indices for the reaction rate equation and the Chemical Langevin Equation models exhibit greater consistency with the collinearity indices for the Chemical Master Equation, computed using with the CFD sensitivity estimator, compared to the CRN and the CRP estimators. Notably, the pair subset {*c*1, *c*3} has the highest collinearity index; however, it is relatively low for the CRP and the CRN schemes in comparison with the other estimations. This is due to the lower accuracy of the CRP and the CRN schemes when compared to the CFD technique. For pair sets, the subset {*c*1, *c*3}, for the triple sets, the subset {*c*3, *c*4, *c*5} and among the quadruple ones, the subset {*c*2, *c*3, *c*4, *c*5} have high value of collinearity indices in relation to the other subsets.

**Table 3.** Infectious disease model: collinearity indices for pair subsets. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 5% perturbation.


There is no subset with high collinearity indices (>20) in pair subsets (Table 3) but there is a parameter subset of size 3 with collinearity index greater than 20 (Table 4). In fact, the parameter subset {*c*3, *c*4, *c*5} is not identifiable with the Coupled Finite Difference sensitivity estimator, the Chemical Langevin Equation, or the deterministic sensitivities. However, the Common Random Number and the Common Reaction Path sensitivities show different results. In Table 5, two parameter subsets of size 4 show a collinearity index greater than 20 with the deterministic, stochastic continuous, and CFD sensitivity estimations. All subsets containing the parameters {*c*3, *c*4, *c*5} are collinear, which is in agreement with the results in Table 4. This indicates that these parameter subsets are poorly identifiable. Consequently, the sensitivity-based estimability analysis performed on the RRE, the CLE, and the CME models are in agreement, thus validating the proposed method for the more general discrete stochastic model. The Common Random Number and the Common Reaction Path techniques could not provide an accurate assessment of the identifiability of various subsets, with only 10,000 realizations, being thus less reliable.


**Table 4.** Infectious disease model: collinearity indices for triple subsets. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 5% perturbation.

**Table 5.** Infectious disease model: collinearity indices for quadruple subsets. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 5% perturbation.


**Table 6.** Infectious disease model: collinearity indices for the set of all kinetic parameters. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 5% perturbation. The singular values for the CFD, the CLE, and the RRE sensitivity estimations show that the number of parameters that are not collinear is four.


### *3.2. Michaelis–Menten Model*

The second model we analyze is the Michaelis–Menten biochemical system, which involves four species—a substrate *S*1, an enzyme *S*2, a complex *S*<sup>3</sup> and a product *S*4—and three reactions. We denote by *Y<sup>i</sup>* the number of molecules of the species *S<sup>i</sup>* . With this notation, the initial conditions for the number of molecules are *<sup>Y</sup>*1(0) = [<sup>5</sup> <sup>×</sup> <sup>10</sup>−7*nAvol*], *<sup>Y</sup>*2(0) = [<sup>2</sup> <sup>×</sup> <sup>10</sup>−7*nAvol*] and *<sup>Y</sup>*3(0) = *<sup>Y</sup>*4(0) = 0, where *<sup>n</sup><sup>A</sup>* <sup>=</sup> 6.023 <sup>×</sup> <sup>10</sup><sup>23</sup> is Avogadro's number and *vol* = 10−<sup>15</sup> denotes the volume of the system. The reactions and the values of the rate parameters are included in Table 7. This model is integrated on the interval [0, 50]. Figure 2 depicts a realization of the system state, simulated with Gillespie's algorithm.

**Figure 2.** Michaelis–Menten model: the evolution in time of the number of molecules of a substrate, an enzyme, a complex and a product, generated with Gillespie's algorithm, on the interval [0, 50].

**Table 7.** Michaelis–Menten model: the list of reactions and the corresponding rate parameter values.


We start by approximating the parametric sensitivities for the Chemical Master Equation. The finite-difference sensitivity estimations obtained with the CFD, the CRP, and the CRN algorithms use a perturbation which represents 1% and 5%, respectively, of the value of the parameter of interest. The sensitivity measures provided in Table 8 indicate that *c*<sup>2</sup> may not be estimated as accurately as the other parameters. The collinearity indices obtained for the perturbation value 1% with each sensitivity estimator for pairs of parameters are reported in Table 9, while the indices for the set of all parameters are recorded in Table 10. For each subset, the results for the stochastic Michaelis–Menten model demonstrate low collinearity indices, below 20. The choice of the finite-difference sensitivity estimator does not significantly affect the parameter identifiability. The stochastic discrete modeling approach to identifiability analysis yields parameter subsets that are not collinear for the Michaelis–Menten system. Additionally, the Tables include the RRE identifiability metrics to validate the CME estimability results. The collinearity indices for the perturbation value of 5% can be found in the Appendix A, and they are consistent with the results obtained using a perturbation of 1%.



**Table 9.** Michaelis–Menten model: collinearity indices for pair subsets. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 1% perturbation.

**Table 10.** Michaelis–Menten model: collinearity indices for the triple subset. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN, and CRP algorithms and a 1% perturbation.


### *3.3. Genetic Toggle Switch Model*

The last biochemical system investigated is the genetic toggle switch [11,28]. Multistable stochastic switches arise in modeling key biological processes. The model considers two gene pairs, whose interaction creates a bistable switch, as each gene negatively regulates the synthesis of the other gene. Due to the presence of noise, the system can transition between the states represented by an abundance of one species and an almost total absence of the other. In this genetic switch system, the two species *U* and *V* take part in four reactions. Table 11 specifies the reaction channels and their propensities. We examine the system using the following parameter values [11]

$$
\mathfrak{a}\_1 = \mathfrak{s}0, \; \beta = 2.5, \; \mathfrak{a}\_2 = 16, \; \gamma = 1 \; \text{,} \tag{20}
$$

and the initial conditions *XV*(0) = *XU*(0) = 0. Figure 3 displays a sample path for the molecular numbers of the two species, simulated with Gillespie's algorithm (left) along with the standard deviation of the CFD, CRP, and CRN sensitivity estimators as functions of time (right).

**Figure 3.** Genetic toggle switch model: (**Left**): the evolution in time of the number of molecules of the species *U* and *V*, generated with Gillespie's algorithm, on the interval [0, 50]. (**Right**): standard deviations of the three estimators, CFD, CRP, and CRN.


**Table 11.** Genetic toggle switch model: the list of reactions and their propensity functions.

The reaction rate equation model cannot capture the stochastic transitions between the states, and thus the deterministic tools for analyzing this system are not applicable. We perform an estimability analysis of the Chemical Master Equation model for the genetic toggle switch, on the interval [0, 50]. To assess how variations in the parameter values affect the dynamics of the system, we approximate the local sensitivities with respect to the parameters whose values are given by (20). We simulate 10,000 coupled sample paths with the CFD, and the CRP methods. The finite-difference sensitivity estimators are applied with a perturbation *θ* = 10−<sup>4</sup> for each parameter value. The sensitivity measures are provided in Table 12 and those calculated using the CFD method show that all parameters have *δ msqr* > 0.2, being thus important enough, while the RRE sensitivity measures indicate that the parameters *β* and *γ* are insignificant.

**Table 12.** Genetic toggle switch model: comparison of *δ msqr* .


Employing the local sensitivity approximations, we compute the collinearity indices for all the subsets of the parameter set {*α*1, *α*2, *β*, *γ*}. Tables 13–15 record the collinearity indices for the pair, triple and quadruple subsets, respectively. No subset of parameters exhibits collinearity based on the CFD, the CRP, and the CRN sensitivity estimations. We conclude that all four parameters are identifiable for the stochastic discrete model. These results are confirmed by the singular values computed with the CFD sensitivity estimator, which are [32.21; 29; 12.18; 4]. Different values of the parameters for this model may yield different results for estimability in the stochastic genetic toggle-switch system.

**Table 13.** Genetic toggle switch model: collinearity indices for pair subsets.


The CME sensitivities with respect to parameters are estimated over 10,000 with the CFD and CRP methods and perturbation *θ* = 10−<sup>4</sup> . \*: Collinearity index does not exist.


**Table 14.** Genetic toggle switch model: collinearity indices for triple subsets.

The CME sensitivities with respect to parameters are estimated over 10,000 with the CFD and CRP methods and perturbation *θ* = 10−<sup>4</sup> . \*: Collinearity index does not exist.



The CME sensitivities with respect to parameters are estimated over 10,000 with the CFD and CRP methods and perturbation *θ* = 10−<sup>4</sup> . \*: Collinearity index does not exist.

### **4. Discussion**

Stochastic models of well-stirred biochemical processes provide a valuable framework for capturing inherent variability at the cellular level when some molecular species have low amounts. Chemical Master Equation is a frequently adopted stochastic discrete model for such processes. By contrast, deterministic approaches are often not suitable for modeling cellular systems as they fail to capture the intrinsic randomness observed experimentally. Many models of realistic biochemical processes depend on a fairly large number of parameters. The values of some of these parameters may be unknown and have to be estimated. Parameter estimation is a critical step in modeling biochemical systems. However, determining appropriate parameter values for stochastic discrete models of biochemical networks poses many challenges. It is essential to determine the key parameters which are identifiable from the experimental data, as well as those that cannot be reliably estimated. For a subset of parameters to be practically identifiable, each parameter of the subset should have a significant contribution to the system dynamics as well as the parameters of the subset should not be correlated.

In this work, we propose a method for detecting collinearity in subsets of parameters for the stochastic discrete model of the Chemical Master Equation, with the goal of finding the parameter sets that exert the greatest influence on the biochemical system state. In addition, we introduce a technique for determining the highest parameter identifiable sets for stochastic biochemical systems, by extending methods from deterministic models to stochastic models. Our analysis is based on estimating the local sensitivities of the system state with respect to the model's parameters. This is achieved by utilizing finite-difference approximations of the parameter sensitivities, specifically the Coupled Finite Difference, the Common Reaction Path, and the Common Random Number schemes. Furthermore, we examine the role of the singular value decomposition of the sensitivity matrix in identifying parameters that are not collinear in stochastic models of biochemical systems. On one hand, we showed that our practical identifiability method is accurate, by comparing the results obtained in the deterministic and stochastic scenarios, on two biochemical systems of practical importance, for which the deterministic model accurately describes the evolution of the expected value of the stochastic system state. Excellent agreement among the various approaches was obtained for these biochemical networks. On the other hand, we wish to emphasize that, in general, a stochastic strategy for selecting identifiable parameter sets should be considered, as it relies on more accurate and reliable estimations of the parametric sensitivities for the widely applicable model of the Chemical Master Equation, compared to the deterministic reaction rate equations. The advantages of our approach over the deterministic one were illustrated by the tests performed on a third model, a genetic toggle switch system exhibiting an interesting multistable behavior. For this model, our stochastic identifiability strategies display excellent performance, while the deterministic techniques show their limitations, by not being able to assess the estimability of the model parameters.

We expect the method to perform best on stochastic biochemical models with a moderate number of reaction rate parameters. Specifying identifiable parameter subsets with the tools provided above may be used to refine models, improve predictions, and study the underlying biological processes under consideration.

**Author Contributions:** Conceptualization, S.G. and S.I.; methodology, S.G. and S.I.; software, S.G.; validation, S.I.; investigation, S.G. and S.I.; writing—original draft preparation, S.G. and S.I.; writing review and editing, S.I.; visualization, S.G. and S.I.; supervision, S.I.; funding acquisition, S.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by a grant from the National Sciences and Engineering Research Council of Canada (NSERC)—Grant No. RGPIN-2020-05469, and Toronto Metropolitan University.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** No new data were created.

**Acknowledgments:** The authors wish to thank the anonymous referees for their suggestions which helped improve the presentation and Monjur Morshed for discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Appendix A**

**Table A1.** Michaelis–Menten model: comparison of *δ msqr* for a 5% perturbation.


**Table A2.** Michaelis-Menten model: collinearity indices for pair subsets. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN and CRP algorithms and a 5% perturbation.


**Table A3.** Michaelis–Menten model: collinearity indices for the triple subset. The CME sensitivities are estimated over 10,000 trajectories with the CFD, CRN and CRP algorithms and a 5% perturbation.


### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
