1. Introduction
The ravages of the COVID-19 pandemic has deepened the need for mathematical and statistical tools to understand the dynamics of epidemics across the world. Simple mathematical models of infectious diseases are useful for providing insight into epidemic trajectories and disease dynamics [
1,
2,
3]. However, applications should target complex but parsimonious models which make realistic assumptions and let the observed data drive estimations.
There are two common approaches to epidemiological modeling: phenomenological models and mechanistic models (e.g., compartmental models). On the one hand, phenomenological models use an empirical approach based on growth curve fitting (e.g., by nonlinear least squares [
4] or by maximum likelihood [
5]) to describe the temporal progression of case counts (e.g., daily confirmed positive cases). In this regard, the logistic bell curve has been widely used for various epidemic data, but it lacks flexibility for epidemics whose data exibits asymmetry or varying growth patterns [
4,
6,
7]. With a view to allow flexibility, Tovissodé et al. [
5] considered the generic growth curve of Turner et al. [
8] with application to COVID-19 data. This approach concedes the simple logistic curve when it is supported by the observed data, but offers the possibility to fit various flexible growth models such as the generalized logistic model [
9,
10], the hyperlogistic model [
8,
11], the hyper-Gompertz [
8] and the Gompertz curves [
12,
13]. However, to be realistic, models for epidemic data should be able to account for the potential effect of containment measures when implemented after an epidemic outbreak. In a target population undergoing an epidemic wave, the number of infective individuals may be assumed to follow an exponential growth in the early epidemic phase where no containment measures were implemented or the implemented measures were not yet effective [
14]. In this case, the variation of the number of infective individuals is expected to shift to a sub-exponential growth resulting from negative feedbacks due to a decrease in the probability that an infectious individual meets a susceptible individual [
6] or effects of the containment measures, if any. The major advantage of the phenomenological modeling approach is its simplicity while allowing the estimation of various quantities of interest to understand an epidemic, e.g., the “epidemic latency period” defined as the delays between the appearance of the first infectious case in the population (“patient zero”) and the outbreak [
14] and epidemic peak time and size, and the forecast of future incidence. The main limitation of phenomenological models is the inability to inform on the transmission process (new infections) and the removal processes (recovery and death) of an epidemic. As a result, phenomenological modeling lacks the ability to assess the effects of control interventions.
On the other hand, and contrary to phenomenological models, mechanistic models structure the population under study into different epidemiological states [
4] and allow assessing the effects of control interventions on the population and disease dynamics. For instance, the effect of various control measures (e.g., contact limitation, detection and diagnosis) on COVID-19 transmission has been assessed using the Susceptibles–Exposed–Infectives–Recovered (SEIR) model and its variants [
15,
16,
17]. However, because only a few epidemiological states can be observed, mechanistic models often face an identifiability issue in the estimation of model parameters [
18,
19,
20]. In addition, there is generally no closed form solutions to the differential equations describing the considered epidemiological states. As a consequence, the estimation of compartmental models often relies on numerical approximations which make fitting procedures (e.g., nonlinear least squares or Bayesian estimation) computationally intensive and may introduce high-order errors in both estimates and forecasts [
21]. Moreover, some quantities of high interest to understand epidemic outbreaks, which are readily available from a growth model including the epidemic latency period, are hard to derive under compartmental models.
This study proposes a hybrid framework to combine the advantages of phenomenological and mechanistic models while circumventing some of the limits of the two approaches. We focus on epidemic waves managed with at least an isolation measure for all identified infectives, as for the COVID-19 pandemic in nearly all the world. The objective of this work is to provide a quantitative framework in which epidemiologists can identify, from a large family of models, the parsimonious model that explains patterns in an observed dataset, and then assess hypotheses on the potential course of related but unobservable processes of interest. Specifically, we modeled confirmed positive cases using a combination of the exponential growth curve for the initial epidemic phase and the generic growth curve [
8] after this initial phase. This development allows the estimation of the duration of the exponential growth phase and the theoretical time and size of the peak of new positive cases. Secondly, we modeled removal (recovery and death) from identified positive cases as binary processes using two logistic regression models to monitor the evolution and the peak (time and size) of the actives among detected cases. Finally, to provide an overall view for a target epidemic, we integrated the growth curve and the logistic regression removal rates into a mechanistic SIQR model frame [
22] in which the population is structured in Susceptibles, Infectives, Quarantined (identified actives cases) and Recovered individuals. The result is a mechanistic model in which the sizes of the different states (compartments) have closed form expressions. This allows inference on various epidemiological parameters such as the delay between the appearance of the first infectious case in the population (“patient zero”) and the outbreak (“epidemic latency period”), the reproduction number, the unobservable new infections per unit time as well as the proportion of the target population immunized against the pathogen of the target disease.
In addition to the estimates (with quantified uncertainty) for common epidemiological parameters, the proposed hybrid modeling framework extracts from the observed data and demographic rates, the evolution along the epidemic course of the key parameter to summarize the dynamics of an epidemic: the reproduction number. The changes in this parameter can thus be confronted to control measures promoted/enforced by public health authorities and governments. For illustrative purpose, we used the developed modeling framework: (i) to model COVID-19 case reporting data (daily PCR-confirmed positives, recoveries and deaths) from Western Africa (28 February to 31 August 2020); and (ii) to evaluate the transmission pattern of the disease in the region during the considered period. The results were used to discuss the effectiveness of some containment measures implemented by governments across the region.
2. The Hybrid Modeling Framework
In this section, we describe the three sub-models integrated into the proposed modeling framework, namely, the growth model, the logistic removal rates and the Susceptible–Infective–Quarantined–Recovered (SIQR) mechanistic model.
2.1. Mixture of Growth Models for Detected Cases
We assume that the cumulative number
of reported cases, as a function of time
t, has the form
where
is the duration from outbreak to the end of the exponential growth phase,
is the generic growth model [
8] with
,
is a constant such that the ultimate epidemic size (detected) is
,
is the “intrinsic” growth rate constant for the sub-exponential growth phase,
is a growth acceleration parameter,
(
) is a shape parameter controlling the skewness of the growth curve during the sub-exponential epidemic phase (see
Appendix A.1 for restriction related details) and
is a constant of integration determined by the initial conditions of the epidemic. The generic growth curve
specified for
encompasses many special or limiting cases including the Bertalanffy–Richards (
), hyper-Gompertz (
while
with
constant), Gompertz (
,
while
), hyper-logistic (
) and logistic (
and
) growth models [
8] (see
Appendix A.1 for details). The parameter
in (
1) is the exponential growth rate for the early epidemic phase and
determines the growth rate at
. The constants
and
are set such that the first derivative
and the second derivative
of
with respect to
t are smooth at
(i.e., at the end of the exponential growth phase). Specifically,
where
and
;
and
are, respectively, the first and second derivatives of
(see
Appendix A.1 for details); and (4) follows from setting
. Furthermore, the real constant
in (
1) ensures that
does not jump at
. In other words,
is given by
(with
) which by (4) simplifies to
In (
1), the time (in e.g., days, weeks or months) of the first identified cases corresponds to
. In other words, to match (
1) to the observed data,
is identified to the number of cases reported in the time interval
,
is the number of cases reported in the time interval
, etc. If
and
, the curve
converges to an exponential growth curve with rate
. However, this scenario can be ruled out since the size of any target population is finite and so is
. In practice, the exponential growth is prevented by negative feedbacks which decrease the probability that an infectious individual and a susceptible individual meet and have an adequate contact (i.e., contact sufficient for transmission). For instance, the growth of the infectives is naturally continuously lowered by the increasing fraction of the population constituted by individuals who recovered and become less susceptible (temporarily or permanently immune) to the infection [
6]. To prevent the exponential growth of the infectives, control measures such as quarantining and lockdown reduce the probability of contact between susceptible and infectious individuals, whereas some other measures such as social distancing and wearing a face mask reduce the likelihood of transmission whenever contacts happen.
The specification of the growth model in (
1) to an epidemic thus implies that the growth rate
, i.e., the number of new cases reported per unit time given by
with
defined in
Appendix A.1, will peak and then fall toward zero case per unit time. The peak occurs at a time
when the growth acceleration
given by,
with
defined in
Appendix A.1, vanishes. The expressions of the time (
) and the size (
) of the peak are available in
Appendix A.2 for the general situation (
and
), as well as for limiting cases.
The number of detected cases is the basic data reported during an epidemic. Once this has been modeled, various epidemic related quantities can be inferred upon introduction of disease related parameters (e.g., detection of infectives, recoveries and deaths) and demographic parameters (e.g., natural mortality, births and immigration).
2.2. Infectives, Epidemic Latency Period and Active Cases
Since only a fraction of infectives are identified at a time
t, the number
of infective individuals in a target population is obtained using (
6) as
[
5], which reads
where
is the number of infectives at the outbreak (
) and
is the detection rate assumed constant along the epidemic course (after the outbreak). Note that the number of infectives before the outbreak (
) is obtained by back extrapolation as
, i.e., considering an exponential growth before the outbreak [
14].
We refer to the time from the appearance of the first infectious case in the population (“patient zero”) to the outbreak as the “epidemic latency period”. An estimate of the duration
of this period is obtained by setting
[
14]. By (
8), the duration of the epidemic latency period is estimated by
, which on using (4) simplifies to
The number of detected and active cases, i.e., individuals tested positive and in isolation at a hospital or at home at time
t, is denoted
following Hethcote et al. [
22] for “Quarantined” state, although we refer to
as “Actives”. Given the detected cases
in (
1),
satisfies
where
is the recovery rate and
is the death rate (natural and disease-related mortality) of actives. Indeed, following Tovissodé et al. [
5], we allow the removal rates
and
from
to be time varying. This is appropriate when recovery and death data are available in addition to the reported positive cases per unit time. The two rates have here the logistic forms
The number of active cases is then given by (see
Appendix B for details)
where
is available from Equation (
A3) and represents the number of persistent cases from previous epidemic waves (isolated actives) at the outbreak of the target epidemic wave (e.g.,
for a new disease-related epidemic) and
is defined as
2.3. Overall Epidemic Dynamics
The dynamics of an epidemic, as expressed by the variations of the infectives
, is determined by the combination of the transmission rate (new infections) and the average residence time, i.e., the average duration from infection to isolation, recovery or death. The core parameter to summarize these dynamics is, at moment
t, the reproduction number denoted
, which is indeed crucial for quantifying the intensity of control measures required to control an epidemic [
7].
The reproduction number is defined as the average number of secondary cases generated by a primary case. With a view to derive
under the growth model in (
1), we first consider an overall picture of the target population in order to enlighten the sources (transmission and removal) of the variations of
as given in (
8).
2.3.1. The SIQR Model
Following the authors of [
5,
14], we consider the Susceptible–Infectious–Quarantined–Recovered (SIQR) model of Hethcote et al. [
22] to obtain a picture of the different states of individuals in a target population. We use the “quarantine-adjusted incidence” version [
22] of this model since the underlying transmission mechanism explicitly recognizes the isolation of detected cases. In this framework, letting
denote, at time
t, the size of the target population (assumed finite but large),
satisfies
where
is the size of the class of susceptible individuals,
is the class of infectives,
is the size of the class of detected active cases and
is the size of the class of individuals who recovered (both detected and not detected). We assume that the infection has zero latent period (susceptible individuals become infectious as soon as they become infected). The individuals in the classes
R are assumed permanently immune within the period of time considered. It is also assumed that known active cases (in the class
Q) do not mix with other classes and do not infect the susceptibles (i.e., the transmission rate from
Q-class individuals is considered negligible). The corresponding SIQR model is described by the following set of nonlinear differential equations [
22]
where
is the recruitment rate of susceptibles (births and immigration);
is the total number of adequate contacts (i.e., contacts sufficient for transmission) per unit time;
is the per capita natural mortality rate;
and
are the recovery rates from actives
and infectives
respectively;
and
are the death rates (natural and disease-related) for actives
and infectives
respectively; and
is the detection rate which is null (
) for
and equals
for
. Note that (18) is the same as (
10) for
. Unlike in [
22], we allow the transmission rate
to be time varying as a consequence of the form of the number of infectives
already available in (
8). The transfer diagram for this SIQR model is shown in
Figure 1.
The system (
16)–(19) always has the disease-free equilibrium
, i.e., in the absence of the disease, the population size
approaches the carrying capacity
. Further discussion of the equilibria of the system are given in
Appendix C.1. The availability of the number of infectives in Equation (
8) makes it possible to solve the system (
16)–(19). Indeed, from (17), the transmission rate, i.e., the number of adequate contacts per unit time (for
) is given by
From (
20), and using the same approach considered to find the number
of active cases in Equation (
13) from the number
of infectives in Equation (
8), the expressions of the number
of susceptibles, the number
of recovered individuals and the total number of persons infected during an epidemic wave can be obtained (see
Appendix C.2 for details).
2.3.2. The Effective Reproduction Number
From the definition of the effective reproduction number as the average number of secondary cases generated by a primary case, the threshold
corresponds to the product of the transmission rate
and the average residence time
in the class of infectives, i.e.,
This effective reproduction number is sometimes referred to as a “quarantine” reproduction number [
22] or simply a “control” reproduction number to acknowledge the influence of isolation of identified infectives, and other control measures, if any [
15]. The basic reproduction number defined as the average number of secondary infections produced when one primary infectious individual enters a completely susceptible population (
,
,
,
), is here given by
. This expression is simplified, assuming
for the sake of beauty [
23] and mostly because
is large (recall this is a model assumption), as
During the epidemic latency period (
) where the growth is exponential (
) and the detection rate is
, the time-varying reproduction number is given by
From the outbreak, the time-varying effective reproduction number during the remaining of the exponential phase has the same form
It appears from (
22) and (
23) that
during the whole exponential growth phase as expected. During the sub-exponential growth phase, the time-varying effective reproduction number is given by
where
(see
Appendix A.1).
2.3.3. Epidemic Peak
The peak of new infections occurs when the second derivative of the total number of infected persons (since the beginning of the epidemic) vanishes. This peak time denoted
satisfies
and is the solution of (see details in
Appendix C.3)
which can be solved for
t using a numerical root finding routine such as the R [
24] function
uniroot or the Matlab [
25] function
fzero. Afterwards, the peak size
(the maximum number of new infections per unit time) is obtained by inserting
in (
A14).
2.4. Long-Term Epidemic Dynamics
The specification of the growth model in (
1) to an epidemic implicitly assumes that the number of infectives in (
8) peaks at time
and then approaches zero. The decay of the infectives after the peak can happen at various rates, depending on the growth pattern (determined by contacts between the infectives and the susceptibles or intermediate hosts), the response of the infected individual’s organism (natural or induced with medicine or a vaccine) to the disease (recovery and death process) and the testing efforts (detection followed by isolation). There are actually two alternative paths from a disease-related state (i.e.,
) toward the unique (disease-free) equilibrium
: transmissions either stop (
reaches zero) or continue fro a long time at a rate which cannot sustain an epidemic (
). These two scenarios are discussed further in
Appendix C.4.
2.5. Statistical Model and Inference
To allow likelihood inference in the growth models in (
1) using observed epidemiological data, we follow Tovissodé et al. [
5] and assign to new reported cases
(
) a log-normal distribution with probability density function (pdf)
where
is a dispersion parameter (standard deviation at logarithmic scale). This specification yields the mean
and the variance
while allowing null values of
. In addition, the numbers of new recoveries
and new deaths
from known active cases
(
) are modeled using logistic regression models with probability mass functions (pmf)
where
and
. The parameter vector indexing the pdf in (
26) and the conditional pmf in (
27) and (28) is
when the generic growth curve is considered for the sub-exponential growth phase. If a special case of the generic growth curve is desired, the corresponding restricted parameters must be withdrawn from
. For instance, the use of a hyper-logistic growth curve (
) implies
. Given
, the conditional log-likelihood of an observed series
with
, as a function of the parameter
is
The conditional maximum likelihood estimate of can be obtained using an optimization algorithm to maximize the log-likelihood function ℓ. Note that the three components of are separable and can be maximized independently. In other words, the parameter vector has the partition and the maximum likelihood estimates of the components , and can be obtained by maximizing , and respectively.
Since both the binomial and the log-normal distributions belong to the exponential family, we consider the common deviance statistic used in Generalized Linear Models [
26] for checking the goodness-of-fit of the log-normal model associated to
and the binomial models associated to
and
. For the selection of the parsimonious model agreeing with the observed data, we consider the likelihood ratio statistic [
27]. Further details on the deviance statistic and the likelihood ratio test are given in
Appendix D.
4. Discussion
The importance of mathematical models in understanding and predicting the course of an epidemic outbreak and in assessing the impacts of public health control measures has been well documented in the current context of the COVID-19 pandemic [
15,
35,
36,
37]. Whereas phenomenological modeling is limited in the scope of inference, compartmental modeling faces identifiability issues and is usually computationally intensive [
38]. This study proposes a hybrid modeling framework which combines phenomenological and mechanistic modeling approaches to assess the dynamics of epidemic outbreaks while circumventing some of the limitations of each approach. We illustrate our description of the different epidemiological aspects that the hybrid modeling framework deals with using COVID-19 data from West Africa (28 February to 31 August 2020). It is worth noting that the heterogeneity of the West African region in terms of testing and reporting policies, especially for the first epidemic wave, is an important limitation for this application. This is systematically true for any regional assessment of the pandemic [
15]. Our analysis aims to provide an overall view of the dynamics of the pandemic in the West Africa. However, the analysis of the data from each country may be conducted to obtain finer country-specific results (for some countries, these may significantly deviate from the overall trend).
The proposed modeling framework uses a combination of the exponential growth model for the initial dynamics of the epidemic and a generic growth curve [
8] to capture the observed patterns in the number of detected positive individuals. This phenomenological model is flexible, includes many special cases and thus allows selecting the effective parsimonious model fitting the observed data based on likelihood ratio tests [
27,
39] or information criteria such as the Akaike’s Information Criterion [
40]. The effectiveness of this approach to phenomenological modeling has been demonstrated on COVID-19 data [
5]. Our application on COVID-19 data from West Africa nevertheless showed that the logistic regression of recoveries and deaths in the identified positive individuals against time can lack fit, as measured by an asymptotic
test on the residual deviance statistic. Nevertheless, these fits can be improved by adding explanatories (different from time, but related to available health facilities) in the logistic regression models. The deterministic SIQR model [
22] considered for mechanistic modeling explicitly acknowledges the isolation of the detected positive individuals. It does not, however, include an exposed (E) state as in the SEIQR model [
41]. The use of the SEIQR model may provide better insights on the effectiveness of control measures since most of the measures first impact the exposition of susceptible individuals. In general, the proposed modeling approach can be extended by considering more complex models such as the SEIQR and the SIDARTHE model [
42] instead of the SIQR model considered herein.
Among interest quantities provided by the hybrid modeling framework, we have the epidemic latency period
(the time from the appearance of the first infectious case in the population to the outbreak). For the West African region, the result indicates that the first imported COVID-19 case(s) in West Africa likely entered the region around 28 January–7 February 2020. To the best of our knowledge, this is the first estimate of this duration in the region. This epidemic latency period is much lower than the 40 days estimated for Italy [
14]. This is in line with the relatively late arrival of the virus in the region, compared to the Asian and European continents, and the prevention and detection measures anticipated by many West African governments [
31]. We obtained a basic reproduction number (
) higher than the estimate (
) obtained by [
15]. Our estimate is, however, closer to country-specific estimates obtained for Nigeria (
) [
43] and Ghana (
) [
44].
During the early phase of the epidemic after the outbreak in West Africa, the detection and isolation of a fraction of infected individuals reduced the reproduction number from
to a control reproduction number of
, i.e., about 5.26% decrease. We estimated the duration of this phase characterized by an exponential growth to be about one month after the outbreak. This implies that the control measures implemented by West African governments to limit the transmission of the disease were not effective on average before April 2020. Indeed, apart from measures taken to limit the importation of new positive individuals (travel bans), many actions to limit the local propagation of the disease were first implemented in late March 2020 [
31] (e.g., curfew set up on 21 March in Burkina-Faso, on 23 March in Côte d’Ivoire, Mauritius and Senegal and on 26 March in Mali; city lockdown on 22 March in Ghana and on 29 March in Nigeria; isolation of the capital from the rest of the country in Côte d’Ivoire on 25 March 2020; and
cordon sanitaire set up to isolate the south from the rest of the country on 30 March 2020 in Benin). Our results indicate that these measures started to impact the dynamics of the epidemic from early April 2020. However, the measures may have affected the transmission dynamics earlier, since the measures mainly limited the exposition of susceptible individuals to the disease.
After the exponential growth phase, the sub-exponential growth pattern allowed the epidemic to peak. The estimated peak time for the detected positive cases was around 15 July 2020, and close to the observed peak time (24 July 2020). This estimated date has a delay of about eight days with respect to the estimated peak time of new infections (
days). This estimate is higher than the estimate (
days) obtained by [
30]. These contrasting results may be related to the more realistic SIQR model considered in this work as compared to the simpler SIR model used by Honfo et al. [
30] who ignored the quarantine-adjustment of the disease incidence [
22]. On the contrary, the estimated maximum number of new infections (
) agrees with the estimate (
new infections) obtained by Honfo et al. [
30].
Our results show that the time-varying effective reproduction number has decayed over April–August 2020, reaching 1 on about 15 July 2020 and 0.66 at the end of the considered period (31 August 2020). Based on the modeled dynamics, the effective reproduction number likely reached its minimum value 0.61 around 29 September 2020. However, the reproduction number likely increased again to approach in the long run. Overall, the various measures decided and enforced by different West African governments, against the first COVID-19 epidemic wave in the region, were able to contain the propagation of the disease (importation of new cases and local transmission) in five months.
However, the COVID-19 pandemic will remain an important issue for a long time, and local region’s endemic to the pathogen will likely appear in the long run. This is so because of the following factors: the re-opening of borders and airports in the region to limit the related economic feedback [
45,
46]; the relaxation of measures such as the ban of sport, political, cultural and religious gatherings [
31,
47]; and the natural evolution of the SARS-Cov-2 virus [
48,
49,
50,
51]. The limited resources and capacity of Sub-Saharan Africa countries in general [
52,
53,
54] to immunize their population through vaccination will compound this threat in the region.