*Article* **Analysis of HIV/AIDS Epidemic and Socioeconomic Factors in Sub-Saharan Africa**

## **Shuman Sun, Zhiming Li, Huiguo Zhang , Haijun Jiang and Xijian Hu \***

College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China; Sunshuman2222@163.com (S.S.); zmli@xju.edu.cn (Z.L.); zhang\_huiguo@163.com (H.Z.); jianghaixju@163.com (H.J.)

**\*** Correspondence: xijianhu@xju.edu.cn

Received: 21 September 2020; Accepted: 26 October 2020; Published: 29 October 2020

**Abstract:** Sub-Saharan Africa has been the epicenter of the outbreak since the spread of acquired immunodeficiency syndrome (AIDS) began to be prevalent. This article proposes several regression models to investigate the relationships between the HIV/AIDS epidemic and socioeconomic factors (the gross domestic product per capita, and population density) in ten countries of Sub-Saharan Africa, for 2011–2016. The maximum likelihood method was used to estimate the unknown parameters of these models along with the Newton–Raphson procedure and Fisher scoring algorithm. Comparing these regression models, there exist significant spatiotemporal non-stationarity and auto-correlations between the HIV/AIDS epidemic and two socioeconomic factors. Based on the empirical results, we suggest that the geographically and temporally weighted Poisson autoregressive (GTWPAR) model is more suitable than other models, and has the better fitting results.

**Keywords:** HIV/AIDS epidemic; regression model; Newton–Raphson procedure; Fisher scoring algorithm; time series

#### **1. Introduction**

Acquired immunodeficiency syndrome (AIDS) is a malignant infectious disease with a high fatality rate caused by human immunodeficiency virus (HIV). The HIV/AIDS epidemic has been one of the greatest global public health and social development problems since 1981, particularly in Sub-Saharan Africa. As of 31 December 2016, over 30 million people had died from the disease [1]. More than 70% of the 35 million people are infected with the HIV/AIDS disease in Sub-Saharan Africa. Thus, the HIV/AIDS epidemic of Sub-Saharan Africa has attracted extensive attention from researchers around the world [2–4].

In earlier studies, Janet et al. [5] and Hallman et al. [6] demonstrated the relationship between the disease and socioeconomic status. Chris et al. [7] indicated socioeconomic factors to explain this disease outperformed cultural ones in South Africa. Mathematical models always play an important role in evaluating the trends of the HIV/AIDS epidemic [8]. For example, regression models have been widely used in the study of the relationship between this disease and influencing factors. Shiboski et al. [9] considered a generalized linear model to obtain the statistical analysis of the HIV/AIDS disease. A mixed-effects linear regression model was used to analyze the correlation between national population and antenatal care [10]. Laurence et al. [11] applied a spatial regression model to show that the epidemic had substantial geographic variance across Sub-Saharan Africa.

This paper proposes several regressive models to investigate the relationships between the HIV/AIDS epidemic, the gross domestic product (GDP) per capita and the population density in ten countries of Sub-Saharan Africa. The Poisson regression model is introduced in Section 2.1. Sections 2.2 and 2.3

describe two spatial models, respectively. A spatiotemporal autoregressive model is proposed in Section 2.4. The maximum likelihood method is used to obtain the iterative formulas of coefficient estimations in Section 3. The main results are shown in Section 4, followed by discussion in Section 5.

#### **2. Methodologies**

#### *2.1. Poisson Regression Model*

Regression models are a set of statistical processes for estimating the relationships between response and explanatory variables. The classical model is a linear regression. Nelder and Wedderburn [12] extended the linear model to a generalized linear regression for solving the discrete data problem. This kind of models are very important in ecology, medicine and economics [13–15]. Suppose that *Y* = (*Y*1,*Y*2, ... ,*Yn*) is the response variable, where *Yi*(*i* = 1, . . . , *n*) are independent. The density function is

$$f(y\_i; \theta\_i, \phi\_i) = \exp\left(\frac{y\_i \theta\_i - b(\theta\_i)}{a(\phi\_i)} + c(y\_i, \phi\_i)\right),$$

where *a*(·), *b*(·), *c*(·, ·) are known functions, and *θi*, *φ<sup>i</sup>* are unknown parameters for *i* = 1, 2, ... , *n*. Denote *μ<sup>i</sup>* = *E*(*Yi*), and *g*(*μi*) = ln(*μi*) is a link function. Let *Xij* be explanatory variables for the *i*th observation in the *j*th variable. Then, the Poisson regression (PR) model is given by

$$\log(\mu\_i) \triangleq \eta\_i = \sum\_{j=1}^p \pounds\_j X\_{ij\prime} \tag{1}$$

where *i* = 1, 2, . . . , *n*, and *βj*(*j* = 1, 2, . . . , *p*) are unknown parameters.

#### *2.2. Geographically Weighted Poisson Regression Model*

With in-depth study, regression models have been frequently applied in epidemiology and health geography for trying to investigate the persistent geographical variations in disease [16]. Based on the generalized linear regression, Brunsdon et al. [17] proposed the geographically weighted regression model to analyze the spatial non-stationary processes of discrete data. The disease maps arising from this process are considered through the establishment of the geographically weighted Poisson regression (GWPR) model [18–20] below

$$\log(\mu\_i) \triangleq \eta\_i = \sum\_{j=1}^p \beta\_j (u\_{i\prime} v\_{i\prime}) X\_{ij\prime} \tag{2}$$

where (*ui*, *vi*)(*i* = 1, 2, ... , *n*) are the geographical locations, and *βj*(*ui*, *vi*)(*j* = 1, 2, ... , *p*) are unknown parameters at the position (*ui*, *vi*).

#### *2.3. Geographically Weighted Poisson Autoregressive Model*

Another issue deserving of special attention is whether there exists an interaction between different regions in terms of spatial data. Previous studies [21–24] showed that spatial data has not only spatial non-stationarity but also correlation. Zhang [25] proposed the geographically weighted Poisson autoregressive (GWPAR) model as follows:

$$\log(\mu\_i) \triangleq \eta\_i = \rho \sum\_{k=1}^n c\_{ik} \eta\_k + \sum\_{j=1}^p \beta\_j (\mu\_{i\prime} \upsilon\_i) X\_{ij\prime} \tag{3}$$

where *ρ* is a scalar autoregressive parameter, and *cik*(*i*, *k* = 1, 2, ... , *n*) is the adjacency relation between the *i*th and *k*th locations. Let *c<sup>i</sup>* be the number of regions adjacent to the *i*th position. If the *k*th position is next to the *i*th's, then *cik* = 1/*c<sup>i</sup>* . Otherwise, *cik* = 0.

#### *2.4. Geographically and Temporally Weighted Poisson Autoregressive Model*

Recently, many spatiotemporal models have been proposed to describe the spatiotemporal variations in the relationships of response and explanatory variables [26,27]. Concerning the modeling of spatiotemporal data, there are two important properties: non-stationarity and auto-correlation. The non-stationarity indicates that there exists more than one linear relation between response and explanatory variables. It can be used to identify where interesting relationships are likely to occur or where detailed investigation is necessary in the study areas [28]. Spatiotemporal auto-correlation is an important factor to determine the temporal correlations of observations [29]. These two problems always appeared together [30]. A geographically and temporally weighted autoregressive (GTWPAR) model can be applied to account for non-stationary and auto-correlated effects simultaneously.

Let *Y* be the response variable, and *Yik*(*i* = 1, 2, ... , *nk*, *k* = 1, 2, ... , *T*) be the independent variables of *Y* in the *i*th position and the *k*th time. The density function can be defined as follows:

$$f(y\_{ik}; \theta\_{ik}, \phi\_{ik}) = \exp\left(\frac{y\_{ik}\theta\_{ik} - b(\theta\_{ik})}{a(\phi\_{ik})} + \mathfrak{c}(y\_{ik}, \phi\_{ik})\right),$$

where the parameters are similar to Section 2.1. Denote *μik* = *E*(*Yik*), and *g*(*μik*) = ln(*μik*). Let *Xijk*(*j* = 1, 2, . . . , *p*) be the *j*th explanatory variable. The GTWPAR model is expressed by

$$\log(\mu\_{ik}) \triangleq \eta\_{ik} = \rho \sum\_{m=1}^{T} \sum\_{l=1}^{n\_k} c\_{lm}^{(\bar{k})} \eta\_{lm} + \sum\_{j=1}^{p} \sum\_{k=1}^{T} \beta\_{jk} (\mu\_{i\bar{k}\lambda}, \upsilon\_{i\bar{k}\lambda}, t\_k) \chi\_{i\bar{j}\lambda}, \tag{4}$$

where {*βjk*(*uik*, *vik*, *tk*)} is a set of unknown parameters at the *i*th position in the *k*th time, and *c* (*ik*) *lm* is the adjacent relation between the location (*uik*, *vik*, *tk*) and (*ulm*, *vlm*, *tm*). Following the work of [31], the spatiotemporal distance between the locations (*uik*, *vik*, *tk*) and (*ulm*, *vlm*, *tm*) can be defined as

$$d\_{lm}^{(ik)} = \sqrt{\lambda[(\mu\_{ik} - \mu\_{lm})^2 + (\upsilon\_{ik} - \upsilon\_{lm})^2] + \mu(t\_k - t\_m)^2},$$

where *μ* and *λ* are used to balance spatiotemporal distances. Suppose that

$$
\mathcal{c}\_{lm}^{(ik)} = \begin{cases}
1/c^{ik}, & 0 < d\_{lm}^{(ik)} < d\_{\prime i} \\
0, & \text{otherwise},
\end{cases}
$$

where *d* is a constant and satisfies min{*d* (*ik*) *lm* } < *d* < max{*d* (*ik*) *lm* }.

Next, we rewrite the model (4) in a matrix form

$$
\eta = \rho \mathbf{C} \eta + \mathbf{B}' \mathbf{X}',
$$

where *η* = (*η*11, ··· , *ηn*11, *η*12, ··· , *ηn*22, ··· , *η*1*T*, ··· , *ηnTT*) , *C* = (*c* (*ik*) *lm* ), *X* = (*Xijk*) and **B** = (*βjk*(*uik*, *vik*, *tk*)). For convenience, define *η<sup>K</sup>* as the *K*th element of *η*; *CIK* and *XIK* are the *I*th row and the *K*th column of the matrices *C* and *X*, respectively. The detailed expressions of *C*, *X* and **B** are given in Appendix A.1.

**Remark 1.** *For the GTWPAR model (4), if ρ* = 0 *and βjk*(*uik*, *vik*, *tk*) *is independent of the spatiotemporal effect, the model is a PR model. If ρ* = 0 *and βjk*(*uik*, *vik*, *tk*) *is dependent on spatial effect but independent of temporal effect, the model becomes GWPR model. If ρ* = 0 *and βjk*(*uik*, *vik*, *tk*) *is independent of temporal effect, it is the GWPAR model. Thus, PR, GWPR and GWPAR models are the special cases of the GTWPAR model.*

#### **3. Coefficient Estimation**

In this section, we only provide the estimation method of the GTWPAR model since the PR, GWPR and GWPAR models are its special cases (Remark 1). Let (*uik*, *vik*, *tk*)(*i* = 1, 2, ... , *nk*, *k* = 1, 2, ... , *T*) be any point in the studied spatiotemporal region. We fix a point (*u*00, *v*00, *t*0) and assume that *βjk*(*uik*, *vik*, *tk*) ≈ *βj*0(*u*00, *v*00, *t*0)(*j* = 1, 2, . . . , *p*). Then, the model (4) can be rewritten by

$$\eta\_{ik} = \mathcal{g}(\mu\_{ik}) = \rho \sum\_{m=1}^{T} \sum\_{l=1}^{n\_k} c\_{lm}^{(ik)} \eta\_{lm} + \sum\_{j=1}^{p} \sum\_{k=1}^{T} \mathcal{\beta}\_{j0}(\mu\_{00}, v\_{00}, t\_0) X\_{ijk} \tag{5}$$

Denote *β*(*u*00, *v*00, *t*0)=(*β*10, ... , *βp*0) , **<sup>X</sup>** = diag(*Xi*.) and *Xi*· = (*Xi*1, ... , *Xip*). The corresponding matrix form can be represented as *η* = *ρCη* + *β* (*u*00, *v*00, *t*0)**X** .

#### *3.1. Estimation of Parameter Vector β*

For the fixed point (*u*00, *v*00, *t*0), we define a spatiotemporal distance *d* (0) *ik* from this point to (*uik*, *vik*, *tk*) as *d* (0) *ik* <sup>=</sup> *<sup>λ</sup>*[(*u*<sup>00</sup> <sup>−</sup> *uik*)<sup>2</sup> + (*v*<sup>00</sup> <sup>−</sup> *vik*)2] + *<sup>μ</sup>*(*t*<sup>0</sup> <sup>−</sup> *tk*)2. The Gauss kernel function of these two points can be written by

$$\begin{split} w\_{ik}(\boldsymbol{\mu\_{00}}, \boldsymbol{\upsilon\_{00}}, t\_{0}) &= \quad \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2} \left(\frac{d\_{ik}^{(0)}}{h\_{ST}}\right)^{2}\right\} \\ &= \quad \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2} \frac{\lambda\left[(\boldsymbol{\mu\_{00}} - \boldsymbol{\mu\_{ik}})^{2} + (\boldsymbol{\upsilon\_{00}} - \boldsymbol{\upsilon\_{ik}})^{2}\right] + \mu(t\_{0} - t\_{k})^{2}}{h\_{ST}^{2}}\right\} \\ &= \quad \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2} \left(\frac{(\boldsymbol{\mu\_{00}} - \boldsymbol{\mu\_{ik}})^{2} + (\boldsymbol{\upsilon\_{00}} - \boldsymbol{\upsilon\_{ik}})^{2}}{h\_{S}^{2}} + \frac{(t\_{0} - t\_{k})^{2}}{\tau h\_{S}^{2}}\right)\right\}, \end{split}$$

where *hST* and *hS* are the space-time bandwidth and space bandwidth, respectively. Meanwhile, we have *h*2 *ST* = *<sup>λ</sup>h*<sup>2</sup> *<sup>S</sup>*, and *τ* = *λ*/*μ* is a spatiotemporal factor. Without loss of generality, let *λ* = 1. Then, the weighted maximum likelihood of *Yik*(*i* = 1, 2, . . . , *nk*, *k* = 1, 2, . . . , *T*) at the point (*u*00, *v*00, *t*0) is

$$L(\boldsymbol{\beta}\_{10\prime}, \boldsymbol{\beta}\_{20\prime}, \dots, \boldsymbol{\beta}\_{p0}) = \prod\_{k=1}^{T} \prod\_{i=1}^{n\_k} f(\boldsymbol{y}\_{ik\prime} \boldsymbol{\theta}\_{ik\prime} \boldsymbol{\phi}\_{ik}) w\_{ik}(\boldsymbol{u}\_{00\prime}, \boldsymbol{v}\_{00\prime}, t\_0),$$

where *f*(*yik*; *θik*, *φik*) is the density function. The log-likelihood can be obtained as follows:

$$\mathbf{L}\_1(\boldsymbol{\beta}(\boldsymbol{\mu\_{00}}, \boldsymbol{\nu\_{00}}, t\_0)) = \sum\_{k=1}^{T} \sum\_{l=1}^{n\_k} \left( \frac{y\_{lk}\theta\_{lk} - b(\theta\_{lk})}{a(\phi\_{lk})} + c(y\_{lk}, \phi\_{lk}) \right) w\_{lk}(\boldsymbol{\mu\_{00}}, \boldsymbol{\nu\_{00}}, t\_0) \dots$$

Note that *c*(*yik*, *φik*) = − ln(*yik*!), *b*(*θik*) = *μik* = exp(*θik*), and *a*(*φik*) = *φik* = 1. Thus, *E*(*Yik*) = *b* (*θik*) = exp(*θik*) = *μik*, *Var*(*Yik*) = *b*(*θik*)*a*(*φik*) = exp(*θik*) = *μik*. Differentiating **L**<sup>1</sup> with respect to *β*(*u*00, *v*00, *t*0) yields

$$\frac{\partial \mathbf{L}\_1}{\partial \beta\_{r0}} = \sum\_{k=1}^{T} \sum\_{i=1}^{n\_k} \left( \frac{y\_{ik} - \mu\_{ik}}{a\_{ik}\phi} \frac{\partial \theta\_{lk}}{\partial \beta\_{r0}} \right) w\_{ik} (u\_{00\prime} v\_{00\prime} t\_0) = 0,\tag{6}$$

where *βr*<sup>0</sup> = *βr*(*u*00, *v*00, *t*0)(*r* = 1, 2, ··· , *p*), and

$$\frac{\partial \theta\_{ik}}{\partial \beta\_{r0}} = \left(\frac{\partial \mu\_{ik}}{\partial \theta\_{ik}}\right)^{-1} \frac{\partial \mu\_{ik}}{\partial \mathcal{S}(\mu\_{ik})} \frac{\partial \mathcal{g}(\mu\_{ik})}{\partial \beta\_{r0}} = \frac{1}{b''(\theta\_{ik})} \frac{1}{\mathcal{g}'(\mu\_{ik})} \frac{\partial \eta\_{ik}}{\partial \beta\_{r0}}.$$

For convenience, let *N* = ∑*<sup>N</sup> <sup>k</sup>*=<sup>1</sup> *nk* and *W* = (*wik*(*u*00, *v*00, *t*0))*N*×*N*. Denote *A* = (*IN* − *<sup>ρ</sup>C*)−1, *<sup>Y</sup>* = (*Y*11, ··· ,*Yn*11, ··· ,*Y*1*T*, ··· ,*YnTT*) , *μ* = (*μ*11, ··· , *μn*11, ··· , *μ*1*T*, ··· , *μnTT*) , *θ* = (*θ*11, ··· , *θn*11, ··· , *θ*1*T*, ··· , *θnTT*) , *φ* = (*φ*11, ··· , *φn*11, ··· , *φ*1*T*, ··· , *φnTT*) . Suppose that *YK*, *μK*, *θ<sup>K</sup>* and *φ<sup>K</sup>* are the *K*th elements of *Y*, *μ*, *θ* and *φ*, respectively. Then, we take the derivative of the model (5) with respect to *βr*0, and obtain

$$\frac{\partial \eta\_l}{\partial \beta\_{r0}} = \sum\_{h=1}^{N} A\_{lh} X\_{hr} = A\_l X\_{r\prime} \quad l = 1, 2, \dots, N.$$

The calculation process is given in Appendix A.2. Thus, the Equation (6) can be rewritten as

$$\frac{\partial \mathbf{L}\_1}{\partial \beta\_{r0}} = \frac{1}{\Phi} \sum\_{l=1}^{N} T\_l A\_{l\cdot\cdot} X\_{\cdot r} (\mathbf{Y}\_l - \mu\_l) \mathbf{g}^{\prime}(\mu\_l) \mathcal{W}\_l(\mu\_{00\prime}, \upsilon\_{00\prime}, t\_0) = 0.$$

However, there is not a close-form solution for *β*(*u*00, *v*00, *t*0). The Newton–Raphson procedure and Fisher scoring algorithm are used to get the estimation of *β*. The iterative formula is expressed as

$$\begin{array}{rcl}\beta^{(m+1)}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0})&=&\beta^{(m)}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0})+\mathbf{I}^{-1}(\beta^{(m)}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0}))S(\beta^{(m)}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0})) \\&=&((\boldsymbol{A}^{(m)}\boldsymbol{X})^{\prime}\boldsymbol{T}^{(m)}\boldsymbol{W}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0})(\boldsymbol{A}^{(m)}\boldsymbol{X}))^{-1} \\&\times(\boldsymbol{A}^{(m)}\boldsymbol{X})^{\prime}\boldsymbol{T}^{(m)}\boldsymbol{W}(\boldsymbol{u}\_{00},\boldsymbol{v}\_{00},t\_{0})\boldsymbol{Z}^{(m)}),\end{array}\tag{7}$$

where the Fisher information matrix **I**(*β*) = *E*(*I*(*β*)), and

$$S(\beta^{(m)}(u\_{00}, v\_{00}, t\_0)) = \left(\frac{\partial \mathbf{L}\_1}{\partial \beta\_{10}}, \frac{\partial \mathbf{L}\_1}{\partial \beta\_{20}}, \dots, \frac{\partial \mathbf{L}\_1}{\partial \beta\_{p0}}\right)'$$

is the scalar vector. The detail process is provided in Appendix A.2. For the fixed point (*uik*, *vik*, *tk*)(*i* = 1, 2, . . . , *nk*; *k* = 1, 2, . . . , *T*), *β*ˆ *jk*(*uik*, *vik*, *tk*) can be obtained by (7).

**Remark 2.** *The estimations β*ˆ(*uik*, *vik*, *tk*)(*i*, *l* = 1, 2, ... , *nk*, *k*, *m* = 1, 2, ... , *T*) *are related to the temporal and spatial effects in the GTWPAR model. If m* = *k, wik*(*ulm*, *vlm*, *tm*) = 0 *and c* (*ik*) *lm* <sup>=</sup> <sup>0</sup>*, then <sup>β</sup>*ˆ(*uik*, *vik*, *tk*) = *<sup>β</sup>*ˆ(*ui*, *vi*) *correspond to the parameter estimations of the GWPAR model. If wik*(*ulm*, *vlm*, *tm*) = 0(*m* = *k*) *and C* = **0***, they are the estimations of the GWPR model. If W* = **0** *and C* = **0***, then β*ˆ(*uik*, *vik*, *tk*) = *β*ˆ *are the global estimation values of the PR model.*

#### *3.2. Estimation of Parameter ρ*

Based on the density function, the log-likelihood function of *ρ* is

$$\mathbf{L}\_2(\rho) = \sum\_{k=1}^{T} \sum\_{i=1}^{n\_k} \left( \frac{y\_{ik}\theta\_{ik} - b(\theta\_{ik})}{a(\phi\_{ik})} + c(y\_{ik}, \phi\_{ik}) \right).$$

Differentiating **L**2(*ρ*) with respect to *ρ*, we have

$$\frac{\partial \mathbf{L}\_2}{\partial \rho} = \sum\_{k=1}^{T} \sum\_{i=1}^{n\_k} \left( \frac{y\_{ik} - \mu\_{ik}}{a\_{ik} \phi} \frac{\partial \theta\_{ik}}{\partial \rho} \right) = 0,\tag{8}$$

where *<sup>d</sup>θik <sup>d</sup><sup>ρ</sup>* <sup>=</sup> <sup>1</sup> *b*(*θik* )*g*(*μik* ) *dηik <sup>d</sup><sup>ρ</sup>* . Then, we take the derivative of the model (5) with respect to *ρ* as follows:

$$\frac{d\eta\_l}{d\rho} = \frac{d\lg(\mu\_l)}{d\rho} = \sum\_{h=1}^N A\_h \mathcal{C}\_h \eta\_h.$$

The detail calculation is given in Appendix A.3. Then, the Equation (8) can be rewritten in the following nonlinear form

$$\frac{d\mathbf{L}\_2}{d\rho} = \sum\_{l=1}^{N} \frac{(\mathbf{Y}\_l - \mu\_l)\sum\_{l=1}^{N} A\_{l\cdot} \mathbf{C}\_{\cdot l\cdot} \eta\_{l\cdot}}{a\_l \phi V(\mu\_l) \mathbf{g}'(\mu\_l)} = 0.1$$

According to the Newton–Raphson procedure and Fisher scoring algorithm, the iterative formula of *ρ*ˆ(*m*+1) is

$$\begin{array}{rcl} \boldsymbol{\beta}^{(m+1)} &=& \boldsymbol{\beta}^{(m)} + \boldsymbol{\mathcal{Z}}^{-1}(\boldsymbol{\beta}^{(m)}) \mathbf{S}(\boldsymbol{\beta}^{(m)}) \\ &=& \boldsymbol{\beta}^{(m)} + ((\boldsymbol{\mathcal{A}}^{(m)} \mathbf{C} \boldsymbol{\eta}^{(m)})' \boldsymbol{T}^{(m)} (\boldsymbol{\mathcal{A}}^{(m)} \mathbf{C} \boldsymbol{\eta}^{(m)}))^{-1} \\ & \times (\boldsymbol{\mathcal{A}}^{(m)} \mathbf{C} \boldsymbol{\eta}^{(m)})' \boldsymbol{T}^{(m)} (\boldsymbol{\mathcal{Z}}^{(m)} - \boldsymbol{\eta}^{(m)}), \end{array} \tag{9}$$

where the scalar vector *S*(*ρ*ˆ(*m*)) = <sup>1</sup> *<sup>φ</sup>* (*ACη*) *T*(*Z* − *η*) and the Fisher information matrix I(*ρ*) = 1 *<sup>φ</sup>* (*ACη*) *<sup>T</sup>*(*ACη*). The calculation process of the scalar vector *<sup>S</sup>*(*ρ*ˆ(*m*)) and the information matrix <sup>I</sup> is given in Appendix A.3.

#### **4. Main Results**

In this section, we apply the PR, GWPR, GWPAR and GTWPAR models to analyze the relationships between the HIV/AIDS epidemic, the GDP per capita and population density in ten countries of Sub-Saharan Africa from 2011 to 2016. The ten countries are Angola, Botswana, Lesotho, Malawi, Mozambique, Namibia, South Africa, Swaziland, Zimbabwe and Zambia. The parameters of these four models are estimated by the Newton–Raphson procedure and Fisher scoring algorithm. The coefficient of determination R2, the corrected Akaike information criterion (AICc), the deviation (D) and mean-square error (MSE) are used to compare the performances of the four models [18].

#### *4.1. The HIV/AIDS Epidemic Models*

The data of HIV/AIDS incidence, GDP per capita and population density were derived from http: //data.cnki.net/InternationalData/Report. Readers should note that authorization is required to access the database on this website. Figure 1 describes the HIV/AIDS incidence in ten countries from 2011 to 2016. It shows that the incidence varies significantly in different regions. Angola has a minimum incidence

of less than 5%, while Botswana and Swaziland have higher incidences of more than 20% every years. Therefore, it may be necessary to consider the temporal and spatial factors in analyzing the HIV/AIDS epidemic.

**Figure 1.** Spatiotemporal HIV/AIDS incidence of ten countries, 2011–2016.

The distributions of HIV/AIDS cases, GDP per capita and population density are displayed in Figure 2. The Pearson correlation coefficients between these cases and GDP per capita and population density are 0.2739 and −0.1179, respectively. Meanwhile, the two socioeconomic factors have different effects on the HIV/AIDS cases at the spatiotemporal locations. These reflect a spatiotemporal non-stationarity between the cases and two factors in ten countries from 2011 to 2016. Table 1 lists the *p*-values of the first-order autocorrelation of HIV/AIDS cases in the different years of the same region or the different regions of the same year. Each region has a significant spatial autocorrelation (*p*-value < 0.01) each year. Lesotho and South Africa had temporal autocorrelation during 2011 to 2016. Thus, the spatial and temporal autocorrelation should not be ignored.

**Figure 2.** Distributions of HIV/AIDS cases, GDP per capita and population density of ten countries, 2011–2016.


**Table 1.** *p*-values of the spatial and temporal autocorrelation analysis.

Next, we standardized the two socioeconomic factors. The multiplex collinear test [32] was performed by the condition number *<sup>k</sup>* <sup>=</sup> <sup>√</sup>*λ*max/*λ*min <sup>=</sup> 1.804(<sup>≤</sup> <sup>15</sup>) (*<sup>λ</sup>* is the eigenvalue of explanatory variable matrix). If *k* > 15, then the data have collinearity. Otherwise, there is no collinearity. Thus, there is no collinearity between the two factors. Let *μik*, *rik* and *Pik* be the annual HIV/AIDS cases (Unit: 1/1000 people), incidence (Unit: 1/100) and total population (unit: 100,000 people) in the *k*th year of the *i*th region, respectively. Denote *g*(*μik*) = *ηik* = ln *μik* = ln *rik* + ln *Pik*(*μik* = *rikPik*, *i* = 1, 2, ... , 10 and *k* = 1, 2, ... , 6). Let *Xi*<sup>1</sup>*<sup>k</sup>* and *Xi*<sup>2</sup>*<sup>k</sup>* be the GDP per capita and population density in the *i*th region at the *k*th year, respectively. The PR model is written by

$$\lg(\mu\_{ik}) = \beta\_0 + \beta\_1 X\_{i1k} + \beta\_2 X\_{i2k}, \quad i = 1, 2, \dots, 10, k = 1, 2, \dots, 6,\tag{10}$$

where *βj*(*j* = 0, 1, 2) are unknown constants. The GWPR model is introduced as

$$\mathcal{S}\_{\mathcal{S}}(\mu\_{ik}) = \beta\_0(u\_{ik}, v\_{ik}) + \beta\_1(u\_{ik}, v\_{ik})X\_{i1k} + \beta\_2(u\_{ik}, v\_{ik})X\_{i2k}, \quad i = 1, 2, \dots, 10,\tag{11}$$

where *k* is a fixed constant taken from {1, 2, ... , 6}, and *βj*(*uik*, *vik*) are unknown spatial parameters for the *i*th country (*uik*, *vik*) in the *k*th year. Let *ρ* be a scalar autoregressive parameter, and *cil* be a constant that represents an adjacency relation. The GWPAR model is

$$\log(\mu\_{i\bar{k}}) = \rho \sum\_{l=1}^{n} c\_{l\bar{l}} \eta\_{i\bar{k}} + \beta \mathbf{o}(u\_{i\bar{k}\succ i} \upsilon\_{i\bar{k}}) + \beta \mathbf{1}(u\_{i\bar{k}\succ i} \upsilon\_{i\bar{k}}) X\_{i1\bar{k}} + \beta \mathbf{2}(u\_{i\bar{k}\succ i} \upsilon\_{i\bar{k}}) X\_{i2\bar{k}} \tag{12}$$

where *n* = 10, *k* is a fixed constant, and *βj*(*uik*, *vik*) are defined as above. Let *c* (*ik*) *lm* be a spatiotemporal adjacency relation, and *βjk*(*uik*, *vik*, *tk*)(*k* = 1, 2, ... , 6) be unknown spatiotemporal parameters in the *i*th country (*uik*, *vik*) in the *k*th year. The GTWPAR model is established as follows:

$$\begin{aligned} \text{g}(\mu\_{ik}) &= \rho \sum\_{m=1}^{T} \sum\_{l=1}^{n\_k} c\_{lm}^{(ik)} \eta\_{lm} + \beta\_{0k} (\mu\_{ik}, \upsilon\_{ik}, t\_k) \\ &+ \beta\_{1k} (\mu\_{ik}, \upsilon\_{ik}, t\_k) X\_{i1k} + \beta\_{2k} (\mu\_{ik}, \upsilon\_{ik}, t\_k) X\_{i2k} \end{aligned} \tag{13}$$

where *T* = 6; *nk* = 10 for every *k* years; and *ρ* is defined as above.

Algorithms I, II, III and IV of PR, GWPR, GWPAR and GTWPAR models are provided in Appendix A.4, respectively.

#### *4.2. Statistical Analysis*

For the PR model, we get the estimated values of unknown parameters by Algorithm I. Then, the best space bandwidth is chosen by the cross-validation method. Following Huang et al. [28], the range [0.09, 2.49] of the space bandwidth is selected according to the minimum and maximum distance of the

geographical positions. In the GWPR model, the best space bandwidth is *h* = 0.62, 0.59, 0.62, 0.61, 0.60, 0.60, and the estimations of coefficient functions are given by Algorithm II. The optimal space bandwidth of the GWPAR model is selected as *h* = 1.2895, 1.1316, 1.1316, 1.0526, 1.0526, 1.0526. Based on Algorithm III, we can get the estimations of coefficient functions and the scalar autoregressive parameter *<sup>ρ</sup>* <sup>=</sup> 0.267, 0.269, 0.263, 0.264, 0.264, 0.264. For the GTWPAR model, we chose *hs* <sup>=</sup> 1.1316, 0.9737, 0.8947, 0.9211, 0.7842, 0.8789 and *τ* = 0.1, where *τ*(> 0) is a balanced parameter. The coefficient estimations and scalar autoregressive parameter *<sup>ρ</sup>* <sup>=</sup> 0.126 can be obtained by Algorithm IV. The quantile and mean values of coefficient estimations and response variables are shown in Table 2. We note that the GWPR, GWPAR and GTWPAR models can reflect the non-stationarity property of the influencing factors; the PR model cannot. Moreover, the GTWPAR model has a better performance than other models by comparing the true and fitted values.


**Table 2.** The quantile and mean values of coefficient estimations and response variables.

The average estimated coefficients are visualized in Figure 3. For the PR model, the GDP per capita and population density had the same effect on the HIV/AIDS epidemic for ten countries in six years. However, there exist significant spatial non-stationarity and auto-correlation for different countries under the GWPR, GWPAR and GTWPAR models. Figure 4 shows the spatial distribution of the average MSE of their response variables. The lighter the color, the smaller the average error is. Thus, the GWPAR and GTWPAR models have the better fitting results.

*Entropy* **2020**, *22*, 1230

**Figure 3.** The spatial distribution of the average coefficient estimations in four models.

**Figure 4.** The average MSE of response variables.

These four indicators can effectively compare the performances of the proposed models (Table 3). The calculation formulas of R2, AICc, D and MSE are given in Appendix A.5. The coefficient of determination R2 gradually increases from 12.91% of the PR model to 99.57% of the GTWPAR model. The MSE, AICc and D values of the GTWPAR model are smaller than those of other models. Therefore, the GTWPAR model is more suitable to investigate the spatiotemporal HIV/AIDS epidemic.


**Table 3.** The comparison of the four models.

Based on the GTWPAR model, the mean values and 95% confidence intervals of the coefficient estimations are shown in Figure 5. The mean estimations are represented by the dot, and the 95% confidence intervals are given by the upper and lower lines. Note that the GDP per capita in Botswana, Namibia and South Africa has a positive effect on the HIV/AIDS cases. Six other countries (except Lesotho) had the opposite results. The population density for five countries had a positive effect on the HIV/AIDS cases—Angola, Botswana, Namibia, South Africa and Zambia. The population density of other five countries had the negative effect. Moreover, the impact of the GDP per capita on HIV/AIDS epidemic had

a strong spatiotemporal non-stationarity in Lesotho, Malawi and Zimbabwe, while the population density had a strong spatiotemporal non-stationarity in Angola.

**Figure 5.** The mean values and 95% confidence intervals of coefficient estimations.

#### **5. Conclusions**

In this paper, we propose four regression models, including the PR, GWPR, GWPAR and GTWPAR, to investigate the non-stationary and auto-correlation properties. The relationships between the HIV/AIDS epidemic, GDP per capita and population density were analyzed in ten countries of Sub-Saharan Africa from 2011 to 2016. The unknown parameters of these models can be estimated by the Newton–Raphson procedure and Fisher scoring algorithm.

The PR model is a classical generalized model, which considers the global relationships between the response and explanatory variables. The GWPR and GWPAR models have been introduced to determine the spatial non-stationarity or auto-correlation. The GTWPAR model proposed by this article can be used to investigate not only spatiotemporal non-stationary but also auto-correlation. Thus, the PR, GWPR and GWPAR models are several special cases of the GTWPAR model (see Remark 1 and Remark 2). The performances of these models were evaluated by analyzing the correlations between the HIV/AIDS epidemic and two socioeconomic factors. The parameter estimations of the models can be obtained by Algorithms I, II, III and IV in Appendix A.4.

The results show that the impacts of GDP per capita and population density on HIV /AIDS cases had significant spatiotemporal non-stationarity and auto-correlation. The GWPR, GWPAR and GTWPAR models can reflect the strong spatial or spatiotemporal non-stationarity. The auto-correlation can be reflected in the GWPAR and GTWPAR models. Compared with other models, the GTWPAR model is more effective in terms of four comparison indicators. Thus, we suggest that the GTWPAR model can be used to analyze the spatiotemporal characteristics of the HIV/AIDS epidemic and the influences of the GDP per capita and population density.

Further work also exists in our study. For example, we observed that the effects of the GDP per capita for Lesotho, Malawi and Zimbabwe and the population density for Angola on HIV/AIDS had strong spatiotemporal non-stationarity. These may be the result of local environmental or political factors. Whether the fitting results of these regions will perform better if explanatory variables such as local unique environmental or political factors are added needs to be further investigated.

**Author Contributions:** Conceptualization, Z.L., H.Z. and X.H.; methodology, S.S., H.Z. and X.H.; software, S.S.; supervision, Z.L. and H.J.; visualization, Z.L.; writing—original draft, S.S.; writing—review & editing, S.S., Z.L. and H.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (XJEDU2017M001, 2018Q011), and the National Natural Science Foundation of China (U1703237, 11661076, 12061070).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Detailed Processes**

*Appendix A.1. The Expressions of C, X and B*

In model *η* = *ρCη* + **B** *X* , the expressions of *C*, *X* and *B* are

*C* = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ *c* (11) <sup>11</sup> ··· *c* (*n*11) <sup>11</sup> ··· *c* (1*T*) <sup>11</sup> ··· *c* (*nTT*) 11 . . . . . . . . . . . . . . . . . . . . . *c* (11) (*n*11) ··· *<sup>c</sup>* (*n*11) (*n*11) ··· *<sup>c</sup>* (1*T*) (*n*11) ··· *<sup>c</sup>* (*nTT*) (*n*11) . . . . . . . . . . . . . . . . . . . . . *c* (11) <sup>1</sup>*<sup>T</sup>* ··· *c* (*n*11) <sup>1</sup>*<sup>T</sup>* ··· *c* (1*T*) <sup>1</sup>*<sup>T</sup>* ··· *c* (*nTT*) 1*T* . . . . . . . . . . . . . . . . . . . . . *c* (11) (*nTT*) ··· *<sup>c</sup>* (*n*11) (*nTT*) ··· *<sup>c</sup>* (1*T*) (*nTT*) ··· *<sup>c</sup>* (*nTT*) (*nTT*) ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ , *X* = ⎛ ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ *X*<sup>111</sup> *X*<sup>121</sup> ··· *X*1*p*<sup>1</sup> . . . . . . . . . . . . *Xn*<sup>111</sup> *Xn*<sup>121</sup> ··· *Xn*<sup>1</sup> *<sup>p</sup>*<sup>1</sup> . . . . . . . . . . . . *X*11*<sup>T</sup> X*12*<sup>T</sup>* ··· *X*1*pT* . . . . . . . . . . . . *XnT*<sup>1</sup>*<sup>T</sup> XnT*<sup>2</sup>*<sup>T</sup>* ··· *XnT pT* ⎞ ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ ,

where *CIK*, *XIK* are respectively the *I*th row and the *K*th column of the matrices *C* and *X*. Moreover,

**B** = (*β*11(*u*11, *v*11, *t*1), ··· , *βp*1(*u*11, *v*11, *t*1), ··· , *β*11(*un*11, *vn*11, *t*1), ··· , *βp*1(*un*11, *vn*11, *t*1), ··· , *β*1*T*(*u*1*T*, *v*1*T*, *tT*), ··· , *βpT*(*u*1*T*, *v*1*T*, *tT*), ··· , *β*1*T*(*unT*1, *vnT*1, *t*1), ··· , *βpT*(*unT*1, *vnT*1, *t*1)) .

*Appendix A.2. Formula and Information Matrix of β*(*u*, *v*, *t*)

(1) For the matrix form of model (5), we obtain

$$\begin{cases} \begin{aligned} &(1 - \rho \mathbb{C}\_{11})\eta\_1 - \rho \mathbb{C}\_{12}\eta\_2 - \cdots - \rho \mathbb{C}\_{1N}\eta\_N = \sum\_{j=1}^p \beta\_{j0} \mathbb{X}\_{1j}, \\ & - \rho \mathbb{C}\_{21}\eta\_1 + (1 - \rho \mathbb{C}\_{22})\eta\_2 - \cdots - \rho \mathbb{C}\_{2N}\eta\_N = \sum\_{j=1}^p \beta\_{j0} \mathbb{X}\_{2j}, \\ & \vdots \\ & \vdots \\ & - \rho \mathbb{C}\_{N1}\eta\_1 - \rho \mathbb{C}\_{N2}\eta\_2 - \cdots + (1 - \rho \mathbb{C}\_{NN})\eta\_N = \sum\_{j=1}^p \beta\_{j0} \mathbb{X}\_{Nj}. \end{aligned} \end{cases}$$

Differentiating the above equations with *βr*<sup>0</sup> yields

$$\begin{cases} (1 - \rho \mathbf{C}\_{11}) \frac{\partial \eta\_{1}}{\partial \beta\_{r0}} - \rho \mathbf{C}\_{12} \frac{\partial \eta\_{2}}{\partial \beta\_{r0}} - \dots - \rho \mathbf{C}\_{1N} \frac{\partial \eta\_{N}}{\partial \beta\_{r0}} = X\_{1r}, \\ -\rho \mathbf{C}\_{21} \frac{\partial \eta\_{1}}{\partial \beta\_{r0}} + (1 - \rho \mathbf{C}\_{22}) \frac{\partial \eta\_{2}}{\partial \beta\_{r0}} - \dots - \rho \mathbf{C}\_{2N} \frac{\partial \eta\_{N}}{\partial \beta\_{r0}} = X\_{2r}, \\ \vdots \\ -\rho \mathbf{C}\_{N1} \frac{\partial \eta\_{1}}{\partial \beta\_{r0}} - \rho \mathbf{C}\_{N2} \frac{\partial \eta\_{2}}{\partial \beta\_{r0}} - \dots + (1 - \rho \mathbf{C}\_{NN}) \frac{\partial \eta\_{N}}{\partial \beta\_{r0}} = X\_{Nr}. \end{cases}$$

Then,

$$
\begin{pmatrix}
1 - \rho \mathbb{C}\_{11} & -\rho \mathbb{C}\_{12} & \cdots & -\rho \mathbb{C}\_{1N} \\
\vdots & \vdots & \vdots & \vdots \\
\end{pmatrix}
\begin{pmatrix}
\frac{\partial \eta\_{1}}{\partial \mathbb{B}\_{r0}} \\
\frac{\partial \eta\_{2}}{\partial \mathbb{B}\_{r0}} \\
\vdots \\
\frac{\partial \eta\_{N}}{\partial \mathbb{B}\_{r0}}
\end{pmatrix} = 
\begin{pmatrix}
X\_{1r} \\
X\_{2r} \\
\vdots \\
X\_{Nr}
\end{pmatrix}.
$$

Denote *∂η ∂βr*<sup>0</sup> = ( *∂η*<sup>1</sup> *∂βr*<sup>0</sup> , *∂η*<sup>2</sup> *∂βr*<sup>0</sup> ,..., *∂η<sup>N</sup> ∂βr*<sup>0</sup> ) and

$$A = \begin{pmatrix} 1 - \rho \mathbb{C}\_{11} & -\rho \mathbb{C}\_{12} & \cdots & -\rho \mathbb{C}\_{1N} \\ -\rho \mathbb{C}\_{21} & 1 - \rho \mathbb{C}\_{22} & \cdots & -\rho \mathbb{C}\_{2N} \\ \vdots & \vdots & \vdots & \vdots \\ -\rho \mathbb{C}\_{N1} & -\rho \mathbb{C}\_{N2} & \cdots & 1 - \rho \mathbb{C}\_{NN} \end{pmatrix}^{-1},$$

$$X\_{I} = (X\_{1r}, X\_{2r}, \dots, X\_{Nr})', A\_{I} = (A\_{I1}, A\_{I2}, \dots, A\_{IN})'.$$

Thus, *∂η ∂βr*<sup>0</sup> = *AX*·*r*, that is

$$\frac{\partial \eta\_l}{\partial \beta\_{r0}} = \sum\_{h=1}^{N} A\_{lh} X\_{hr} = A\_l X\_{,r\_l} \quad l = 1, 2, \dots, N.$$

(2) The element *Irb* of *I*(*β*) satisfies

*Irb*(*β*) = <sup>−</sup> *<sup>∂</sup>*2**L**<sup>1</sup> *∂βb*0*∂βr*<sup>0</sup> <sup>=</sup> <sup>−</sup> *<sup>∂</sup> ∂βb*<sup>0</sup> *<sup>N</sup>* ∑ *l*=1 *Yl* − *μ<sup>l</sup> alφ Al*·*X*·*<sup>r</sup> V*(*μl*)*g*(*μl*) *Wl*(*u*00, *v*00, *t*0) = − *N* ∑ *l*=1 *Yl* − *μ<sup>l</sup> alφ ∂ ∂βb*<sup>0</sup> *Al*·*X*·*<sup>r</sup> V*(*μl*)*g*(*μl*) *Wl*(*u*00, *v*00, *t*0) − *N* ∑ *l*=1 *Al*·*X*·*<sup>r</sup> V*(*μl*)*g*(*μl*) *∂ ∂βb*<sup>0</sup> *Yl* − *μ<sup>l</sup> alφ Wl*(*u*00, *v*00, *t*0) = − *N* ∑ *l*=1 *Yl* − *μ<sup>l</sup> alφ ∂ ∂βb*<sup>0</sup> *Al*·*X*·*<sup>r</sup> V*(*μl*)*g*(*μl*) *Wl*(*u*00, *v*00, *t*0) + *N* ∑ *l*=1 *Al*·*X*·*rAl*·*X*·*<sup>b</sup> alφV*(*μl*)(*g*(*μl*))<sup>2</sup> *Wl*(*u*00, *<sup>v</sup>*00, *<sup>t</sup>*0).

The Fisher information matrix is

$$\begin{split} \mathbf{I}(\boldsymbol{\beta}) = \mathbf{E}(I(\boldsymbol{\beta})) = \mathbf{E}((I\_{rb}(\boldsymbol{\beta}))\_{p \times p}) &= \sum\_{l=1}^{N} \frac{A\_{l} \mathbf{X}\_{r} A\_{l} \mathbf{X}\_{\cdot b}}{a\_{l} \boldsymbol{\Phi} V(\mu\_{l}) (\mathbf{g}^{r}(\mu\_{l}))^{2}} \mathbf{W}\_{l}(\mu\_{00}, \boldsymbol{\nu}\_{00}, t\_{0}) \\ &= \frac{1}{\boldsymbol{\Phi}} \sum\_{l=1}^{N} T\_{l} A\_{l} \mathbf{X}\_{r} A\_{l} \mathbf{X}\_{b} \mathbf{W}\_{l}(\mu\_{00}, \boldsymbol{\nu}\_{00}, t\_{0}). \end{split}$$

*Appendix A.3. Formula and Information Matrix of ρ*

(1) Differentiating (4) with *ρ* yields

$$\begin{cases} (1 - \rho \mathbb{C}\_{11}) \frac{d\eta\_{1}}{d\rho} - \rho \mathbb{C}\_{12} \frac{d\eta\_{2}}{d\rho} - \dots - \rho \mathbb{C}\_{1N} \frac{d\eta\_{N}}{d\rho} = \mathbb{C}\_{11}\eta\_{1} + \mathbb{C}\_{12}\eta\_{2} + \dots + \mathbb{C}\_{1N}\eta\_{N}, \\ -\rho \mathbb{C}\_{21} \frac{d\eta\_{1}}{d\rho} + (1 - \rho \mathbb{C}\_{22}) \frac{d\eta\_{2}}{d\rho} - \dots - \rho \mathbb{C}\_{2N} \frac{d\eta\_{N}}{d\rho} = \mathbb{C}\_{21}\eta\_{1} + \mathbb{C}\_{22}\eta\_{2} + \dots + \mathbb{C}\_{2N}\eta\_{N}, \\ \vdots \\ -\rho \mathbb{C}\_{N1} \frac{d\eta\_{1}}{d\rho} - \rho \mathbb{C}\_{N2} \frac{d\eta\_{2}}{d\rho} - \dots + (1 - \rho \mathbb{C}\_{NN}) \frac{d\eta\_{N}}{d\rho} = \mathbb{C}\_{N1}\eta\_{1} + \mathbb{C}\_{N2}\eta\_{2} + \dots + \mathbb{C}\_{NN}\eta\_{N}. \end{cases}$$

That is to say

$$
\begin{pmatrix}
1 - \rho \mathbb{C}\_{11} & -\rho \mathbb{C}\_{12} & \cdots & -\rho \mathbb{C}\_{1N} \\
\vdots & \vdots & \vdots & \vdots \\
\end{pmatrix}
\begin{pmatrix}
\frac{d\eta\_{1}}{d\rho} \\
\frac{d\eta\_{2}}{d\rho} \\
\vdots \\
\frac{d\eta\_{N}}{d\rho}
\end{pmatrix} = \begin{pmatrix}
\mathbb{C}\_{11} & \mathbb{C}\_{12} & \cdots & \mathbb{C}\_{1N} \\
\mathbb{C}\_{21} & \mathbb{C}\_{22} & \cdots & \mathbb{C}\_{2N} \\
\vdots & \vdots & \vdots & \vdots \\
\mathbb{C}\_{N1} & \mathbb{C}\_{N2} & \cdots & \mathbb{C}\_{NN}
\end{pmatrix} \begin{pmatrix}
\eta\_{1} \\
\eta\_{2} \\
\vdots \\
\eta\_{N}
\end{pmatrix}.
$$

Let *<sup>d</sup><sup>η</sup> <sup>d</sup><sup>ρ</sup>* = ( *<sup>d</sup>η*<sup>1</sup> *<sup>d</sup><sup>ρ</sup>* ,..., *<sup>d</sup>η<sup>N</sup> <sup>d</sup><sup>ρ</sup>* ) . Then, *<sup>d</sup><sup>η</sup> <sup>d</sup><sup>ρ</sup>* = *ACη*, where

$$\frac{d\eta\_l}{d\rho} = \frac{d\mathcal{g}(\mu\_l)}{d\rho} = \sum\_{h=1}^N A\_l \mathbb{C}\_{\text{-}h} \eta\_{h\prime} \quad l = 1, 2, \cdots, N.$$

(2) The scalar vector of *ρ* is

$$\begin{aligned} S(\rho) &= \quad \frac{d\mathbf{L}\_2}{d\rho} = \frac{1}{\Phi} \sum\_{l=1}^N T\_l \left( \sum\_{h=1}^N A\_{l\cdot} \mathbb{C}\_{\cdot h} \eta\_h \right) (Y\_l - \mu\_l) \mathbf{g}'(\mu\_l), \\ &= \quad \frac{1}{\Phi} (AC\eta)' T(Z - \eta). \end{aligned}$$

The information matrix is

$$\begin{array}{rcl} I(\rho) &=& -\frac{\partial^2 \mathcal{L}\_2}{\partial \rho^2} = -\frac{d}{d\rho} \sum\_{l=1}^N \left( \frac{\mathcal{Y}\_l - \mu\_l}{a\_l \phi} \right) \left( \frac{\sum\_{h=1}^N A\_l \mathcal{C}\_{\cdot h} \eta\_h}{V(\mu\_l) \mathcal{G}'(\mu\_l)} \right) \\ &=& -\sum\_{l=1}^N \left( \frac{\mathcal{Y}\_l - \mu\_l}{a\_l \phi} \right) \frac{d}{d\rho} \left( \frac{\sum\_{h=1}^N A\_l \mathcal{C}\_{\cdot h} \eta\_h}{V(\mu\_l) \mathcal{G}'(\mu\_l)} \right) \\ & - \sum\_{l=1}^N \left( \frac{\sum\_{h=1}^N A\_l \mathcal{C}\_{\cdot h} \eta\_h}{V(\mu\_l) \mathcal{G}'(\mu\_l)} \right) \frac{d}{d\rho} \left( \frac{\mathcal{Y}\_l - \mu\_l}{a\_l \phi} \right) . \end{array}$$

where

$$\frac{d\mu\_l}{d\rho} = \frac{d\mu\_l}{d\lg(\mu\_l)} \frac{d\lg(\mu\_l)}{d\rho} = \frac{1}{\lg'(\mu\_l)} \sum\_{h=1}^N A\_{l^\cdot} \mathbb{C}\_{h\eta\eta\_h}.$$

Thus,

$$I(\rho) = -\sum\_{l=1}^{N} \left( \frac{\Upsilon\_l - \mu\_l}{a\_l \phi} \right) \frac{d}{d\rho} \left( \frac{\sum\_{h=1}^{N} A\_{l\cdot} \mathcal{C}\_{\cdot h} \eta\_h}{V(\mu\_l) \mathcal{g}'(\mu\_l)} \right) + \sum\_{l=1}^{N} \frac{(\sum\_{h=1}^{N} A\_{l\cdot} \mathcal{C}\_{\cdot h} \eta\_h)^2}{a\_l \phi V(\mu\_l) (\mathcal{g}'(\mu\_l))^2}.$$

The Fisher information matrix is

$$\begin{aligned} \mathcal{Z}(\rho) &= \quad E(I(\rho)) = \sum\_{l=1}^{N} \frac{(\sum\_{h=1}^{N} A\_l \mathbb{C}\_{-h} \eta\_h)^2}{a\_l \Phi V(\mu\_l) (\mathcal{g}'(\mu\_l))^2} \\ &= \quad \frac{1}{\Phi} \sum\_{l=1}^{N} T\_l \left( \sum\_{h=1}^{N} A\_l \mathbb{C}\_{-h} \eta\_h \right)^2 = \frac{1}{\Phi} (AC\eta)' T(AC\eta) .\end{aligned}$$

*Appendix A.4. Algorithms of Coefficient Estimation*

We provide four algorithms to estimate the unknown coefficients of PR, GWPR, GWPAR and GTWPAR models in Section 4.

**Algorithm I:** Estimate the unknown parameters in the PR model. Take the initial values *<sup>g</sup>*(*μ*(0) *ik* ) = *η*(0) *ik* <sup>=</sup> ln *<sup>μ</sup>*(0) *ik* , *yik* <sup>=</sup> *<sup>μ</sup>*(0) *ik* , *<sup>Z</sup>*(0) *ik* <sup>=</sup> *<sup>η</sup>*(0) *ik* + *g* (*μ*(0) *ik* )(*yik* <sup>−</sup> *<sup>μ</sup>*(0) *ik* ), and *w*(0) *ik* <sup>=</sup> <sup>1</sup> *aikV*(*μ*(0) *ik* )(*g*(*μ*(0) , *i* = 1, 2, . . . , 10, *k* = 1, 2, . . . , 6.

The iterative formula of *β*ˆ(*m*+1) is

$$\beta^{(m+1)} = \left(X^\prime \mathcal{W}^{(m)} X\right)^{-1} X^\prime \mathcal{W}^{(m)} Z^{(m)}.$$

Repeat the above step until convergence yields. The estimated value *β*ˆ = *β*ˆ(*m*) can be obtained.

*ik* ))<sup>2</sup>

**Algorithm II:** Estimate the unknown coefficients in the GWPR model. (Note that *k* is a fixed constant taken from {1, 2, ... , 6} and the following steps should be repeated six times independently). Take the initial values *η*(0) *ik* <sup>=</sup> *<sup>g</sup>*(*μ*(0) *ik* ), *yik* <sup>=</sup> *<sup>μ</sup>*(0) *ik* , *<sup>Z</sup>*(0) *ik* <sup>=</sup> *<sup>η</sup>*(0) *ik* + *g* (*μ*(0) *ik* )(*yik* <sup>−</sup> *<sup>μ</sup>*(0) *ik* ), and

$$t\_{ik}^{(0)} = \frac{1}{a\_{ik} V(\mu\_{ik}^{(0)}) (g'(\mu\_{ik}^{(0)}))^2}, i = 1, 2, \dots, 10.$$

Let *Z*(0) = (*Z*(0) <sup>1</sup>*<sup>k</sup>* , *<sup>Z</sup>*(0) <sup>2</sup>*<sup>k</sup>* , ··· , *<sup>Z</sup>*(0) <sup>10</sup>*k*) and *<sup>T</sup>*(0) <sup>=</sup> diag(*T*(0) <sup>1</sup>*<sup>k</sup>* , *<sup>T</sup>*(0) <sup>2</sup>*<sup>k</sup>* , ··· , *<sup>T</sup>*(0) <sup>10</sup>*<sup>k</sup>* ). The iterative formula of *<sup>β</sup>*ˆ(*m*+1) at the location (*u*0*k*, *v*0*k*) is

$$\beta^{(m+1)}(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) \quad = \, \left( A'(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) T^{(m)} \mathcal{W}(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) A(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) \right)^{-1} A'(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) T^{(m)} \mathcal{W}(\mathfrak{u}\_{0k}, \mathfrak{v}\_{0k}) Z^{(m)},$$

where *W*(*u*0*k*, *v*0*k*) = diag(*w*1(*u*0*k*, *v*0*k*), *w*2(*u*0*k*, *v*0*k*), ··· , *w*10(*u*0*k*, *v*0*k*)) and

$$w\_i(u\_{0k}, v\_{0k}) = \frac{1}{\sqrt{2\pi}} \exp\left(-\frac{1}{2} \left(\frac{d\_{ik}^{(0)}}{h}\right)^2\right).$$

Repeat the above step until convergence. When (*u*0*k*, *v*0*k*) takes all the locations (*uik*, *vik*), we will get the estimated value *β*ˆ = *β*ˆ(*m*) in a fixed the *k*th year.

**Algorithm III:** Estimate unknown coefficients in the GWPAR model. (Note that *k* is a fixed constant and the following steps should be repeated six times as in Algorithm II). Take the initial value *β*(0) <sup>1</sup> , *β*(0) <sup>2</sup> from Algorithm II, and *<sup>ρ</sup>*(0) is the absolute value of spatial Moran's I <sup>=</sup> 0.4480. The initial values *<sup>η</sup>*(0) = (*<sup>I</sup>* <sup>−</sup> *<sup>ρ</sup>*(0)*C*)−1*Xβ*(0), *<sup>μ</sup>*(0) *ik* <sup>=</sup> *<sup>g</sup>*−1(*η*(0) *ik* ), and

$$Z\_{ik}^{(0)} = \lg(\mu\_{ik}^{(0)}) + \lg'(\mu\_{ik}^{(0)})(y\_{ik} - \mu\_{ik}^{(0)}),\\t\_{ik}^{(0)} = \frac{1}{a\_{ik}V(\mu\_{ik}^{(0)})(\mathcal{g'}(\mu\_{ik}^{(0)}))^2}, i = 1, 2, \cdots, 10.$$

The (*m* + 1)th iterative estimation *β*ˆ(*m*+1)(*u*0*k*, *v*0*k*) and *ρ*ˆ(*m*+1) is

$$\begin{array}{rcl}\beta^{(m+1)}(\boldsymbol{u}\_{0k},\boldsymbol{v}\_{0k})&=& ( (\boldsymbol{A}^{(m)}\boldsymbol{X})^{\prime}\boldsymbol{T}^{(m)}\boldsymbol{\mathcal{W}}(\boldsymbol{A}^{(m)}\boldsymbol{X}) )^{-1}(\boldsymbol{A}^{(m)}\boldsymbol{X})^{\prime}\boldsymbol{T}^{(m)}\boldsymbol{\mathcal{W}}\boldsymbol{Z}^{(m)},\\\beta^{(m+1)}&=& \boldsymbol{\rho}^{(m)} + ((\boldsymbol{A}^{(m)}\boldsymbol{C}\boldsymbol{\eta}^{(m)})^{\prime}\boldsymbol{T}^{(m)}(\boldsymbol{A}^{(m)}\boldsymbol{C}\boldsymbol{\eta}^{(m)}))^{-1}(\boldsymbol{A}^{(m)}\boldsymbol{C}\boldsymbol{\eta}^{(m)})^{\prime}\boldsymbol{T}^{(m)}(\boldsymbol{Z}^{(m)} - \boldsymbol{\eta}^{(m)}).\end{array}$$

If (*u*0*k*, *v*0*k*) takes all the locations (*uik*, *vik*), the estimate *β*ˆ(*m*+1) can be given. When all estimated values arrive to converge, we will get *β*ˆ = *β*ˆ(*m*)(*uik*, *vik*) and *ρ*ˆ = *ρ*ˆ(*m*) in a fixed the *k*th year.

**Algorithm IV:** Estimate the unknown coefficients in the GTWPAR model. Take the initial values *β*(0) <sup>1</sup>*<sup>k</sup>* (*uik*, *vik*, *tk*), *<sup>β</sup>*(0) <sup>2</sup>*<sup>k</sup>* (*uik*, *vik*, *tk*), *<sup>i</sup>* <sup>=</sup> 1, 2, ... , 10, *<sup>k</sup>* <sup>=</sup> 1, 2, ... , 6 from Algorithm III, and *<sup>ρ</sup>*(0) is the absolute value of spatiotemporal Moran's I<sup>=</sup> 0.2143. The initial value vector *<sup>η</sup>*(0) = (*<sup>I</sup>* <sup>−</sup> *<sup>ρ</sup>*(0)*C*)−1*Xβ*(0), *<sup>μ</sup>*(0) *ik* = *g*−1(*η*(0) *ik* ), *<sup>Z</sup>*(0) = (*Z*(0) <sup>1</sup> , *<sup>Z</sup>*(0) <sup>2</sup> , ··· , *<sup>Z</sup>*(0) <sup>60</sup> ) , *T*(0) = diag(*T*(0) <sup>1</sup> , *<sup>T</sup>*(0) <sup>2</sup> ,..., *<sup>T</sup>*(0) <sup>60</sup> ) and

$$Z\_{ik}^{(0)} = \lg(\mu\_{ik}^{(0)}) + \lg'(\mu\_{ik}^{(0)})(y\_{ik} - \mu\_{ik}^{(0)}),\ T\_{ik}^{(0)} = \frac{1}{a\_{ik}V(\mu\_{ik}^{(0)})(\mathcal{g'}(\mu\_{ik}^{(0)}))^2}.$$

The (*m* + 1)th iterative estimations *β*ˆ(*u*00, *v*00, *t*0) and *ρ*ˆ are

$$\begin{array}{rcl}\hat{\beta}^{(\mathfrak{m}+1)}(\mathfrak{u}\_{00},\mathfrak{v}\_{00},t\_{0})&=&((A^{(\mathfrak{m})}X)'T^{(0)}W(A^{(\mathfrak{m})}X))^{-1}\\&\times(A^{(\mathfrak{m})}X)'T^{(\mathfrak{m})}WZ^{(\mathfrak{m})},\\\rho^{(\mathfrak{m}+1)}&=&\rho^{(\mathfrak{m})}+((A^{(\mathfrak{m})}\mathbb{C}\eta^{(\mathfrak{m})})'T^{(\mathfrak{m})}(A^{(\mathfrak{m})}\mathbb{C}\eta^{(\mathfrak{m})}))^{-1}\\&\times(A^{(\mathfrak{m})}\mathbb{C}\eta^{(\mathfrak{m})})'T^{(\mathfrak{m})}(Z^{(\mathfrak{m})}-\eta^{(\mathfrak{m})}).\end{array}$$

where *W* = {*wik*(*u*00, *v*00, *t*0)} and

$$w\_{lk}(\mu\_{00}, v\_{00}, t\_0) = \frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{1}{2} \left( \frac{(\mu\_{00} - \mu\_{lk})^2 + (v\_{00} - v\_{lk})^2}{h\_S^2} + \frac{(t\_0 - t\_k)^2}{\tau h\_S^2} \right) \right\}.$$

A detailed definition is given in Section 3.1. If (*u*00, *v*00, *t*0) takes all the locations (*uik*, *vik*, *tk*) and all estimations converge, we will get *β*ˆ = *β*ˆ(*m*)(*uik*, *vik*, *tk*) and *ρ*ˆ = *ρ*ˆ(*m*).

It is worth noting that we use the parameter estimates of the previous model as the initial values of the next model to reduce the number of iterations and improve the operational efficiency. For example, the estimations of the GWPR model are selected as the initial values of the GWPAR model.

*Appendix A.5. The Model Comparison Indicators (See Table 3)*

(1) The coefficient of determination is defined by

$$\mathbb{R}^2 = 1 - \frac{\sum (\eta - \hat{\eta})^2}{\sum (\eta - \bar{\eta})^2},$$

where *η* is a set of vectors {*ηik*}, *η*ˆ and *η*¯ are the parameter estimate and the mean value of *η*, respectively. (2) Deviation can be defined as

$$\mathcal{D} = \sum (y \ln(\frac{\hat{\mu}}{y}) + (y - \hat{\mu})),$$

where *y* is a set of response variables, and *μ*ˆ is the estimation of *μ*=E(y).

(3) The corrected Akaike information criterion is

$$\text{AICc} = D + 2P + 2\frac{P(P+1)}{N-P-1}r$$

where *D*, *P* and *N* are the deviation, the number of parameters and the number of samples, respectively. (4) Mean-square error is given by

$$\text{MSE} = \frac{1}{N - P} \sum (\vec{\eta} - \eta)^2 \vec{\eta}$$

where the parameter settings are the same as above.

#### **References**



© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
