A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa

Ayalew, Kassahun Abere; Manda, Samuel; Cai, Bo

doi:10.3390/ijerph182111215

Open AccessArticle

A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa

by

Kassahun Abere Ayalew

^1,*,

Samuel Manda

^1,2,3

and

Bo Cai

⁴

¹

School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3209, South Africa

²

Biostatistics Unit, South African Medical Research Council, Pretoria 0001, South Africa

³

Department of Statistics, University of Pretoria, Pretoria 0028, South Africa

⁴

Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, Columbia, SC 29208, USA

^*

Author to whom correspondence should be addressed.

Int. J. Environ. Res. Public Health 2021, 18(21), 11215; https://doi.org/10.3390/ijerph182111215

Submission received: 9 August 2021 / Revised: 8 October 2021 / Accepted: 9 October 2021 / Published: 26 October 2021

(This article belongs to the Special Issue Advanced Spatial-Temporal Statistics and Applications for Disease Mapping, Spatial Dependence and Capacity Building in Biostatistics in Sub-Saharan Africa Countries)

Download

Browse Figures

Versions Notes

Abstract

Despite making significant progress in tackling its HIV epidemic, South Africa, with 7.7 million people living with HIV, still has the biggest HIV epidemic in the world. The Government, in collaboration with developmental partners and agencies, has been strengthening its responses to the HIV epidemic to better target the delivery of HIV care, treatment strategies and prevention services. Population-based household HIV surveys have, over time, contributed to the country’s efforts in monitoring and understanding the magnitude and heterogeneity of the HIV epidemic. Local-level monitoring of progress made against HIV and AIDS is increasingly needed for decision making. Previous studies have provided evidence of substantial subnational variation in the HIV epidemic. Using HIV prevalence data from the 2016 South African Demographic and Health Survey, we compare three spatial smoothing models, namely, the intrinsically conditionally autoregressive normal, Laplace and skew-t (ICAR-normal, ICAR-Laplace and ICAR-skew-t) in the estimation of the HIV prevalence across 52 districts in South Africa. The parameters of the resulting models are estimated using Bayesian approaches. The skewness parameter for the ICAR-skew-t model was not statistically significant, suggesting the absence of skewness in the HIV prevalence data. Based on the deviance information criterion (DIC) model selection, the ICAR-normal and ICAR-Laplace had DIC values of 291.3 and 315, respectively, which were lower than that of the ICAR-skewed t (348.1). However, based on the model adequacy criterion using the conditional predictive ordinates (CPO), the ICAR-skew-t distribution had the lowest CPO value. Thus, the ICAR-skew-t was the best spatial smoothing model for the estimation of HIV prevalence in our study.

Keywords:

Bayesian; disease mapping; skew-t distribution; ICAR-normal; ICAR-Laplace; spatial random effects; spatial model

1. Introduction

Governments in sub-Saharan Africa (SSA), in collaboration with non-governmental organizations and private sectors, design national strategic plans and policies, allocate resources and implement programs in the fight against the HIV/AIDS epidemic [1,2]. Such efforts are designed to reduce HIV-related infection, morbidity and mortality. As well as understanding the level of the HIV epidemic at the national level, most governments in the region have implemented a decentralized approach to governance and service provision. Thus the need for reliable local (district)-level HIV statistics to support decision making regarding the delivery of HIV care, treatment and prevention services [3,4]. Most of the countries in SSA rely on data obtained from national HIV surveys for monitoring the level of the HIV epidemic and subsequence responses. However, the national HIV surveys are mostly empowered to produce reliable HIV estimates at national and provincial level. Crude HIV estimates at small area level could be exaggeratedly estimated due to small numbers, resulting in unstable variances [5,6,7]. Consequently, HIV prevention and treatment programs tailored to small areas could be based on unreliable evidence [8].

As a result, modelling approaches are used for generating local-level estimates from survey data that are originally meant to provide reliable estimates at national and provincial levels [9,10]. The most used approach has been using spatial smoothing models where spatial components are incorporated in the model as random effects. The spatial models produce reliable disease rates with improved accuracy for small areas with few sparse observations by incorporating information from local, spatially contiguous areas. The structured random effect in spatial models represents clustering of diseases over geographical areas, unobserved environmental or frailty factors which are spatially correlated but are not included as covariates in a model [11,12,13]. Structured spatial random effects (which consider the local effects) are mostly modelled using the intrinsic conditional autoregressive normal (ICAR-normal) model (Besag et al [13], Carlin and Banerjee [14]). The ICAR-normal model offers greater flexibility for modelling the spatial correlation than the linear mixed effects model, with only a global random effect. However, a normal spatial distribution on the structured spatial effect could be restrictive, as there could be a possibility that the normality assumption could be misspecified [15]. Misspecification of the distribution of the random effects may result in estimates of diseases rates that are biased [16,17]. The usual approach is to transform the data to normality, for example by performing a logarithm of the rates. However, if there was an appropriate theoretical model, transformation could be avoided, as it is difficult to interpret results from transformed data. In addition, the transformation could result in the loss of information [17].

A few approaches have been proposed to reduce the impact of a normal distribution assumption for spatial random components. For example, Lunn et al [18] and Manda [19] proposed a double exponential and a mixture of ICAR-normal and ICAR-double exponential, respectively, to better capture possible wider tails for the spatial random effects. Kim and Mallick [20] and Azzalini and Capitanio [21] considered a skew-normal spatial model for point referenced data. However, the structured spatial skewed random fields suffer identifiability problems (since the skewness parameter may be unknown) [22] and must be determined uniquely [23]. To solve this identifiability problems, Zhang and El-Shaarawi [24] defined a skewed stationary Gaussian process for spatial random effect based on the work by Azzalini and Capitanio [21]. In addition, Allard and Naveau [25] and Zareifard and Jafari Khaledi [26] introduced a skew-normal spatial random field based on Domınguez-Molina et al [27] and Palacios and Steel [28], respectively, for point referenced data. Other skewed spatial distributions are the skew-normal by Rantini et al [29] and Fernández and Steel [30].

Our aim, in this study, is to model the district-level HIV prevalence in South Africa using spatial smoothing methods. There is ample evidence of substantial small area variation in the distribution of HIV prevalence in Sub-Saharan Africa [31,32]. Similarly evidence has also been found in South Africa by Kim et al [33] and Gutreuter et al [34]. The distribution of the district HIV prevalence could be skewed and non-normal. Thus, we estimated the spatial distribution of the HIV prevalence among the districts in South Africa using the ICAR-normal [13], ICAR skew-t distribution (Nathoo and Ghosh [35]) and ICAR-Laplace [18] using the 2016 South African Demographic and Health Survey data. The next section presents the description of the spatial models used and the HIV data. Section 3 contains the results obtained from fitting the models to the data. We discuss the results in Section 4 and conclude in Section 5.

2. Methods and Data Source

2.1. Skew-t Spatial Random Effects Distribution

Let

Y_{i}

be the number of HIV positive individuals out of a sample of size

n_{i}

in district

i (i = 1, \dots, 52)

. Both

Y_{i}

and

n_{i}

are adjusted to account for the survey design to become the effective number of HIV cases,

Y_{i}^{*}

, and the effective sample size,

n_{i}^{*}

[35,36,37,38]. A three-stage Bayesian hierarchical spatial smoothing model for a binary HIV outcome uses a binomial distribution at stage one as

Y_{i}^{*} | p_{i} ~ B i n o m i a l (n_{i}^{*}, p_{i}), i = 1, \dots, 52

where

p_{i}

is the proportion (prevalence) of HIV in district

i

and is modelled at the second stage by a logit link function using a set of district-level predictor variables,

X_{i}

, and both unstructured and spatially structured random effects, as introduced by Besag et al. (1991).

\log (\frac{p_{i}}{1 - p_{i}}) = β_{0} + X_{i} β + u_{i} + v_{i}

where

β_{0}

is the intercept;

β

is a vector of regression coefficients for predictor variable in

X_{i}

;

u_{i}

is the unstructured random component and it is assumed to follow a normal distribution,

u_{i} ~ N (0, σ_{u}^{2})

;

v_{i}

is the structured spatial random component for district

i

.

The structured spatial random effects could be modelled using an intrinsic conditional autoregressive normal (ICAR-normal) prior (Besag et al [13], Knorr-Held and Best [12] and Carlin and Banerjee [14]) as

v_{i} | v_{- i} ~ I C A R N (μ_{v}, σ_{v}^{2}) = N (\frac{Σ_{j ~ i} v}{m_{i}}, \frac{σ_{v}^{2}}{m_{i}})

where

m_{i}

is the number of neighbours of district

i

. Lunn et al [18] suggested an alternative model based on a Laplace/double exponential distribution (ICAR-Laplace), which is given as

u_{i} ~ I C A R L (μ_{u}, σ_{u}^{2})

.

However, in situations where the distribution of HIV prevalence data could be non-normal and asymmetric, alternative spatial smoothing models that are robust and flexible could fit the data better. As a result, Nathoo and Ghosh [35] suggested the skew-t (ICAR-skew-t) spatial smoothing model, defined as

v_{i} | v_{- i} ~ S T_{v} (\frac{Σ_{j ~ i} v_{j}}{m_{i}}, \frac{σ_{v}^{2}}{m_{i}}, δ_{v})

For easy implementation in most Bayesian statistical software, Sahu et al [39] presented a suitable representation of skew-t distribution with

k

degrees of freedom. Suppose

y ~ s k e w - t (k),

then it could be expressed as

y = η^{- \frac{1}{2}} (∆ | X_{0} | + X)

, where

X_{0} ~ N (0, 1)

,

X ~ N (μ, σ^{2})

,

∆

is the skewness parameter and

η ~ g a m m a (\frac{k}{2}, \frac{k}{2})

. The hierarchical set-up of this stochastic representation can be given as

Y / w ~ N (μ + ∆ w, \frac{Σ}{η})

, where

| X_{0} | = w ~ N (0, I_{k}) I (w > 0)

. Thus, the ICAR-skew-t for the structured spatial random effect can be expressed as

v_{i} ~ N (\frac{Σ_{j ~ i} s_{j}}{m_{i}} + δ_{v} w_{i}, \frac{σ_{s}^{2}}{η * m_{i}})

where

w_{i} ~ N (0, I) I (w_{i} > 0)

,

s_{i / S_{- i}} ~ N (\frac{Σ_{j ~ i} s_{j}}{m_{i}}, \frac{σ_{s}^{2}}{m_{i}})

and

σ_{s}^{2}

and

δ_{v}

are the variance of

s_{i}

and the skewness parameter, respectively. The hierarchical representation of the ICAR-skew-t model is shown in the Appendix A.

2.2. Methods for Comparing Competing Models

In this study, we used the deviance information criterion (DIC) and conditional predictive ordinates (CPO) for comparing models. The deviance information criterion was developed by Spiegelhalter et al [40] as a method used for comparing models in a Bayesian framework. It is a measure of a model’s goodness of fit or adequacy adjusted for a measure of model complexity measured as effective number of parameters. Let

θ

and

y = y_{1}, \dots, y_{1}

be the model parameter and data respectively, then DIC is expressed as

D I C = \bar{D} + p_{D} = 2 \bar{D} - D (\bar{θ})

where

\bar{D} = E_{θ / y} [D (θ)] = E_{θ / y} [- 2 \log p (y / θ)]

and is the posterior mean deviance that measures the goodness of fit or adequacy

p_{D} = \bar{D} - D (\bar{θ}) = E_{θ / y} [D (θ)] - D (E_{θ / y} [θ]) = E_{θ / y} [- 2 \log p (y / θ)] - [- 2 \log p (y / \bar{θ} (y)]

is a measure of the effective number of parameters and measures model complexity; larger values of

p_{D}

suggests higher complexity of the model. It is also defined as the difference between the posterior mean of the deviance and the deviance at the posterior means of the parameters of interest; in other words, it is considered as the expected excess of the true residuals over the estimated residuals in the data conditional on the parameter

θ

[16]. Let

θ^{1}, \dots, θ^{k}

be parameter estimates from a converged Markov chain, then

\bar{D}

is estimated as

\frac{1}{k} \sum_{1}^{k} D (θ^{k})

and

D (\bar{θ}) = D (\frac{1}{k} \sum_{1}^{k} θ^{k}

).

The CPO is a leave-one-out cross validation approach that measures the posterior probability of observing

y_{i}

when the model is fitted to all data excluding

y_{i}

and it measures the predictive ability of the fitted model. Let

Y = Y_{1}, Y_{2}, \dots, Y_{n}

be the

n X 1

data vector and

Y_{- i}

be the data vector without

y_{i}

. Then, the conditional predictive ordinate for observation

y_{i}

is given as

C P O_{i} = f (y_{i} / y_{- i}) = \int f (y_{i} / θ) P (θ / y_{- i}) d θ = E_{θ / y} [\frac{1}{f (y_{i} / θ)}]

where

θ

is the parameter vector,

y_{i}

is the ith observation and

y_{- i}

is the observed data set except

y_{i}

. Thus, one can estimate the value of the inverse of

C P O_{i}

by averaging the inverse probability function evaluated at

y_{i}

for each

θ^{k}

produced from the posterior density. The

C P O_{i}

values could be easily determined from the standard MCMC output which is given as

C P O_{i} = {[\frac{1}{k} \sum_{k = 1}^{K} \frac{1}{f (y_{i} / θ^{k})}]}^{- 1}

which is the harmonic mean of the probability density function evaluated at

y_{i}

for each

θ^{k}

, where K is the number of iterations. For discrete data, the comparison of

C P O_{i}

with the relative frequency determined from data without

y_{i}

(

y_{- i}

) enables the assessment of the predictive capacity of the fitted model to the data. In order to compare two or more competing models, the overall CPO values of each model are assessed, given as

C P O = \prod_{i} C P O_{i}

; A model with higher CPO value suggests better predictive performance than the other models; hence, this model is preferred over other models. Mostly, the CPO value is close to zero, thus the negative of the sum of the log of the

C P O_{i}

is used as indicated by Cai et al [41] and is given by

L S_{c v} = - \sum_{i = 1}^{k} l o g C P O_{i}

. Thus, a model with the lowest

L S_{c v}

value is the best model in terms of its predictive capacity.

2.3. Implementation

The model parameters were determined using a Bayesian estimation approach via Markov Chain Monte Carlo (MCMC) as implemented in OpenBUGS [42]. The prior distributions for the regression coefficients and the unstructured random component were the same for all the three models. The prior distribution for the intercept was

β_{0} ~ uniform on (- \infty, \infty)

and the prior for the regression coefficients was

β_{q} ~ N (0, 0.00001)

, where

q = 1, 2, 3, 4

; the variance parameters

σ_{u}^{2}

and

σ_{v}^{2}

were given as inverse gamma prior distributions with shape and scale parameters set at 20 and 2000, respectively. The skewness parameters for ICAR-skew-t were assigned

δ_{v} ~ N (0, 0.01)

prior. We conducted a sensitivity analysis to determine the impact of the hyper-parameters of the priors on the outcome variable; for this, we chose the most commonly used hyper-parameters, such as

I G (1000, 1000), I G (10, 10), I G (1, 10)

and

I G (2, 2000)

. Since prior distributions with larger variances are considered in the model, the estimates from this analysis are expected to be relatively robust. Moran’s I test was conducted on the model residuals to determine the presence of spatial correlation [43]. We ran 100,000 iterations for each model to make inferences. We determined the number of initial iterations that needed to be discarded by assessing the history plots of each model and for each parameter. Similarly, we also investigated the autocorrelation plots of each model and each parameter to determine the selection intervals to avoid correlation problems in the generated chains.

2.4. Data

The data analyzed were obtained from the 2016 South African Demographic and Health Survey (SADHS 2016). The SADHS 2016 was conducted for evaluating the country’s health programs by monitoring key milestones such as mortality, fertility, maternal and child health, nutrition, HIV, gender-based violence, etc. The data for measuring these indicators were collected by asking respondents relevant sociodemographic and behavioral characteristic questions and by collecting biological specimens. The SADHS 2016 survey employed a multistage stratified cluster sampling design to select households and/or respondents for the sample. All women between the age of 15 and 49 and men between the ages of 15 and 59 were included in the survey. Interview data were collected from a total of 8514 women and 3618 men and 6912 individuals were tested for HIV seropositivity. More information about SADHS 2016 can be obtained from the full study report [44].

The observed district-level HIV prevalence was computed by taking the survey design into account. The effective sample sizes in each district was determined by dividing the observed number of sample size at each district by the design effect [36]; the effective number of HIV cases is thus the product of effective sample size and the weighted prevalence. The number of HIV tests conducted in the survey by district varied substantially, with a sample size of between 8 tests and 455 tests, with a median sample size of 111 tests. There were some districts with zero count of HIV positive individuals in the sample. For this, we assigned them the average of the simulated data from a normal distribution with mean value equal to the average of the log of prevalence in the neighboring districts and variance as the variance of the log of the prevalence

p_{i}

calculated from all the neighboring districts divided by the number of neighbors, shown in Figure 1b; the map in Figure 1a shows the raw data not adjusted for zero positive cases. A skewness test was conducted on the prevalence, with and without adjusting for zero HIV prevalence, but no significant skewness was found.

The covariates included in the models are the multidimensional poverty index constructed using the 2016 community survey data [45], HIV prevalence among pregnant women obtained from the 2017 National Antenatal Sentinel Survey report [46], population density and male condom distribution coverage obtained from the 2017 district health barometer report [47]. Previous studies indicate that these factors are associated with HIV prevalence ecologically as well as individually [3,48].

3. Results

The skewness parameters for ICAR-skew-t were not significant, perhaps suggesting that the spatial component is lighter tailed (see Table 1). The model with the lowest

L S_{c v}

and DIC values was deemed to be the best model in its predictive performance and goodness of fit, respectively. Thus, as can be seen in Table 1, the model with the lowest

L S_{c v}

(170.5) is the ICAR-skew-t model, followed by the ICAR-normal model (

L S_{c v}

= 172.4). The ICAR-normal model and the ICAR-Laplace model have the lowest (291.3) and second lowest (315) DIC values, respectively. The difference in the DIC values between these models is more than five, suggesting that there is substantial difference between the two models in terms of goodness of fit to the data, according Spiegelhalter et al [40]; however, a study by De la Cruz and Branco [49] indicated that DIC is not appropriate for such type of complex models. Thus, based on the

L S_{c v}

values, the ICAR-skew-t model was the best in terms of its predicative capacity as compared to the other two models used in this study.

As a sensitivity analysis, we ran the analysis using different sets of hyper-parameters for priors of the precision parameters. Thus, the mean difference in the values of the outcome variables at different choices of hyper-parameter values was observed at the third digit after the decimal point, which suggests the absence of a significant impact on the outcome variable. The Moran’s I test statistic was significant (p-value = 0.000001), suggesting that residuals were spatially clustered. As shown in Table 1, district-level ANC prevalence is the strong predictor of district-level HIV prevalence determined from the 2016 SADHS data, whereas the other covariates were not statistically significant.

Figure 2e, shows the prevalence of HIV by district in South Africa estimated using the ICAR-skew-t spatial model (best model). According to the estimates from this model, most of the districts with high levels of HIV prevalence are located in southeastern parts of the country, while low levels of HIV prevalence are in the southwestern parts. This pattern is the same for all the maps produced using estimates from different models with or without covariates. Maps (a), (c) and (e) are estimates of the ICAR-normal, ICAR-Laplace and skew-t models with covariates, respectively; the spatial pattern of HIV prevalence is the same for these models, except the estimate from the ICAR-normal model for one district in the northwestern part. Maps (b), (d) and (f) are estimates of the ICAR-normal, ICAR-Laplace and skew-t models without covariates and the pattern of HIV prevalence by district is the same for the estimates determined using these models. One notable difference for the pattern of estimates with and without covariates for the models is that the level of HIV prevalence is lower for estimates with covariates than those without covariates in two districts in the western part.

4. Discussion

HIV is a leading cause of disease burden in sub-Saharan Africa. In the era of decentralized approach to governance and service provision, designing effective HIV intervention programs and monitoring strategies at local administrative levels requires reliable estimates of local variation in HIV burden. Our study compared three spatial smoothing models, namely, the intrinsically conditionally autoregressive normal, Laplace and skew-t (ICAR-normal, ICAR-Laplace and ICAR-skew-t) in the estimation of the HIV prevalence across 52 districts in South Africa. It analyzed HIV prevalence data from the 2016 South African Demographic and Health Survey. The models were fitted using the Markov Chain Monte Carlo method in OpenBUGS, a freely available Bayesian statistical package. We found that the ICAR-skew-t distribution was the best spatial smoothing model for the estimation of HIV prevalence in our study.

We found that the districts with high levels of HIV prevalence were in the southeastern parts of the country, while low levels of HIV prevalence corresponded to the southwestern parts. Our findings are similar to those by Gutreuter et al [34] and Woldesenbet et al [46]. The estimates of HIV prevalence by district in South Africa could help governmental and non-governmental originations, as well as the private sector, to know the level of the epidemics at lower administrative level, thus prioritizing and plan appropriate public health programs tailored to each community and evaluating the combined impact of national and local public health programs.

A major weakness of our study could be that there were no HIV data in some of the sparsely populated districts; hence, we simulated data from neighboring districts to estimate prevalence of HIV in such districts; thus, the estimates for these districts may not be reliable and should be interpreted with caution. In addition, a limited number of predictors was included in the model; hence, some important predictors of district-level HIV prevalence might be missing.

5. Conclusions

In conclusion, alternative spatial distributions to ICAR-normal should be considered for modeling spatial disease outcomes. The spatial random effects could be skewed or non-normal and misspecification of the distribution of random effects could lead to estimates that are biased. This could lead to implications in the estimation of disease burden, adversely impacting policy derivations. In our study, we found that the intrinsic conditional autoregressive skew-t (ICAR-skew-t) model was the best in predicting district-level HIV prevalence compared to the ICAR-normal and ICAR-Laplace spatial models based on an analysis of the 2016 South African Demographic and Health Survey (2016 SADHS) data. District antennal clinic HIV prevalence was the most influential predictor of the district-level 2016 SADH HIV prevalence.

Author Contributions

Conceptualization, S.M.; methodology, K.A.A., S.M. and B.C. software, K.A.A.; formal analysis, K.A.A.; writing—original draft preparation and revisions, K.A.A.; writing—review and editing, S.M. and B.C.; critical insight, S.M. and B.C.; All authors have read and agreed to the published version of the manuscript.

Funding

Samuel Manda was supported by the South Africa Medical Research Council.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study are available from the Demographic and Health Survey (DHS) website https://dhsprogram.com/Data/ (accessed on 1 August 2021) upon request from the MEASURE DHS program team.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Hierarchical representation of the disease mapping model presented in Section 2.1, assuming the spatial random components follows skew-t distribution is given as follows

Y = Y_{1}, Y_{2}, \dots, Y_{n}

be a one-dimensional random variable with binomial distribution

l o g i t (p_{i}) = β_{0} + X_{i} β + u_{i} + v_{i} u_{i} ~ N (0, σ_{u}^{2}) v_{i /} S_{i}, σ_{v}^{2}, δ_{v}, w_{i}, ~ N (\frac{Σ_{j ~ i} u_{j}}{m_{i}} + δ_{u} w_{i}, \frac{σ_{s}^{2}}{η * m_{i}}) s_{i / S_{- i}} ~ N (\frac{Σ_{j ~ i} s_{j}}{m_{i}}, \frac{σ_{s}^{2}}{m_{i}}) w_{i} ~ N (0, I) I (w_{i} > 0) η ~ g a m m a (\frac{k}{2}, \frac{k}{2}) β_{i} ~ N (β_{0,} Λ), i = 0, 1, 2, \dots, k

where k is the number of covariates

σ_{v}^{2} ~ I G (Ω, v) δ_{u} ~ N (0, Γ) σ_{s}^{2} ~ I G (Ω, u) k ~ E x p (k_{0}) I (k > 2)

where

p_{i}

is the weighted prevalence corresponding to

Y_{i}

i = 1, 2, \dots, 52

,

σ_{u}^{2}

and

σ_{v}^{2}

are variance of the spatial and the heterogeneous random component,

I (w_{i} > 0)

is an indicator function,

I G

is inverse gamma and

E x p

is exponential.

Based on the likelihood distribution and the above prior specifications the posterior distribution of all the parameters assuming conditional independence between the response variable and the hyper parameters is given as

p (μ, β, u, v, σ_{u}^{2}, σ_{v}^{2}, δ_{u}, w, k, η, s / y) \propto L (y / β, u, v, σ_{s}^{2}, σ_{v}^{2}, δ_{u}, w, s) P (β, u, v, σ_{s}^{2}, σ_{v}^{2}, δ_{u}, w, k, η) = \prod_{i} p (y_{i} / μ_{i}) \prod_{j} (p (β_{j} / Λ) p (Λ)) p (u / σ_{s}^{2}) p (σ_{s}^{2}) p (v / σ_{v}^{2}) p (σ_{v}^{2}) p (s / σ_{s}^{2}) p (w) p (δ_{u}) p (k) p (η)

References

UNAIDS. 2016–2021 Strategy on the Fast-Track to end AIDS; UNAIDS: Geneva, Switzerland, 2015; Available online: https://www.unaids.org/sites/default/files/media_asset/20151027_UNAIDS_PCB37_15_18_EN_rev1.pdf (accessed on 1 August 2021).
PEPFAR. PEPFAR 2021 Country and Regional Operational Plan (COP/ROP) Guidance for all PEPFAR Countries; PEPFAR: Washington, WA, USA, 2021. Available online: https://www.state.gov/wp-content/uploads/2020/12/PEPFAR-COP21-Guidance-Final.pdf (accessed on 1 August 2021).
Manda, S.; Masenyetse, L.; Cai, B.; Meyer, R. Mapping HIV prevalence using population and antenatal sentinel-based HIV surveys: A multi-stage approach. Popul. Health Metrics. 2015, 13, 22. [Google Scholar] [CrossRef]
Larmarange, J. Evaluation of geospatial methods to generate subnational HIV prevalence estimates for local level planning. AIDS 2016, 30, 1467–1474. [Google Scholar]
Tanser, F.; Bärnighausen, T.; Cooke, G.; Newell, M.-L. Localized spatial clustering of HIV infections in a widely disseminated rural South African epidemic. Int. J. Epidemiol. 2009, 38, 1008–1016. [Google Scholar] [CrossRef] [PubMed]
Niragire, F.; Achia, T.; Lyambabaje, A.; Ntaganira, J. Bayesian Mapping of HIV Infection among Women of Reproductive Age in Rwanda. PLoS ONE 2015, 10, e0119944. [Google Scholar] [CrossRef] [PubMed][Green Version]
Chimoyi, L.A.; Musenge, E. Spatial analysis of factors associated with HIV infection among young people in Uganda, 2011. BMC Public Health 2014, 14, 555. [Google Scholar] [CrossRef] [PubMed]
Houlihan, C.F.; Mutevedzi, P.C.; Lessells, R.J.; Cooke, G.S.; Tanser, F.C.; Newell, M.-L. The tuberculosis challenge in a rural South African HIV programme. BMC Infect. Dis. 2010, 10, 23–29. [Google Scholar] [CrossRef] [PubMed]
Johnson, G.D. Small area mapping of prostate cancer incidence in New York State (USA) using fully Bayesian hierarchical modelling. Int. J. Health Geogr. 2004, 3, 29. [Google Scholar] [CrossRef][Green Version]
Leyland, A.H.; Langford, I.H.; Rasbash, J.; Goldstein, H. Multivariate spatial models for event data. Stat. Med. 2000, 19, 2469–2478. [Google Scholar] [CrossRef]
Lawson, A.B.; Browne, W.J.; Rodeiro, C.L.V. Diease Mapping with WinBUGS and MLwiN; Wiley & Sons: Chichester, UK, 2003. [Google Scholar]
Knorr-Held, L.; Best, N.G. A shared component model for detecting joint and selective clustering of two diseases. J. R. Stat. Soc. Ser. A Stat. Soc. 2001, 164, 73–85. [Google Scholar] [CrossRef]
Besag, J.; York, J.; Mollié, A. Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Stat. Math. 1991, 43, 1–20. [Google Scholar] [CrossRef]
Carlin, B.; Banerjee, S. Hierarchical Multivariate CAR Models for Spatio-Temporally Correlated Survival Data. Bayesian Stat. 2003, 7, 45–63. [Google Scholar]
Arellano-Valle, R.; Bolfarine, H.; Lachos, V. Bayesian Inference for Skew-normal Linear Mixed Models. J. Appl. Stat. 2007, 34, 663–682. [Google Scholar] [CrossRef]
Ghosh, P.; Branco, M.D.; Chakraborty, H. Bivariate random effect model using skew-normal distribution with application to HIV-RNA. Stat. Med. 2007, 26, 1255–1267. [Google Scholar] [CrossRef]
Verbeke, G.; Lesaffre, E. A Linear Mixed-Effects Model with Heterogeneity in the Random-Effects Population. J. Am. Stat. Assoc. 1996, 91, 217–221. [Google Scholar] [CrossRef]
Lunn, D.; Jackson, C.; Best, N.; Thomas, A.; Spiegelhalter, D. The BUGS Book: A Practical Introduction to Bayesian Analysis; CRC: Boca Raton, FL, USA, 2012. [Google Scholar]
Manda, S.O.M. Macro Determinants of Geographical Variation in Childhood Survival in South Africa Using Flexible Spatial Mixture Models. In Demographic Methods and Population Analysis; Kandala, N.-B., Ghilagaber, G., Eds.; Springer: Dordrecht, The Netherlands, 2014. [Google Scholar]
Kim, H.-M.; Mallick, B.K. A Bayesian prediction using the skew Gaussian distribution. J. Stat. Plan. Inference 2004, 120, 85–101. [Google Scholar] [CrossRef]
Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew normal distribution. J. R. Stat. Soc. Ser. B Stat. Methodol. 1999, 61, 579–602. [Google Scholar] [CrossRef]
Genton, M.; Zhang, H. Identifiability problems in some non-Gaussian spatial random fields. Chil. J. Stat. 2012, 3, 171–179. [Google Scholar]
Gelfand, A.E.; Sahu, S.K. Identifiability, Improper Priors, and Gibbs Sampling for Generalized Linear Models. J. Am. Stat. Assoc. 1999, 94, 247–253. [Google Scholar] [CrossRef]
Zhang, H.; El-Shaarawi, A. On spatial ske—Gaussian processes and applications. Environmetrics 2009, 21, 33–47. [Google Scholar]
Allard, D.; Naveau, P. A New Spatial Skew-Normal Random Field Model. Commun. Stat. Theory Methods 2007, 36, 1821–1834. [Google Scholar] [CrossRef]
Zareifard, H.; Khaledi, M.J. Non-Gaussian modeling of spatial data using scale mixing of a unified skew Gaussian process. J. Multivar. Anal. 2013, 114, 16–28. [Google Scholar] [CrossRef]
Domınguez-Molina, J.; González-Farıas, G.; Gupta, A. The Multivariate Closed Skew Normal Distribution; Technical Report; Department of Mathematics and Statistics, Bowling Green State University: Bowling Green, OH, USA, 2003. [Google Scholar]
Palacios, M.B.; Steel, M.F.J. Non-Gaussian Bayesian Geostatistical Modeling. J. Am. Stat. Assoc. 2006, 101, 604–618. [Google Scholar] [CrossRef]
Rantini, D.; Iriawan, N.; Irhamah, I. Fernandez–Steel Skew Normal Conditional Autoregressive (FSSN CAR) Model in Stan for Spatial Data. Symmetry 2021, 13, 545. [Google Scholar] [CrossRef]
Fernández, C.; Steel, M.F.J. On Bayesian Modeling of Fat Tails and Skewness. J. Am. Stat. Assoc. 1998, 93, 359–371. [Google Scholar]
Dwyer-Lindgren, L.; Cork, M.A.; Sligar, A.; Steuben, K.M.; Wilson, K.F.; Provost, N.R.; Mayala, B.K.; Vander Heide, J.D.; Collison, M.L.; Hall, J.B.; et al. Mapping HIV prevalence in sub-Saharan Africa between 2000 and 2017. Nat. Cell Biol. 2019, 570, 189–193. [Google Scholar] [CrossRef]
Cuadros, D.F.; Abu-Raddad, L.J. Spatial variability in HIV prevalence declines in several countries in sub-Saharan Africa. Health Place 2014, 28, 45–49. [Google Scholar] [CrossRef] [PubMed]
Kim, H.; Tanser, F.; Tomita, A.; Vandormael, A.; Cuadros, D.F. Beyond HIV prevalence: Identifying people living with HIV within underserved areas in South Africa. BMJ Glob. Health 2021, 6, e004089. [Google Scholar] [CrossRef]
Gutreuter, S.; Igumbor, E.; Wabiri, N.; Desai, M.; Durand, L. Improving estimates of district HIV prevalence and burden in South Africa using small area estimation techniques. PLoS ONE 2019, 14, e0212445. [Google Scholar] [CrossRef]
Nathoo, F.S.; Ghosh, P. Skew-elliptical spatial random effect modeling for areal data with application to mapping health utilization rates. Stat. Med. 2013, 32, 290–306. [Google Scholar] [CrossRef]
Kish, L. Methods for Design Effects. J. Off. Stat. 1995, 11, 55–77. [Google Scholar]
Chen, C.; Wakefield, J.; Lumely, T. The use of sampling weights in Bayesian hierarchical models for small area estimation. Spat. Spatio-Temporal Epidemiol. 2014, 11, 33–43. [Google Scholar] [CrossRef]
Vandendijck, Y.; Faes, C.; Kirby, R.; Lawson, A.; Hens, N. Model-based inference for small area estimation with sampling weights. Spat. Stat. 2016, 18, 455–473. [Google Scholar] [CrossRef] [PubMed]
Sahu, S.K.; Dey, D.K.; Branco, M.D. A new class of multivariate skew distributions with applications to bayesian regression models. Can. J. Stat. 2003, 31, 129–150. [Google Scholar] [CrossRef]
Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; van der Linde, A. Bayesian measures of model complexity and fit (with discussion). J. R. Stat. Soc. Ser. B 2002, 64, 583–639. [Google Scholar] [CrossRef]
Cai, B.; Lawson, A.B.; Hossain, M.; Choi, J.; Kirby, R.S.; Liu, J. Bayesian semiparametric model with spatially-temporally varying coefficients selection. Stat. Med. 2013, 32, 3670–3685. [Google Scholar] [CrossRef]
Thomas, A.; Best, N.; Lunn, D. WinBUGS User Manual: Version 1.4. 2001. Available online: https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/manual14.pdf (accessed on 1 August 2021).
Moran, P.A.P. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17–23. [Google Scholar] [CrossRef] [PubMed]
National Department of Health. South Africa Demographic and Health Survey 2016. Available online: https://dhsprogram.com/pubs/pdf/FR337/FR337.pdf (accessed on 1 August 2021).
Fransman, T.; Yu, D. Multidimensional poverty in South Africa in 2001–2016. Dev. S. Afr. 2019, 36, 50–79. [Google Scholar] [CrossRef]
Woldesenbet, S.A.; Kufa, T.; Lombard, C.; Manda, S.; Ayalew, K.; Cheyip, M.; Puren, A. The 2017 National Antenatal Sentinel HIV Survey Key Findings; National Institute of Communicable Disease: Pretoria, South Africa, 2019.
Massyn, N.; Padarath, A.; Peer, N.; Day, C. District Health Barometer 2016/17; Health System Trust: Durban, South Africa, 2017.
Van Schalkwyk, C.; Dorrington, R.E.; Seatlhodi, T.; Velasquez, C.; Feizzadeh, A.; Johnson, L.F. Modelling of HIV prevention and treatment progress in five South African metropolitan districts. Sci. Rep. 2021, 11, 5652. [Google Scholar] [CrossRef]
De la Cruz, R.; Branco, M.D. Bayesian analysis for nonlinear regression model under skewed errors, with application in growth curves. Biom. J. 2009, 51, 588–609. [Google Scholar] [CrossRef]

Figure 1. Map of HIV prevalence by district in South Africa before (a) and after (b) adjusting the data for zero positive tests in some districts.

Figure 2. Estimated HIV prevalence by district in South Africa with covariates (first row a,c,e) and without covariates (second row b,d,f).

Table 1. Comparison of the fitted models using DIC and CPO.

Covariates	ICAR-Normal	ICAR-Laplace	ICAR-Skew-t
Intercept	2.473 (−3.288, −1.65)	−2.542 (−3.321, −1.743)	−2.538 (−3.625, −1.469)
Population density	−0.0001 (−0.0003, 0.0002)	−0.0001 (−0.0003, 0.0002)	0.0001 (−0.0003, 0.0002)
Male condom distribution	−0.0070 (−0.0183, 0.0069)	−0.0064 (−0.0178, 0.0039)	−0.0069 (−0.0177, 0.0032)
Multidimensional poverty index	0.81056 (−2.826, 4.7939)	0.593 (−3.139, 4.357)	0.8934 (−2.915, 4.71)
ANC HIV prevalence	3.778 (1.673, 5.7058)	3.974 (2.074, 5.897)	3.831 (1.7, 5.931)
$σ_{v}^{2}$	0.0061 (0.0006, 0.6596)	0.0059 (0.0006, 0.9225)	0.0088 (0.0009, 0.4719)
$σ_{u}^{2}$	0.0066 (0.0007, 0.2281)	0.0106 (0.0011, 0.2434)	0.0031 (0.0004, 0.1688)
$δ_{u}$			0.05 (−0.6, 0.62)
DIC	291.3	315	348.1
$L S_{c v}$	172.4	174	170.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ayalew, K.A.; Manda, S.; Cai, B. A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa. Int. J. Environ. Res. Public Health 2021, 18, 11215. https://doi.org/10.3390/ijerph182111215

AMA Style

Ayalew KA, Manda S, Cai B. A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa. International Journal of Environmental Research and Public Health. 2021; 18(21):11215. https://doi.org/10.3390/ijerph182111215

Chicago/Turabian Style

Ayalew, Kassahun Abere, Samuel Manda, and Bo Cai. 2021. "A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa" International Journal of Environmental Research and Public Health 18, no. 21: 11215. https://doi.org/10.3390/ijerph182111215

APA Style

Ayalew, K. A., Manda, S., & Cai, B. (2021). A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa. International Journal of Environmental Research and Public Health, 18(21), 11215. https://doi.org/10.3390/ijerph182111215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparison of Bayesian Spatial Models for HIV Mapping in South Africa

Abstract

1. Introduction

2. Methods and Data Source

2.1. Skew-t Spatial Random Effects Distribution

2.2. Methods for Comparing Competing Models

2.3. Implementation

2.4. Data

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI