1. Introduction
Since the 1950s, Bayesian inference has been widely applied and can be observed in almost every field. In the Bayesian framework, the posterior distribution
of the model given parameters
and data
y is of interest. The Markov Chain Monte Carlo (MCMC [
1]) algorithm is able to compute this posterior distribution effectively, but when dealing with large sample data or complex models, it encounters problems of long computation time and slow convergence. To address these problems, Rue et al. (2009) [
2] proposed a new algorithm to combine Laplace approximation with modern numerical integration in the Bayesian framework, INLA. INLA can significantly conquer the problem of the high computational cost of the MCMC algorithm while ensuring fitting accuracy. Although many existing methods can approximate the marginal likelihood [
3,
4,
5,
6], INLA still has the advantages of high estimation accuracy, high computational speed, and high computational power, and its ability of parallel computation is especially important for spatial or spatio-temporal latent Gaussian models, while smaller models also enjoy good speedup.
In addition, the INLA algorithm assumes that the geostatistics data are continuous hidden Mat
rn Gaussian fields (GF) of a single realization. The second-order smooth isotropic Mat
rn GF is a solution of a SPDE [
7] with Mat
rn covariance function (Mat
rn, (1960) [
8]). This covariance is affected by the separation distance between spatial points, so the solution of SPDE can reflect the autocorrelation within spatial points to some extent. SPDE establishes a connection between the continuous Gaussian random field (GRF) and the discrete Gaussian Markov random field (GMRF [
9]), since the precision matrix of GMRF is sparse, it allows a fast Bayesian inference. So, the INLA algorithm can be combined with SPDE. INLA-SPDE has been widely applied in various fields, including air pollution (Cameletti et al. (2013) [
10]), infectious diseases (Moraga et al. (2021) [
11]), species (Moraga, (2021) [
12]), wildfires (Zhang et al. (2023) [
13]), earthquakes (Wilson et al. (2020) [
14]), etc. Detailed application of INLA-SPDE in the R-INLA software can be found in Lindgren et al. (2015) [
15], Krainski et al. (2018) [
16].
However, most current studies employing INLA-SPDE for spatial effects primarily focus on spatial heterogeneity or the correlation within spatial points, which is not sufficient. The spatial dependence between different subjects is also an important aspect to be considered. Anselin (1988) [
17] initially introduced the concept of spatial econometrics, integrating the effects of region, location and spatially related effects into the model. Subsequently, LeSage and Pace (2009) [
18] provided an overview and the application of spatial econometric models. For Bayesian inference, they utilized the MCMC algorithm to estimate the posterior distribution of the model parameters. However, although this technique provides a feasible Bayesian model-fitting method, it still has the problem of heavy computation. Bivand et al. (2014 [
19], 2015 [
20]) described how to use the INLA and Bayesian Model Average (BMA [
21]) to fit some spatial econometrics models. Their focus was on the specification of the response and error terms commonly used in econometric models, but these models could not be directly implemented in the software at that time. Thus, they fitted several conditional models by R-INLA and then combined these models with BMA to fit spatial econometric models. In a later study, G’omez-Rubio et al. (2019) [
22] proposed different methods for applying INLA to fit spatial econometric models and perform multivariate inference on the posterior. In the literature of G’omez-Rubio et al. (2020) [
23], the authors utilized the INLA-BMA algorithm to fit spatial econometric models. Jiaqi Teng (2021) [
24] applied the MCMCINLA method to fit a spatial lag model in a spatial econometric model. G’omez-Rubio et al. (2021) [
25] described a novel class of latent models in order to fit a diverse array of spatial econometric models using R-INLA.
Based on the latent model proposed by G’omez-Rubio et al. (2021) [
25], this paper proposes a new latent Bayesian spatial model under INLA-SPDE, which incorporates the spatial dependence of different subjects and the spatial random effects, thus more comprehensively accounting for the influence of spatial effects on the geostatistics data. In order to analyze the application scenarios and limitations of our proposed model, we simulate the model under the conditions of large samples, small samples and different spatial autocorrelation parameters. It is found that our proposed model has more accurate parameter estimation with large samples and strong spatial autocorrelation effects. Then, the tuberculosis (TB) incidence data in China are used as empirical data to further illustrate the effectiveness of this model. The results show that the estimation of the fixed-effects parameters remains highly accurate, and the significance of some of the fixed effects changed after accounting for spatial dependence. Therefore, this latent model can provide some references for the application of some problems in geostatistics in the future.
The structure of this paper is as follows.
Section 2 presents the background of our proposed model and its construction;
Section 3 offers a proof of principle demonstrating that our model can be applied under the INLA framework;
Section 4 conducts numerical simulations of the proposed model to verify its correctness, giving the applicability and limitations of the proposed model;
Section 5 presents the source of the TB incidence data in China and the pre-processing process of the data, giving the process of implementation and empirical analysis of INLA-SPDE based on this data; finally,
Section 6 summarizes and discusses the entire paper.
3. Proof of GMRF Structure
In order to use INLA-SPDE to fit model (
6), it is essential that the model conforms to the INLA framework, particularly to exhibit the GMRF structure with a sparse precision matrix. Therefore, this section provides a demonstration of the GMRF structure within our proposed model.
Before the proof, we split the main model (
6) into two pieces,
and
, respectively, to simplify the subsequent proof process. For the first block, assume that
has a Gaussian prior with precision matrix
Q and zero mean, i.e.,
and
. According to Bayes’ theorem, the joint distribution of
x and
for the INLA demand can be obtained as follows:
Next, from the definition, assuming that the joint distribution is Gaussian and therefore the conditional distribution is also Gaussian, with
The precision matrix
is symmetric and sparse. Thus, the joint distribution of
x and
is
where
is the precision matrix of
given the hyperparameters
and
,
This shows that
obeys a normal distribution with zero mean and precision matrix
, so the first block has a GMRF structure. Then, due to the strongly sparse and symmetric structure of
, this block is allowed to use INLA for fast computation on GMRF. Details can be found in Rue et al. (2005) [
9].
For the spatial random effect
, it is the exact and stable solution of linear fractional SPDE
where
is the Laplace operator,
controls the smoothness,
controls the variance, and
is the scaling parameter.
can also be expressed using the finite element method by means of a basis function on a triangular profile defined over a domain
Here
G is the number of vertices of the triangular profile and
is a set of deterministic basis functions that are locally supported and segmentally linear in the triangular profile.
obeys a normal distribution with mean 0, whose value is 1 at vertex
g and 0 at the other vertices. By using Neumann boundary conditions, the precision matrix
Q of the normal weight vector
is obtained when
.
The element of the diagonal matrix
C and the sparse matrix
G is
and
(∇ denotes the gradient). Since the elements of the precision matrix
Q depend on
and
, it is a sparse matrix and
is a GMRF with distribution
, which meets the requirements of the INLA algorithm. Details can be found in Lindgren et al. (2011) [
7].
In summary, all these components of the main model exhibit a GMRF structure with a sparse precision matrix and satisfy the criteria of the INLA algorithm.
6. Disscussion
When processing and analyzing geostatistic data or other spatial problems, the effects of spatial dependence and spatial heterogeneity cannot be ignored. In this paper, a new latent Bayesian spatial model is proposed to better take into account the spatial dependence of subjects and spatial random effects. We simulate the proposed model under different arithmetic cases separately and have the applicability and limitation that our proposed model is fitted with high precision parameter estimatimation in large samples and significant spatial autocorrelation problems. Then, we applied our model to the TB incidence data in China under INLA-SPDE, and found that the prevalence of TB has strong spatial dependence and non-uniform spatial random effects in mainland China.
Since this paper focuses on the effects of spatial dependence of subjects and heterogeneity of space itself on the response terms, the non-significant covariate effects in the results are not further investigated, while the spatial lag of the covariates themselves and the spatial autocorrelation of the error terms are not extensively considered in more detail. In the future, researchers can continue to build on this study by exploring how INLA-SPDE can be used to incorporate smoothing covariate effects in the latent classes within a Bayesian framework to further analyze the non-significant effects in the results that are already available, as well as investigating the model’s relationship with any other autocorrelation effects.
Finally, one limitation of our study is that the model used in our analysis only includes the spatial structure of the data and does not consider the temporal structure of the data. Data with a spatio-temporal structure would allow the application of INLA-SPDE to spatio-temporal modeling, which would include covariates that may vary in space and time, as well as modeling the random effects of spatio-temporal variation residuals random effects. This would allow us to obtain the effects of spatio-temporal mixed effects on diseases such as TB. However, it is also important to note that spatio-temporal econometric modeling is also a complex problem that requires further investigation.