Next Article in Journal
Computing the Exact Number of Similarity Classes in the Longest Edge Bisection of Tetrahedra
Next Article in Special Issue
LDAShiny: An R Package for Exploratory Review of Scientific Literature Based on a Bayesian Probabilistic Model and Machine Learning Tools
Previous Article in Journal
Particle Filtering: A Priori Estimation of Observational Errors of a State-Space Model with Linear Observation Equation
Previous Article in Special Issue
Hierarchical Modeling for Diagnostic Test Accuracy Using Multivariate Probability Distribution Functions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variable Selection for the Spatial Autoregressive Model with Autoregressive Disturbances

1
School of Mathematics and Information Science, Nanchang Normal University, Nanchang 330032, China
2
College of Mathematics and Statistics, Fujian Normal University, Fuzhou 350117, China
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(12), 1448; https://doi.org/10.3390/math9121448
Submission received: 8 May 2021 / Revised: 3 June 2021 / Accepted: 17 June 2021 / Published: 21 June 2021
(This article belongs to the Special Issue Multivariate Statistics: Theory and Its Applications)

Abstract

:
Along with the rapid development of the geographic information system, high-dimensional spatial heterogeneous data has emerged bringing theoretical and computational challenges to statistical modeling and analysis. As a result, effective dimensionality reduction and spatial effect recognition has become very important. This paper focuses on variable selection in the spatial autoregressive model with autoregressive disturbances (SARAR) which contains a more comprehensive spatial effect. The variable selection procedure is presented by using the so-called penalized quasi-likelihood approach. Under suitable regular conditions, we obtain the rate of convergence and the asymptotic normality of the estimators. The theoretical results ensure that the proposed method can effectively identify spatial effects of dependent variables, find spatial heterogeneity in error terms, reduce the dimension, and estimate unknown parameters simultaneously. Based on step-by-step transformation, a feasible iterative algorithm is developed to realize spatial effect identification, variable selection, and parameter estimation. In the setting of finite samples, Monte Carlo studies and real data analysis demonstrate that the proposed penalized method performs well and is consistent with the theoretical results.

1. Introduction

Spatial econometric models are mainly used to deal with spatial dependent data in applications. Spatial dependence across sectional units may concern a spatial autocorrelation in a dependent variable or disturbance term. The first form of dependence is usually defined by a spatial autoregressive (SAR) model and another by a spatial error model (SEM). In fact, both spatial dependencies may be reflected in a spatial autoregressive model with autoregressive disturbances (SARAR). These models were first introduced by Cliff and Ord [1], which have aroused wide concern, see, e.g., the research by Kelejian and Prucha [2], Lee [3], Arraiz et al. [4], and the books by Anselin [5] and Cressie [6].
In practice, explanatory variables are needed to be chosen from a number of variables during the initial data analysis. How to select significant variables to keep in the final model becomes very important for further analysis. Therefore, variable selection has received increasing attention in statistical modeling and inference. However, the study of variable selection in spatial econometric models is not as sufficient as that in classical linear models due to the complexity caused by spatial dependence. The main goal of our analysis is to fill some gaps in this area to a certain degree. We mainly focus on a variable selection method for the SARAR model based on a penalized quasi-likelihood method and investigate its oracle property. Furthermore, a feasible algorithm is given for realizing these procedures.
The methods of variable selection for classical linear models have been developed rapidly since the Akaike information criterion (AIC) was proposed by Akaike [7]. Then, similar methods based on the information criterion have progressed remarkably, such as the Bayesian information criterion (BIC) [8], risk inflation criterion (RIC) [9], etc. Using these criteria, the best subset selection became the standard method to select covariants for a long time. Although they are practically useful, the common drawback is the lack of stability and incorporating stochastic errors from each stage of variable selection as noted by Liang and Li [10]. Moreover, it may require a comparison of all possible submodels. This is a combinational problem with NP-complexity [11]. In order to overcome these drawbacks, penalized methods of variable selection have been proposed in recent years, including least absolute shrinkage and selection operator (LASSO) [12], smoothly clipped absolute deviation (SCAD) penalty [13], elastic-net (ENet) [14], adaptive LASSO [15], minimax concave penalty (MCP) [16], and so on. These methods can select significant variables and estimate unknown parameters simultaneously. Fan and Li [13] established the oracle property in the sense that the penalized estimator behaves the same as the ordinary least squares estimator as we know the true linear model, which can be used to assess the efficiency of the penalized estimator. In the Bayesian framework, some developments include Mitchell and Beauchamp [17], Raftery et al. [18], Jiang [19], etc. Other related methods can be found in Chen et al. [20] and Steel [21].
Along with the rapid development of the geographic information system (GIS), variable selection for the spatial econometric models has become a new concern in the last 10 years or so. Based on the Bayesian idea, LeSage and Parent [22] developed the Bayesian model averaging (BMA) technique for the SAR model and SEM. Some extension works include LeSage and Fischer [23] and Cuaresma et al. [24,25]. In order to avoid the complex calculation of marginal likelihoods in BMA, Piribauer [26] used stochastic search variable selection (SSVS) prior to deal with the identification of the SAR model. Generally, it is challenging to extend the penalized methods to data that are dependent either over time or across space, as variable selection involves not only regression coefficients but also auto-correlation coefficients [27]. In recent years, Liu et al. [28] gave an efficient variable selection procedure for the SAR model and obtained the large sample properties by a penalized quasi-likelihood method. Using SCAD penalty and instrumental variable, Xie et al. [29] considered variable selection in the SAR model with a diverging number of parameters. They showed that the SCAD penalty in the SAR model for variable selection also has a nice oracle property as in the classical linear model.
High dimensional spatial data may lead to complex and multiple spatial dependencies. However, the existing methods are constrained by dimension and spatial heterogeneity, which brings great challenges to the application of traditional spatial econometric models. Although the technology of dimension reduction by eliminating redundant information through variable selection in classical linear models is being gradually developed and the research on variable selection in spatial lag models has been completed, it is still difficult to effectively solve the problem of variable selection with spatial heterogeneity in error terms. The SARAR model has both a dependent variable spatial effect and spatial error term, it can reflect spatial effect information and describe spatial heterogeneity relatively comprehensively. Moreover, once the spatial effect of error is ignored, it will lead to model recognition errors, reduce the estimation error and prediction accuracy, and bring concerns to the application research. In light of the above considerations and the excellent performance of penalized methods, we studied the variable selection of spatial cross-section data based on the SARAR model. The main contributions are as follows: (1) For high-dimensional spatial heterogeneous data, a penalty quasi-likelihood method is proposed to solve the problem of dimensionality reduction of explanatory variables and the identification of two kinds of spatial effects. (2) Using the idea of step-by-step transformation, a new iterative numerical algorithm is proposed to avoid the influence of spatial heterogeneity. (3) Simulation and case analysis will help practitioners in related fields to use reasonably. (4) The proposed method can provide a useful reference for the study of variable selection in semi-parametric and nonparametric spatial regression models.
The remainder of this paper is as follows. Section 2 presents a penalized quasi-likelihood method in the SARAR model. Section 3 introduces a feasible algorithm to complete a variable selection procedure. Section 4 provides a Monte Carlo study to investigate the finite sample performance. Section 5 illustrates the proposed method through an application of the Boston housing data. Summary and discussion is stated in Section 6. Appendix A and Appendix B contain some assumptions and proofs of theorems.

2. Model and Variable Selection

2.1. The SARAR Model

The SARAR model can be specified as:
Y n = ρ 1 W 1 n Y n + X n β + U n , U n = ρ 2 W 2 n U n + E n ,
where Y n denotes an n × 1 vector of observations on the dependent variable, X n is an n × k matrix of observations on k exogenous explanatory variables, W 1 n and W 2 n are known n × n spatial weight matrices, β is a k-dimensional parameter vector of regression coefficients, ρ 1 and ρ 2 are scalar spatial autoregressive coefficients with | ρ 1 | < 1 and | ρ 2 | < 1 , U n is an n × 1 vector of regression disturbances, both W 1 n Y n and W 2 n U n are the spatial lag term and spatial error lag term respectively, and E n = e 1 , , e n T is an n-dimensional vector of i.i.d. innovations with zero mean and finite variance σ 2 . Note that this model is also known as the Cliff–Ord model or the SARAR(1,1) model. The SAR model and SEM are corresponding to ρ 2 = 0 and ρ 1 = 0 , respectively.
Let θ 0 = σ 0 2 , ρ 10 , ρ 20 , β 0 T T = θ 1 , 0 , θ 2 , 0 , , θ k + 3 , 0 T be the true value of θ , and θ = σ 2 , ρ 1 , ρ 2 , β T T = θ 1 , θ 2 , , θ k + 3 T . Denote S 1 n ρ 1 = I n ρ 1 W 1 n , S 2 n ρ 2 = I n ρ 2 W 2 n , E n γ = S 2 n ρ 2 S 1 n ρ 1 Y n X n β , where γ = ρ 1 , ρ 2 , β T T . According to the idea of quasi-maximum likelihood estimation [3], we can write the log-quasi-likelihood function of the model (1) as
ln L n θ = n 2 ln 2 π n 2 ln σ 2 + ln S 1 n ρ 1 + ln S 2 n ρ 2 1 2 σ 2 E n T γ E n γ ,
where L n θ is the quasi-likelihood function of the model (1).

2.2. Penalized Method

The spatial econometric research has shown that it is inappropriate to use the the ordinary least squares estimation (OLS) method directly for SAR models. In the case of the SARAR model, the OLS estimators of the spatial autoregressive coefficients are biased and inconsistent. Therefore, the penalized least squared method can not be directly used for variable selection in this model. Considering a good performance of the quasi-maximum likelihood estimation in the SARAR model, the penalized quasi-likelihood method deserves priority. We start with a penalized quasi-likelihood function for the model (1) defined as:
J θ = ln L n θ + n j = 2 k + 3 p λ j θ j ,
where p λ · is the SCAD penalty function defined by Fan and Li [13] as:
p λ ϑ = λ I ϑ λ + a λ ϑ + a 1 λ I ϑ > λ for ϑ > 0 and some a > 2 .
For comparison, we also introduce the following two popular penalty functions.
  • HARD thresholding penalty function:
    p λ | ϑ | = λ 2 ( | ϑ | λ ) 2 I ( | ϑ | < λ ) .
  • L 1 penalty function:
    p λ | ϑ | = λ | ϑ | .
In fact, the L 1 penalty function corresponds to the LASSO [12]. The AIC and BIC correspond to the penalty functions p λ ϑ = n 1 I ( ϑ 0 ) and p λ ϑ = n 1 log ( n ) I ( ϑ 0 ) respectively because j I ( ϑ j 0 ) gives the size of the selected submodel.
In the classical linear models, Fan and Li [13] proposed that a perfect variable selection method should possess the following three properties:
(1)
Unbiasedness: The resulting estimator is nearly unbiased when the true unknown parameter is large to avoid unnecessary modeling bias;
(2)
Sparsity: The resulting estimator automatically sets small estimated coefficients to zero to reduce model complexity;
(3)
Continuity: The resulting estimator is continuous in data to avoid instability in the model prediction.
Under some regular conditions, they showed that variable selection via the SCAD penalty function possesses above properties, but the other penalty functions proposed above may not satisfy the three properties simultaneously. Related references can be seen in Fan and Li [13], and Wang and Zhu [30] for more information.

2.3. Main Results

Note that it may be chaotic in the arrangement of the original non-zero elements of θ 0 . Re-labeling θ 0 can put the non-zero elements in the front together and separate them from the zero elements, which is convenient for the concise expression of the theorems and proofs. Therefore, denote θ 0 = θ 10 T , θ 20 T T , where we assume that θ 10 is a vector containing s nonzero elements and θ 20 = 0 is a k + 3 s -dimensional zero vector. θ ^ = θ ^ 1 T , θ ^ 2 T T is the penalized quasi-likelihood estimator of θ . The theorems stated below give some satisfactory properties of a large sample.
Theorem 1.
Suppose that n a n = o ( 1 ) , b n = o ( 1 ) , and the assumptions in Appendix A hold. Then there is a local minimizer θ ^ of J θ such that:
θ ^ θ 0 = O p n 1 / 2 ,
where a n = max 2 j s p λ j n θ j , 0 , b n = max 2 j s p λ j n θ j , 0 .
For the SCAD penalty function, the p λ j n ( · ) exists at any non-zero point by choosing a proper λ j n . Theorem 1 shows that there is a local minimizer of J θ which is a n consistent penalized quasi-likelihood estimator by choosing appropriate regularization parameter λ j n .
Theorem 2.
Suppose that the assumptions in Appendix A hold, lim n λ j n = 0 , lim n n λ j n = ,   p λ j n δ satisfies lim inf   n lim inf   δ 0 + p λ j n δ / λ j n > 0 . Then with probability approaching one, the n consistent local minimizer θ ^ = θ ^ 1 T , θ ^ 2 T T in Theorem 1 must satisfy:
  • (i) Sparsity: θ ^ 2 = 0 ;
  • (ii) Asymptotic normality:
    n n 1 θ 10 + Λ θ ^ 1 θ 10 + d d N 0 , 1 θ 10 + Ω 1 θ 10 ,
    where n 1 θ 10 , 1 θ 10 , and Ω 1 θ 10 denote the first s upper-left submatrix of n θ 0 , θ 0 = lim n n θ 0 , and Ω θ 0 = lim n Ω n θ 0 respectively, and n θ 0 , Ω n θ 0 are denoted in notations.
Theorem 2 shows that the proposed method can identify the SAR model ( ρ 1 0 , ρ 2 = 0 ), SEM ( ρ 1 = 0 , ρ 2 0 ), SARAR model ( ρ 1 0 , ρ 2 0 ), select explanatory variables and estimate unknown parameters simultaneously. Similar to the analysis of Fan and Li [13], if λ j n 0 as n , then a n 0 for both SCAD and HARD thresholding penalty functions. Moreover, we obtain that Λ 0 and d 0 as n . Thus, under regular conditions, the responding oracle property of the penalized quasi-likelihood estimators can be obtained. That is, the penalized quasi-likelihood estimators perform asymptotically as well as the ordinary quasi-likelihood estimators for nonzero parameters when knowing the correct submodel. However, for the LASSO penalty function, some conditions in Theorem 2 can not be satisfied.

3. Algorithm Design and Implementation

In this section, we consider the implementation of the proposed procedures. Since the penalized quasi-likelihood function J θ is nonconcave, it is challenging to get the global optimum solution. The study by Liu et al. [28] proposed: The existing algorithms, such as local quadratic approximation (LQA) algorithm [13] and local linear approximation (LLA) algorithm [31], can not be used directly to the SAR model. Similarly, those algorithms also do not give the correct minimizer of J θ for the SARAR model. Hence, we design the following iterative algorithm.
Initialization:
θ ( 0 ) = σ ( 0 ) , ρ 1 ( 0 ) , ρ 2 ( 0 ) , β ( 0 ) .
Iteration:
Find β p + 1 by arg min β R k l 1 β = J σ ( p ) , ρ 1 p , ρ 2 p , β ,
Find ρ 1 ( p + 1 ) , ρ 2 ( p + 1 ) by arg min ρ 1 , ρ 2 ( 1 , 1 ) l 2 ρ 1 , ρ 2 = J σ ( p ) , ρ 1 , ρ 2 , β p + 1 ,
Find σ ( p + 1 ) by arg min σ ( 0 , ) l 3 σ = J σ , ρ 1 p + 1 , ρ 2 p + 1 , β p + 1 .
Iterate (5) to (7) until the successive value satisfies | | θ ^ ( q + 1 ) θ ^ ( q ) | | < ε , where θ ^ ( q ) = σ ^ ( q ) , ρ ^ 1 ( q ) , ρ ^ 2 ( q ) , β ^ ( q ) T T and ε is a given tolerance value. In the following simulation, we let ε be 10 4 . Denote the final estimate of σ 2 , ρ 1 , ρ 2 , β as σ ^ 2 , ρ ^ 1 , ρ ^ 2 , β ^ , then θ ^ = σ ^ 2 , ρ ^ 1 , ρ ^ 2 , β ^ T T .
In (4), the initial value of θ is the quasi-maximum likelihood estimate based on the log-quasi-likelihood function of the model (1). In (5), we note that if both autoregressive coefficients ρ 1 and ρ 1 are known in the SARAR model (1), then we can transform it as the following linear model Y n * = X n * β + E n , where Y n * = S 2 n ρ 2 S 1 n ρ 1 Y n , X n * = S 2 n ρ 2 X n . Therefore, the LQA algorithm can be used to complete this step as in the classical linear models. In (6), the optimization problem of bivariate functions can be solved by the Nelder–Mead method [32]. In (7), by using the partial derivative, the unique minimum point is:
σ ( p + 1 ) = 1 n E n T γ ( p + 1 ) E n γ ( p + 1 ) ,
where γ ( p + 1 ) = ρ 1 ( p + 1 ) , ρ 2 ( p + 1 ) , β p + 1 T T .  Figure 1 presents a flowchart of the proposed algorithm.
To implement the above algorithm, the tuning parameters need to be chosen. For the SCAD penalty function, we set a = 3.7 as recommended by Fan and Li [13]. Moreover, it is desirable to select a proper data-driven method to estimate all tuning parameters λ 2 , , λ k + 3 . Wang et al. [33] proved that the optimal tuning parameter in the SCAD penalty can be determined by BIC for the linear regression models. Thus, we can select λ = ( λ 2 , , λ k + 3 ) T by the following Bayesian information criterion:
BIC λ = 2 ln L n θ ^ + α λ log n ,
where α λ = j = 1 k + 3 I θ ^ j 0 . Then λ is set to be λ ^ = arg min λ BIC λ .
In fact, minimizing the BIC over a k + 2 -dimensional space is an unduly onerous task for a large k. To save computation time, one may use the same tuning parameter for all penalty functions. However, the experiments, though not given for saving space, show that the spatial regression coefficient ρ 2 is easy to compress to 0 even if the sample size is medium. Intuitively, we should use different tuning parameters for spatial regression coefficients ρ j ( j = 1 , 2 ) and regression coefficients β j ( j = 1 , , k ) because the range of ρ j ( j = 1 , 2 ) are known before estimation, but the range of β j ( j = 1 , , k ) are not. Thus, we set λ 2 = λ 3 , and λ 4 = = λ k + 3 to optimize the results. It should be pointed out that we can prove the consistency of the BIC criterion under more stringent conditions, such as the bounded derivative of the quasi-likelihood function and α λ . However, it is very difficult to prove the consistency under some mild conditions and will be left for further study.

4. Numerical Simulation

In this section, we conduct some Monte Carlo experiments to evaluate the finite sample performance of the proposed variable selection method in the SARAR model using R codes.

4.1. Simulation Sampling

The sample data is generated by model (1). We consider eight explanatory variables following an 8-dimensional normal distribution with zero mean and covariance matrix σ i j , where σ i j = 0 . 5 | i j | . The spatial autoregressive coefficients are set to be ( ρ 1 , ρ 2 ) = ( 0.7 , 0.7 ) , ( 0.7 , 0.3 ) , ( 0.7 , 0 ) ,   ( 0 , 0.7 ) , and ( 0 , 0 ) . For simplicity, let W 1 n = W 2 n = I R B m , where B m = 1 / m 1 l m l m T I m , ⊗ is the Kronecker product, and l m is an m-dimensional column vector of ones [3,34], which is called the Case spatial weight matrix. To observe the influence of different spatial weight matrices, the Rook spatial weight matrix is introduced, in which w i j is set to be 1 when the regions share a common boundary and set to be 0 for other cases. For the Case spatial weight matrix, we take m = 3 and different values of R, where R = 10 , 20 , 60 , then corresponding sample sizes are n = 30 , 60 , 180 . For the Rook spatial weight matrix, we use the grid square area to generate it according to whether the edges are adjacent. To ensure that the region is square, the value of n is the square of the integer value and n = 36 , 64 , 196 . The regression coefficients are assumed to be β = 3 , 2 , 0 , 0 , 1 , 0 , 0 , 0 T . The innovation e i follows a normal distribution with mean 0 and variance σ 2 = 1 , 1.5 .

4.2. Simulation Results

For each case, we do 100 repetitions. The average number of zero coefficients which are correctly identified is denoted as “C”. The label “I” indicates the average number of non-zero coefficients incorrectly shrunk to zero. To measure the estimation accuracy of θ , we compare the estimation accuracy using the medians of squared error (SE) as in Liang and Li [10], which is defined as:
SE = θ ^ n θ 0 2 = i = 1 k + 3 θ ^ i θ i , 0 2 ,
where θ ^ n = θ ^ 1 , θ ^ 2 , , θ ^ k + 3 T is the estimate of θ 0 . In Table 1, Table 2 and Table 3, Oracle implies the results of variable selection knowing zero parameters. Moreover, other penalty functions, such as HARD and LASSO, are introduced in the penalized quasi-likelihood function for comparison.
Table 1, Table 2 and Table 3 clearly show that there are similar performances for variable selection under both different spatial weight matrices. In other words, the proposed method is not sensitive to the change of the spatial weight matrix. As we expected, all penalty functions can reduce their mSE (the median of SE) and give close results of Oracle with the increase of sample size. In most cases, the SCAD penalty produces the lowest mSE, the HARD penalty has a little bigger than the SCAD penalty, and the LASSO penalty produces the largest mSE. Moreover, if there are spatial effects for both the spatial lag term and the spatial error lag term ( ρ 1 0 and ρ 2 0 ), the mSE is often relatively large; if only one of the spatial lag term and the spatial error lag term is related to the spatial effect ( ρ 1 0 , ρ 2 = 0 , and ρ 1 = 0 , ρ 2 0 ), the value of the mSE is usually smaller, especially when there is no spatial effect ( ρ 1 = 0 and ρ 2 = 0 ). However, like most of the existing results of variable selection, the mSE will be less accurate in all cases if the variance σ 2 of the innovation becomes large. In terms of C and I, we can see that the average number of correctly identifying zero-valued coefficients approaches the true value and the average number of incorrectly identifying zero-valued coefficients approaches 0 as the sample size n increases. These simulation results accord with the theoretical analysis. The SCAD and HARD penalties have good performance about C, there is little difference between them in most cases. They can converge rapidly to the real number of 0 except a LASSO penalty with a low convergence rate, which may imply that both SCAD and HARD tend to give smaller models than LASSO. In the case of small samples, the LASSO penalty has the lowest value of the I in most cases. However, their differences quickly disappear in large samples for all penalties. These results are similar to those obtained by Fan and Li [13]. It is worth noting that when ρ 2 is small, it is easy to compress to 0, and then produce a larger error rate I in the setting of small samples.
Table 4 shows the results of ignoring spatial effects by the LQA algorithm [13] under the same context as in Table 1. In terms of I, when there are two spatial effects ( ρ 1 0 and ρ 2 0 ), the number of incorrect zero in Table 1 is much lower than those in Table 4. When only one spatial effect exists ( ρ 1 0 , ρ 2 = 0 , or ρ 1 = 0 , ρ 2 0 ), the number of incorrect zero in Table 4 decreases slightly compared to the first case and is also larger than that in Table 1. When there is no spatial effect ( ρ 1 = 0 and ρ 2 = 0 ), the results of our algorithm are close to that of the LQA algorithm. Meanwhile, turning attention to the C, our algorithm can identify more true zeros than the LQA algorithm as long as the spatial effect exists ( ρ 1 0 ). Although we are surprised to find that the value of C and I under the LQA algorithm seem to be getting close to the correct values with slow speed as the sample size increases for the SCAD and HARD penalties, the mSE reflected the estimation errors of their parameters are large and outrageous. This is in line with our intuition: Ignoring both spatial effects, the LQA algorithm is implemented on the wrong model and easily leads to a large estimated deviation. Moreover, the LQA algorithm is affected by the initial estimation. In simulation, the initial estimation is the quasi-maximum likelihood estimation, which is equal to the least square estimation (including the observation value of dependent variable Y n ). If the strong spatial effect about ρ 1 is ignored, the observation value of the dependent variable will deviate from the requirement of unbiased estimation seriously, which will lead to a great deviation of the initial estimation. With the influence of iteration, the accumulated error of final estimation will be extraordinary. However, when the spatial effects disappear, we can see that both algorithms have similar good performances, which indicates that no matter whether there are spatial effects, the proposed algorithm still has a satisfactory performance in a finite sample.
Considering the complexity of the asymptotic covariance matrix of θ , we use the traditional bootstrap method in which the sample size of the resampled observations is 100 to obtain the standard deviations of parameter estimates. The parameter vector θ is estimated by our algorithm. SD indicates the median absolute deviation of 100 estimated coefficients in the 100 simulations, which can be regarded as an estimate of the true standard deviation of θ . Using the bootstrap, we calculate a median of estimated standard deviations, denoted as SDm, and estimate its standard deviation by median absolute deviation, denoted as SDmad.
Table 5 provides the numerical simulation results of nonzero coefficients under ρ 1 = 0.7 , ρ 2 = 0.3 , σ 2 = 1 , n = 30 , and n = 60 with the Case spatial weight matrix. The simulation results show that the bootstrap estimated standard deviation becomes increasingly accurate when sample size n increases. In most cases, the SD, SDm, and SDmad obtained by the SCAD and HARD penalties are smaller than that obtained by the LASSO penalty, which shows that the LASSO penalty does not appear to be as stable as the SCAD and HARD penalties. Furthermore, when the σ 2 increases and is away from 1, the estimation of the standard deviation will be less accurate although the results are not presented. In one world, the LASSO penalty generally lags behind the SCAD and HARD penalties concerning the accuracy of estimates. For saving space, the other cases, such as ρ 1 = 0.7 , ρ 2 = 0.7 , or σ 2 = 1.5 , have similar results and are omitted.

5. Data Example

Now, we consider a real example for the application and performance of the proposed variable selection method in the SARAR model.

5.1. The Sample Data

We consider the Boston housing data set which was originally given by Harrison and Rubinfeld [35] and has been used by many authors, for example, Pace and Gilley [36,37], and so on. The data set contains 506 census tracts with 14 nonconstant independent variables. It can be found in the spdep library of R. Similar to the analysis of Harrison and Rubinfeld [35], the dependent variable is set to be log ( MEDV ) and the explanatory variables are assumed as RM 2 , AGE, log ( DIS ) , log ( RAD ) , TAX, PTRATIO, ( B 0.63 ) 2 , log ( LSTAT ) , CRIM, ZN, INDUS, CHAS, and NOX 2 . Table 6 gives the interpretation of all abbreviated variables. For subsequent analysis, the data are centralized and standardized. The spatial weight matrix is constructed with rook contiguity: The weight is 1 if two different areas share a common boundary, and 0 otherwise. Then the matrix is row-normalized as is usually carried out in practice.

5.2. Spatial Dependence Test

In spatial data analysis, the Moran’s I statistic (Moran I) is usually used to test spatial dependence. Table 7 shows the value of the Moran’s I in the Boston housing data. It is 0.7644 with a p-value 2.2 × 10 16 , which implies that the MEDV has a strong spatial correlation. It is well known that the Moran’s I reflects the degree of spatial autocorrelation and can not effectively identify specific spatial autoregressive models due to the existence of different spatial correlations. Fortunately, the popular Lagrange multiplier diagnostics can help us to complete this specification for several different spatial autoregressive models. This test method avoids the optimization of the nonlinear function and is easy to implement. Using the spdep package in R, we can obtain the desired results for identification. From Table 7, it is obvious to see that the p-value in each case is very small, which implies that the Boston housing data can be modeled by spatial models. However, the values of test statistics and p-values suggest that the SARAR model is the best choice among these spatial models to fit the Boston housing data. Moreover, previous studies have used multiple hypothesis tests to judge spatial effects and select explanatory variables, and then determine the model. It is difficult to prove the relevant theoretical properties. Based on the proposed variable selection method, the SARAR model can not only be used to identify different spatial effects and select explanatory variables simultaneously, but also has a good theoretical guarantee. Therefore, we will use the SARAR model for variable selection in this data.

5.3. Model Selection and Estimation

Under a SARAR model, the results are reported in Table 8, where the quasi-maximum likelihood estimate (QMLE) and penalized quasi-likelihood estimate (PQLE) via the SCAD, HARD, and LASSO penalties are listed to assess the performance of variable selection.
The QMLE demonstrates that there are four variables that show a relatively small impact on the MEDV, including ZN, INDUS, CHAS, and AGE. These variables in other studies also show a small effect on the MEDV, such as Harrison and Rubinfeld [35], Pace and Gilley [36], and so on. Moreover, variables with positive effects include ZN, INDUS, RM 2 , log ( RAD ) , ( B 0.63 ) 2 , while others have negative effects. As we expected, the parameter estimates obtained by the penalized method are close to the QMLE, and both nonzero estimates keep the same sign. Moreover, the four insignificant variables (ZN, INDUS, CHAS, and AGE) are penalized to zero under different penalty functions. Therefore, these penalties produce the same selection results in this setting. However, BIC in Table 8 shows that the SCAD and HARD penalties are preferable to the LASSO penalty. Interestingly, although the spatial correlation coefficients ρ 1 and ρ 2 are also penalized by different penalty functions, they do not shrink to zero and have similar results with the QMLE. From the perspective of model specification, we can say that the penalty method recognizes the spatial autoregressive relationship.
For comparison, the Boston housing data is also fitted by a classical linear regression model and the related results are presented in Table 9. The QMLE shows that there are three unimportant variables, including ZN, INDUS, and AGE. Moreover, variables with positive effects include ZN, INDUS, CHAS, RM 2 , AGE, log ( RAD ) , ( B 0.63 ) 2 , while others have negative effects. In addition, all penalties also produce the same selection results in this model. According to the QMLE, these penalties can also select important variables and shrink unimportant variables to zero. Based on the BIC, the SCAD and HARD penalties also outperform the LASSO penalty in this setting.
Although both models have similar selection results, the differences between them are quite obvious. For the QMLE, the estimated coefficient of AGE is negative in the SARAR model, a plausible result, but it is positive in the linear model, which seems implausible. For the PQLE, it is easy to see that the CHAS disappears in the SARAR model while it is relatively important in the linear model. Furthermore, the meaning of the parameter estimation in these two models is also distinctly different. The interpretation of parameter estimates in the SARAR model will become richer and more complicated than that in the linear model because of the spatial autocorrelation [38]. As we expected, the BIC for the SARAR model is far less than that for the classical linear model, which indicates that the SARAR model has a better fitting effect than the classical linear model in such data.

6. Summary and Discussion

In theory, the proposed penalized quasi-likelihood method can identify two kinds of spatial effects, select significant explanatory variables, and estimate unknown parameters simultaneously. The penalized estimators has consistency, sparsity, and normality, which show that the penalty estimation of the coefficient of the significant variable with an unknown zero coefficient is as good as that of the significant variable with a known zero coefficient. In application, the proposed method is consistent with the theoretical results, which can effectively penalize the coefficients of insignificant variables to zero, identify the appropriate spatial regression model, and improve the interpretability of the results due to the decrease of the variable dimension.
From the analysis results of theory and application, it can be seen that the proposed method can effectively achieve a variable selection and identify spatial effects. At the same time, due to the complexity and time consumption of high-dimensional matrix inverse operation, we also find that the optimization efficiency of the penalty quasi- likelihood function still has room for further improvement. Therefore, this method is suitable for the case of a medium sample size and variable dimension not exceeding the sample size. When the sample size is large enough, the penalty GMM method can be considered to improve the operation speed. Once the dimension of the variable exceeds the sample size, our proposed method will not be applicable. Even so, the proposed method can also be used as a basis for future research, such as a new feature selection in spatial data.
In conclusion, it is significant to extend this model to other high dimensional parameter regression models, such as spatial Durbin models, dynamic panel data models, or super high dimensional nonparametric spatial regression models or semi-parametric spatial regression models, such as varying-coefficient spatial regression models, single index spatial regression models, additive spatial regression models, etc. These contents are optional for further research.

Author Contributions

Conceptualization, J.C.; methodology, X.L.; software, X.L.; validation, J.C.; formal analysis, J.C.; investigation, J.C.; resources, J.C.; data curation, X.L.; writing—original draft preparation, X.L.; writing—review and editing, J.C.; visualization, J.C.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. Both authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NSF of Fujian Province, China (2020J01170), Fujian Normal University Innovation Team Foundation, China (IRTL1704), Key Science and Technology Projects of Jiangxi Provincial Department of Education, China (GJJ202603), and Nanchang Normal University PhD Research Foundation, China (NSBSJJ2020006).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Assumptions

The following regular conditions are needed for the large sample properties of the penalized quasi-likelihood estimator.
Assumption A1.
The e i , i = 1 , , n , are independent identically distributed with E e i = 0 and v a r e i = σ 2 . The moment E e 1 4 + v exists for a v > 0 .
Assumption A2.
The elements w 1 n , i j = O 1 / h n , w 1 n , i i = 0 in W 1 n , w 2 n , i j = O 1 / h n , w 2 n , i i = 0 in W 2 n , where i , j = 1 , 2 , , n , and h n / n 0 as n .
Assumption A3.
The matrix S 1 n and S 2 n are nonsingular.
Assumption A4.
The sequences of matrices W 1 n , W 2 n , S 1 n 1 , and S 2 n 1 are uniformly bounded in both row and column sums [39].
Assumption A5.
The lim n n 1 X n T X n exists and is nonsingular. The elements of X n are uniformly bounded constants for all n.
Assumption A6.
The row and column sums of S i n 1 ρ i are uniformly bounded, uniformly in ρ i in a closed subset Λ of ( 1 , 1 ) and the true ρ i 0 is an interior point of Λ, i = 1 , 2 .
Assumption A7.
As n , n 1 X n , G 1 n X n β 0 T X n , G 1 n X n β 0 and n 1 X n , G 2 n X n β 0 T X n , G 2 n X n β 0 exist and are nonsingular.
Assumption A8.
The lim n n θ 0 and lim n Ω n θ 0 exist.
Assumption A9.
The third derivatives 3 L n θ / θ j θ l θ m exist for all θ in an open set Θ that contains the true parameter point θ 0 . Furthermore, there are functions M j l m such that n 1 3 ln L n θ / θ j θ l θ m M j l m for all θ Θ , where E M j l m < for j , l , m .
Assumption A1 provides an essential condition for the use of the central limit theorem in Kelejian and Prucha [40]. Assumption A2 describes the dynamic relation between the spatial weight matrix and sample size n. If { h n } is a bounded sequence, Assumption A2 is easily satisfied. In the Case model [34] where h n may diverge to infinity also satisfies Assumption A2. Assumption A3 can guarantee the existence of mean and variance of independent variable. Assumption A4 implies that the variance of Y n is bounded as n goes to infinity. Similar conditions have been adopted in Kelejian and Prucha [40] and Lee [3]. Assumption A5 can exclude the multicollinearity of the regressors X n . Assuming that the regressors are uniformly bounded is convenient for analysis. If not, it can be replaced by stochastic regressors with certain finite moment conditions [3]. Assumption A6 is deals well with the nonlinearity of ln | S 1 n ρ 1 | and ln | S 2 n ρ 2 | in the log-quasi-likelihood function. Assumption A7 means that G k n X n β 0 and X n are not asymptotically multicollinear with k = 1 , 2 . It is an identification condition of θ 0 . Assumptions A8 and A9 are applied for Taylor expansion of the log-quasi-likelihood function and asymptotic normality of the estimator.

Appendix B. Proofs of Theorems 1 and 2

The following Lemmas are used for proofs of Theorems 1 and 2.
Lemma A1.
Under Assumptions A1–A7, we have:
1 n ln L n θ 0 θ = O p 1 .
Lemma A2.
Under Assumptions A1–A8, we have:
1 n 2 ln L n θ 0 θ θ T = E 1 n 2 ln L n θ 0 θ θ T + o p 1 .
Lemma A3.
Suppose that lim inf   n lim inf   δ 0 + p λ n δ / λ n > 0 , lim n λ n = 0 , lim n n λ n = , and Assumptions 1–9 hold. Then with probability approaching one,
J θ 1 0 = min θ 2 C n 1 / 2 J θ 1 θ 2 ,
where θ 1 satisfies θ 1 θ 10 = O P n 1 / 2 and C is a constant.
Proof of Lemma A1.
It follows from a straightforward calculation that:
ln L n θ 0 θ = ( 1 2 σ 0 4 E n T E n n σ 0 2 1 σ 0 2 E n T S 2 n W 1 n Y n tr G 1 n 1 σ 0 2 E n T G 2 n E n σ 0 2 tr G 2 n 1 σ 0 2 S 2 n X n T E n ) .
By (A1) and some operational properties of related matrices in [3], we have:
1 n ln L n β 0 β = 1 n σ 0 2 S 2 n X n T E n = O p 1 .
Note that:
var 1 n ln L n θ 0 σ 2 = 1 4 n σ 0 8 var E n T E n = O 1 , var 1 n ln L n θ 0 ρ 1 2 n σ 0 2 S 2 n G 1 n X n β 0 T S 2 n G 1 n X n β 0 + 2 n σ 0 4 var E n T S 2 n G 1 n S 2 n 1 E n = O 1 , var 1 n ln L n θ 0 ρ 2 = 1 n σ 0 4 var E n T G 2 n E n = O 1 .
By the Chebyshev inequality, we obtain:
1 n ln L n θ 0 σ 2 = O p 1 , 1 n ln L n θ 0 ρ 1 = O p 1 , 1 n ln L n θ 0 ρ 2 = O p 1 .
 □
Proof of Lemma A2.
Note that:
1 n 2 ln L n θ 0 2 σ 2 = 1 2 σ 0 4 1 n σ 0 6 E n T E n , 1 n 2 ln L n θ 0 β β T = 1 n σ 0 2 S 2 n X n T S 2 n X n , 1 n 2 ln L n θ 0 ρ 1 2 = 1 n σ 0 2 σ 0 2 tr G 1 n 2 + S 2 n W 1 n Y n T S 2 n W 1 n Y n , 1 n 2 ln L n θ 0 ρ 2 2 = 1 n σ 0 2 σ 0 2 tr G 2 n 2 + G 2 n E n T G 2 n E n , 1 n 2 ln L n θ 0 β ρ 1 = 1 n σ 0 2 S 2 n X n T S 2 n W 1 n Y n , 1 n 2 ln L n θ 0 β ρ 2 = 1 n σ 0 2 S 2 n X n T G 2 n + W 2 n X n T E n , 1 n 2 ln L n θ 0 β σ 2 = 1 n σ 0 4 S 2 n X n T E n , 1 n 2 ln L n θ 0 ρ 1 ρ 2 = 1 n σ 0 2 S 2 n 1 E n T S 2 n T W 2 n + S 2 n W 2 n T W 1 n Y n , 1 n 2 ln L n θ 0 ρ 1 σ 2 = 1 2 n σ 0 4 S 2 n W 1 n Y n T E n + E n T S 2 n W 1 n Y n , 1 n 2 ln L n θ 0 ρ 2 σ 2 = 1 2 n σ 0 4 E n T G 2 n s E n .
Then, similar to the proof of Theorem 3.2 in [3], we can obtain Lemma 2. □
Proof of Theorem 1.
Let z n = n 1 / 2 + a n . As demonstrated by Fan and Li [13], it suffices to prove that for any given η > 0 , there is a positive constant C such that:
P inf u = C J θ 0 + z n u > J θ 0 1 η .
(A2) shows that there is a local minimizer in a bounded closed domain { θ 0 + z n u : | | u | | C } for continuous function J θ with probability at least 1 η . Consequently, there exists a local minimizer θ ^ 0 such that | | θ ^ 0 θ 0 | | = O p ( z n ) .
By p λ n 0 = 0 , z n = o ( 1 ) and the Taylor expansion, we have:
J θ 0 + z n u J θ 0 n n 1 z n ln L n θ 0 θ T u + 1 2 u T n θ 0 u z n 2 1 + o p 1 + j = 1 s z n d j u j + z n 2 v j u j 2 1 + o 1 = J 1 + J 2 + J 3 ,
where,
J 1 = n 1 z n ln L n θ 0 θ T u , J 2 = 1 2 u T n θ 0 u z n 2 1 + o p 1 , J 3 = j = 1 s z n d j u j + z n 2 v j u j 2 1 + o 1 .
From Lemma 1, b n = o ( 1 ) , and O p ( n 1 / 2 z n ) = O p ( z n 2 ) , it follows that J 1 = u · O p z n 2 , J 2 = u 2 · O p z n 2 , and J 3 is bounded by u · O p z n 2 + u 2 · o p z n 2 . Thus, J 1 and J 3 can be dominated by J 2 uniformly with a sufficiently large u = C when n . Hence, (A2) holds. Note that z n = o ( n 1 / 2 ) . This completes the proof of Theorem 1. □
Proof of Lemma A3.
It suffices to prove that, for any θ 1 satisfying | | θ 1 θ 10 | | = O p n 1 / 2 and | | θ 2 | | C n 1 / 2 , and j = s + 1 , , k + 3 , with probability tending to 1 as n ,   J θ / θ j and θ j have the same signs for θ j ( C n 1 / 2 , C n 1 / 2 ) .
For θ j 0 and j = s + 1 , , k + 3 ,
J θ θ j = ln L n θ θ j + n p λ j n θ j sgn θ j .
By the Taylor expansion, we have:
ln L n θ θ j = ln L n θ 0 θ j + l = 1 k + 3 2 ln L n θ 0 θ j θ l θ l θ l , 0 + l = 1 k + 3 m = 1 k + 3 3 ln L n θ * θ j θ l θ m θ l θ l , 0 θ m θ m , 0 ,
where θ * lies between θ and θ 0 . Under | | θ 1 θ 10 | | = O p n 1 / 2 , | | θ 2 | | C n 1 / 2 and Assumption A9, we can obtain by Lemmas 1 and 2 that n 1 ln L n θ / θ j is of order O p ( n 1 / 2 ) . Thus,
J θ θ j = n λ j n λ j n 1 p λ j n θ j sgn θ j + O P n 1 / 2 λ j n 1 .
Note that lim inf   n lim inf   δ 0 + λ j n 1 p λ j n δ > 0 and lim n n 1 / 2 λ j n 1 = 0 . The sign of the derivative is the same as that of θ j for a sufficiently large n. This shows that the minimizer attains at θ 2 = 0 . Lemma 3 is proven. □
Proof of Theorem 2.
Lemma 3 shows that part (i) holds. Next, we give the proof of part (ii). By Theorem 1, there is a n consistent local minimizer of J { θ 1 T , 0 T T } denoted as θ ^ 1 , which satisfies:
J θ θ j θ = θ ^ 1 T , 0 T T = 0 for j = 1 , , s .
Note that θ 1 = σ 2 . By the Taylor expansion, we have:
J θ θ j = ln L n θ θ j n p λ j n θ j sgn θ j I j 1 = ln L n θ 0 θ j + l = 1 s 2 ln L n θ 0 θ j θ l + o p 1 θ l θ l , 0 n p λ j n θ j , 0 sgn θ j , 0 + p λ j n θ j , 0 + o p 1 θ j θ j , 0 I j 1 ,
where I j 1 is an indicator function.
Moreover, it follows from (A3) and (A4) that:
ln L n θ 0 θ j = l = 1 s 2 ln L n θ 0 θ j θ l θ ^ l θ l , 0 + n v j θ ^ j θ j , 0 + n d j + o p n .
Note that (A1) can be written as:
0 1 σ 0 2 S 2 n G 1 n X n β 0 T E n 0 1 σ 0 2 S 2 n X n T E n + 1 2 σ 0 4 E n T E n n σ 0 2 1 σ 0 2 E n T S 2 n G 1 n S 2 n 1 E n σ 0 2 tr G 1 n 1 σ 0 2 E n T G 2 n E n σ 0 2 tr G 2 n 0 .
Then, by (A5), Slutsky’s theorem and the central limit theorem of the linear-quadratic form [40], we can obtain:
n n 1 θ 10 + Λ θ ^ 1 θ 10 + d d N 0 , 1 θ 10 + Ω 1 θ 10 .
This completes the proof. □

References

  1. Cliff, A.D.; Ord, J.K. Spatial Autocorrelation; Pion Ltd.: London, UK, 1973. [Google Scholar]
  2. Kelejian, H.H.; Prucha, I.R. A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J. Real. Estate Financ. 1998, 17, 99–121. [Google Scholar] [CrossRef]
  3. Lee, L.F. Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
  4. Arraiz, I.; Drukker, D.M.; Kelejian, H.H.; Prucha, I.R. A spatial Cliff-Ord-type model with heteroskedastic innovations: Small and large sample results. J. Regional. Sci. 2010, 50, 592–614. [Google Scholar] [CrossRef] [Green Version]
  5. Anselin, L. Spatial Econometrics: Methods and Models; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988. [Google Scholar]
  6. Cressie, N. Statistics for Spatial Data; John Wiley and Sons: New York, NY, USA, 1993. [Google Scholar]
  7. Akaike, H. Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 1973, 60, 255–265. [Google Scholar] [CrossRef]
  8. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  9. Foster, D.P.; George, E.I. The risk inflation criterion for multiple regression. Ann. Stat. 1994, 22, 1947–1975. [Google Scholar] [CrossRef]
  10. Liang, H.; Li, R. Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 2009, 104, 234–248. [Google Scholar] [CrossRef] [Green Version]
  11. Huo, X.; Ni, X. When do stepwise algorithms meet subset selection criteria? Ann. Stat. 2007, 35, 870–887. [Google Scholar] [CrossRef] [Green Version]
  12. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  13. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  14. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Statist. Soc. B 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
  15. Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  16. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [Green Version]
  17. Mitchell, T.J.; Beauchamp, J.J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 1988, 83, 1023–1032. [Google Scholar] [CrossRef]
  18. Raftery, A.E.; Madigan, D.; Hoeting, J.A. Bayesian model averaging for linear regression models. J. Am. Stat. Assoc. 1997, 92, 179–191. [Google Scholar] [CrossRef]
  19. Jiang, W.X. Bayesian variable selection for high dimensional generalized linear models: Convergence rates for the fitted densities. Ann. Stat. 2007, 35, 1487–1511. [Google Scholar] [CrossRef] [Green Version]
  20. Chen, Y.; Du, P.; Wang, Y. Variable selection in linear models. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 6, 1–9. [Google Scholar] [CrossRef]
  21. Steel, M.F. Model averaging and its use in economics. J. Econ. Lit. 2020, 58, 644–719. [Google Scholar] [CrossRef]
  22. LeSage, J.P.; Parent, O. Bayesian model averaging for spatial econometric models. Geogr. Anal. 2007, 39, 241–267. [Google Scholar] [CrossRef]
  23. LeSage, J.P.; Fischer, M. Spatial growth regressions, model specification, estimation, and interpretation. Spat. Econ. Anal. 2008, 3, 275–304. [Google Scholar] [CrossRef] [Green Version]
  24. Cuaresma, J.C.; Doppelhofer, G.; Feldkircher, M. The determinants of economic growth in European regions. Reg. Stud. 2014, 48, 44–67. [Google Scholar] [CrossRef]
  25. Cuaresma, J.C.; Doppelhofer, G.; Huber, F.; Piribauer, P. Human capital accumulation and long-term income growth projections for European regions. J. Regional. Sci. 2018, 58, 81–99. [Google Scholar] [CrossRef] [Green Version]
  26. Piribauer, P. Heterogeneity in spatial growth clusters. Empir. Econ. 2016, 51, 659–680. [Google Scholar] [CrossRef]
  27. Zhu, J.; Huang, H.; Reyes, P.E. On selection of spatial linear models for lattice data. J. R. Statist. Soc. B 2010, 72, 389–402. [Google Scholar] [CrossRef] [Green Version]
  28. Liu, X.; Chen, J.; Cheng, S. A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
  29. Xie, T.; Cao, R.; Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat. Pap. 2020, 61, 1125–1145. [Google Scholar] [CrossRef]
  30. Wang, H.; Zhu, J. Variable selection in spatial regression via penalized least squares. Can. J. Stat. 2009, 37, 607–624. [Google Scholar] [CrossRef]
  31. Zou, H.; Li, R. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 2008, 36, 1509–1533. [Google Scholar]
  32. Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
  33. Wang, H.; Li, R.; Tsai, C.L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 2007, 94, 553–568. [Google Scholar] [CrossRef] [PubMed]
  34. Case, A.C. Spatial patterns in household demand. Econometrica 1991, 59, 953–965. [Google Scholar] [CrossRef] [Green Version]
  35. Harrison, D.H.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef] [Green Version]
  36. Pace, R.K.; Gilley, O.W. Using the spatial configuration of the data to improve estimation. J. Real. Estate Financ. 1997, 14, 333–340. [Google Scholar] [CrossRef]
  37. Tang, Q. Robust estimation for functional coefficient regression models with spatial data. Statistics 2014, 48, 388–404. [Google Scholar] [CrossRef]
  38. LeSage, J.; Pace, R. Introduction to Spatial Econometrics; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  39. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
  40. Kelejian, H.H.; Prucha, I.R. On the asymptotic distribution of the Moran I test statistic with applications. J. Econom. 2001, 104, 219–257. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The algorithm flowchart.
Figure 1. The algorithm flowchart.
Mathematics 09 01448 g001
Table 1. Simulation results of variable selection with Case spatial weight matrix.
Table 1. Simulation results of variable selection with Case spatial weight matrix.
σ 2 = 1 n = 30n = 60n = 180
MethodCImSECImSECImSE
ρ 1 = 0.7
ρ 2 = 0.7
SCAD3.83000.11000.43294.60000.01000.09415.00000.00000.0272
Hard4.29000.11000.49614.62000.01000.10484.97000.00000.0276
LASSO2.73000.16000.74523.04000.06000.22093.48000.00000.0635
Oracle5.00000.00000.09095.00000.00000.03005.00000.00000.0090
ρ 1 = 0.7
ρ 2 = 0.3
SCAD3.89000.25000.46274.73000.05000.12185.00000.00000.0284
Hard4.28000.29000.48614.68000.07000.12084.91000.00000.0332
LASSO2.78000.32000.59473.37000.06000.17293.78000.00000.0546
Oracle5.00000.00000.11115.00000.00000.03485.00000.00000.0100
ρ 1 = 0.7
ρ 2 = 0.0
SCAD5.15000.02000.34235.70000.00000.09835.96000.00000.0273
Hard5.40000.02000.42595.57000.00000.10885.94000.00000.0276
LASSO4.19000.01000.45034.38000.00000.15854.60000.00000.0598
Oracle6.00000.00000.04616.00000.00000.01756.00000.00000.0045
ρ 1 = 0.0
ρ 2 = 0.7
SCAD5.10000.16000.49645.68000.02000.12685.91000.00000.0206
Hard5.15000.17000.49755.70000.03000.12885.93000.00000.0271
LASSO4.00000.10000.56334.16000.01000.14724.73000.00000.0487
Oracle6.00000.00000.07616.00000.00000.02606.00000.00000.0071
ρ 1 = 0.0
ρ 2 = 0.0
SCAD5.98000.02000.32836.46000.00000.09516.92000.00000.0264
Hard6.37000.02000.31646.63000.00000.10106.94000.00000.0265
LASSO4.82000.00000.43374.96000.00000.15725.25000.00000.0561
Oracle7.00000.00000.03407.00000.00000.01637.00000.00000.0040
Table 2. Simulation results of variable selection with the Case spatial weight matrix.
Table 2. Simulation results of variable selection with the Case spatial weight matrix.
σ 2 = 1.5 n = 30n = 60n = 180
MethodCImSECImSECImSE
ρ 1 = 0.7
ρ 2 = 0.7
SCAD3.95000.13000.84254.71000.02000.15365.00000.00000.0461
HARD4.31000.13000.84954.62000.02000.16664.96000.00000.0471
LASSO2.91000.34001.63682.99000.14000.46623.51000.00000.1328
Oracle5.00000.00000.18385.00000.00000.05485.00000.00000.0168
ρ 1 = 0.7
ρ 2 = 0.3
SCAD4.08000.40000.86754.73000.04000.18295.00000.00000.0462
HARD4.22000.38000.85294.71000.08000.19024.90000.00000.0536
LASSO2.67000.27000.98623.21000.07000.28223.97000.00000.0987
Oracle5.00000.00000.17095.00000.00000.06665.00000.00000.0188
ρ 1 = 0.7
ρ 2 = 0.0
SCAD5.17000.07000.72045.67000.02000.16545.97000.00000.0462
HARD5.29000.09000.81445.60000.03000.18165.93000.00000.0491
LASSO4.13000.02000.75404.27000.00000.25204.78000.00000.1036
Oracle6.00000.00000.11106.00000.00000.03946.00000.00000.0103
ρ 1 = 0.0
ρ 2 = 0.7
SCAD4.96000.20000.77135.69000.03000.18655.90000.00000.0408
HARD4.99000.21000.87865.68000.04000.18885.92000.00000.0460
LASSO3.55000.09000.86394.13000.01000.25514.99000.00000.0862
Oracle6.00000.00000.14976.00000.00000.05416.00000.00000.0140
ρ 1 = 0.0
ρ 2 = 0.0
SCAD5.97000.07000.68126.50000.02000.16976.91000.00000.0446
HARD6.30000.07000.65526.61000.00000.17146.93000.00000.0449
LASSO4.82000.02000.69334.92000.00000.23375.52000.00000.1020
Oracle7.00000.00000.07657.00000.00000.03677.00000.00000.0100
Table 3. Simulation results of variable selection with the Rook spatial weight matrix.
Table 3. Simulation results of variable selection with the Rook spatial weight matrix.
σ 2 = 1 n = 36n = 64n = 196
MethodCImSECImSECImSE
ρ 1 = 0.7
ρ 2 = 0.7
SCAD4.43000.10000.28024.62000.00000.11424.99000.00000.0290
HARD4.56000.12000.29984.62000.02000.13164.96000.00000.0343
LASSO3.23000.10000.40903.35000.01000.19823.84000.00000.0573
Oracle5.00000.00000.24155.00000.00000.11215.00000.00000.0206
ρ 1 = 0.7
ρ 2 = 0.3
SCAD4.34000.24000.29874.55000.11000.12904.98000.00000.0370
HARD4.39000.28000.32294.59000.15000.14954.96000.00000.0375
LASSO3.28000.27000.39233.29000.12000.19353.86000.00000.0591
Oracle5.00000.00000.25845.00000.00000.12545.00000.00000.0368
ρ 1 = 0.7
ρ 2 = 0.0
SCAD5.51000.00000.20825.68000.00000.09395.98000.00000.0264
HARD5.59000.00000.24305.67000.00000.09595.96000.00000.0264
LASSO4.32000.00000.31244.35000.00000.17484.85000.00000.0489
Oracle6.00000.00000.02936.00000.00000.01736.00000.00000.0053
ρ 1 = 0.0
ρ 2 = 0.7
SCAD5.12000.11000.32385.52000.00000.11685.90000.00000.0300
HARD5.20000.16000.37125.53000.01000.12405.92000.00000.0307
LASSO4.01000.17000.43054.35000.02000.19224.71000.00000.0717
Oracle6.00000.00000.06166.00000.00000.03966.00000.00000.0121
ρ 1 = 0.0
ρ 2 = 0.0
SCAD6.19000.00000.16886.56000.00000.09556.93000.00000.0242
HARD6.53000.01000.18526.56000.00000.11066.88000.00000.0259
LASSO5.07000.00000.31465.29000.00000.16815.71000.00000.0435
Oracle7.00000.00000.02267.00000.00000.01407.00000.00000.0044
Table 4. Simulation results of variable selection when we ignore spatial effects.
Table 4. Simulation results of variable selection when we ignore spatial effects.
σ 2 = 1 n = 30 n = 60 n = 180
MethodCImSECImSECImSE
ρ 1 = 0.7
ρ 2 = 0.7
SCAD4.15001.19002894.04.50000.99004159.84.78000.45005025.6
HARD1.93000.37002153.92.77000.42003697.34.17000.23004870.4
LASSO0.25000.15002110.40.00000.00003449.50.00000.00004762.6
ρ 1 = 0.7
ρ 2 = 0.3
SCAD4.23000.570073.6984.47000.3400105.624.78000.0400122.92
HARD3.92000.430070.8154.37000.3400101.094.74000.0500122.92
LASSO1.76000.210079.0490.49000.110099.8440.00000.0000117.50
ρ 1 = 0.7
ρ 2 = 0.0
SCAD4.26000.440039.3244.55000.190051.21304.77000.010052.666
HARD4.05000.410037.0804.54000.200050.6414.86000.010052.667
LASSO1.97000.190040.9841.12000.060048.7130.00000.000049.750
ρ 1 = 0.0
ρ 2 = 0.7
SCAD3.92000.10000.79084.66000.00000.59774.86000.00000.5517
HARD4.31000.07000.83954.72000.00000.62434.89000.00000.5618
LASSO2.49000.02000.78092.83000.00000.75123.18000.00000.6659
ρ 1 = 0.0
ρ 2 = 0.0
SCAD3.55000.02000.33534.66000.00000.11154.99000.00000.0260
HARD4.14000.02000.43824.66000.00000.11254.90000.00000.0276
LASSO2.49000.00000.49802.86000.00000.16723.31000.00000.0533
Table 5. Standard deviations of estimates of the nonzero regression coefficients.
Table 5. Standard deviations of estimates of the nonzero regression coefficients.
Method n = 30 n = 60
SDSDmSDmadSDSDmSDmad
SCAD
σ 2 0.20060.16340.03250.10910.12820.0154
ρ 1 0.02260.02270.00320.01690.01310.0012
ρ 2 0.11060.18510.02360.05650.08950.0078
β 1 0.12250.12990.01520.11100.08860.0079
β 2 0.14950.13740.01720.11610.08900.0085
β 5 0.20740.14660.02050.11110.08710.0092
HARD
σ 2 0.17880.14980.03250.10220.12510.0157
ρ 1 0.02220.02330.00310.01650.01300.0012
ρ 2 0.08500.11560.01340.05420.06840.0062
β 1 0.12870.12970.01720.11050.08740.0097
β 2 0.18160.13420.01700.11570.08770.0081
β 5 0.19300.13400.02070.11140.08670.0084
LASSO
σ 2 0.21040.16090.03500.12360.13540.0173
ρ 1 0.02570.02520.00270.01900.01370.0013
ρ 2 0.12420.17820.02270.06860.08950.0063
β 1 0.18300.15320.02460.09740.09940.0108
β 2 0.19540.15700.02640.11350.09610.0094
β 5 0.19210.14850.02650.11610.09430.0107
Table 6. Variables used in the analysis.
Table 6. Variables used in the analysis.
VariableDescription
MEDVThe median value of owner-occupied homes. Source: 1970 U.S. Census.
CRIMCrime rate by town. Source: FBI (1970).
ZNProportion of a town’s residential land zoned for lots greater than 25,000 square feet. Source: Metropolitan Area Planning Commission (1972).
INDUSProportion nonretail business acres per town. Source: Harrison and Rubinfeld (1978).
CHASCharles River dummy: =1 if tract bounds the Charles River; =0 if otherwise. Source: 1970 U.S. Census.
NOXNitrogen oxide concentrations in pphm (annual average concentration in parts per hundred million). Source: TASSIM.
RMAverage number of rooms in owner units. Source: 1970 U.S. Census.
AGEProportion of owner units built prior to 1940. Source: 1970 U.S. Census.
DISWeighted distances to five employment centres in the Boston region. Source: Harrison and Rubinfeld (1978).
RADIndex of accessibility to radial highways. It was calculated on a town basis. Source: MIT Boston Project.
TAXFull value property tax rate ($/$10,000). Source: Massachusetts Taxpayers Foundation (1970).
PTRATIOThe number of students divided by the number of teachers in town school district. Source: Massachusetts Dept. of Education (1971–1972).
BBlack proportion of population. Source: 1970 U.S. Census.
PARTProportion of population that is lower status = 1 2 (proportion of adults without some high school education and proportion of male workers classified as laborers). Source: 1970 U.S. Census.
Table 7. Moran’s I test and Lagrange multiplier diagnostics for spatial dependence.
Table 7. Moran’s I test and Lagrange multiplier diagnostics for spatial dependence.
TermsValues of Test Statisticsp-Values
Moran I0.7644<2.2e−16
LMerr186.57<2.2e−16
LMlag190.71<2.2e−16
SARMA228.32<2.2e−16
Note: LMerr represents the test results of the SEM; LMlag represents the test results of the SAR model; and SARMA represents the test results of the SARAR model.
Table 8. Parameter estimates using quasi-maximum likelihood and penalized estimates via SCAD, HARD, and LASSO under a SARAR model.
Table 8. Parameter estimates using quasi-maximum likelihood and penalized estimates via SCAD, HARD, and LASSO under a SARAR model.
TermsQMLESCADHARDLASSO
CRIM−0.1405−0.1240−0.1410−0.1346
ZN0.0221
INDUS0.0280
CHAS−0.0058
NOX 2 −0.1037−0.0223−0.1046−0.0560
RM 2 0.17210.16570.16430.1624
AGE−0.0372
log(DIS)−0.2082−0.1127−0.1851−0.1415
log(RAD)0.17500.11240.16800.1039
TAX−0.1958−0.1583−0.1757−0.1137
PTRATIO−0.1030−0.0816−0.1071−0.0830
(B−0.63) 2 0.08650.07130.08270.0652
log(LSTAT)−0.3998−0.4317−0.4167−0.3900
ρ 1 0.28050.26950.27760.3691
ρ 2 0.41450.44440.41070.2430
σ 2 0.11820.11970.11900.1230
BIC489.81474.18467.36477.00
Table 9. Parameter estimates using quasi-maximum likelihood and penalized estimates via SCAD, HARD, and LASSO under a classical linear model.
Table 9. Parameter estimates using quasi-maximum likelihood and penalized estimates via SCAD, HARD, and LASSO under a classical linear model.
TermsQMLESCADHARDLASSO
CRIM−0.2537−0.2541−0.2539−0.2420
ZN0.0047
INDUS0.0051
CHAS0.05730.05650.05780.0554
NOX 2 −0.2178−0.2157−0.2158−0.1852
RM 2 0.13670.13830.13830.1418
AGE0.0085
log(DIS)−0.2529−0.2570−0.2567−0.2211
log(RAD)0.20350.20130.20110.1569
TAX−0.1744−0.1708−0.1704−0.1373
PTRATIO−0.1663−0.1667−0.1666−0.1575
(B−0.63) 2 0.06920.06890.06930.0665
log(LSTAT)−0.5496−0.5467−0.5464−0.5440
σ 2 0.19510.19520.19520.1961
BIC696.27677.68677.67680.24
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, X.; Chen, J. Variable Selection for the Spatial Autoregressive Model with Autoregressive Disturbances. Mathematics 2021, 9, 1448. https://doi.org/10.3390/math9121448

AMA Style

Liu X, Chen J. Variable Selection for the Spatial Autoregressive Model with Autoregressive Disturbances. Mathematics. 2021; 9(12):1448. https://doi.org/10.3390/math9121448

Chicago/Turabian Style

Liu, Xuan, and Jianbao Chen. 2021. "Variable Selection for the Spatial Autoregressive Model with Autoregressive Disturbances" Mathematics 9, no. 12: 1448. https://doi.org/10.3390/math9121448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop