Next Article in Journal
Interval Type-3 Fuzzy Inference System Design for Medical Classification Using Genetic Algorithms
Previous Article in Journal
Multi-Player Non-Cooperative Game Strategy of a Nonlinear Stochastic System with Time-Varying Parameters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Variable Selection with Exponential Squared Loss for the Spatial Error Model

College of Science, China University of Petroleum, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Submission received: 19 November 2023 / Revised: 15 December 2023 / Accepted: 17 December 2023 / Published: 20 December 2023

Abstract

:
With the widespread application of spatial data in fields like econometrics and geographic information science, the methods to enhance the robustness of spatial econometric model estimation and variable selection have become a central focus of research. In the context of the spatial error model (SEM), this paper introduces a variable selection method based on exponential square loss and the adaptive lasso penalty. Due to the non-convex and non-differentiable nature of this proposed method, convex programming is not applicable for its solution. We develop a block coordinate descent algorithm, decompose the exponential square component into the difference of two convex functions, and utilize the CCCP algorithm in combination with parabolic interpolation for optimizing problem-solving. Numerical simulations demonstrate that neglecting the spatial effects of error terms can lead to reduced accuracy in selecting zero coefficients in SEM. The proposed method demonstrates robustness even when noise is present in the observed values and when the spatial weights matrix is inaccurate. Finally, we apply the model to the Boston housing dataset.

1. Introduction

With the widespread application of spatial effects data in various fields such as spatial econometrics, geography, and epidemiology, spatial regression models used for handling such data have been extensively studied by many scholars. Spatial effects include spatial correlation (dependence) and spatial heterogeneity. To reflect different forms of spatial correlation, Anselin (1988) [1] categorized spatial econometric models into spatial autoregressive models (SARs), spatial Durbin models (SDMs), and spatial error models (SEMs). Among them, the spatial error model is represented as Y = ρ W Y + X β + U , U = ω M U + V . The spatial error model (SEM) considers the mutual influence between spatially adjacent regions. The fundamental idea is to incorporate spatial autocorrelation as an error term into the regression model, addressing the limitation of traditional regression models in handling spatial autocorrelation issues. In the SEM model, the error terms are no longer independently and identically distributed but exhibit spatial autocorrelation with the error terms of neighboring regions.
With the rapid increase in data dimensionality, the issue of variable selection in spatial error models has attracted widespread attention. In the classic field of linear regression, there have been numerous studies on variable selection. Penalty methods are commonly used for model variable selection, and  several penalty functions have been proposed, such as the least absolute shrinkage and selection operator (LASSO, Tibshirani, 1996 [2]), smoothed trimmed absolute deviation (SCAD, Fan and Li, 2001 [3]) and adaptive lasso (Zou, 2006 [4]). Due to the spatial dependence inherent in SEM, the aforementioned penalty methods can be directly applied to variable selection in the spatial error model.
Due to observation noise and inaccurate spatial weight matrices, traditional variable selection methods can become unstable and inaccurate in parameter estimation. This can also lead to misguidance in variable selection, resulting in the erroneous choice of variables that do not reflect the actual situation.Therefore, efforts have been made to adopt more robust approaches. Many studies employ the Huber loss function (Huber, 2004 [5]); however, the Huber loss function may not be sufficiently robust in the presence of extreme outliers, and it often leads to high computational complexity when dealing with high-dimensional data. Wang et al. (2013) [6] introduced a class of robust estimates with exponential square loss functions ϕ γ = 1 e x p ( t 2 / γ ) . For large values of γ , the exponential square estimates are similar to the least squares estimates; when γ is very small, larger absolute values of t i observations will result in an empirical loss close to 1.0. Therefore, outliers have little impact on the estimates when γ is small, limiting the influence of outliers on the estimates. In contrast, this method exhibits stronger robustness compared to other robust methods, including Huber estimates (Huber, 2004 [5]), quantile regression estimates (Koenker and Bassett, 1978 [7]), and composite quantile regression estimates (Zou and Yuan, 2008 [8]). Wang et al. (2013) [6] also introduced a method for selecting the parameter γ .
We focus on the variable selection problem of the spatial error model (SEM). However, currently, there is not much exploration regarding robust variable selection methods for spatial error models. Liu et al. (2020) [9] employed penalized pseudo-likelihood and SCAD penalties for variable selection in the spatial error model. Through Monte Carlo simulations, they demonstrated that this method exhibits good variable selection performance based on normal data. Dogan (2023) [10] used modified harmonic mean estimates to estimate the marginal likelihood function of the cross-sectional spatial error model, thereby improving the estimation performance of spatial error model parameters. However, their methods are still susceptible to the influence of outliers. Therefore, exploring more robust variable selection methods is imperative. Considering the robustness of the exponential square loss function, we employ an adaptive lasso penalty method in conjunction with the exponential square loss to penalize all unknown parameters in the loss function except for the variance of the random disturbance term. Song et al. (2021) [11] applied this method to the SAR model and achieved satisfactory results. In this paper, we apply the same method to the SEM model. We construct the following optimization model:
min β R p , ρ [ 0 , 1 ] , ω [ 0 , 1 ] L ( β , ρ , ω ) = 1 n i = 1 n ϕ γ ( Y i ρ Y i ˜ X i β ω U i ˜ ) + λ j = 1 p P ( β j ) ,
where ρ , β , and  ω are all parameters to be estimated, and  Y ˜ = W Y , U ˜ = M U . λ is a non-negative regularization term weight coefficient. j = 1 p P ( β j ) is a penalty term, and ϕ γ ( · ) is the exponential square loss mentioned earlier.
Due to the non-convex nature of the exponential square loss function, the empirical loss term is inherently a structured non-convex function with respect to three variable blocks, namely β R p , ρ [ 0 , 1 ] , ω [ 0 , 1 ] . Furthermore, since many penalty methods (lasso or adaptive lasso) are non-differentiable, (1) represents a non-convex, non-differentiable optimization problem. In this paper, we propose a robust variable selection method for the spatial error model based on the exponential squared loss function and adaptive lasso penalty. This method allows for the selection of variables while simultaneously estimating the regression coefficients. The main contributions of this paper are as follows.
(1)
We established a robust variable selection method for SEM, which employs the exponential squared loss and demonstrates good robustness in the presence of outliers in the observations and inaccurate estimation of the spatial weight matrix.
(2)
We designed a BCD algorithm to solve the optimization problem in SEM. We decomposed the exponential square loss component into the difference of convex functions and built a CCCP program for solving the BCD algorithm subproblems. We utilized an accelerated FISTA algorithm to solve the optimization problem with the adaptive lasso penalty term. We also provide the computational complexity of the BCD algorithm.
(3)
We verified the robustness and effectiveness of this method through numerical simulation experiments. The simulation results indicate that neglecting the spatial effects of error terms leads to a decrease in variable selection accuracy. When there is noise in the observations and inaccuracies in the spatial weight matrix, the method proposed in this paper outperforms the comparison methods in correctly identifying zero coefficients, non-zero coefficients, and median squared error (MedSE).
The organizational structure of this paper is as follows: In Section 2, we introduce the SEM model, construct the loss function with exponential squared loss and adaptive lasso regularization, and provide methods for selecting model hyperparameters and estimating variance. In Section 3, we design an efficient algorithm to optimize the loss function, accomplishing variable selection and parameter estimation. Section 4 includes numerical simulation experiments to assess the variable selection and estimation performance of the proposed method. In Section 5, we apply this method to the Boston Housing dataset. Finally, in Section 6, we summarize the entire paper.

2. Model Estimation and Variable Selection

2.1. Spatial Error Model

We consider the following SEM model with covariates
Y = ρ W Y + X β + U , U = ω M U + V ,
where Y i R , Y = ( Y 1 , . . . , Y n ) T is an n-dimensional response vector, X = ( X 1 , . . . , X n ) T R n × p is the design matrix, and  U = ( U 1 , . . . , U n ) T R n × 1 is the error vector. W and M are n × n spatial weight matrix. In practical applications, M is often constructed using a spatial rook matrix. The spatial autocorrelation coefficients ρ and ω are scalars, while ω measures the impact of errors in the dependent variable in neighboring areas on the local observations. A larger ω indicates a stronger spatial dependence effect in the sample observations, while a smaller ω implies a weaker effect. A high-order spatial error model with p lag terms and q disturbance terms is typically represented as follows:
Y = r = 1 p ρ r W r Y + X β + U , U = l = 1 q ω l M l U + V .
This study considers a first-order spatial error model. I n R n × n is an n-dimensional identity matrix. It is typically assumed that V i is independently and identically distributed, i.e.,  V i N ( 0 , σ 2 ) . Therefore, U and Y can be represented as follows:
Y = ( I n ρ W ) 1 ( X β + U ) ,
U = ( I n ω M ) 1 V ,
where it is necessary to ensure that I n ρ W and I n ω M are invertible. According to Banerjee et al. (2014) [12], under certain normalization operations, the maximum singular value of the matrix W is 1. Therefore, condition ρ < 1 ensures the invertibility of ( I n ρ W ). We impose constraints on ρ and ω , i.e.,  ρ < 1 and ω < 1 . Furthermore, we neglect the endogeneity induced by spatial dependence (Ma et al., 2020) [13].

2.2. Variable Selection Methods

This section considers variable selection for model (2). Following Fan and Li (2001) [3], it is typically assumed that the regression coefficient vector β is sparse. A natural approach is to use penalty methods for model handling, which can simultaneously select important variables and estimate unknown parameters. The constructed penalty method can be represented as follows:
min β R p , ρ [ 0 , 1 ] , ω [ 0 , 1 ] L ( β , ρ , ω ) = 1 n i = 1 n ϕ γ ( Y i ρ Y i ˜ X i β ω U i ˜ ) + λ j = 1 p P ( β j ) .
When considering the choice of penalty methods, this paper opts for the adaptive lasso penalty. Let β ^ be a consistent estimator of β . Following the recommendation of Zou (2006) [4], we set r = 1 and define the weight vector η R p , η j = 1 β j ^ r , j = 1 , , p . The adaptive lasso is redefined as
j = 1 p P ( β j ) = j = 1 p η j β j .
The penalty-robust regression formula with an exponential square loss and adaptive lasso penalty is redefined as
min β R p , ρ [ 0 , 1 ] , ω [ 0 , 1 ] L ( β , ρ , ω ) = 1 n i = 1 n ϕ γ ( Y i ρ Y i ˜ X i β ω U i ˜ ) + λ j = 1 p η j β j ,
where Y ˜ = W Y , U ˜ = M U , and  ϕ γ ( · ) is the exponential square loss mentioned earlier ϕ γ ( t ) = 1 exp ( t 2 / γ ) . λ is a positive regularization coefficient. The selection of the tuning parameter γ in the exponential square function will be discussed in the next section.

2.3. The Selection of γ

The parameter γ controls the robustness and estimation efficiency of the proposed variable selection method. Wang et al. (2013) [6] introduced a method for selecting γ . It first identifies a set of tuning parameters such that the proposed penalty estimates have an asymptotic break-even point at 0.5, and then selects the parameter based on the principle of maximum efficiency. The process is as follows:
Step 1. Initialize a set of parameters with β ^ = β ( 0 ) , ρ ^ = ρ ( 0 ) , ω ^ = ω ( 0 ) , where ρ ( 0 ) = ω ( 0 ) = 0.5 . Rewrite the model as Y * = X β + U , where U * = ω M U and Y * = Y ρ W Y U * .
Step 2. Let D n = ( X 1 , Y 1 * ) , . . . , ( X n , Y n * ) be a sample set that consists of a collection of samples, and calculate r i ( β ^ ) = Y i * X i β ^ ,   i = 1 , , n and S n = 1.4826 × m e d i a n i r i ( β ^ ) m e d i a n j ( r j ( β ^ ) ) . Then, there exists a set of pseudo-outliers
D m = ( X i , Y i ) : r i ( β ^ ) 2.5 S n , m = 1 i n : r i ( β ^ ) 2.5 S n , and  D n m = D n \ D m .
Step 3. Construct V ^ ( γ ) = { I ^ ( β ^ ) } 1 Σ ˜ 2 { I ^ ( β ^ ) } 1 , where
I ^ ( β ^ ) = 2 γ 1 n i = 1 n exp r i 2 ( β ^ ) / γ 2 r i ( β ^ ) γ 1 · 1 n i = 1 n X i X i T ,
Σ ˜ 2 = Cov exp r 1 2 ( β ^ ) / γ 2 r 1 ( β ^ ) γ X 1 , , exp r n 2 ( β ^ ) / γ 2 r n ( β ^ ) γ X n .
Next, let γ n be the minimum value of det ( V ^ ( γ ) ) in the set G = { γ : ζ ( γ ) ( 0 , 1 ] } , where ζ ( · ) is defined the same way as in Wang et al. (2013) [6].
Step 4. The optimal solution from the optimization problem min β R p , ρ [ 0 , 1 ] , ω [ 0 , 1 ] L ( β , ρ , ω ) = 1 n i = 1 n ϕ γ ( Y i ρ Y i ˜ X i β ω U i ˜ ) updates ρ ^ , β ^ , and  ω ^ , where Y ˜ = W Y and U ˜ = M U . Terminate if the result converges; otherwise, return to Step 2.
Considering the potential difficulties arising from the significant computational burden, we do not use cross validation to select γ and the penalty coefficient λ . In practical applications, we can find a threshold γ 1 for which ζ ( γ 1 ) = 1 . The values of γ typically fall within the interval [ 5 γ 1 , 30 γ 1 ] . In the initial steps mentioned above, an initial estimate β ( 0 ) is required. In numerical simulations, we use an estimate with the LAD loss as the initial estimate.

2.4. The Selection of λ and η j

When considering the selection of λ and η j , it is common to unify the two parameters: λ j = λ · η j . Various methods such as AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), and cross validation can be used to choose these parameters.
To ensure the consistency of the variable selection and reduce computational complexity, we follow the method described in Wang et al. (2007) [14] by selecting the regularization parameter through the minimization of the BIC objective function. The objective function is as follows:
i = 1 n 1 e x p ( Y i ρ Y i ˜ X i β ω U i ˜ ) 2 / γ n + n j = 1 p λ j β j j = 1 p l o g ( 0.5 n λ j ) l o g ( n ) .
From this, we can deduce that λ j = l o g ( n ) n β j . In practical applications, we may not know the values of β j , but we can easily obtain β ˜ j through the exponential square loss. In Section 2.3, we obtained the estimate for γ . It is important to note that when making this simple selection, the following conditions should be satisfied: n λ ^ j 0 , j p 0 and n λ ^ j , j > p 0 , where p 0 is the number of non-zero values in β . This ensures the consistency of the variable selection in the final estimate.

2.5. Estimation of Noise Variance

We estimate the variance of the random disturbance term; let G = ( I n ρ W ) 1 , F = ( I n ω M ) 1 , then the estimated variance of the random disturbance term V is given by
σ ^ 2 = 1 n ( Y G X β ) ( G F ) ( G F ) T 1 ( Y G X β ) ,
where β , ρ , and  ω can be obtained by solving Equation (8). It is known from the constraints on ρ and ω that both G and F are non-singular matrices. Let u = G X β and v = G F . Then, the estimated σ ^ 2 defined in Equation (10) can be calculated using the following equation:
σ ^ 2 = 1 n v · ( y u ) 2 2 .

3. Model Solving Algorithm

This section outlines the algorithm for solving Equation (8). In this optimization problem, the objective function can be divided into three variable blocks: β R p , ρ [ 0 , 1 ] , and  ω [ 0 , 1 ] . We can use a block coordinate descent algorithm to solve it. However, the objective function in Equation (8) is non-convex and non-smooth. Using block coordinate descent alone can be challenging to ensure convergence. Therefore, we perform a convex–concave procedure (CCCP) by decomposing the exponential square loss part and utilize the CCCP algorithm to solve the subproblem for β . For the lasso and adaptive lasso penalty terms, we use the ISTA (Iterative Soft Thresholding) algorithm, and this paper employs an accelerated FISTA algorithm.

3.1. Block Coordinate Descent Algorithm

In Algorithm 1, we provide the framework of a block coordinate descent algorithm for the alternating iterative solution of β , ω , ρ .
The next tasks are to solve subproblems (14)–(16). Subproblems (14) and (15) can be transformed into single-variable function optimization problems with the other two parameters fixed. The range for achieving the optimal solution of the objective function is [0, 1]. Therefore, the parabolic interpolation-based golden section method can be employed for solving them. For specific algorithm details, please refer to Forsythe et al. (1977) [15]. In the next section, we will discuss the solution to (16) in detail.
Algorithm 1 BCD algorithm framework.
1. Set the initial iteration point β 0 R p , ρ 0 [ 0 , 1 ] , ω 0 [ 0 , 1 ] ;
2. Repeat k = k = 0 , 1 , 2 , . . . ;
3. Solve the subproblem about ρ with initial point ρ k ;
ρ k + 1 min ρ [ 0 , 1 ] L β k , ρ , ω k

4. Solve the subproblem about ω with initial point ω k ;
min ω [ 0 , 1 ] L β k , ρ k + 1 , ω ω k + 1

5. Solve the subproblem with initial value β k ,
β k + 1 min β R p L β , ρ k + 1 , ω k + 1

to obtain a solution β k + 1 , ensuring that L ( β k , ρ k + 1 , ω k + 1 ) L ( β k + 1 , ρ k + 1 , ω k + 1 ) 0 ,
and that β k + 1 is a stationary point of L ( β k , ρ k + 1 , ω k + 1 ) .
6. Iterate until convergence.

3.2. Solving Subproblem (16)

Given that the optimization problem is composed of the exponential square loss term and the lasso or adaptive lasso penalty term, it can be observed that when ρ and ω are fixed, the lasso or adaptive lasso is convex. The exponential square loss can be decomposed into the difference of two convex functions. Therefore, subproblem (16) is a DC (difference of convex) program, and it can be solved using the appropriate algorithms. When decomposing the exponential square loss function into a DC form, we made the following attempts.
Proposition 1. 
ϕ γ ( t ) = 1 e x p ( t 2 / γ ) , We assume that ϕ γ ( t ) = f ( t ) g ( t ) , where both f ( t ) and g ( t ) are convex functions. We can assume that g ( t ) = 4 γ 2 t 2 based on the non-negativity of the second derivative. Then, we set:
g ( t ) = 1 3 γ 2 t 4 ,
f ( t ) = ϕ γ ( t ) + g ( t ) = 1 e x p ( t 2 / γ ) + 1 3 γ 2 t 4 .
It can be proven that both f ( t ) and g ( t ) are convex functions, completing the DC decomposition of the exponential square part ϕ γ ( t ) = f ( t ) g ( t ) . We can then define the following two functions:
J v e x ( β ) = 1 n i = 1 n f ( Y i ρ k + 1 W i Y X i β ω k + 1 M i U ) + λ j = 1 p η j β j ,
J c a v ( β ) = 1 n i = 1 n g ( Y i ρ k + 1 W i Y X i β ω k + 1 M i U ) ,
where f ( · ) and g ( · ) are defined in (17) and (18), W i and M i represent the i-th row of the spatial weight matrices W and M, λ j = 1 p η j β j is the convex penalty with respect to β . Then, J v e x ( · ) and J c a v ( · ) are convex and concave functions, respectively. Subproblem (16) can be rewritten as
min β R p L β , ρ k + 1 , ω k + 1 = J v e x ( β ) + J c a v ( β ) .
For this convex–concave procedure, we can use Algorithm 2 for solving it as described in Yuille et al. (2003) [16].
Algorithm 2 Convex–concave procedure algorithm.
1. Set an initial point β 0 and initialize k = 0 .
2. Repeat
3.
β k + 1 = a r g m i n β J v e x ( β ) + J c a v ( β k ) · β

4. Update the iteration counter: k = k + 1
5. Until convergence of β k
The CCCP algorithm is a monotonically decreasing global optimization method that approximates the global optimal solution by alternating between optimizing the convex and concave parts. Therefore, it is possible to use iterative solutions to minimize the objective function in (21) by solving (22). We are considering which algorithm to use for solving (22) to improve the efficiency of Algorithm 2.
We notice that the first term in (22) consists of a convex function 1 n i = 1 n f ( · ) and a convex penalty term on β given by λ j = 1 p η j β j , while the second term is a linear function of β . We can rewrite (22) as follows:
min β R p ψ ( β ) + λ i = 1 p η i β j ,
where ψ ( β ) is a continuously differentiable convex function, defined as ψ ( β ) = 1 n i = 1 n f ( Y i ρ k + 1 W i Y X i β ω k + 1 M i U ) + J c a v ( β k ) · β . Beck and Teboulle (2009) [17] introduced an algorithm called ISTA (Iterative Soft Thresholding Algorithm) to solve optimization problems with lasso penalties. Song et al. (2021) [11] proved that this algorithm can be applied to models with adaptive lasso penalties. Therefore, ISTA can be used to solve problems structured like (23). This paper chooses the accelerated version of ISTA, known as the FISTA algorithm, for optimization. The convergence properties of FISTA have been established by Beck and Teboulle (2009) [17]. Algorithm 3 provides an overview of the FISTA algorithm.
With this, we complete the algorithm design for solving (16). In addition, we provide the computational complexity of the BCD program and an introduction to the machine specifications used in subsequent experiments in Appendix B.
Algorithm 3 FISTA algorithm with backtracking step for solving (22).
Require: A , ξ , w λ > 0
Ensure: solution β
1: Step 0. Select L 0 > 0 , η > 1 , β 0 R 2 p . Let ξ 1 = β 0 , t 1 = 1
2: Step k ( k 1 ) .
3:  Determine the smallest non-negative integer i k which makes L ¯ = η i k L k 1 satisfy
4:
F Θ L ¯ ξ k Q L ¯ Θ L ¯ ξ k , ξ k .

5: Let L k = η i k L k 1 and calculate:
6: β k = Θ L k ξ k
7: t k + 1 = 1 2 1 + 1 + 4 t k 2
8: ξ k + 1 = β k + t k 1 t k + 1 β k β k 1
9: Output β : = β k .

4. Numerical Simulations

In this section, we conducted several sets of numerical simulations to validate the impact of spatial errors on the estimation and variable selection of the SEM model. We also compared the performance of the proposed variable selection method with other methods in various scenarios, including small and large sample sizes, the presence of many non-significant covariates, noisy response variable observations, and inaccurate spatial weight matrix.

4.1. Generation of Simulated Data

The data generation process is based on model (2). We consider covariates to follow a normal distribution with a mean of zero and a covariance matrix of ( σ i j ) , where σ i j = | 0.5 | i j . In other words, the design matrix X n is an n × ( q + 3 ) matrix. Let the sample size n 30 , 150 , 300 and the number of non-significant covariates q 5 , 20 , 100 , 200 .
The spatial autoregressive coefficient ρ follows a uniform distribution on the interval [ ρ 1 0.1 , ρ 1 + 0.1 ] , where ρ 1 0.8 , 0.5 , 0.2 . To verify whether spatial dependence among response variables affects the model estimation and variable selection, an additional experiment with ρ = 0 is included. In this case, the SEM model reduces to a linear error model. The spatial dependence coefficient in the error term ω [ 0 , 0.6 ] is used to examine the influence of spatial effects in the error terms on SEM model estimation and variable selection. The coefficients for covariates are set as a sparse vector β = ( β 1 , β 2 , β 3 , 0 q ) , where q is the number of non-significant covariates. The values of ( β 1 , β 2 , β 3 ) are sampled from a normal distribution with means of ( 3 , 2 , 1.6 ) and a variance of 0.01 × I 3 , where I 3 is the 3 × 3 identity matrix in R 3 × 3 .
Let the spatial weight matrix W n = I R B m , where B m = 1 m 1 ( 1 m · 1 m T I m ) , ⊗ denotes the Kronecker product, and I m is an m-dimensional column vector consisting of ones. In this study, let m = 3 . The error term’s spatial weight matrix M is constructed as a spatial rook matrix, considering only adjacent elements in the horizontal and vertical directions. In this matrix, adjacent element positions are set to 1, while all other positions are set to 0. The response variable Y and the error term U are generated according to the (4) and (5) provided in the context.
Independent random disturbance terms V N ( 0 , σ 2 I n ) . The parameter σ 2 is uniformly distributed in the interval [ σ 1 − 0.1, σ 1 + 0.1], where σ 1 = 1 . When considering noise in the response variables, the interference term follows a mixture of normal distributions. Specifically, V is distributed as a mixture of two normal distributions V ( 1 δ 1 ) · N ( 0 , 1 ) + δ 1 · N ( 10 , 6 2 ) , where δ 1 = 0.01 .
We also designed inaccurate spatial weight matrix W by randomly removing 30 % of the non-zero values in each row of W and adding 50 % non-zero elements randomly in each row of W. Then, the constructed inaccurate W was normalized and used in the SEM model.
Each scenario was based on 100 simulations. To gauge the model’s superiority, we compared it with square loss and absolute loss. For accurate performance evaluation, we employed the median squared error (MedSE, Liang and Li, 2009) [18] as the metric for accuracy comparison.

4.2. Unregularized Estimation on Gaussian Noise Data

In this section, the unregularized estimation of independent variable coefficients, spatial weight coefficients, and noise variance is performed using the SEM model on Gaussian noise data for both q = 5 and q > 5 .
When considering the spatial effect of the error term ( ω = 0.6 ), the model’s estimation results are shown in Figure 1. (1) All three loss functions yield estimates of β 1 , β 2 , β 3 , and σ 2 that closely align with their true values ( β 1 , β 2 , β 3 have means of 3, 2, 1.6, and the mean of σ 2 is 1) as seen in Table A1. As the sample size increases, the estimates gradually converge to the true values. Notably, the square loss function provides the most accurate estimates. (2) When not considering the spatial effect of the response variable ( ρ = 0 ), the SEM model also produces reasonably accurate estimates. However, ignoring the spatial effect of the error term ( ω = 0 ) significantly reduces the accuracy of the SEM estimates. This is especially apparent in the estimation of the noise variance, and the estimates for β and ρ also become less accurate, resulting in an increase in MedSE. (3) According to Figure 2, by comparing the median squared error (MedSE), it is evident that the SEM model estimates based on the squared loss function perform the best in terms of estimation performance.
The results in Figure 3 illustrate the estimation performance of the Structural Equation Modeling (SEM)model on normal data when there are numerous non-significant covariates. It is evident that the accuracy of the parameter estimates for the three loss functions is much lower than the results shown in Figure 1. Specifically, there are changes in the median values of parameter estimates and an increase in the number of outliers, with the exponential square loss function yielding the fewest outlier estimates. Examining the specific data in Table A2, it is observed that as the sample size increases, the estimates for parameters gradually approach the true values, and MedSE shows a decreasing trend. However, as anticipated, the results are still unsatisfactory.

4.3. Unregularized Estimation When the Observed Values of Y Have Outliers

In this section, we use the SEM model to estimate the data with outliers in the response variable without regularization. (1) Similar to Table A1, all three loss functions provide reasonably accurate estimates for β and σ 2 , and as the sample size increases, the estimates get closer to the true values. Furthermore, the exponential loss, in particular, provides better estimates for β and σ 2 than the other two methods. (2) When ignoring the spatial effect of the model’s error term, the SEM estimates for the coefficients and noise variance are not accurate, and MedSE increases significantly compared to when ω 0 . (3) By comparing Figure 1a with Figure 4a, it can be observed that the exponential square loss demonstrates better resistance to the influence of noise in the observed values. Additionally, through Figure 4b, similar results to Figure 2 can be observed, further highlighting the robustness advantage of the exponential square loss.

4.4. Unregularized Estimation with Noisy Spatial Weight Matrix

In this section, we designed an inaccurate spatial weight matrix and used the SEM model to estimate the coefficients and noise variance. “Remove 30% ” and “Add 50%” refer to randomly removing 30 % of non-zero values in each row of W and adding 50 % non-zero elements in each row, respectively. Table A4 presents the estimation results. All simulated data are generated based on ρ = 0.5 and σ 1 = 1 . Compared to the estimation results on normal data in Table A1, the inaccurate W leads to increased MedSE values and a decrease in the estimation performance for all three loss functions. To observe the variations in MedSE for each loss function, we plotted Figure 5. (1) When some non-zero weights in the matrix W are removed, the MedSE for all three loss functions decreases with an increase in the sample size. Among the three loss functions, the MedSE for the exponential square loss shows the most significant decrease, regardless of whether weights are added or removed. (2) When half of the non-zero weights in each row of W are added, all three methods have higher MedSE values compared to removing some non-zero weights from W. (3) Through the comparison of the estimated values and MedSE, we find that the exponential square loss shows more robustness when combating the impact of inaccurate spatial weight matrices.

4.5. Variable Selection with Regularizer on Gaussian Noise Data

In this section, we perform variable selection on the generated Gaussian noise data (including q = 5 and q > 5 ) using the SEM model with three different loss functions and the penalty method (lasso or adaptive lasso). “Correct” represents the average number of correctly selected zero coefficients, and “InCorrect” represents the average number of non-zero coefficients incorrectly identified as zero. “Exp + l1 ”, “S + l1” and “Lad + l1” represent SEM models with exponential loss, square loss, and absolute loss, respectively, using the lasso penalty. “Exp + l ˜ 1 ”, “S + l ˜ 1 ” and “Lad + l ˜ 1 ” represent SEM models with exponential loss, square loss, and absolute loss, respectively, using the adaptive lasso penalty. The sample size for the variable selection simulation experiments is n 30 , 300 .
We visualized the results of variable selection for the model when considering the spatial effect of the error term and created Figure 6. (1) By observing Figure 6a, it is evident that the correctness of selecting non-zero regression coefficients for the three loss functions is relatively low when the sample size is small. Additionally, the loss functions incorporating the adaptive lasso penalty achieve higher accuracy in variable selection compared to those with the lasso penalty alone. As the sample size increases, all methods can almost correctly select all variables. A comparison reveals that the exponential square loss, combined with the adaptive lasso penalty, achieves higher accuracy. (2) The results in Figure 6b indicate that ignoring the spatial effect of the error term introduces significant disturbance to variable selection. The correctness of variable selection for all methods drops below half, and even with an increase in sample size, the improvement in variable selection is limited. However, through comparison, it is found that the exponential square loss exhibits the best robustness. (3) Combining with Table A5, it is observed that among the three loss functions, the SEM model with the exponential loss function has lower MedSE compared to the other two loss functions.
Based on the variable selection structure when the number of non-significant covariates approaches the sample size, we plotted Figure 7. Due to the significant differences in variable selection results for different values of q (30 and 300), it is challenging to display them well in the graph. Therefore, we compared the variable selection differences for each method with or without considering the spatial effect of the error term. From Figure 7, it is evident that not considering the spatial effect of the error term significantly reduces the correct selection of variables. Table A6 presents the detailed results. Comparing with the results in Table A5, it is observed that when the sample size is small, using both lasso and adaptive lasso penalties with all three loss functions may lead to a slightly higher number of incorrect identifications of zero coefficients. However, as the sample size increases, the error rate decreases to zero. Importantly, the combination of the exponential square loss function with lasso and adaptive lasso penalties performs better in variable selection, ensuring smaller MedSE even when identifying more zero coefficients correctly. Similarly, ignoring the spatial effect of the error term leads to an increase in the error rate in variable selection for the SEM model, accompanied by an increase in MedSE.

4.6. Variable Selection with Regularizer on Outlier Data

In order to compare the estimation and variable selection performance of various variable selection methods when data contain outliers, in this section, we conducted numerical simulations with response variable observations containing outliers and an inaccurate spatial weight matrix.
Table 1 presents the variable selection results of the SEM model when response variable observations contain outliers. We included data for sample sizes of 30 and 300, with δ = 0.01 . It can be observed that in almost all test cases, the SEM model with the exponential square loss and lasso or adaptive lasso penalties identifies more true zero coefficients and has a lower MedSE. Compared to the variable selection results for normal data in Table A5, the superiority of the exponential square loss with lasso and adaptive lasso penalties is more pronounced in the presence of outliers. It achieves higher accuracy in identifying zero coefficients and has lower MedSE. This indicates that the exponential square loss exhibits stronger robustness when outliers are present in Y. By observing the results for ω = 0 and ω = 0.6 , we can see that, similar to the results in Table A5 and Table A6, ignoring the spatial dependency in the error term often leads to a reduction in the number of correctly identified zero coefficients in the SEM model, decreased accuracy in identification, and an increase in MedSE.
Table 2 presents the variable selection results in the case of inaccurate W. Because in practical applications it can be challenging to estimate the spatial weight matrix, this simulation provides valuable insights. In all test scenarios, the SEM model with the exponential square loss and lasso or adaptive lasso penalties achieves higher accuracy in identifying zero coefficients. In cases where 30 % of the non-zero values in W are removed and 50 % of non-zero values are added to W, the superiority of the proposed method becomes more apparent. It can correctly identify more zero coefficients while keeping MedSE smaller. In the presence of an inaccurate weight matrix, the impact of the spatial effects on the variable selection is similar to the results in Table A5 and Table A6, as well as Table 1. When ω = 0 , the SEM model incorrectly identifies a larger number of zero coefficients, but when ω = 0.6 , the number of incorrectly identified zero coefficients is reduced to zero. These results indicate that the proposed SEM variable selection method with the exponential square loss and lasso or adaptive lasso penalties is robust and suitable for variable selection in spatially dependent data, even when the weight matrix W cannot be accurately estimated.

5. Empirical Data Verification

In this section, we apply the SEM variable selection method proposed in this paper to empirical dataset to verify the performance of parameter estimation and variable selection. The purpose of this experiment is to validate whether the proposed method can correctly select covariates that are useful for the response variable, according to Burnham and Andersen (2002) [19]. We choose BIC (Bayesian Information Criterion) as the criterion to assess the performance of the method. This paper uses the Boston Housing dataset, which was initially created by Harrison and Rubinfeld (1978 [20]). The data come from the real estate market in the Boston area in the 1970s and include information about housing prices in Massachusetts, USA. This dataset can be obtained from the “spdeep” library in R.
The Boston Housing dataset consists of 506 samples, each with 13 features and one response variable, making a total of 14 columns, Table 3 describes the meanings of various features in this dataset. The main objective is to study the relationship between housing prices and other variables and select the most important ones. The response variable is the logarithm of MEDV, and the predictor variables include the logarithm of DIS, RAD, LSTAT, as well as the square of RM and NOX. Other variables considered are CRIM, ZN, INDUS, CHAS, AGE, TAX, PTRATIO, and B-1000.
The spatial weight matrix W can be calculated based on the Euclidean distance using latitude and longitude coordinates. We set a distance threshold d 0 , and when the distance d i j between two regions is greater than d 0 , w i j = 0 . w i j can be represented as:
d i j = ( L A T T i L A T T j ) 2 + ( L O N G i L O N G j ) 2 ,
D i j = m a x ( 1 d i j d 0 , 0 ) , i j ,
w i j = 0 i = j D i j / ( j = 1 , i j n D i j ) i j
Then, normalize each row of the spatial weight matrix W and incorporate it into the SEM model.
Table 4 presents the estimates and variable selection results for the SEM with three loss functions, lasso or adaptive lasso penalties, and no penalty. The results indicate that the SEM with all three loss functions provide estimates for ρ and σ 2 around 0.5 and 0.2, respectively. For variable selection, we consider a variable important when the absolute value of its coefficient estimate is greater than 0.05, while a variable is deemed unimportant when the absolute value of its coefficient estimate is less than 0.005. It can be observed that, in the absence of regularization terms, all three loss functions estimate the coefficients of NOX, DIS, RAD, and LSTAT to be greater than 0.05, classifying them as important variables. Moreover, all models suggest a negative correlation between NOX, DIS, LSTAT, and MEDV, indicating that higher nitrogen oxide concentrations, greater distances to employment centers, and a higher percentage of lower-income population are associated with lower housing prices. On the other hand, RAD is considered positively correlated with MEDV, implying that a higher highway accessibility index leads to higher housing prices in the area. However, INDUS, CHAS, and AGE are deemed unimportant variables, as their coefficient estimates have absolute values less than 0.005. ZN, TAX, and B-1000 have coefficient estimates close to zero. These findings suggest that the proposed method with the exponential square loss is effective in variable selection.
To visually compare the variable selection performance with penalty terms for the three loss functions, we marked the variable selection results of the SEM model with “ + ” and “−” symbols. We set a threshold of 0.005, and variables with coefficients greater than 0.005 were marked with “ + ”, while those with coefficients less than −0.005 were marked with “−”. As shown in Table 5, the SEM models with the exponential square loss and adaptive lasso selected fewer variables. Furthermore, compared to the other two loss functions (with and without lasso penalties), the SEM model using adaptive lasso and exponential square loss has the lowest BIC index. This clearly demonstrates the superiority of the variable selection method proposed in this paper.

6. Conclusions and Discussions

The paper proposes a robust variable selection method for the spatial autoregressive model with exponential squared loss and the adaptive lasso penalty. This method can simultaneously select variables and estimate unknown coefficients. The main conclusions of this study are as follows:
  • The penalized exponential squared loss effectively selects non-zero coefficients of covariates. When there is noise in the observations and an inaccurate spatial weight matrix, the proposed method shows significant resistance to their impact, demonstrating good robustness.
  • The block coordinate descent (BCD) algorithm proposed in this work is effective in optimizing the penalized exponential squared loss function.
  • In numerical simulation experiments and empirical applications, variable selection results with lasso and adaptive lasso penalties were compared, and adaptive lasso consistently outperformed in various scenarios.
  • It is noteworthy that ignoring the spatial effects of error terms (i.e., ω = 0 ) severely reduces the accuracy of this variable selection method. However, for general error models (when ρ = 0 , ω 0 ), the proposed method remains applicable.
In the field of spatial econometrics, there are many other spatial regression models, as well as numerous robust loss functions and penalties. Building on the foundation of this study, we plan to explore more related issues in the future.

Author Contributions

Conceptualization, Y.S. and F.Z.; methodology, S.M.; software, S.M.; validation, Y.S.; formal analysis, Y.H.; investigation, F.Z.; resources, S.M.; writing—original draft preparation, S.M.; writing—review and editing, S.M., Y.H., Y.S. and F.Z.; supervision, Y.S.; project administration, S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Fundamental Research Funds for the Central Universities (No. 23CX03012A), National Key Research and Development Program (2021YFA1000102) of China, and Shandong Provincial Natural Science Foundation of China (ZR2021MA025, ZR2021MA028).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The authors confirm that the data supporting the findings of this study are cited within the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A provides specific data results for multiple scenarios in the numerical simulation experiments of Section 4 in this paper.
Table A1. Unregularized estimation on normal data ( q = 5 ).
Table A1. Unregularized estimation on normal data ( q = 5 ).
n = 30, q = 5n = 150, q = 5n = 300, q = 5
ExpSquareLADExpSquareLADExpSquareLAD
ρ = 0.8 , ω = 0.6
β 1 4.1713.0723.0983.2543.1233.0843.0783.1463.074
β 2 2.2592.0122.3121.9742.1092.1712.1382.0862.061
β 3 1.1181.6591.8341.6721.6781.7021.5791.6171.708
ρ ^ 0.7420.7860.7340.7970.7960.7880.7920.7980.795
σ ^ 2 1.3000.8041.8081.1641.0401.1051.1211.0811.094
MedSE1.7030.7641.6950.2240.3710.4580.2360.2450.297
ρ = 0.5 , ω = 0.6
β 1 3.1142.9392.8983.0942.9872.9112.9353.0423.024
β 2 2.0112.1571.9641.8912.0202.0592.0522.0032.022
β 3 1.6461.6221.7391.6071.5781.5801.5421.5651.629
ρ ^ 0.5070.5020.5000.5100.5100.5000.5090.5090.506
σ ^ 2 0.5570.7350.9471.0980.9810.9661.0661.0301.045
MedSE0.3980.7030.9640.1760.2900.3570.1400.1880.244
ρ = 0 , ω = 0.6
β 1 2.9102.9572.9223.0823.0002.9122.9423.0293.027
β 2 2.0042.1561.9311.8912.0502.0632.0562.0122.047
β 3 1.8851.6321.7111.6091.5941.6201.5441.5721.612
ρ ^ 0.0580.0160.0540.0060.0020.0070.0000.0000.007
σ ^ 2 0.5810.7400.9381.1070.9790.9741.0701.0311.047
MedSE0.5170.6480.9620.1850.2930.3720.1390.1850.242
ρ = 0.8 , ω = 0
β 1 4.2942.2372.7212.5282.5671.7511.2121.8201.942
β 2 0.0962.4141.9751.2331.3811.673−0.2591.2571.873
β 3 2.1461.2261.8980.2441.4571.4471.0891.2160.794
ρ ^ 0.9110.9640.9290.9870.9910.9991.0000.9951.000
σ ^ 2 13.4507.5205.20655.57315.91131.34932.99767.42815.859
MedSE3.4844.0418.7172.5983.8965.7334.2315.1016.523
ρ = 0.5 , ω = 0
β 1 3.3142.4752.9782.6232.7472.1681.4332.1191.942
β 2 0.2822.3191.8651.4181.4291.963−0.4071.5581.298
β 3 2.6931.2591.8400.2751.5951.6361.2091.3941.018
ρ ^ 0.8600.9210.8320.9550.9530.9960.9840.9641.000
σ ^ 2 12.5087.1994.8516.23720.2073.89918.06340.88425.945
MedSE3.1633.6747.7662.4494.0865.7854.6245.5036.422
ρ = 0 , ω = 0
β 1 3.9432.6923.2173.1763.3302.4441.8272.8741.896
β 2 0.3622.6731.9211.6231.7182.273−0.5731.8921.911
β 3 2.9941.3921.6460.3141.9971.6231.4011.6181.010
ρ ^ 0.5950.6950.5000.7850.7950.8830.8660.8120.931
σ ^ 2 17.03537.54342.05428.23529.14158.82413.27712.48313.397
MedSE3.5934.4217.0712.8094.9546.5895.4176.4899.667
Table A2. Unregularized estimation on high-dimensional data ( q > 5 ).
Table A2. Unregularized estimation on high-dimensional data ( q > 5 ).
n = 30, q = 20n = 150, q = 100n = 300, q = 200
ExpSquareLADExpSquareLADExpSquareLAD
ρ = 0.8 , ω = 0.6
β 1 1.0993.6084.3623.5583.6664.0794.7633.6734.484
β 2 1.2372.7722.2464.3402.2752.7261.9962.4402.662
β 3 −0.1081.7652.7911.6181.8292.1141.8891.9592.162
ρ ^ 0.5100.5980.5000.5320.6790.5000.5450.6740.500
σ ^ 2 45.6211.2886.1888.7761.49211.3727.2461.93710.486
MedSE11.0847.08612.9139.6514.7649.5219.0064.9199.523
ρ = 0.5 , ω = 0.6
β 1 1.0882.9583.0443.1273.0333.0053.0862.9623.037
β 2 1.5001.9581.8492.0231.9991.9592.1731.9841.976
β 3 1.1121.7531.6951.3121.6181.6581.2251.6151.596
ρ ^ 0.5050.5020.5000.5020.5050.5000.5090.5040.500
σ ^ 2 19.7570.1830.5570.5870.3350.7230.4100.3530.704
MedSE6.7212.6182.7102.0431.9852.5402.1761.9302.357
ρ = 0.2 , ω = 0.6
β 1 2.5302.9403.1043.1872.9462.9562.8512.9252.858
β 2 1.5991.9281.7291.6652.0291.8392.2981.9281.960
β 3 1.9641.8041.8441.2501.5461.4851.2461.5681.517
ρ ^ 0.4920.4010.5000.4300.3210.5000.4080.3210.500
σ ^ 2 0.2290.2391.0660.9860.3831.2950.5070.4041.469
MedSE5.4463.5224.2212.8562.2643.5052.4772.2283.432
ρ = 0 , ω = 0.6
β 1 0.7312.9942.9113.5563.0253.0832.9952.9772.896
β 2 1.5351.9081.9391.6412.0731.8612.3201.9902.067
β 3 2.2161.8342.0091.1021.5751.5961.2571.6071.486
ρ ^ 0.4760.1640.5000.3360.1090.5000.1480.0830.500
σ ^ 2 3.7160.2591.7772.4200.3712.5170.4710.3892.854
MedSE9.7663.3595.9544.4122.1074.7202.3602.0824.605
ρ = 0.8 , ω = 0
β 1 −4.7222.5516.874−2.8313.6517.391−0.0182.5334.660
β 2 8.2932.3334.512−2.8472.3252.7750.0482.8830.733
β 3 3.1171.7602.0224.4180.4256.648−0.1101.9595.006
ρ ^ 0.5080.8210.5000.7170.8980.5000.8330.9030.500
σ ^ 2 27.29320.28224.51217.57519.96422.1579.30613.66616.381
MedSE39.68627.89867.61035.05635.44831.47014.06713.66525.497
ρ = 0.5 , ω = 0
β 1 −0.6342.6943.102−2.2233.4724.273−0.0182.6082.457
β 2 3.1281.9003.563−3.4181.8730.4520.0482.1291.651
β 3 1.7832.0532.2877.5010.8884.250−0.1101.4983.339
ρ ^ 0.5060.7670.5000.6740.8580.5000.6040.8680.500
σ ^ 2 32.22629.17327.80712.57214.28015.7959.51914.60613.914
MedSE15.09119.44028.25723.32824.80357.9614.06719.19210.382
ρ = 0.2 , ω = 0
β 1 0.3332.4033.076−1.7543.6743.566−0.0182.7333.423
β 2 2.1181.9043.280−3.2632.1010.2800.0482.3351.651
β 3 1.4542.5562.7868.0251.1903.249−0.1101.3603.278
ρ ^ 0.4990.6550.5000.6030.8010.5000.3900.8050.500
σ ^ 2 20.53717.19233.63321.77523.25726.93825.87615.69484.560
MedSE10.57018.94318.37124.95326.61514.68214.06714.20118.278
ρ = 0 , ω = 0
β 1 0.2712.4983.162−1.8883.9223.214−0.0182.9324.047
β 2 1.9341.8771.642−2.5052.1930.9380.0482.4271.655
β 3 1.0542.5611.7873.8951.2243.179−0.1101.4031.953
ρ ^ 0.4970.5460.5000.5450.6900.5000.3820.7140.500
σ ^ 2 25.28821.13830.85525.72462.47421.79352.33116.7047.310
MedSE10.85419.91117.32914.77918.19622.82114.06711.64118.816
Table A3. Unregularized estimation when the observed values of y have outliers.
Table A3. Unregularized estimation when the observed values of y have outliers.
n = 30, q = 5n = 150, q = 5n = 300, q = 5
ExpSquareLADExpSquareLADExpSquareLAD
ρ = 0.8 , ω = 0.6
β 1 4.0533.1513.0303.3213.1113.1313.0483.1453.161
β 2 2.1842.0322.2031.9292.1052.1042.1112.0832.102
β 3 1.1461.7261.7901.6611.7011.6421.5921.7001.710
ρ ^ 0.7300.7680.7360.7810.7840.7790.7790.7860.775
σ ^ 2 1.2280.9531.5091.3621.1261.2651.2141.1741.225
MedSE1.6050.9451.2720.2800.3940.5370.2280.2700.329
ρ = 0.5 , ω = 0.6
β 1 3.2612.9973.0533.1262.9423.0292.9262.9602.978
β 2 2.0011.9572.0741.8772.0232.0102.0421.9971.997
β 3 1.5311.5691.6411.6011.6271.5041.5571.6321.596
ρ ^ 0.5130.5200.5000.5250.5120.5100.5290.5240.531
σ ^ 2 0.7510.8370.9751.3281.0721.1701.1841.1191.185
MedSE4.4564.8614.9203.1733.2934.4451.1632.2043.278
ρ = 0.8 , ω = 0
β 1 3.1032.3763.0162.9912.3482.5671.8892.5771.983
β 2 2.7951.8301.1581.8481.2092.0531.3861.4062.538
β 3 0.7011.0620.9210.0671.3170.8391.2371.5941.130
ρ ^ 0.8210.9830.9220.9880.9910.9930.9990.9931.000
σ ^ 2 22.18124.46726.21512.46320.05324.0744.6113.0304.237
MedSE5.6523.9888.5973.3453.9176.4311.4273.9256.045
Table A4. Unregularized estimation with noisy W.
Table A4. Unregularized estimation with noisy W.
n = 30, q = 5n = 150, q = 5n = 300, q = 5
ExpSquareLADExpSquareLADExpSquareLAD
Remove 30%
ρ = 0.5 , ω = 0.6
β 1 2.3542.8813.0183.0183.1223.1273.0373.0903.127
β 2 2.9872.0731.7461.9122.0771.9712.2022.0092.021
β 3 1.2391.6201.5471.7911.6981.5521.5291.6931.656
ρ ^ 0.4610.4360.4990.4170.3870.4100.3850.3990.396
σ ^ 2 1.4571.2661.8111.3401.3131.3621.3671.3771.332
MedSE1.4730.9841.3230.4170.4470.5750.2720.3530.386
Remove 30%
ρ = 0.5 , ω = 0
β 1 2.9502.2232.7842.0582.4852.8451.3642.7632.652
β 2 1.0322.6762.8462.1621.6941.4710.5131.6401.773
β 3 1.3811.1380.034−0.0951.2042.1041.3501.1731.009
ρ ^ 0.8800.8600.7440.8930.9270.9070.9380.9330.933
σ ^ 2 16.18421.41528.47319.80033.27636.67715.61823.68429.837
MedSE3.1554.6939.4782.9255.8968.4073.9347.6249.886
Add 50%
ρ = 0.5 , ω = 0.6
β 1 4.9363.2863.3963.6493.2663.3653.3343.3793.279
β 2 1.8162.2872.2202.1202.2202.0922.1032.1672.210
β 3 0.6131.9071.6531.9901.7561.7791.7291.7721.896
ρ ^ 0.3810.4250.5000.3640.4150.4430.4080.4190.430
σ ^ 2 2.5321.9722.8011.9631.5401.6401.4361.5111.589
MedSE2.7991.6622.1700.8240.8520.9270.5350.6370.673
Table A5. Variable selection with regularizer on normal data ( q = 5 ).
Table A5. Variable selection with regularizer on normal data ( q = 5 ).
n = 30, q = 5n = 300, q = 5
Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1 Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1
ρ = 0.8 , ω = 0.6
Correct1.802.172.202.101.571.775.005.004.904.874.874.57
Incorrect0.000.000.000.000.030.000.000.000.000.000.000.00
MedSE1.501.481.010.851.411.430.230.220.270.250.300.33
ρ = 0.5 , ω = 0.6
Correct4.174.872.902.402.572.705.005.005.004.904.804.83
Incorrect0.000.000.000.000.000.000.000.000.000.000.000.00
MedSE0.380.250.600.710.820.740.130.110.240.220.240.28
ρ = 0 , ω = 0.6
Correct4.035.003.102.371.832.175.005.005.004.904.804.83
Incorrect0.000.000.000.000.000.000.000.000.000.000.000.00
MedSE0.440.160.600.771.000.980.130.110.230.200.280.26
ρ = 0.8 , ω = 0
Correct1.573.030.600.400.400.673.073.000.470.330.470.47
Incorrect0.671.000.100.130.070.271.731.670.070.130.270.27
MedSE3.393.723.653.926.135.263.893.905.365.925.546.81
ρ = 0.5 , ω = 0
Correct0.903.000.500.200.370.772.102.000.430.200.570.67
Incorrect0.000.000.100.100.100.171.071.000.030.000.170.27
MedSE2.442.713.874.134.314.514.194.345.776.325.446.21
ρ = 0 , ω = 0
Correct1.002.970.500.330.630.871.972.000.400.330.300.40
Incorrect0.000.000.130.170.100.170.971.000.100.030.130.17
MedSE2.633.214.364.504.364.674.654.946.777.527.047.67
Table A6. Variable selection with regularizer on high-dimensional data ( q > 5 ).
Table A6. Variable selection with regularizer on high-dimensional data ( q > 5 ).
n = 30, q = 20n = 300, q = 200
Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1 Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1
ρ = 0.8 , ω = 0.6
Correct3.44.91.22.74.55.894.5135.089.189.296.6103.0
Incorrect0.20.50.00.10.00.10.00.00.00.00.00.0
MedSE7.47.29.07.03.43.84.32.84.95.14.24.3
ρ = 0.5 , ω = 0.6
Correct5.28.34.85.310.311.6178.0198.0170.0166.0188.0191.0
Incorrect0.20.20.00.00.00.00.00.00.00.00.00.0
MedSE4.82.23.12.51.51.41.70.92.02.11.51.4
ρ = 0 , ω = 0.6
Correct4.86.14.24.36.98.2180.2200.0165.3158.4153.8156.6
Incorrect0.10.10.10.00.00.00.00.00.00.00.00.0
MedSE5.64.33.63.52.12.11.80.92.12.22.42.3
ρ = 0.8 , ω = 0
Correct5.58.00.70.61.12.420.020.07.76.64.110.1
Incorrect1.11.60.00.00.10.23.03.00.10.00.10.0
MedSE18.920.423.725.918.413.74.14.194.680.2145.0155.2
ρ = 0.5 , ω = 0
Correct3.64.70.80.92.13.419.620.09.810.59.613.7
Incorrect0.20.80.10.00.10.22.93.00.10.10.00.1
MedSE11.312.314.818.98.37.84.14.16.35.36.25.6
ρ = 0 , ω = 0
Correct3.84.80.91.52.63.819.620.09.28.712.614.8
Incorrect0.10.50.30.10.10.22.93.00.10.00.10.1
MedSE8.68.914.818.96.46.84.14.16.45.94.33.6

Appendix B

Appendix B provides the derivation of the computational complexity of the BCD algorithm designed in Section 3, along with the specifications of the machine used to execute the program designed in this paper. This includes relevant hardware configurations, the operating system, and the programming software employed.
According to the study by Beck and Teboulle (2009) [17], the FISTA algorithm converges to the optimal solution at a rate of O ( 1 / k 2 ) when solving problem (22), where k is the iteration step. Since the termination conditions of Algorithm 1 are [ β k , ρ k , ω k ] T [ β k + 1 , ρ k + 1 , ω k + 1 ] T ϵ 1 or L ( β k , ρ k , ω k ) L ( β k + 1 , ρ k + 1 , ω k + 1 ) ϵ 2 , to obtain an ϵ -optimal solution, the FISTA algorithm requires O ( 1 / ϵ ) iterations, where each iteration is used to calculate the gradient ϕ ( β ) of (23). As the O ( n p ) computation is needed to calculate the ϕ ( β ) , the computation paid for an ϵ -optimal solution of the subproblem (22) is O ( n p / ϵ ) . We assume that the BCD algorithm converges within a specified number of iterations, and in each iteration, the CCCP algorithm terminates at a maximum iteration number of i t e C C C P . The total computational complexity of the algorithm is then O ( i t e C C C P · n p / ϵ ) .
The computer used in this experiment is equipped with an Intel Core i7 processor, 16 GB of RAM, and a 1 TB hard drive, running on a Windows 10 workstation. We implemented the designed algorithm using MATLAB R2020a and subsequently visualized the results.

References

  1. Anselin, L. Spatial Econometrics: Methods and Models; Springer Science and Business Media: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
  2. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
  3. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  4. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
  5. Huber, P.J. Robust Statistics; John Wiley Sons: Hoboken, NJ, USA, 2004. [Google Scholar]
  6. Wang, X.; Jiang, Y.; Huang, M.; Zhang, H. Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 2013, 108, 632–643. [Google Scholar] [CrossRef] [PubMed]
  7. Koenker, R.; Bassett, G., Jr. Regression quantiles. Econom. J. Econom. Soc. 1978, 46, 33–50. [Google Scholar] [CrossRef]
  8. Zou, H.; Yuan, M. Composite quantile regression and the oracle model selection theory. Ann. Statist. 2008, 36, 1108–1126. [Google Scholar] [CrossRef]
  9. Liu, X.; Ma, H.; Deng, S. Variable Selection for Spatial Error Models. J. Yanbian Univ. Nat. Sci. Ed. 2020, 46, 15–19. [Google Scholar]
  10. Doğan, O. Modified harmonic mean method for spatial autoregressive models. Econ. Lett. 2023, 223, 110978. [Google Scholar] [CrossRef]
  11. Song, Y.; Liang, X.; Zhu, Y.; Lin, L. Robust variable selection with exponential squared loss for the spatial autoregressive model. Comput. Stat. Data Anal. 2021, 155, 107094. [Google Scholar] [CrossRef]
  12. Banerjee, S.; Carlin, B.P.; Gelfand, A.E. Hierarchical Modeling and Analysis for Spatial Data; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar]
  13. Ma, Y.; Pan, R.; Zou, T.; Wang, H. A naive least squares method for spatial autoregression with covariates. Stat. Sin. 2020, 30, 653–672. [Google Scholar] [CrossRef]
  14. Wang, H.; Li, G.; Jiang, G. Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J. Bus. Econ. Stat. 2007, 25, 347–355. [Google Scholar] [CrossRef]
  15. Forsythe, G.E. Computer Methods for Mathematical Computations; Prentice-Hall: Hoboken, NJ, USA, 1977. [Google Scholar]
  16. Yuille, A.L.; Rangarajan, A. The concave-convex procedure. Neural Comput. 2003, 15, 915–936. [Google Scholar] [CrossRef] [PubMed]
  17. Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
  18. Liang, H.; Li, R. Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 2009, 104, 234–248. [Google Scholar] [CrossRef] [PubMed]
  19. Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference. A Practical Information-Theoretic Approach; Springer: Berlin/Heidelberg, Germany, 2004; Volume 2. [Google Scholar]
  20. Harrison, D., Jr.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
Figure 1. Unregularized Estimation on Gaussian Noise Data ( q = 5 ). (a) Estimates of the four unknown parameters in the SEM model based on the exponential squared loss function. (b) Estimates for the four unknown parameters based on the squared loss function. (c) Estimates for the unknown parameters based on the absolute value loss function.
Figure 1. Unregularized Estimation on Gaussian Noise Data ( q = 5 ). (a) Estimates of the four unknown parameters in the SEM model based on the exponential squared loss function. (b) Estimates for the four unknown parameters based on the squared loss function. (c) Estimates for the unknown parameters based on the absolute value loss function.
Axioms 13 00004 g001
Figure 2. The MedSE of the three unregularized loss function estimates.
Figure 2. The MedSE of the three unregularized loss function estimates.
Axioms 13 00004 g002
Figure 3. Unregularized estimation on Gaussian noise data ( q > 5 ). (a) Estimates of the four unknown parameters in the SEM model based on the exponential squared loss function. (b) Estimates for the four unknown parameters based on the squared loss function. (c) Estimates for the unknown parameters based on the absolute value loss function.
Figure 3. Unregularized estimation on Gaussian noise data ( q > 5 ). (a) Estimates of the four unknown parameters in the SEM model based on the exponential squared loss function. (b) Estimates for the four unknown parameters based on the squared loss function. (c) Estimates for the unknown parameters based on the absolute value loss function.
Axioms 13 00004 g003
Figure 4. Unregularized estimation when the observed values of y have outliers. (a) The estimation results of the non-regularized exponential square loss function. (b) The MedSE of the estimates for the three loss functions.
Figure 4. Unregularized estimation when the observed values of y have outliers. (a) The estimation results of the non-regularized exponential square loss function. (b) The MedSE of the estimates for the three loss functions.
Axioms 13 00004 g004
Figure 5. The variation of MedSE with changes in sample size.
Figure 5. The variation of MedSE with changes in sample size.
Axioms 13 00004 g005
Figure 6. The results of regularization variable selection based on normal data ( q = 5 ). (a) Variable selection results for each method when ω = 0.6 . (b) Variable selection results for each method when ω = 0 .
Figure 6. The results of regularization variable selection based on normal data ( q = 5 ). (a) Variable selection results for each method when ω = 0.6 . (b) Variable selection results for each method when ω = 0 .
Axioms 13 00004 g006
Figure 7. The results of regularization variable selection based on normal data ( q > 5 ). (a) Variable selection results for each method when q = 20 . (b) Variable selection results for each method when q = 200 .
Figure 7. The results of regularization variable selection based on normal data ( q > 5 ). (a) Variable selection results for each method when q = 20 . (b) Variable selection results for each method when q = 200 .
Axioms 13 00004 g007
Table 1. Variable selection with regularizer when the observed values of y have outliers.
Table 1. Variable selection with regularizer when the observed values of y have outliers.
n = 30, q = 5n = 300, q = 5
δ = 0.01 Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1 Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1
ρ = 0.8 , ω = 0
Correct1.802.031.702.371.432.135.005.004.904.934.534.67
Incorrect0.000.000.000.000.030.000.000.000.000.000.000.00
MedSE1.391.470.910.931.591.080.220.210.270.310.350.34
ρ = 0.5 , ω = 0
Correct0.330.230.500.530.430.573.274.070.670.470.370.83
Incorrect0.000.200.130.070.070.230.100.100.170.130.030.13
MedSE4.855.504.564.517.065.111.401.326.356.809.095.16
ρ = 0.8 , ω = 0.6
Correct3.674.032.402.532.772.934.975.004.974.934.634.60
Incorrect0.000.000.000.000.000.000.000.000.000.000.000.00
MedSE0.440.330.760.730.830.820.160.140.200.230.300.28
Table 2. Variable selection with regularizer with noisy W.
Table 2. Variable selection with regularizer with noisy W.
n = 30, q = 5n = 300, q = 5
Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1 Exp + l1Exp + l ˜ 1 S + l1S + l ˜ 1 Lad + l1Lad + l ˜ 1
Remove 30%
ω = 0.6
Correct1.502.771.631.631.771.874.935.004.604.203.874.20
Incorrect0.000.000.000.000.000.000.000.000.000.000.000.00
MedSE1.391.271.201.071.171.190.270.240.290.370.430.40
ω = 0
Correct0.402.200.770.530.470.671.133.000.430.530.300.40
Incorrect0.000.000.030.170.070.170.100.770.070.070.270.13
MedSE2.482.626.325.065.234.813.403.536.894.118.486.47
Add 50%
ω = 0.6
Correct0.400.731.371.201.031.874.635.003.473.173.203.53
Incorrect0.000.030.000.030.000.030.000.000.000.000.000.00
MedSE2.722.591.761.611.581.390.520.490.680.640.710.66
Table 3. Variable description.
Table 3. Variable description.
VariableDescription
DISWeighted distance to five job centers in Boston
RADHighway convenience index
LSTATProportion of lower income groups
RMAverage number of rooms
NOXNitric oxide concentration
CRIMPer capita crime rate
ZNThe proportion of residential land
INDUSThe proportion of non-commercial land
CHAS1 if it is a river Otherwise 0
AGEProportion of owner-occupied units built before 1940
GARnumber of car spaces in garage (0 = no garage)
AGEage of dwelling in years
TAXFull property tax rate per 10,000 dollars
PTRATIORatio of teachers to students in town
B-1000Percentage of blacks in the town
MEDVMedian owner-occupied home price
Table 4. Variable selection for the Boston dataset.
Table 4. Variable selection for the Boston dataset.
EXPSquareLAD
E + l1E + l 1 ˜ E + nullS + l1S + l 1 ˜ S + nullL + l1L + l 1 ˜ L + null
CRIM−0.0066−7.8600 × 10 4 −0.0066−0.0059−0.0059−0.0070−0.0035−0.0034−0.0063
ZN3.1200 × 10 4 03.3400 × 10 4 −4.3300 × 10 4 −4.0800 × 10 4 3.8900 × 10 4 −8.0800 × 10 4 −8.7400 × 10 4 1.5200 × 10 4
INDUS0.001500.00150.00170.00170.00130.00100.00100.0018
CHAS00−0.00170.00250.00250.00481.0700 × 10 4 1.1700 × 10 4 −0.0182
NOX−0.19080−0.1745−2.5800 × 10 4 −2.9800 × 10 4 −0.25784.5000 × 10 5 5.2000 × 10 5 −0.1514
RM0.00790.00760.00790.01460.01450.00670.02000.01990.0123
AGE−4.5400 × 10 4 0−4.3100 × 10 4 −0.0012−0.0012−2.8700 × 10 4 −0.0014−0.0014−0.0012
DIS−0.1411−0.0105−0.1396−0.0107−0.0107−0.1571−5.8600 × 10 4 −6.3800 × 10 4 −0.1419
RAD0.063500.06390.01030.01030.07013.6200 × 10 4 3.9800 × 10 4 0.0484
TAX−3.5600 × 10 4 −1.1700 × 10 4 −3.6200 × 10 4 −3.8000 × 10 5 −4.1000 × 10 5 −3.6400 × 10 4 −1.0600 × 10 4 −1.3500 × 10 4 −2.8700 × 10 4
PTRATIO−0.01080−0.0106−0.0171−0.0166−0.0115−0.0032−0.0032−0.0081
B-10003.2800 × 10 4 2.1900 × 10 4 3.2900 × 10 4 3.9800 × 10 4 4.0200 × 10 4 2.8200 × 10 4 6.0200 × 10 4 5.6500 × 10 4 4.3600 × 10 4
LSTAT−0.2090−0.1818−0.2110−0.0249−0.0249−0.2279−0.0014−0.0014−0.1503
ρ ^ 0.51730.61720.51740.61770.61900.51900.51640.51390.5000
σ 2 ^ 0.01920.02230.01920.02470.02470.01920.02940.02940.0206
BIC−567.8321−491.5053−567.6874−446.1253−446.9024−564.2289−355.5156−356.8915−540.9918
Table 5. Variable selection for the Boston dataset after marking.
Table 5. Variable selection for the Boston dataset after marking.
EXPSquareLAD
E + l1E + l 1 ˜ E + nullS + l1S + l 1 ˜ S + nullL + l1L + l 1 ˜ L + null
CRIM
ZN
INDUS+ +++++++
CHAS +++
NOX
RM+++++++++
AGE
DIS
RAD+ ++++ +
TAX
PTRATIO
B-1000 +
LSTAT
count +313444323
count−536555567
count8499998810
BIC−567.83205−491.50529−567.68740−446.12530−446.90237−564.22889−355.5156−356.89146−540.99178
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, S.; Hou, Y.; Song, Y.; Zhou, F. Robust Variable Selection with Exponential Squared Loss for the Spatial Error Model. Axioms 2024, 13, 4. https://doi.org/10.3390/axioms13010004

AMA Style

Ma S, Hou Y, Song Y, Zhou F. Robust Variable Selection with Exponential Squared Loss for the Spatial Error Model. Axioms. 2024; 13(1):4. https://doi.org/10.3390/axioms13010004

Chicago/Turabian Style

Ma, Shida, Yiming Hou, Yunquan Song, and Feng Zhou. 2024. "Robust Variable Selection with Exponential Squared Loss for the Spatial Error Model" Axioms 13, no. 1: 4. https://doi.org/10.3390/axioms13010004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop