Next Article in Journal
Convergence Results for History-Dependent Variational Inequalities
Previous Article in Journal
Some Results on Zinbiel Algebras and Rota–Baxter Operators
Previous Article in Special Issue
Partial Derivatives Estimation of Multivariate Variance Function in Heteroscedastic Model via Wavelet Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parameter Estimation in Spatial Autoregressive Models with Missing Data and Measurement Errors

College of Science, China University of Petroleum, Qingdao 266580, China
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(5), 315; https://doi.org/10.3390/axioms13050315
Submission received: 23 January 2024 / Revised: 15 April 2024 / Accepted: 24 April 2024 / Published: 10 May 2024
(This article belongs to the Special Issue Mathematical and Statistical Methods and Their Applications)

Abstract

:
This study addresses the problem of parameter estimation in spatial autoregressive models with missing data and measurement errors in covariates. Specifically, a corrected likelihood estimation approach is employed to rectify the bias in the log-maximum likelihood function induced by measurement errors. Additionally, a combination of inverse probability weighting (IPW) and mean imputation is utilized to mitigate the bias caused by missing data. Under several mild conditions, it is demonstrated that the proposed estimators are consistent and possess oracle properties. The efficacy of the proposed parameter estimation process is assessed through Monte Carlo simulation studies. Finally, the applicability of the proposed method is further substantiated using the Boston Housing Dataset.

1. Introduction

Both classic linear regression models and spatial autoregressive models are used to study the linear relationship between a response variable and multiple explanatory variables. The former usually assumes that the observed values of the explained variable are independent of each other. However, in fields such as economics, biology, and meteorology, the collected data often exhibit certain spatial dependencies. Ignoring these dependencies in statistical inference can lead to significantly biased results (see Luo [1]). The latter model, in contrast, considers spatial dependencies, positing that a region’s response variable is not only related to its explanatory variables but also associated with those of neighboring regions (see Chen [2]).
Both classic linear regression models and spatial autoregressive models typically assume that (i) the values of explanatory variables are always observable or measurable (assuming no incomplete observation), and (ii) the observations or measurements are error-free (assuming no measurement errors in explanatory variables) (see Bai et al. [3]). However, these assumptions may be violated in many scientific studies and practical applications. It is well known in statistical analysis that ignoring measurement errors and missing observations in explanatory variables can lead to serious biases in estimation and large standard errors, resulting in incorrect inference on the estimated regression coefficients. Therefore, studying parameter estimation methods for spatial autoregressive models with measurement errors and missing data in explanatory variables is of great importance.
This paper mainly studies the parameter estimation issues in spatial autoregressive models with measurement errors and missing data in explanatory variables.
Many economic datasets are related to spatial locations, such as studies on Gross Domestic Product, tourism, and research and development across various provinces nationwide (see Li [4]). Spatial data introduce spatial location information (or mutual distances) to cross-sectional or panel data. Spatial data are generally recognized to have locational attributes (see Anselin [5]), assuming that variables with closer distances are more closely related. Tobler’s First Law of Geography states that everything is related to everything else, but nearby things are more related than distant things (see Tobler [6]).
Spatial econometrics was first proposed by Jean Paelinck in May 1974 at the Netherlands Statistical Conference, aiming to provide methodological foundations for econometric models of urban and regional economics. Spatial econometrics primarily deals with addressing spatial interactions (spatial autocorrelation) and spatial structures (spatial heterogeneity) in cross-sectional and panel data regression models (see Anselin [7]). The issues studied in spatial econometrics include (1) model setting; (2) parameter estimation; (3) model setting testing; and (4) spatial prediction (see Anselin [8]). Spatial autoregressive models, which incorporate spatial effects into classic regression models, are important for studying spatial autocorrelation in data.
Research on spatial autoregressive models began early. In the 1970s, Cliff and Ord [9] introduced spatial effects into traditional linear regression models, constructing spatial autoregressive models. However, due to the endogeneity caused by spatial dependence, ordinary least-squares parameter estimates are biased and inconsistent. To address this, researchers have proposed other estimation methods to reduce or eliminate bias caused by spatial effects, obtaining consistent parameter estimates. Anselin [5] first applied the concentrated likelihood function method to provide the maximum likelihood estimation (MLE) of the model. However, MLE requires solving complex likelihood functions and is computationally expensive. Kelejian and Prucha [10] proposed the Generalized Method of Moments for spatial autoregressive models, which is relatively simpler than MLE regardless of sample size, and also provided the asymptotic properties of this estimator for large and small samples. Lesage [11] used Bayesian methods based on Gibbs sampling to address the parameter estimation issue in spatial autoregressive models with heteroskedasticity. Lee [12] employed the GMM and 2SLS for spatial autoregressive models, deriving the optimal GMM and proving its consistency and asymptotic normality. In a normal distribution, the optimal GMM and ML estimates have the same limiting distribution. The fundamental idea behind the Generalized Method of Moments (GMM) is to estimate model parameters using the moment conditions within the model. Meanwhile, Two-Stage Least Squares (2SLS) is a commonly employed method to address issues of endogeneity, particularly when instrumental variables are encountered in regression analysis. Lee and Liu [13] extended the GMM in mixed spatial autoregressive models to higher-order mixed spatial autoregressive models. Their research showed that the GMM has computational advantages over the usual ML, and the proposed GMM estimates were proven to be consistent and asymptotically normal. Wei et al. [14] proposed a semi-parametric partially linear varying coefficient spatial autoregressive model and introduced a quasi-maximum likelihood method based on local linear methods to estimate model parameters.
The development of spatial econometrics in China started late, focusing primarily on empirical studies. For example, Ma and Zhang [15] used provincial panel data in China to analyze the impact of economic development and energy structure on haze pollution, finding a positive spatial correlation in inter-provincial haze pollution, with provinces having a higher proportion of coal consumption in their energy structure experiencing more severe haze pollution. Wang et al. [16] used Chinese city panel data to analyze the impact of high-speed rail on economic growth, finding that the opening of high-speed rail strengthens the positive spatial dependence among Chinese cities’ GDPs and has a positive spatial spillover effect on economic growth. Cheng and Dong [17] used panel data from countries along the “Belt and Road” to construct a spatial econometric model to analyze the spatial effects of trade facilitation on China’s industrial goods exports, finding a positive spatial spillover effect. In 2010, China saw its first textbook on spatial econometrics, authored by Shen [18], but this book focused more on modeling and simulation, with less emphasis on theoretical derivation and proof. Overall, domestic research on spatial autoregressive models has achieved certain results but generally exhibits a situation with more empirical studies and fewer theoretical methods.
Spatial data with measurement errors and missing data are commonly found in scientific research and practical applications. Although various techniques can reduce errors and missing data, the measurement errors and missing data sometimes reach a level that cannot be ignored. Therefore, studying spatial autoregressive models with measurement errors and missing data in explanatory variables becomes very important.
Due to spatial dependencies, the assumption of independence among explanatory variables in spatial data no longer holds. Under such circumstances, if one ignores the measurement errors and spatial dependencies present in the data and still employs traditional estimation methods (such as the least-squares method), this can lead to significant biases in the estimation results (see Li [4]). In their study of spatial data linear mixed models with measurement errors, Yi et al. [19] found that ignoring measurement errors leads to reduced regression coefficients and increased variance. To address this issue, they proposed a structural modeling method that obtains model parameters through maximum likelihood estimation while considering measurement errors and uses the EM algorithm for iterative optimization of parameters. They proved that the proposed method’s parameter estimates have good asymptotic properties, i.e., the maximum likelihood estimates are consistent and satisfy asymptotic normality. Huque et al. [20] explored the sensitivity of parameter estimation in spatial regression models when explanatory variables have measurement errors. When errors exist, parameter estimates of the model exhibit attenuation bias. They proved the bias expression of the estimator when ignoring measurement errors, showing that the bias is related to the degree of spatial correlation between explanatory variables and residuals. They also proposed two strategies for obtaining consistent parameter estimates: (1) using an estimated attenuation factor for subsequent correction and (2) linearly transforming the error-prone explanatory variables. Through simulation studies, they assessed the finite sample performance of these two methods. The results showed that both methods can provide consistent parameter estimates, but the transformation method performs better. They also illustrated this method using ischemic heart disease data. Zhang and Zhu [21] proved that for spatial autoregressive models, when the explained variables have measurement errors, whether or not these errors are related to the model’s disturbance terms, the commonly used maximum likelihood estimation is inconsistent. When the null hypothesis is rejected, using 2SLS can yield consistent parameter estimates. He and Hu [22] introduced measurement errors of independent variables into the classic spatial autoregressive model, establishing a univariate spatial autoregressive measurement error (USARME) model, and proposed a parameter estimation method for this model. Their research showed that if one does not consider the measurement errors of independent variables and directly uses the ordinary spatial autoregressive model, the estimated parameters exhibit significant bias. As measurement errors increase, the parameter estimation performance of the ordinary spatial autoregressive model becomes very poor, while the USARME model still achieves good estimation results. The feasibility and reliability of the proposed parameter estimation method were verified through numerical simulations. Luo [1] proposed a three-stage least-squares (3SLS) estimation method that simultaneously uses Berkson and classical types of instrumental variables. Under mild conditions, they derived the asymptotic normality of the estimators proposed for each type of instrumental variable.
In practical applications, for various reasons, some observations in datasets may be missing. For example, in pharmacological studies, due to the side effects of some drugs, some patients are unable to continue treatment and drop out mid-course, leading to missing data. Simply ignoring these missing values not only reduces the efficiency of the study but may also introduce systematic bias (see You [23]).
Types of missing data, classified according to the missing mechanism, can roughly be divided into three categories: Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). Among these, randomly missing data are characterized by the distribution of missing data not depending on unobserved data but only on observed data (see Cheng [24]). This article mainly studies this type of missing data, that is, whether the missing data depend only on observable (exogenous) explanatory variables.
Typical methods for dealing with missing data include imputation and inverse probability weighting. Yang et al. [25] proposed a missing data imputation method based on spatial clustering and spatial autoregressive models. This method first uses the DBSCAN algorithm to cluster the dataset, then establishes a spatial autoregressive model within each cluster to impute missing data. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based spatial clustering algorithm designed to organize points in a dataset into clusters separated by areas of varying density, with the capability to identify outliers or noise points. Experimental results with meteorological data showed that compared to cluster kernel function-based imputation and K-nearest neighbors (KNN) imputation, this method achieves more accurate imputation results. Wang and Lee [26] considered the situation of randomly missing explanatory variables in spatial autoregressive models. They proposed nonlinear least-squares estimation and the Generalized Method of Moments for the model. Additionally, they proposed an inferential two-stage least-squares estimation method. These estimation methods were analyzed and compared, revealing that the generalized nonlinear least-squares, best-generalized two-stage least-squares, and optimal moment estimation methods have the same asymptotic variance. Monte Carlo simulations showed that these methods provide consistent estimates even in the presence of unknown heteroskedastic disturbances and are more robust compared to the EM algorithm. Luo [1] studied the situation of missing response variable data in spatial autoregressive models. They proposed a two-stage least-squares estimation method based on IPW and missing data imputation (2II-2SLS). They proved the consistency and asymptotic normality of this estimator and studied its finite sample performance through simulations. The results showed that the performance of this estimator is superior to that of the EM estimator and the maximum likelihood estimator and that the choice of initial values has little impact on its performance.
The above-mentioned papers have greatly enhanced our understanding of parameter estimation in spatial autoregressive models with missing data and measurement errors. However, current research mainly focuses on dealing with single types of data issues in spatial autoregressive models, such as considering only missing data or only measurement errors. In contrast, there is a relative lack of research on the more complex situation where both measurement errors and missing data occur simultaneously in spatial autoregressive models. This paper investigates the parameter estimation problem in spatial autoregressive models with both measurement errors and missing data, proposing a method for parameter estimation. The main contributions of this article are as follows:
(1)
We establish a parameter estimation method for spatial autoregressive models with missing data and measurement errors, which uses a combination of corrected likelihood estimation and IPW with mean imputation to eliminate biases caused by missing data and measurement errors.
(2)
We apply the proposed method to revise and optimize traditional spatial autoregressive models. Based on this, the log-likelihood function of the modified model is presented, and explicit mathematical expressions and analyses are provided for some key parameters, offering deeper insights into the theoretical foundation and practical application of the model.
(3)
Under some mild conditions, we prove that the proposed estimates have consistency and oracle properties. Additionally, we conduct extensive numerical studies, proving that our method is superior to others in terms of parameter estimation.
The rest of this paper is organized as follows. In Section 2, the parameter estimation of spatial autoregression with missing data and measurement errors is considered, presenting the theoretical properties of the oracle estimator and proving its consistency and asymptotic normality. Section 3 conducts numerical comparisons and simulation studies. Section 4 illustrates the application of the proposed method through real data analysis. The proof of the technical results is provided in Appendix A.

2. Parameter Estimation in Spatial Autoregressive Models with Measurement Errors and Missing Data

2.1. Spatial Autoregressive Models

We consider the following spatial autoregressive model:
Y n = X n β + λ W n Y n + ε n
where Y n is an n × 1 vector of the observed values of the dependent variable; X n is an n × k matrix of the observed values of k exogenous covariates; λ is a scalar spatial autoregressive coefficient with λ < 1 ; W n is a known n × n spatial weight matrix; ε n is an n-dimensional vector of regression disturbances, independently and identically distributed with mean 0 and finite variance σ 2 ; and β is a k-dimensional vector of the regression coefficients. Let θ 0 = σ 0 2 , λ 0 , β 0 T T = θ 1 , 0 , θ 2 , 0 , , θ k + 2 , 0 T be the true parameter point. Let S n ( λ ) = I n λ W n and ε n ( δ ) = Y n X n β λ W n Y n , where δ = λ , β T T . Then, following the approach of Lee [27], the log-likelihood function of the model is given by:
ln L n ( θ ) = n 2 ln ( 2 π ) n 2 ln σ 2 + ln S n ( λ ) 1 2 σ 2 ε n T ( δ ) ε n ( δ ) = n 2 ln ( 2 π ) n 2 ln σ 2 + ln S n ( λ ) 1 2 σ 2 S n ( λ ) Y n X n β T S n ( λ ) Y n X n β
where θ = σ 2 , λ , β T T = θ 1 , θ 2 , , θ k + 2 T and ln L n ( θ ) is the log-likelihood function of model (1). Let ε n = e 1 , e 2 , , e n T , S n = S n ( λ 0 ) , and G n = W n S n 1 . Additionally, to ensure the large-sample properties of QMLE, some basic assumptions are listed as follows:
Assumption 1. 
In ε n , e i , i = 1 , , n are independently and identically distributed, with mean E ( e i ) = 0 and variance Var ( e i ) = σ 2 . For γ > 0 , the moment E e i 4 + γ exists.
Assumption 2. 
For all i , j , the elements w n , i j of W n are at most of the order of h n 1 , denoted as O ( 1 / h n ) , where the rate sequence h n can be bounded or divergent. As a normalization, w n , i i = 0 for all i.
Assumption 3. 
As n , n 1 h n 0 .
Assumption 4. 
The matrix S n is non-singular.
Assumption 5. 
The matrix sequences W n and S n 1 are uniformly bounded in terms of row and column sums.
Assumption 6. 
For all n, elements X n are uniformly bounded constants. When lim n , n 1 X n T X n exists and is non-singular.
Assumption 7. 
For all λ in the compact parameter space Λ, which is a compact set of the parameter space Λ, S n 1 ( λ ) is uniformly bounded in terms of row and column sums. The true λ 0 is inside Λ.
Assumption 8. 
lim n n 1 X n , G n X n β 0 T X n , G n X n β 0 exists and is a non-singular matrix.
Assumption 9. 
lim n E n 1 2 ln L n ( θ 0 ) θ θ T exists.
Assumption 10. 
For all θ in the open set H containing the true parameter point θ 0 , the third-order derivative 3 ln L n ( θ ) θ j θ l θ m exists. Additionally, there exists a function M j l m such that for all θ H , n 1 3 ln L n ( θ ) θ j θ l θ m M j l m , where E ( M j l m ) < for all j , l , m .
Assumptions 1–9 are similar to those provided by Lee [27], which are sufficient conditions for the correctness of the global identification and the consistency and asymptotic normality of QMLE for model (1). Assumption 1 is needed to apply the central limit theorem by Kelejian and Prucha [28]. Assumptions 2 and 3 characterize the weight matrix for sample size n. If h n is a bounded sequence, then Assumptions 2 and 3 are satisfied. In Case’s model, Assumptions 2 and 3 still hold, although h n may diverge (see Case [29]). Assumption 4 is used to ensure the existence of the means and variances of the independent variables. Assumption 5 implies that when n tends to infinity, the variance of Y n is bounded (refer to Kelejian, Prucha [28], and Lee [27]). Assumption 6 excludes multicollinearity among the regressors in X n . For convenience, we assume the regressors are uniformly bounded. If not, they can be replaced with random regressors under certain finite moment conditions (see Lee [27]). Assumption 7 is meaningful for handling the nonlinearity of the log-likelihood function ln S n ( λ ) . Assumptions 8 and 9 are applicable to the asymptotic normality of QMLE. Assumption 10 is similar to condition (C) in Fan and Li [30] and plays an important role in the Taylor expansion of related functions.

2.2. Spatial Autoregressive Model with Missing Data and Measurement Errors

When a subset of covariates has missing values, we consider model (3). Let X i ( o ) R s be the vector of covariates that are always observed and X i ( m ) R k be the vector of covariates that may contain some missing components. For each observation, the indicator Q i denotes whether X i ( m ) is fully observed, i.e., if X i ( m ) is fully observed, then Q i = 1 ; otherwise, Q i = 0 . Let v i = X i ( o ) T R s , and as mentioned earlier, the data in this study are randomly missing but not endogenous. This means that the probability of missing observations may depend on variables that are always observed rather than on variables that may have missing data. Formally:
π i 0 = Pr ( Q i = 1 X i ) = Pr ( Q i = 1 X i ( o ) ) = Pr ( Q i = 1 v i )
In the covariate data, we assume that the dimension of the covariates X i ( m ) R k with missing data and the dimension of the covariates involved in the missing model X i ( o ) R s are fixed.
For the issue of missing data in covariates, the IPW method can be used to address this. The idea of IPW is to offset potential biases due to missing data by assigning different weights to complete observations. This approach helps mitigate biases in results estimation due to missing covariate data, thereby achieving more accurate statistical inferences. The probability π i 0 is usually parameterized and modeled through logistic or probit regression. Here, we assume it is generated by the following logistic regression model:
π ˜ i = exp ξ 0 + v i T ξ 1 1 + exp ξ 0 + v i T ξ 1
For simplicity, π i 0 denotes the true probability of observation i having complete data, and π ˜ i is the probability calculated based on the logistic function. Let Q = diag Q 1 π ˜ 1 , Q 2 π ˜ 2 , , Q n π ˜ n . The weighted spatial autoregressive log-likelihood function is defined as:
ln L ˜ n ( θ ) = n 2 ln ( 2 π ) n 2 ln σ 2 + ln S n ( λ ) 1 2 σ 2 S n ( λ ) Y n X n β T Q S n ( λ ) Y n X n β
When X n has measurement errors, consider the classic additive measurement error model:
Z n = X n + U n
In Equation ( 6 ) , X n cannot be directly observed, but Z n can be, where Z n is an n × k matrix, U n is the error term with U n N ( 0 , Σ ) , and ε n is independent of U n . Additionally, we assume that U n is independent of the covariate X n .
Li [4] proposed a corrected likelihood estimation to solve the spatial autoregressive model with measurement errors. In this case, IPW and corrected likelihood estimation are applied to spatial autoregressive models with measurement errors and missing data.
The corrected likelihood method was initially proposed by Nakamura [31] to address the impact of measurement errors on parameter estimation, enabling parameter estimation without additional assumptions. The specific method is as follows:
For the classic linear regression model with measurement errors:
Y = X β + ε , Z = X + U
where Y is the dependent variable, X is the unobservable value of the explanatory variable, β is the parameter vector, ε is the residual vector, Z is the observable vector, and U is the measurement error.
Let L ( β , X , Y ) , U ( β , X , Y ) , J ( β , X , Y ) , and I ( β , X , Y ) denote, respectively, the log-likelihood function, score function, observed information, and Fisher information of model (7) given Z and Y, with E + being the mathematical expectation regarding the respective variable Y. Without measurement errors in variables, the following equations hold:
E + U ( β , X , Y ) = 0
E + J ( β , X , Y ) = I ( β , X , Y )
With measurement errors, if Z values are simply substituted for X, Equations (8) and (9) do not always hold.
Thus, Nakamura’s corrected likelihood method is used to handle this model, setting the corrected log-likelihood function L ( β , Z , Y ) to satisfy:
E { L ( β , Z , Y ) } = L ( β , X , Y )
where E is the conditional expectation of Z given Y , X . Let U ( β , Z , Y ) = L ( β , Z , Y ) β and J ( β , Z , Y ) = U ( β , Z , Y ) β represent the corrected score function and corrected observed information, respectively. If E and β are interchangeable, then:
E { U ( β , Z , Y ) } = U ( β , X , Y )
E { J ( β , Z , Y ) } = I ( β , X , Y )
If the estimation of β satisfies U ( β , Z , Y ) = 0 , then β is called a corrected likelihood estimate. Let E = E + E . Then,
E { U ( β , Z , Y ) } = E + E { U ( β , Z , Y ) } = E + { U ( β , X , Y ) } = 0
This shows that the corrected score function is unbiased.
Property 1. 
Let F be an open convex subset of the parameter space containing β. If L ( β , Z , Y ) and L ( β , X , Y ) are differentiable on F, k 2 var { L ( β , Z k , Y k ) } < , β is identifiable, Y is mutually identifiable, and then U ( β , X , Y ) = 0 has a root that is consistent with probability one as n converges in probability to 0.
Property 2. 
If β and β X are consistent roots of U ( β , X , Y ) = 0 and U ( β , Z , Y ) = 0 , and if U ( β , X , Y ) and U ( β , Z , Y ) meet some regularity conditions, then under the given conditions of X and Y, ( β β X ) follows an asymptotic normal distribution with mean 0 and variance I + ( β , X ) 1 E + var { U ( β , Z , Y ) } I + ( β , X ) 1 , as ( n ) .
For model (7), the log-likelihood function is
L ( β , X , Y ) = 1 2 ln ( 2 π ) 1 2 ln σ 2 1 2 σ 2 ( Y n X n β ) T ( Y n X n β )
Define
L ( β , Z , Y ) = 1 2 ln ( 2 π ) 1 2 ln σ 2 1 2 σ 2 ( Y n X n β ) T ( Y n X n β ) β T Σ β
Then,
E { L ( β , Z , Y ) } = L ( β , X , Y )
Thus, L ( β , Z , Y ) is the corrected likelihood function for model (7).
By applying Nakamura’s corrected likelihood method, the corrected likelihood function for the spatial autoregressive model with missing data and measurement errors can be obtained (here, denote ln L ^ n ( θ ) = L ( λ , β , Z , Y ) ):
ln L ^ n ( θ ) = n 2 ln ( 2 π ) n 2 ln σ 2 + ln | S n ( λ ) | 1 2 σ 2 { ( S n ( λ ) Y n Z n β ) T Q ( S n ( λ ) Y n Z n β ) β T Σ β }
Next, we solve for the parameters. Differentiating both sides of the previous equation with respect to β and σ 2 , we obtain the following set of equations:
ln L ^ n ( θ ) β = 0 ln L ^ n ( θ ) σ 2 = 0
That is,
1 2 σ 2 2 Z n T Q ( S n ( λ ) Y n Z n β ) 2 Σ β = 0 n 2 σ 2 + 1 2 σ 4 ( S n ( λ ) Y n X n β ) T Q ( S n ( λ ) Y n X n β ) β T Σ β = 0
Given λ , the corrected likelihood estimate for β is:
β ^ ( λ ) = Z n T Q Z n Σ 1 Z n T Q S n ( λ ) Y n
Similarly, the corrected likelihood estimate for σ 2 is:
σ ^ 2 ( λ ) = 1 n { [ ( S n ( λ ) Y n Z n β ) T Q ( S n ( λ ) Y n Z n β ) ] β T Σ β } = 1 n Y n T S n T ( λ ) M n S n ( λ ) Y n
where M n = P z Q Z n Z n T Q Z n Σ 1 T Σ Z n T Q Z n Σ 1 Z n T Q , P z = Q Q Z n Z n T Q Z n Σ 1 Z n T Q . Substituting Equations (20) and (21) into Equation (17), the concentrated likelihood function for λ is:
ln L ^ n ( λ ) = n 2 ( ln ( 2 π ) + 1 ) n 2 ln σ ^ 2 ( λ ) + ln | S n ( λ ) |
The corrected likelihood estimate of λ is then found by maximizing Equation (22).
Theorem 1 (Oracle Properties). 
Suppose the regularity conditions in Assumptions A1–A5 in Appendix A hold. It is apparent that Assumptions A1–A5 are largely consistent with the earlier Assumptions 1–8. If Assumptions A1–A5 hold, then  θ ^ n  is globally identified, a consistent estimator, and has an asymptotic distribution.
1. 
θ 0 is globally identifiable, and θ ^ n is a consistent estimate of θ 0 .
2. 
n ( θ ^ θ 0 ) D N ( 0 , Σ θ 1 ) , where Σ θ 1 = lim n E 1 n 2 ln L ^ n ( θ ) θ θ T .

3. Simulation

Through Monte Carlo simulation, the performance and efficiency of the proposed method were compared. We simulated 500 datasets.
The sample sizes n for each dataset were set to 100, 150, 200, and 250, respectively. The threshold m of the weight matrix represents the number of non-zero elements in each row of the matrix. The threshold m for the weight matrix was set to 10, 15, 20, and 25, respectively. The spatial autoregressive coefficients λ were set to 0.5 and −0.5. Covariates and random errors were generated as follows: Covariates X n = ( X 1 , X 2 , , X p ) were generated from a p-dimensional normal distribution with mean 0 and variance 1. In the simulation, we set β = ( 1 , 0 , 5 , 0 , 3 ) . The generation mechanism for Y n is
Y n = ( I n λ W n ) 1 ( X n β + ε i )
We assumed that the error ε in the spatial autoregressive model followed a normal distribution with variances e 2 of 0.5 2 , 1 2 , and 1.5 2 . In the simulation, we assumed that X 2 and X 4 might have missing values. The missingness model considered is
Logit { P r ( R i = 1 ) } = 0.5 + 0.5 X i 1 1.5 X i 3 + 0.5 X i 5
The measurement error model was set as Z = X + U , where the measurement error U follows a multivariate normal distribution ∑ with mean 0 and variance 1 2 .
(I)
Both measurement errors and missing data are considered;
(II)
Measurement errors are ignored;
(III)
Missing data are ignored (By dropping observations with missing covariates.);
(IV)
Both measurement errors and missing data are ignored.
Table 1, Table 2 and Table 3 present the median square errors (MeSE) of the estimates for λ , β , and σ 2 as λ ^ λ 2 , β ^ β 2 , and σ ^ 2 σ 2 2 , respectively. As observed from Table 1, Table 2 and Table 3, the estimates provided by the proposed method are overall significantly superior to those obtained by directly ignoring missing data or both missing data and measurement errors, with smaller squared errors. Compared to ignoring missing data, the proposed method’s correction effect on estimating β and λ is not very pronounced, but the correction for σ 2 is relatively significant. Overall, the proposed method more significantly corrects the biases caused by missing data. However, when the values of n and e are relatively low, the correction effect for measurement errors is not particularly satisfactory. Moreover, ignoring both missing data and measurement errors leads to larger squared errors due to the severe loss of information. In summary, correcting for missing data and measurement errors is both necessary and effective.

4. Real Data Example

In this section, we present a real example to illustrate the performance of the parameter estimation procedure for spatial autoregressive models with missing data and measurement errors proposed in this paper.
We consider the famous 1970 Boston Housing Dataset, which contains information on 506 different houses in different locations in the Boston Standard Metropolitan Statistical Area. This dataset has been used by many authors and can be found in the spdep library in R. It was first analyzed by Harrison and Rubinfeld [32]. Sun et al. [33] and Du et al. [34] explored the spatial dependence of these data through partially linear varying coefficient autoregressive and partially linear additive autoregressive models, respectively. Liu [35] used the Moran I statistic to test the spatial dependence of the dataset. Therefore, the data serve our analysis purposes well. Table 4 provides specific descriptions of the variables in the dataset.
In the actual data analysis, following the practice of Harrison and Rubinfeld [32], we consider the logarithm of the median value of owner-occupied homes (MEDV) in census tracts as the dependent variable and the other variables as the independent variables. Among these, the weighted distance to five Boston employment centers (DIS), the index of accessibility to radial highways (RAD), the percentage of the population classified as lower income (LSTAT), and the average number of rooms per dwelling (RM) are log-transformed, while the nitrogen oxide concentration (NOX) is squared. For ease of analysis, all variables are mean-centered to have a sample mean of zero.
Spatial weight matrices generally consist of two types of information. One is determined using latitude and longitude coordinates, and the other is determined using the relative locations of regions (see Liu [35]).
Our approach is similar to that of Pace and Gilley [36]. We first define an initial matrix W and the weight between two houses i and j as:
W i j = max 1 d i j d 0 , 0
where d i j is the Euclidean distance calculated based on the latitude and longitude coordinates of the two houses. We set the threshold distance d 0 to 0.05. Additionally, in practice, the spatial weight matrix is row-normalized.
For the above dataset, we consider the following model:
Y = λ W Y + X 1 β 1 + X 2 β 2 + X 3 β 3 + X 4 β 4 + X 5 β 5 + ε i
where the response variable Y is the median value of house prices (MEDV), X 1 is the weighted distance (DIS), X 2 is the index of accessibility to radial highways (RAD), X 3 is the percentage of the population classified as lower income (LSTAT), X 4 is the nitrogen oxide concentration (NOX), and X 5 is the average number of rooms per dwelling (RM).
In the simulation, we assume that X 2 and X 4 may have missing values. The missingness model considered is:
Logit { P r ( R i ) = 1 } = 1 + 2 X i 1 3 X i 3 + X i 5
The measurement error model is set as Z = X + U , where the measurement error U follows a multivariate normal distribution ∑ with mean 0 and variance equal to the sample variance σ i 2 of each variable.
Table 5 displays the median squared errors (MeSEs) of the estimators λ , β , and σ 2 , specifically λ ^ λ 2 , β ^ i β i 2 , and σ ^ 2 σ 2 2 . It can be observed that the performance of the proposed method for the Boston Housing Dataset is largely consistent with the results of the numerical simulations. Although the correction effect for the partial β is not particularly pronounced, the overall correction effect is still relatively ideal, and in particular, the corrections for σ 2 and λ are very effective.

5. Conclusions

We have developed a robust method for simultaneously handling missing data and measurement errors in covariates of spatial autoregressive models. Clearly, traditional statistical methods may lead to biased estimates when covariates have missing data and measurement errors. Our method uses IPW and corrected likelihood methods to address this issue. We have studied the theoretical properties of the proposed method and investigated its performance in parameter estimation through Monte Carlo simulations, comparing it to scenarios where measurement errors, missing data, or both are ignored. The simulation studies demonstrate that our method outperforms traditional direct extensions for parameter estimation in spatial autoregressive models.

Author Contributions

Software, Validation, Investigation, Resources, Data Curation, Writing—Review & Editing, and Funding Acquisition were handled by T.L. He was mainly responsible for implementing the concepts, programming and running simulations, testing the code, collecting and curating data, as well as editing the manuscript and securing funding. Conceptualization, Methodology, Formal Analysis, Writing—Original Draft Preparation, Writing—Review & Editing were performed by Z.Z. He was primarily responsible for the initial concept and design, theoretical derivation of formulas, and writing and editing the theoretical parts of the manuscript. Supervision and Project Administration were carried out by the corresponding author Y.S., who was in charge of overseeing the entire project and managing its administration. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Fundamental Research Funds for the Central Universities (No. 23CX03012A) and the National Key Research and Development Program (2021YFA1000102) of China.

Data Availability Statement

All datasets used in this study, including those analyzed and those generated during the research process, are publicly available and can be accessed through the following link: https://drive.google.com/drive/folders/1egSNneFuDZe-iUbn69OPkWlDAxEzpP6q (accessed on 25 April 2024).

Conflicts of Interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Appendix A

Using a combination of IPW and mean interpolation, consider an ideal “pseudo-complete dataset” Z ˜ n for covariates, satisfying
( S n ( λ ) Y n Z n β ) T Q ( S n ( λ ) Y n Z n β ) = ( S n ( λ ) Y n Z ˜ n β ) T ( S n ( λ ) Y n Z ˜ n β )
From Equations (1), (4) and (A1), we obtain
Y n = λ W n Y n + Z ˜ n β + ε n U n β λ W n Y n + Z ˜ n β + ε n
where ε n = ε n U n β .
Thus, Equation (16) can be simplified as:
Y n = S n 1 ( Z ˜ n β 0 + ε n )
where S n = S n ( λ 0 ) .
Let G n = W n S n 1 . Then, I n + λ 0 G n = S n 1 , and Equation (A1) can be expressed as:
Y n = λ 0 G n Z ˜ n β 0 + Z ˜ n β 0 + S n 1 ε n
To establish the asymptotic properties of the estimator, the following regularity conditions are required:
Assumption A1. 
In ε n = { e i } and U n = { u i } , elements e i , u i , i = 1 , , n are independently and identically distributed with mean E ( e i ) = E ( u i ) = 0 , variance Var ( e i ) = σ 2 , Var ( u i ) = Σ . For γ > 0 , moments E ( { | e i | 4 + γ } ) and E ( { | u i | 4 + γ } ) exist.
Assumption A2. 
For all i , j , elements w n , i j of W n are at most of the order of h n 1 , denoted as O ( 1 / h n ) , where the rate sequence { h n } can be bounded or divergent. For normalization, all i have w n , i i = 0 .
Assumption A3. 
The matrix sequences { W n } and { S n 1 } are uniformly bounded in terms of row and column sums.
Assumption A4. 
For all n, elements Z ˜ n are uniformly bounded constants. When lim n , n 1 Z ˜ n T Z ˜ n exists and is non-singular.
Assumption A5. 
For all λ in the compact parameter space Λ, { S n 1 ( λ ) } are uniformly bounded in terms of row and column sums. The true λ 0 is inside Λ.
Assumption A6. 
lim n n 1 ( Z ˜ n G n Z ˜ n β 0 ) T ( Z ˜ n G n Z ˜ n β 0 ) exists and is a non-singular matrix.
Theorem A1. 
Under Assumptions 1–8, θ 0 is globally identifiable, and θ ^ 0 is a consistent estimator of θ 0 .
Define P n ( λ ) = max β , σ 2 E ( ln L ^ n ( θ ) ) . In this maximization problem, the optimal solutions for β and σ 2 are
β ( λ ) = Z ˜ n T Z ˜ n Σ 1 Z ˜ n T S n ( λ ) Y n σ 2 ( λ ) = 1 n E ( S n ( λ ) Y n Z ˜ n β ) T ( S n ( λ ) Y n Z ˜ n β ) β T Σ β = 1 n ( λ 0 λ ) 2 ( G n Z ˜ n β 0 ) T M n ( G n Z ˜ n β 0 ) + σ 0 2 tr ( S n T ) 1 S n T ( λ ) S n ( λ ) S n 1
Then,
P n ( λ ) = n 2 l n ( 2 π ) + 1 n 2 l n σ ^ 2 ( λ ) + l n | S n ( λ ) |
The value of λ 0 can be obtained by maximizing { P n ( λ ) n } .
To prove that θ ^ 0 is a consistent estimator of θ 0 , it suffices to show the following:
1.
ln L ^ ( θ ) P n ( λ ) n converges uniformly to 0 in Λ ;
2.
For ω > 0 , lim n max λ N ¯ ω ( λ 0 ) P n ( λ ) P n ( λ 0 ) n is a complement set of a neighborhood with a diameter.
(a)
Since
ln L ^ ( λ ) P n ( λ ) n = 1 2 ln σ ^ 2 ( λ ) ln σ 2 ( λ )
σ 2 ( λ ) and σ ^ 2 ( λ ) can be written as
σ 2 ( λ ) = ( λ 0 λ ) 2 ( G n Z ˜ n β 0 ) T M n ( G n Z ˜ n β 0 ) n + σ 2 ( λ ) , σ ^ 2 ( λ ) = 1 n Y n T S n T ( λ ) M n S n ( λ ) Y n = 1 n ( λ 0 λ ) 2 ( G n Z ˜ n β 0 ) T M n ( G n Z ˜ n β 0 ) + 2 ( λ 0 λ ) H 1 n ( λ ) + H 2 n ( λ ) .
where
σ 2 ( λ ) = σ 0 2 n tr ( S n T ) 1 S n T ( λ ) S n ( λ ) S n 1 , H 1 n ( λ ) = 1 n ( G n Z ˜ n β 0 ) T M n S n ( λ ) S n 1 ε n , H 2 n ( λ ) = 1 n ε n T ( S n 1 ) T S n T ( λ ) M n S n ( λ ) S n 1 ε n .
It can be shown that on Λ ,
H 1 n ( λ ) = O p ( 1 ) , H 2 n ( λ ) σ 2 ( λ ) = O p ( 1 ) .
Therefore,
σ ^ 2 ( λ ) σ 2 ( λ ) = O p ( 1 ) .
Hence,
sup λ Λ ln L ^ n ( θ ) P n ( λ ) n = O p ( 1 ) .
(b)
1 n P n ( λ ) P n ( λ 0 ) = 1 n P p , n ( λ ) P p , n ( λ 0 ) = 1 2 ln σ 2 ( λ ) ln σ 2 ( λ ) , where P p , n ( λ ) = n 2 ln ( 2 π ) + 1 n 2 ln σ 2 ( λ ) + ln | S n ( λ ) | .
{ P n ( λ ) n } is uniformly continuous on Λ . By Jensen’s inequality, for all λ ,
P p , n ( λ ) P p , n ( λ 0 ) n 0
and, therefore,
σ 2 ( λ ) σ 2 ( λ )
If the assumption does not hold, then there exists a sequence { λ n } Λ , lim n λ n = λ + λ 0 such that
lim n P n ( λ n ) P n ( λ 0 ) n = 0
This can only occur if
lim n σ 2 ( λ + ) σ 2 ( λ + ) = 0
and
lim n P p , n ( λ + ) P p , n ( λ 0 ) n = 0
simultaneously hold, but the latter equation contradicts
lim n ( G n Z ˜ n β 0 ) T M n ( G n Z ˜ n β 0 ) n 0
Hence, consistency is proven.
Theorem A2. 
The asymptotic distribution of CMLE θ ^ n can be derived from the Taylor expansion of ln L n ^ ( θ ^ n ) θ = 0 at θ n , where the first-order derivative of the log-likelihood function at θ 0 is
1 n ln L ^ n ( θ 0 ) β = 1 n σ 0 2 Z ˜ n T Q ε n + n Σ β 0 1 n ln L ^ n ( θ 0 ) σ 2 = 1 2 n σ 0 4 Q ε n T ε n n β 0 T Σ β 0 n σ 0 2 1 n ln L ^ n ( θ 0 ) λ = 1 n σ 0 2 G n Z ˜ n β 0 T ε n + 1 n σ 0 2 ε n T G n ε n σ 0 2 tr ( G n )
The asymptotic distribution of these expressions can be obtained through the central limit theorem. If { h n } is a bounded sequence, Kelejian and Prucha’s central limit theorem can be used. If { h n } is unbounded, i.e., lim n h n = , then under Assumption A5, 1 n σ 0 2 ( G n Z ˜ n β 0 ) T ε n will significantly influence 1 n ln L ^ ( θ 0 ) λ . This is because
var 1 n ε n T G n ε n = O 1 h n 1 n ε n T G n ε n σ 0 2 tr ( G n ) = O p ( 1 )
However,
1 n ( G n Z ˜ n β 0 ) T ε n = O p ( 1 )
In this case, Kolmogorov’s central limit theorem can be used.
1 n ln L n ^ ( θ 0 ) θ s variance matrix is E 1 n ln L ^ ( θ 0 ) θ 1 n ln L ^ ( θ 0 ) θ = E 1 n 2 ln L ^ ( θ 0 ) θ θ T
where
E 1 n 2 ln L ^ ( θ 0 ) θ θ T = 1 n σ 0 2 ( Z ˜ n T Z ˜ n 2 n Σ ) 1 n σ 0 2 Z ˜ n T ( G n Z ˜ n β 0 ) 1 σ 0 4 Σ β 0 1 n σ 0 2 ( G n Z ˜ n β 0 ) T Z ˜ n 1 n σ 0 2 ( G n Z ˜ n β 0 ) T ( G n Z ˜ n β 0 ) + 1 n tr ( G n T G n ) 1 n σ 0 2 ( G n ) 1 σ 0 4 Σ β 0 1 n σ 0 2 tr ( G n ) 1 2 σ 0 4
In 2 ln L ^ ( θ 0 ) θ θ T , λ , β , and 1 σ 0 2 appear either as linear or quadratic moments, and
2 ln L ^ ( θ ) λ 2 = tr [ W n T S n 1 ( λ ) ] 2 Y n T W n T W n Y n σ 2
Let G n ( λ ) = W n S n 1 ( λ ) , and by the mean value theorem,
tr ( G n 2 ( λ ˜ n ) ) = tr ( G n 2 ) + 2 tr ( G n 3 ( λ ˜ ) ) ( λ ˜ λ 0 )
Assumption A3 ensures that within a neighborhood of λ 0 , G n ( λ ˜ n ) ’s row and column sums are uniformly bounded. Since tr ( G n 3 ( λ ˜ n ) ) = O n h n ,
Y n T W n T W n Y n = O p ( n h n )
and, therefore,
1 n 2 ln L ^ ( θ ˜ n ) λ 2 2 ln L ^ ( θ 0 ) λ 2 = 2 tr ( G n 3 ( λ ˜ n ) ) n ( λ ˜ n λ 0 ) + 1 σ 0 2 1 σ ˜ 2 Y n T W n T W n Y n n = O p ( 1 )
The second-order derivatives of other terms can be similarly deduced. Since both Z ˜ n T G n ε n n and 1 n ( ε n T G n ε n ( σ 0 2 + β 0 T Σ β 0 ) tr ( G n ) ) converge to O p ( 1 ) , it follows that
1 n 2 ln L ^ ( θ ˜ n ) θ θ T 2 ln L ^ ( θ 0 ) θ θ T P 0
Since ln L n ( θ 0 ) θ is either a linear or quadratic function of ε n , and the higher-order moments of ε n exist according to Assumption A1, the central limit theorem by Kelejian and Prucha implies
1 n ln L ^ ( θ 0 ) θ D N 0 , Σ θ
Assumption A5 ensures that Σ θ is non-singular, and the asymptotic distribution of θ ^ n can be written as
n ( θ ^ n θ 0 ) = 1 n 2 l n L ^ n ( θ ˜ n ) θ θ T 1 1 n l n L ^ n ( θ 0 ) θ
where θ ^ n converges in probability to θ 0 .

References

  1. Luo, G. Statistical Inference in Spatial Autoregressive Models with Complex Data; Beijing University of Technology: Beijing, China, 2023. [Google Scholar] [CrossRef]
  2. Chen, Q. Advanced Econometrics and Stata Application; Higher Education Press: Beijing, China, 2010. [Google Scholar]
  3. Bai, Y.; Tian, M.; Tang, M.-L.; Lee, W.-Y. Variable selection for ultra-high dimensional quantile regression with missing data and measurement error. Stat. Methods Med. Res. 2021, 30, 129–150. [Google Scholar] [CrossRef] [PubMed]
  4. Li, W. Parameter Estimation of Spatial Autoregressive Models with Measurement Error; Yunnan University: Kunming, China, 2020. [Google Scholar]
  5. Anselin, L. SpaceStat Tutorial: A Workbook for Using SpaceStat in the Analysis of Spatial Data; West Virginia University: Urbana, IL, USA, 1992. [Google Scholar]
  6. Tobler, R.W. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ. Geogr. 2016, 46, 234–240. [Google Scholar] [CrossRef]
  7. Anselin, L. Spatial Econometrics: Methods and Models. J. Am. Stat. Assoc. 1990, 85, 160. [Google Scholar] [CrossRef]
  8. Anselin, L. Thirty Years of Spatial Econometrics. Pap. Reg. Sci. 2010, 89, 3–25. [Google Scholar] [CrossRef]
  9. Cox, T.F. Spatial Processes: Models and Applications. J. R. Stat. Soc. Ser. A 1984, 147, 515. [Google Scholar] [CrossRef]
  10. Prucha, K.I.R. A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model. Int. Econ. Rev. 2010, 40, 509–533. [Google Scholar] [CrossRef]
  11. Lesage, J.P. Bayesian Estimation of Spatial Autoregressive Models. Int. Reg. Sci. Rev. 1997, 20, 113–129. [Google Scholar] [CrossRef]
  12. Lee, L.F. GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J. Econ. 2007, 137, 489–514. [Google Scholar] [CrossRef]
  13. Lee, L.F.; Liu, X. Efficient GMM estimation of high order spatial autoregressive models. Econ. Theory 2010, 26, 187–230. [Google Scholar] [CrossRef]
  14. Wei, C.; Guo, S.; Zhai, S. Statistical inference of partially linear varying coefficient spatial autoregressive models. Econ. Model. 2017, 64, 553–559. [Google Scholar] [CrossRef]
  15. Ma, M.; Zhang, X. Spatial Effects of China’s Haze Pollution and the Impact of Economy and Energy Structure. China Ind. Econ. 2014, 19–31. [Google Scholar] [CrossRef]
  16. Wang, Y.F.; Ni, P.F. Economic Growth Spillovers and Regional Spatial Optimization Under the Influence of High-Speed Rail. China Ind. Econ. 2016, 21–36. [Google Scholar] [CrossRef]
  17. Cheng, Y.; Dong, C. Study on the Spatial Effects of Trade Facilitation on China’s Industrial Manufactured Goods Export. Quant. Econ. Tech. Econ. Res. 2021, 38, 98–115. [Google Scholar] [CrossRef]
  18. Shen, T.; Feng, D.; Sun, T. Spatial Econometrics; Peking University Press: Beijing, China, 2010. [Google Scholar]
  19. Yi, L.; Tang, H.; Lin, X. Spatial Linear Mixed Models with Covariate Measurement Errors. Stat. Sin. 2009, 19, 1077. [Google Scholar]
  20. Huque, M.H.; Bondell, H.D.; Ryan, L. On the impact of covariate measurement error on spatial regression modelling. Environmetrics 2015, 25, 560–570. [Google Scholar] [CrossRef]
  21. Zhang, Z.; Zhu, P. Estimation and Testing of Spatial Autoregressive Models with Measurement Error. Stat. Res. 2010, 27, 103–109. [Google Scholar] [CrossRef]
  22. He, X.; Hu, X. Parameter Estimation of Univariate Spatial Autoregressive Measurement Error Models. Sci. China Math. 2020, 50, 613–628. [Google Scholar] [CrossRef]
  23. You, P. Estimation of Semiparametric Spatial Autoregressive Models with Random Missing; Yunnan University: Kunming, China, 2018. [Google Scholar]
  24. Cheng, D. Modeling and Analysis of Residents’ Travel Behavior Based on Multiple Differences; Dalian Jiaotong University: Dalian, China, 2019. [Google Scholar] [CrossRef]
  25. Yang, Z.; Yu, J.; Chen, J. A Missing Data Imputation Method Based on Clustering and Spatial Autoregressive Model. Intelligent Information Technology Application Association. In Proceedings of the 2011 International Conference on Ecological Protection of Lakes-Wetlands-Watershed and Application of 3S Technology (EPLWW3S 2011 V2), Nanchang, China, 25–26 June 2011; pp. 554–557. [Google Scholar]
  26. Wang, W.; Lee, L. Estimation of spatial autoregressive models with randomly missing data in the dependent variable. Econ. J. 2013, 16, 73–102. [Google Scholar] [CrossRef]
  27. Lee, L.F. Asymptotic Distributions of Quasi-Maximum Likelihood Estimators for Spatial Autoregressive Models. Econometrica 2004, 72, 1899–1925. [Google Scholar] [CrossRef]
  28. Kelejian, H.H.; Prucha, I.R. On the Asymptotic Distribution of the Moran I Test Statistic with Applications. J. Econom. 2001. [Google Scholar] [CrossRef]
  29. Case, A.C. Spatial Patterns in Household Demand. Econometrica 1991, 59, 953–965. [Google Scholar] [CrossRef]
  30. Li, F.R. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Publ. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  31. Nakamura, T. Corrected Score Function for Errors-in-Variables Models: Methodology and Application to Generalized Linear Models. Biometrika 1990, 77, 127–137. [Google Scholar] [CrossRef]
  32. David, H., Jr. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978. [Google Scholar] [CrossRef]
  33. Sun, Y.; Yan, H.; Zhang, W. A semiparametric spatial dynamic model. Ann. Stats. 2014, 42, 700–727. [Google Scholar] [CrossRef]
  34. Du, J.; Sun, X.; Cao, R. Statistical inference for partially linear additive spatial autoregressive models. Spat. Stat. 2018, 52–67. [Google Scholar] [CrossRef]
  35. Liu, X.; Chen, J.; Cheng, S. A Penalized Quasi-Maximum Likelihood Method for Variable Selection in the Spatial Autoregressive Model. Spat. Stat. 2018, 25, 86–104. [Google Scholar] [CrossRef]
  36. Pace, R.K.; Gilley, O.W. Using the Spatial Configuration of the Data to Improve Estimation. J. Real Estate Financ. Econ. 1997, 14, 333–340. [Google Scholar] [CrossRef]
Table 1. MeSEs for λ .
Table 1. MeSEs for λ .
n = 100n = 150n = 200n = 250
e λ m = 10m = 15m = 20m = 10m = 15m = 20m = 15m = 20m = 25m = 15m = 20m = 25
Incorporating measurement errors and missing data
0.50.51.627E+019.324E-015.839E+014.618E-024.189E-024.618E-023.850E-023.492E-023.850E-022.462E-023.809E-022.610E-02
−0.58.240E+002.289E+003.887E+004.440E-024.288E-024.440E-023.938E-022.922E-023.938E-022.435E-023.501E-022.692E-02
10.52.184E-012.354E-012.102E-011.552E-011.353E-011.555E-011.487E-011.248E-011.185E-011.078E-011.304E-011.073E-01
−0.52.081E-012.065E-012.159E-011.480E-011.437E-011.562E-011.524E-011.312E-011.207E-011.138E-011.281E-011.112E-01
1.50.54.731E-014.785E-014.959E-013.910E-013.290E-013.192E-013.803E-012.695E-012.539E-012.395E-012.561E-012.375E-01
−0.54.632E-015.174E-014.847E-013.844E-013.290E-013.267E-013.723E-012.694E-012.819E-012.261E-012.668E-012.542E-01
Ignoring missing data
0.50.51.320E+021.603E+021.368E+039.288E+022.599E+028.833E+018.905E+028.535E+019.025E+017.097E+016.560E+011.963E+02
−0.51.403E+021.288E+026.390E+021.014E+032.685E+028.620E+019.785E+021.797E+021.074E+027.247E+016.491E+011.774E+02
10.53.652E+036.698E+011.661E+037.315E+022.568E+029.291E+016.350E+021.839E+021.098E+027.515E+017.648E+011.835E+02
−0.53.840E+037.060E+011.649E+031.035E+033.026E+028.620E+015.711E+021.797E+021.081E+027.477E+017.326E+011.934E+02
1.50.53.776E+036.636E+012.115E+039.394E+022.887E+029.215E+017.766E+021.727E+021.091E+027.751E+017.263E+011.949E+02
−0.53.692E+036.373E+011.862E+038.851E+022.952E+028.631E+017.435E+021.987E+021.167E+027.406E+016.734E+011.884E+02
Ignoring measurement errors
0.50.55.054E-025.442E-025.727E-023.850E-023.523E-023.850E-023.140E-023.492E-022.867E-022.482E-023.501E-022.432E-02
−0.55.584E-024.883E-024.941E-023.938E-023.660E-023.938E-022.713E-022.922E-022.754E-022.403E-023.501E-022.692E-02
10.52.101E-012.141E-012.100E-011.334E-011.487E-011.332E-011.199E-011.248E-011.185E-011.074E-011.314E-011.095E-01
−0.52.021E-012.038E-012.149E-011.428E-011.524E-011.490E-011.064E-011.312E-011.207E-011.109E-011.254E-011.102E-01
1.50.54.449E-014.604E-014.668E-013.803E-013.180E-013.142E-012.530E-012.695E-012.539E-012.441E-012.509E-012.375E-01
−0.54.664E-014.983E-014.969E-013.723E-013.233E-013.121E-012.527E-012.694E-012.819E-012.309E-012.637E-012.568E-01
Ignoring measurement errors and missing data
0.50.51.227E+021.543E+021.312E+038.905E+022.516E+028.535E+016.129E+021.699E+029.582E+016.998E+016.491E+011.921E+02
−0.51.299E+021.220E+026.024E+029.785E+022.573E+028.620E+016.237E+021.942E+021.035E+027.140E+016.491E+011.746E+02
10.53.468E+036.335E+011.557E+037.074E+022.468E+028.876E+016.189E+021.827E+021.078E+027.358E+017.555E+011.799E+02
−0.53.493E+036.724E+011.557E+031.008E+032.891E+028.326E+015.566E+021.772E+021.062E+027.408E+017.166E+011.914E+02
1.50.53.524E+036.336E+012.019E+038.924E+022.822E+028.807E+017.539E+021.677E+021.078E+027.674E+017.112E+011.918E+02
−0.53.375E+036.215E+011.734E+038.151E+022.867E+028.371E+017.322E+021.935E+021.143E+027.346E+016.638E+011.847E+02
Table 2. MeSEs for β .
Table 2. MeSEs for β .
n = 100n = 150n = 200n = 250
e λ m = 10m = 15m = 20m = 10m = 15m = 20m = 15m = 20m = 25m = 15m = 20m = 25
Incorporating measurement errors and missing data
0.50.51.801E-028.973E-046.942E-031.063E-072.664E-071.269E-065.714E-082.531E-076.536E-074.296E-072.436E-062.780E-07
−0.52.281E-022.045E-031.019E-031.119E-073.246E-071.563E-065.643E-081.796E-075.257E-074.646E-072.419E-062.300E-07
10.51.784E-071.216E-055.359E-072.187E-079.488E-073.077E-061.683E-077.737E-072.334E-061.877E-063.835E-069.066E-07
−0.51.822E-071.164E-054.690E-072.171E-079.527E-073.141E-061.917E-078.920E-071.713E-061.825E-064.568E-061.024E-06
1.50.54.202E-072.680E-051.040E-063.780E-071.599E-063.857E-064.070E-071.672E-064.307E-064.402E-065.968E-062.078E-06
−0.54.165E-072.693E-058.912E-073.953E-071.725E-066.557E-064.159E-071.475E-064.581E-064.567E-075.814E-062.216E-06
Ignoring missing data
0.50.51.027E-061.165E-061.946E-076.254E-081.896E-079.387E-075.386E-082.548E-076.431E-074.499E-072.257E-062.740E-07
−0.51.329E-061.328E-062.800E-076.503E-082.004E-079.951E-075.913E-081.693E-075.375E-074.490E-072.253E-062.388E-07
10.51.679E-071.181E-055.124E-072.171E-079.079E-073.141E-061.828E-077.743E-072.231E-062.041E-063.937E-069.629E-07
−0.51.866E-071.037E-054.678E-072.425E-079.527E-073.857E-061.939E-078.920E-071.685E-061.822E-063.700E-061.017E-06
1.50.54.402E-072.404E-051.753E-063.953E-071.725E-066.790E-064.070E-071.654E-063.690E-064.567E-075.917E-061.985E-06
−0.54.567E-072.572E-059.034E-074.730E-071.725E-066.706E-064.159E-071.475E-064.581E-064.402E-075.814E-062.232E-06
Ignoring measurement errors
0.50.51.801E-028.973E-046.942E-031.063E-072.664E-071.269E-065.714E-082.531E-076.536E-074.296E-072.436E-062.780E-07
−0.52.281E-022.045E-031.019E-031.119E-073.246E-071.563E-065.643E-081.796E-075.257E-074.646E-072.419E-062.300E-07
10.51.784E-071.216E-055.359E-072.187E-079.488E-073.077E-061.683E-077.737E-072.334E-061.877E-063.835E-069.066E-07
−0.51.822E-071.164E-054.690E-072.171E-079.527E-073.141E-061.917E-078.920E-071.713E-061.825E-064.568E-061.024E-06
1.50.54.202E-072.680E-051.040E-063.780E-071.599E-063.857E-064.070E-071.672E-064.307E-064.402E-065.968E-062.078E-06
−0.54.165E-072.693E-058.912E-073.953E-071.725E-066.557E-064.159E-071.475E-064.581E-064.567E-075.814E-062.216E-06
Ignoring measurement errors and missing data
0.50.51.801E-028.973E-046.942E-031.063E-072.664E-071.269E-065.714E-082.531E-076.536E-074.296E-072.436E-062.780E-07
−0.52.281E-022.045E-031.019E-031.119E-073.246E-071.563E-065.643E-081.796E-075.257E-074.646E-072.419E-062.300E-07
10.51.784E-071.216E-055.359E-072.187E-079.488E-073.077E-061.683E-077.737E-072.334E-061.877E-063.835E-069.066E-07
−0.51.822E-071.164E-054.690E-072.171E-079.527E-073.141E-061.917E-078.920E-071.713E-061.825E-064.568E-061.024E-06
1.50.54.202E-072.680E-051.040E-063.780E-071.599E-063.857E-064.070E-071.672E-064.307E-064.402E-065.968E-062.078E-06
−0.54.165E-072.693E-058.912E-073.953E-071.725E-066.557E-064.159E-071.475E-064.581E-064.567E-075.814E-062.216E-06
Table 3. MeSEs for σ 2 .
Table 3. MeSEs for σ 2 .
n = 100n = 150n = 200n = 250
e λ m = 10m = 15m = 20m = 10m = 15m = 20m = 15m = 20m = 25m = 15m = 20m = 25
Incorporating measurement errors and missing data
0.50.58.838E+036.371E+001.498E+052.544E-022.725E-022.938E-021.402E-021.326E-021.413E-021.255E-025.831E-021.436E-02
−0.51.971E+033.881E+014.972E+022.515E-022.724E-022.161E-021.504E-021.285E-021.285E-021.087E-024.399E-021.340E-02
10.52.497E-012.111E-012.422E-012.286E-012.451E-012.327E-012.655E-013.116E-012.646E-015.731E-017.831E-015.634E-01
−0.52.428E-012.424E-012.191E-012.618E-012.378E-012.618E-012.571E-013.228E-012.817E-015.984E-011.015E+005.455E-01
1.50.51.039E+008.811E-018.929E-011.870E+001.237E+001.388E+001.751E+001.950E+001.752E+003.281E+004.377E+002.882E+00
−0.57.428E-011.214E+001.008E+002.276E+001.156E+001.197E+001.409E+002.525E+002.000E+003.183E+005.228E+003.731E+00
Ignoring missing data
0.50.56.195E+057.860E+059.643E+079.921E+071.059E+076.249E+051.097E+086.918E+061.673E+061.590E+061.887E+069.845E+06
−0.57.591E+056.743E+051.409E+071.534E+081.149E+078.153E+058.139E+061.796E+068.070E+061.241E+061.750E+068.070E+06
10.55.165E+088.849E+041.396E+088.536E+071.048E+078.269E+051.222E+087.143E+061.647E+061.489E+061.885E+068.864E+06
−0.58.202E+088.025E+041.175E+081.300E+089.951E+067.452E+051.033E+087.292E+061.689E+061.381E+061.962E+061.062E+07
1.50.55.896E+089.098E+041.807E+081.364E+081.323E+077.982E+051.827E+088.376E+061.992E+061.701E+062.093E+069.215E+06
−0.55.829E+088.085E+041.362E+081.450E+081.148E+077.943E+051.349E+088.539E+061.874E+061.478E+062.036E+061.101E+07
Ignoring measurement errors
0.50.51.552E-021.409E-021.635E-022.452E-021.870E-021.839E-023.074E-023.737E-023.452E-023.967E-021.438E-015.327E-02
−0.51.556E-021.611E-021.420E-021.966E-021.673E-021.157E-013.432E-023.214E-025.016E-024.095E-021.157E-015.016E-02
10.51.575E-011.784E-011.942E-013.701E-014.468E-013.903E-014.643E-015.033E-014.763E-017.663E-011.166E+007.863E-01
−0.52.203E-011.996E-012.201E-014.989E-013.800E-013.196E-014.460E-015.350E-015.115E-018.282E-011.174E+007.716E-01
1.50.51.071E+009.059E-011.040E+002.490E+001.723E+001.843E+002.196E+002.450E+002.661E+003.956E+004.853E+003.374E+00
−0.58.711E-011.130E+001.075E+002.974E+001.652E+001.563E+001.849E+003.028E+002.583E+003.761E+005.851E+004.455E+00
Ignoring measurement errors and missing data
0.50.56.249E+057.933E+059.505E+071.007E+081.088E+076.285E+051.143E+088.101E+061.670E+061.592E+061.891E+061.008E+07
−0.58.039E+056.779E+051.376E+071.524E+081.148E+078.325E+058.101E+061.797E+068.078E+061.262E+061.754E+068.078E+06
10.55.243E+088.745E+041.387E+088.553E+071.051E+078.309E+051.223E+087.152E+061.655E+061.490E+061.887E+068.866E+06
−0.58.338E+088.011E+041.187E+081.306E+081.015E+077.443E+051.036E+087.452E+061.691E+061.382E+061.965E+061.069E+07
1.50.55.856E+088.961E+041.826E+081.371E+081.389E+077.982E+051.828E+088.344E+061.997E+061.726E+062.096E+069.126E+06
−0.55.829E+088.209E+041.369E+081.456E+081.156E+077.943E+051.357E+088.559E+061.843E+061.474E+062.039E+061.089E+07
Table 4. Descriptions of the variables in the Boston Housing Dataset.
Table 4. Descriptions of the variables in the Boston Housing Dataset.
AttributeExplanationRemarks
CRIMPer capita crime rate by town
ZNProportion of residential land zoned for lots over 25,000 sq. ft.Residential land proportion
INDUSProportion of non-retail business acres per townNon-retail business proportion
CHASCharles River dummy variableCharles River variable for regression analysis
NOXNitrogen oxide concentration (ppm)Environmental indicator
RMAverage number of rooms per dwellingNumber of rooms in residential units
AGEProportion of owner-occupied units built prior to 1940Pre-1940s-constructed units proportion
DISWeighted distances to five Boston employment centersDistance to employment hubs
RADIndex of accessibility to radial highwaysHighway accessibility index
TAXFull-value property tax rate per 10,000Property tax rate
PRATOPupil–teacher ratio by townPupil-teacher ratio
B 1000 ( B k 0.63 ) 2 , where Bk is the proportion of blacks by townProportion of black population
LSTATPercentage of the population classified as lower incomeLower-income class proportion
MEDVMedian value of owner-occupied homesTypically, the target variable in an analysis
Table 5. MeSEs for λ , β , and σ 2 .
Table 5. MeSEs for λ , β , and σ 2 .
λ β 1 β 2 β 3 β 4 β 5 σ 2
Real parameters4.446E-01−1.770E-01−3.425E-02−4.261E-01−1.249E-014.102E-018.168E-03
Incorporating measurement errors and missing data8.870E-031.878E-038.669E-049.953E-031.716E-032.848E-019.182E-07
Ignoring missing data3.536E-022.045E-038.998E-049.691E-031.805E-031.746E-018.075E-06
Ignoring measurement errors1.043E-021.919E-038.840E-046.065E-032.132E-032.457E-018.077E-07
Ignoring measurement errors and missing data4.528E-022.094E-038.565E-043.985E-031.629E-031.208E-011.125E-05
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, T.; Zhang, Z.; Song, Y. Parameter Estimation in Spatial Autoregressive Models with Missing Data and Measurement Errors. Axioms 2024, 13, 315. https://doi.org/10.3390/axioms13050315

AMA Style

Li T, Zhang Z, Song Y. Parameter Estimation in Spatial Autoregressive Models with Missing Data and Measurement Errors. Axioms. 2024; 13(5):315. https://doi.org/10.3390/axioms13050315

Chicago/Turabian Style

Li, Tengjun, Zhikang Zhang, and Yunquan Song. 2024. "Parameter Estimation in Spatial Autoregressive Models with Missing Data and Measurement Errors" Axioms 13, no. 5: 315. https://doi.org/10.3390/axioms13050315

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop