4.2. FCNN–GWR—The Combination of Deep Learning and GWR
GWR cannot characterize the nonlinear complex characteristics of price, and existing deep-learning methods cannot explicitly process the spatial heterogeneity. Therefore, we propose the FCNN–GWR model, combining deep learning with GWR, which can handle both aspects of the problem. The general idea of the FCNN–GWR model is that the FCNN model can provide an acceptable prediction value for house rental price through deep learning, and the implementation of GWR on this value can optimize it. As the β parameters of GWR contain the spatial heterogeneity and spatial discrepancy of the house rental price, including them in the deep-learning model may explicitly help to optimize the fitting value. We can compose a matrix M, which combines the GWR β parameters with the structural, locational, and neighborhood variables, and then deep learning can be carried out in the matrix M, which may obtain more accurate prediction results. The matrix M = [β0, β1, β2, …, βm, x1, x2, …, xm], where β0, β1, … represents the β parameters in the GWR model as in Equation (2), and x1, x2, … means the structural, locational, and neighborhood variables, the same as Equation (1).
As has already been verified, GWR has a clear disadvantage in out-of-sample forecasts, which means that the prediction value of GWR may be not sufficiently reliable when there are not enough samples near the concerning point. Therefore, we only adopt the GWR predictions when there is a relatively large number of samples nearby; when there are fewer samples nearby, just the previous FCNN prediction values are adopted as the final prediction value, which means:
where for the
ith house,
yi denotes its prediction value; other variables are the same as Equations (1) and (2). For a certain house, Condition 1 means that the number of its nearby house samples (within the distance of the bandwidth of GWR) is larger than the average level among all houses (
, where
represents the number of neighboring samples within the GWR bandwidth for the
ith house, and
n represents the total number of houses in the dataset); Condition 2 means that the number of nearby (within the distance of bandwidth) houses around it is smaller than the average level among all house samples. Since the bandwidth is a decisive parameter in GWR, and only samples within the distance of the bandwidth of the GWR play a relatively important role in the calculation, we divided the quantities of nearby samples into the 2 conditions by the number of samples within the distance of the bandwidth. When the quantity of nearby samples is smaller than the average, the prediction by GWR may not be sufficiently credible. In these cases, FCNN is more reliable while conversely, weighting geographically may reduce precision. The minus value of accuracy increment of the GWELM in Table 1 of Deng et al. [
31] may be attributed to this phenomenon.
Recent studies have proved that the attention mechanism can be effective for the neural networks of the housing price [
1,
2,
9,
55]. In our research, the house variables [
x1,
x2, …,
xm] and GWR
β parameters [
β0,
β1,
β2, …,
βm] were assigned with an attention block [
1] in front of the first fully connected layer, respectively. The attention block can convert the original input characteristics into attended characteristics, in order to identify the important features that influence the rental prices. The attention block can be described as a Softmax-activated fully connected layer, and the algorithm is:
. where
, where
Φ(·) is the Softmax function,
x is the input features,
y is the output features,
h is the neurons of this fully-connected layer, and
w is the weights of the input
x.
Φ(h) is the soft attention-weighted vector, which can signify the importance of the features of the house variables and GWR
β parameters. The variation among attended features
yk would be substantially larger than the variance among the original features
xk as a result of the attention block, suggesting that the important characteristics for the house rental price are emphasized in the network, and it would benefit the convergence and performance of the model.
In the FCNN–GWR model, the data should be divided into the training sets the test sets. The process of FCNN–GWR is shown in
Figure 3 and can be described as follows:
Step 1: Train the FCNN model on the training set with the structural, locational, and neighborhood variables of the house.
Step 2: Execute the GWR model on the training set with the structural, locational, and neighborhood variables. Then, the β parameters of GWR can be calculated for each house via GWR fitting.
Step 3: Put the GWR β parameters, and the structural, locational, and neighborhood variables together to make up the matrix M. Through a deep-learning training with the matrix M wrapped with the attention blocks, the FCNN model including the GWR β parameters (and structural, locational, and neighborhood variables) can be obtained.
Step 4: Predict the price value on the test set with the FCNN model including the GWR β parameters (obtained in Step 3). The prediction value is referenced as f1.
Step 5: Predict the price value on the test set with the ordinary FCNN model (obtained in Step 1, only with the structural, locational, and neighborhood variables, without the GWR β parameters). The prediction value is referenced as f0.
Step 6: On the test set, the final predicted results of FCNN–GWR are obtained according to equation (6): if there are relatively more samples nearby (Condition 1), the final prediction value would be f1; if there are relatively fewer samples nearby (Condition 2), the final prediction value would be f0.
Through this method of synthetic training, the FCNN–GWR model not only has the ability to explain the nonlinear complexity of the price but also addresses the spatial heterogeneity explicitly since the method considers the influence of surrounding rental houses. In this paper, FCNN–GWR and other models were used and compared in the study areas to demonstrate the superiority of the proposed model.
4.3. Quantity-Based Locational and Neighborhood Variables
In traditional housing price models, the locational and neighborhood variables include DCBD, Dpark, and so on (
Table 2). These factors can reflect the location of the house, but a limited number of variables are allowed in order to avoid the multicollinear problem [
56]. In this way, although main factors of the price can be effectively explained, problems still exist. Firstly, these locational and neighborhood variables are distance-based, while expressing the location of houses with the distance may be somewhat inaccurate, which leads to the loss of precision in the house-pricing model. Seo et al. [
16], Li et al. [
29], and Bency et al. [
10] have given the evidence. As shown in
Figure 4, there are many POIs typed “school” in this area. When calculating the locational and neighborhood variables of the houses in this area, only the information of the blue points is actually used, which are the “nearest” school POIs to the houses; the information of the neighboring yellow points is not included, just because they are not the “nearest” ones to the houses. In other words, the locational information formed by these yellow points is discarded rather than exploited, which may influence the accuracy of the price model. Secondly, some variables are excluded from the model since they are similar to other variables, the model may lose a certain amount of information. These variables can also contribute to the housing price to a certain extent.
To solve this, we propose another method to measure the locational characteristics of the house: the quantity-based locational and neighborhood variables. In our perspective, the number and the combination of the various kinds of POIs surrounding a house can better reflect its locational characteristics. For example,
Figure 5a is the place near a gate of a school, with very dense POIs around it. It is not reasonable to consider only the selected “nearest” POI since other neighboring POIs also contribute to the locational characteristics. To consider the influences of other POIs, a better way is to calculate the number of every type of POI nearby. The number and the combination of every type of POI can better reflect the location characteristics of houses locally. For example, in
Figure 5a there are 1301 commercial POIs, 68 traffic POIs, 20 stadium POIs, and 123 school POIs. The number of commercial and school POIs is very large, which implies that this place may be the intersection between the school and the commercial district. For another example,
Figure 5b is a newly built venue in Wuhan. There are 104 business POIs, 53 transportation POIs, 6 stadium POIs, and no school POIs nearby. The number of subways, bus, parking, and other transportation facilities is very large, but the number of commercial and school POIs is very small, which demonstrates the characteristics of this place as a new infrastructure and new venue. Therefore, the locational characteristics of a place can be reflected in the form of the above.
The amount of distribution of different types of POIs near a house can be measured to express the quantity characteristics of POIs described above. In fact, the Kernel Density Estimation (KDE) [
48] is a practical way to measure the number and density of the points near a certain place, which is a robust analytical tool in GIS for model discovery and spatial statistical and spatiotemporal data mining. In this research, KDE is adopted and the estimated density value of KDE for different types of POIs can be used as “quantity-based variables” for expressing the characteristics of the rental houses, that is:
where
η(
sj) is the estimated density value of the
jth type of POIs for a house sample,
Nj is the total number of the
jth type of POIs, dist(
sj,
xj,k) is the distance between the location of the house and the location of the
kth POI in the
jth type of POIs,
K(·) is the penalty function (also called kernel function in KDE); and
h is the bandwidth of the kernel function, which represents the smoothing effect of the kernel function. If we put [
η(
s1),
η(
s1), …,
η(
sj), …], the estimated density values for all types of POIs, together, they can represent comprehensive locational and neighborhood characteristics for a rental house. The combination of [
η(
s1),
η(
s1), …,
η(
sj), …,
η(
sN)] is labeled as “quantity-based locational and neighborhood variables” in this paper.
From the formula above, we learn that the kernel function
K(·) and the bandwidth
h are 2 parameters that KDE requires. In this research, 4 types of common-used kernel functions are tested to find a suitable kernel and bandwidth for the quantity-based locational and neighborhood variables: the Triangular Kernel, the Gaussian Kernel, and the Laplacian Kernel. We put the structural variables and the “quantity-based locational and neighborhood variables” together, to construct the vector
F, which represents the overall factors of the rental house price:
where
sv represents a structural variable of the house (in
Table 2);
Q and
N represent the number of the structural variables and locational and neighborhood variables, respectively. The different types of kernel functions and bandwidths would be tested and optimized to find a best one to make the factors
F to get a highest
R2 in the OLS model related to the rental house price.
where
ri;o and
ri;s are the observed (actual) and simulated (calculated by the model) rental prices (unit: RMB/m
2/month) for the
ith house, and
n is the number of rental house samples in each dataset. By testing we find that the Gaussian Kernel performs the best for all the 4 cities and is chosen as the KDE kernel for generating the quantity-based locational and neighborhood variables in this study. After optimizing, the KDE bandwidth
h of the 4 cities is determined as 12,657.4 m, 18,495.5 m, 11,549.4 m, 14,386.9 m for Wuhan, Beijing, Nanjing, and Xi’an, respectively.
The quantity-based locational and neighborhood variables are more comprehensive than the traditionally used locational and neighborhood variables in
Table 2 and better reflect the multiscale and comprehensive geographical characteristics of the location. To compare with the “quantity-based” variables, the distances to the 134 types of POIs can also be the locational and neighborhood variables, and they are introduced and labeled as “distance-based locational and neighborhood variables” in this paper. Compared with distance-based and quantity-based variables, the “traditional locational and neighborhood variables” refer to the locational and neighborhood variables in
Table 2, which includes 9 frequently used variables. It should be noted that the traditional, distance-based and quantity-based locational and neighborhood variables are all correlated with the distance to the POIs. The distances correlated with POIs in this study are measured as the road network distance, which can be measured through GIS network analysis. Then each kind of locational and neighborhood variable can be generated: the traditional variables and distance-based variables are generated by the nearest distance to a certain kind of POI; the quantity-based variables are generated by the KDE kernel functions and bandwidths.
The “distance-based” and “quantity-based” locational and neighborhood variables contain large, complex, similar, and multicollinear factors, which is the situation where deep-learning methods perform well. Hundreds of locational variables provide a sufficient number of vectors for learning and make the models more accurate. However, it should also be noted that the problem of multicollinearity inevitably exists among the very large number. Apparently, if these similar and multicollinear factors are employed in the HPM and GWR model, we cannot evaluate their impacts on the price through the model anymore. The main role of quantity-based variables is to improve the fitting accuracy. That is, the parameter β in the HPM and β(ui, vi) in the GWR are no longer of economic significance with the employment of distance-based and quantity-based locational and neighborhood variables, but the fitting accuracy of the model will be greatly improved. Therefore, the meaning of the parameters of the variables (the impact on the price) will not be discussed in this paper. In addition, when calculating β(ui, vi) in Equation (3) of the GWR model, the solution of the inverse matrix should be replaced by its pseudoinverse in case there is no inverse matrix.
In this study, traditional locational and neighborhood variables, distance-based locational and neighborhood variables, and quantity-based locational and neighborhood variables will be respectively employed in the HPM, GWR, FCNN, and FCNN–GWR model. When it comes to the distance-based and quantity-based locational and neighborhood variables for each model, the meaning of the parameters in the model will not be discussed as they make no sense, and only the fitting accuracy and predictive ability would be discussed.
4.4. Accuracy Assessment of the Models
The basic rental price models in this study include 4 types: HPM, GWR, FCNN and the proposed FCNN–GWR model. The locational and neighborhood variables of the house include 3 kinds: traditional variables, distance-based variables, and quantity-based variables. For these 4 basic models, all 3 kinds of locational and neighborhood variables will be respectively employed, and the fitting results will be compared. The corresponding experimental groups are labeled as traditional HPM, distance-based HPM, quantity-based HPM, traditional GWR, distance-based GWR, quantity-based GWR, traditional FCNN, distance-based FCNN, quantity-based FCNN, traditional FCNN-GWR, distance-based FCNN-GWR, and quantity-based FCNN-GWR. For the traditional HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are 20 structural variables and 9 traditional locational and neighborhood variables with no multicollinearity. For the distance-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are the structural variables and 134 distance-based locational and neighborhood variables. For the quantity-based HPM, GWR, FCNN, and FCNN–GWR experiments, the explanatory variables are the structural variables and 134 quantity-based locational and neighborhood variables.
In all of the experiments, the data are divided into a 70% training set and a 30% test set. First, each model would be fitted or trained on the training set; then, the model would be executed on the test set to access the fitting accuracy and predictive power for unknown samples. To enhance the reliability of the experiment, a shuffle and split cross-validation is carried out. The training set and test set are shuffled 4 times, and the results are averaged finally in case that they are determined by inaccurate information.
Several accuracy assessment indicators are calculated to appraise the performance of the above models, referring to existing studies [
13,
31] and commonly adopted indicators, including the Pearson’s correlation coefficient (Pearson
R), the adjusted coefficient of determination (adj
R2), the root mean square error (RMSE) and its percentage (%RMSE), and the mean absolute error (MAE) and its percentage (%MAE):
where
ri;o and
ri;s are the observed and simulated rental prices for the
ith house (unit: RMB/m
2/month), and
n is the number of rental houses in each dataset.