Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction

Zhao, Yijiang; Tang, Xiao; Liao, Zhuhua; Liu, Yizhi; Liu, Min; Lin, Jian

doi:10.3390/ijgi12090356

Open AccessArticle

Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction

by

Yijiang Zhao

^1,2,*,

Xiao Tang

^1,2,

Zhuhua Liao

^1,2

,

Yizhi Liu

^1,2,

Min Liu

^1,2 and

Jian Lin

^1,2

¹

Hunan Key Laboratory for Service Computing and Novel Software Technology, Hunan University of Science and Technology, Xiangtan 411201, China

²

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2023, 12(9), 356; https://doi.org/10.3390/ijgi12090356

Submission received: 11 May 2023 / Revised: 27 July 2023 / Accepted: 29 August 2023 / Published: 31 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

Building type prediction is a critical task for urban planning and population estimation. The growing availability of multi-source data presents rich semantic information for building type prediction. However, existing residential building prediction methods have problems with feature extraction and fusion from multi-type data and multi-level interactions between features. To overcome these limitations, we propose a deep learning approach that takes both the internal and external characteristics of buildings into consideration for residential building prediction. The internal features are the shape characteristics of buildings, and the external features include location features and semantic features. The location features include the proximity of the buildings to the nearest road and areas of interest (AOI), and the semantic features are mainly threefold: spatial co-location patterns of points of interest (POI), nighttime light, and land use information of the buildings. A deep learning model, DeepFM, with multi-type features embedded, was deployed to train and predict building types. Comparative and ablation experiments using OpenStreetMap and the nighttime light dataset were carried out. The results showed that our model had significantly higher classification performance compared with other models, and the F1 score of our model was 0.9444. It testified that the external semantic features of the building significantly enhanced the predicted performance. Moreover, our model showed good performance in the transfer learning between different regions. This research not only significantly enhances the accuracy of residential building identification but also offers valuable insights and ideas for related studies.

Keywords:

residential building prediction; POI spatial co-location patterns; DeepFM; OpenStreetMap; feature engineering

1. Introduction

Data on building types is essential for urban planning, land use planning, and resource allocation [1]. Apart from traditional field surveying and mapping, there are several methods to obtain building information, including high-resolution satellite imagery [2], OpenStreetMap (OSM), and Google Maps. However, there are some problems with incompleteness and uneven distribution of building data, which is especially obvious in some developing countries. Semantic information such as building types is especially lacking in these countries [3]. However, semantic information is crucial in applications such as population density and distribution estimation since they have a direct correlation with the residences of the residents. Therefore, how to accurately classify residential buildings holds immense practical significance [4].

Currently, research on the prediction of building types can be categorized into two approaches: image-based and vector-based. Image-based building extraction involves extracting building outlines and semantic content (such as commercial or residential buildings) from remote sensing images through computer vision and artificial intelligence technology [5]. Although convolutional neural networks are used to extract buildings from images and are good at extracting shape and texture features [6], they can hardly extract the socioeconomic attributes of buildings. To solve this problem, researchers have attempted to extract semantic features by integrating social media data with high-resolution images [4,7,8,9]. However, these methods rely on specific social media data and have semantic quality issues. Furthermore, the predictive effect of buildings is closely related to the resolution of remote sensing images, and the acquisition of high-resolution remote sensing image data limits the applicability of this method, particularly in developing countries [10]. Therefore, building type prediction from high-resolution images remains a challenging task [11,12].

Building type prediction based on vector data (such as points, lines, and polygons) has received less attention than research based on remote sensing images [4]. With the continuous development of science and technology, volunteered geographic information (VGI) has become an important source of geospatial data [13]. OSM, relying on volunteer contributions and edits, is the most representative application of VGI projects and has become one of the main sources for building type classification [14]. Vector polygons are used in OSM to describe the geometry of buildings, i.e., building footprints. However, due to the non-professional nature of the contributors, it is difficult for them to accurately describe building types with OSM tags, which leads to semantic quality issues [15]. In order to ensure and improve data quality, it is of great significance to automatically determine building types based on information not only of geographical objects themselves but also the characteristics of their surrounding features [16].

In recent years, some scholars attempted to study building types by internal features of buildings such as building shape [17,18]. However, the information that can be extracted from the intrinsic characteristics of buildings is limited, and it is difficult to achieve good prediction results. With the development of the Internet of Things, an increasing amount of data with diverse types can be acquired very conveniently. To improve the precision of building prediction, it is becoming more important and feasible to fuse multiple data sources and extract the external characteristics of buildings [19,20,21].

According to Tobler’s First Law of Geography [22], everything is related to everything else, but near things are more related to each other. In recent years, scholars explored the use of external feature data such as points of interest (POI) for building function recognition and related research [23,24]. For example, Lin et al. [25] improved the prediction accuracy of urban building functions from remote sensing images by using the importance and density of POIs. Gao et al. [26] extracted urban functional regions from POIs and human activities on location-based social networks, and the model achieved good performance. Additionally, Xu et al. [27] effectively predicted residential building types by modeling urban nighttime population distribution based on Google Earth and land use information. The above research demonstrates the positive effect of improving the performance of building type prediction when taking the surrounding features of buildings into consideration.

In current research on residential building identification, there is limited consideration given to the types of features associated with building types. For instance, Sturrock et al. [28] solely employed simple features, such as area and number of sides, to represent the shape of buildings, and they also neglected to consider semantic features. As a result, the recognition accuracy was relatively low. In addition to shape features, Atwal et al. [3] also incorporated semantic features, which were represented using label values directly associated with building types. When the dataset contains a significant proportion of residential buildings, the recognition accuracy can be achieved through simple training. However, this method has inherent limitations. Specifically, in areas where the number of residential buildings accounts for a small proportion, the accuracy of residential building identification is expected to decline. Furthermore, it is worth highlighting that the current literature predominantly relies on traditional machine learning models for handling multi-source features, without exploring the potential of deep learning models. This approach may have limitations when dealing with datasets containing diverse feature types, as the expressive capacity of traditional machine learning models is limited.

In contrast, deep learning models offer more powerful capabilities for feature learning and expression. They can automatically learn higher level and more abstract representations from data, enabling them to capture complex relationships and associations in multi-source features. It becomes possible to better exploit the potential associations and nonlinear relationships among features, by leveraging deep learning models. This can lead to improved accuracy in residential identification. In this paper, we try to predict residential buildings using the DeepFM deep learning model. Compared with other machine learning models, the DeepFM model can mine richer feature information and has higher recognition accuracy by achieving the interaction of low-order and high-order features.

From the research presented above, it is obvious that most research on building type prediction often ignores important external features related to building types, which results in insufficient model prediction ability. Therefore, accurately predicting building types based on architectural exterior features such as location and semantics is of significant importance. In light of this, we propose a deep learning prediction model for residential buildings, which integrates three types of features, namely, shape, location, and semantics. This model takes both internal and external building features into account and uses the deep learning method DeepFM to achieve interaction between low-order and high-order features. Therefore, it achieves good prediction performance.

The main contributions of this study are outlined as follows:

We model POI characteristics using POI spatial co-location patterns. Different POI spatial combination patterns were mined by constructing POI spatial co-location patterns. Indicators reflecting POI combination relationships were designed and calculated to model POI features. By combining POI spatial co-location patterns around buildings with other semantic features such as nighttime light and land use, the prediction precision of our model improved. Experiments showed that semantic features, particularly POI spatial co-location patterns, made a greater contribution to the accuracy of prediction;
We propose representing the location characteristics of the building using the distance between the building and the nearest road and areas of interest (AOI). The feature integrated the distances of related residential roads and three types of AOIs surrounding the building to represent the characteristics of residential and non-residential buildings. Experimental results validated the effectiveness of these features in enhancing the accuracy of prediction;
We propose the integration of internal and external features from three types of features, i.e., shape, location, and semantics, to characterize residential buildings from a multi-type features perspective. Shape features included 13 distinct shape characteristics of buildings. To enhance the interaction between low-order and high-order features, we introduced a deep learning method named DeepFM for building prediction;
We conduct an evaluation of the proposed deep learning model that integrates multi-type features by experiments using the OpenStreetMap and nighttime light remote sensing image dataset. Results showed a significant improvement in the prediction efficiency of residential building types compared with existing methods, confirming the effectiveness of our proposed model.

2. Study Area and Datasets

The primary research area of this study is London, U.K. (Figure 1). London is the largest city in Europe and the capital of the U.K., covering an area of about 65 km². As of May 2022, the population of London was expected to reach about 9.43 million. Furthermore, the quality of OSM data in London is high [29], making it a representative sample in Europe. For the transfer learning experiment in this study, sample data from Beijing, China, and Portland, OR, U.S.A., with distinct geographical characteristics, were selected as the test datasets.

In the experiments presented in this study, two datasets were used: the OSM dataset and the nighttime light remote sensing image dataset of NPP-VIIRS Monthly DNB Composite. OSM is one of the most successful and widely used VGI projects, with free and open-source data available under open license agreement. This study collected five different types of objects from OSM. From the experimental data, there were 381,641 buildings with types, 362,573 roads, 71,359 POIs, 42,279 AOIs, and land use. These data included both raster and vector formats. Specifically, the nighttime light data was in raster format, while other data such as buildings and road networks were in vector format. The vector data consisted of three types: road network data were represented as lines, POI data were in points, and buildings were represented as polygons. The OSM dataset used in this study was downloaded from the official OSM website (http://www.openstreetmap.org, accessed on 3 May 2022). The nighttime light data used in this study was obtained from the NPP-VIIRS dataset in 7 January 2022. The data have a spatial resolution of 500 m and were sourced from the NOAA.

3. Methodology

The framework for residential building prediction was composed of four steps, namely, data acquisition, data preprocessing and feature extraction, model training, and building type prediction, which are shown in Figure 2.

In the first step, we obtained the data from OSM and NPP-VIIRS according to our experiment regions. The subsequent step involved data preprocessing and feature extraction. During this phase of our research, we used ArcGIS 10.7 to facilitate data processing and matching between buildings and other datasets. Specifically, in semantic features, we performed essential data preprocessing tasks on the nighttime light data, which included resampling, cropping, and outlier removal. Once the nighttime light data were appropriately preprocessed, we proceeded with classifying the nighttime light according to light intensity levels. This classification allowed us to categorize the light data into different brightness levels, which were then matched with the corresponding buildings. Furthermore, we extracted POIs located within a specified distance from buildings. These POIs were then classified based on predetermined classification criteria. Then, for location features, we measured the distances from every building to the nearest road and AOI, by leveraging the proximity analysis tool. After matching buildings with their surrounding POIs, road network, and AOIs with corresponding algorithms, the spatial distribution characteristics of POI, distance to the nearest road, and AOI were extracted. We then collected 13 shape features commonly used in such analyses. After the three types of features were collected, the eigenvalues were computed using the characteristic formula. After removing outliers, the data were normalized. Finally, all features were fused to form a multi-source eigenmatrix, and the deep learning model named DeepFM was applied for training and to predict residential building types.

3.1. Shape Features and Location Features

3.1.1. Shape Features

Buildings with the same function tend to have similar architectural shapes, while those with different functions will have different shapes. For instance, apartments and industrial buildings show notable differences in shape characteristics. Many researchers have proposed to predict building types using geometric features of buildings [18,30,31]. In this study, 13 common classic shape indicators [32,33] were selected, including perimeter, area, rectangularity, mean length, maximum length, minimum length, outlier, compacity, convexity, elongation, right angle, orientation, and granularity. Detailed illustrations and equations of some shape indicators are shown in Figure 3.

3.1.2. Location Features

In this study, the distance of the buildings from the nearest road and AOI were used as location features. AOI, as a type of spatial data describing geographic entities [34], refers to areas within the urban environment that absorb public attention. Furthermore, AOI is of great significance in multiple application fields [35], and is often used in the identification of urban functional areas as it reveals areas that are highly exposed to the public [36]. Public facilities, such as parks, schools, and sports fields, are examples of AOI types. They are indispensable public facilities in people’s daily life because their public demand and application frequency are very high [37]. Therefore, this study selected these three AOI types and calculated the distance from the building to the nearest AOI site as one of the location features.

The distance to the nearest road was considered as another important location feature for building prediction, which is often used as a significant predictor of building types [28,38]. Maximum speed limits and location distributions vary widely depending on the type of road, therefore, road types may impact the building type prediction. We conducted comparative experiments on different road types from the OSM road network and eventually selected road types that have more significant impacts on building prediction.

The experiments mainly considered three road types related to residential buildings with different speed limits and distributions, including ‘footway’, ‘primary’, and ‘residential’. Comparing these three road types, as shown in Figure 4, ‘primary’ and ‘footway’ had little difference in their impact on prediction, while ‘residential’ had a greater effect on prediction, with higher accuracy, recall, and F1-score than the former. After investigation, we found that ‘residential’ was primarily distributed in residential building areas and was closest to residential building areas, followed by ‘footway’, and finally ‘primary’. Therefore, we chose the distance from ‘residential’ as one of the location features.

3.2. Semantic Features

3.2.1. Land Use

In this study, semantic features used for prediction included POI spatial co-location patterns, nighttime light, and land use. The land use information used in this study was obtained from the OSM dataset, which included the land use information for the building location. This information covered semantic features such as the natural and social attributes of the land such as ‘farmyard’, ‘forest’, ‘commercial’, and so on. Previous studies demonstrated that land use information can be used to detect residential buildings [39] and predict building types [3]. Therefore, this study selected land use as one of the semantic features. First, to ensure data quality, we removed certain land use categories with less quantity from the land use dataset. Six categories such as ‘retail’, ‘residential’, ‘industrial’, ‘forest’, ‘farmyard’, and ‘commercial’ were retained. Then, after connecting the land use information with the building coordinate using the ArcGIS tool, six land use indicators were obtained as semantic features for building type prediction.

3.2.2. Nighttime Light

Nighttime light data provide an accurate reflection of the brightness of a city at night and can also serve as a feature of human activities. Research shows that nighttime light data are correlated with population distribution [40,41,42]. Some researchers have utilized nighttime light data to distinguish between residential and non-residential areas and estimate residential vacancy rates [43]. Therefore, this study hypothesized that there may be a correlation between nighttime light data and residential building distribution, and used it as an important supplementary data source for prediction. In this study, the standard deviation classification method was employed to classify the light brightness into six levels. The corresponding light brightness values were assigned to buildings by spatial mapping.

3.2.3. POI Spatial Co-Location Patterns

POI can partially reflect surrounding geographic objects of buildings [44]. However, most studies only consider the statistical information of POIs, neglecting the spatial distribution of POIs, which leads to insufficient mining of feature information [45]. To address this issue, Chen et al. [46] proposed that the combination pattern of POIs within a certain range can more accurately reflect the distribution of surrounding buildings.

This study aims to infer building types based on POI features. Based on the above analysis, the combined pattern of POIs can better distinguish residential buildings from non-residential buildings. According to research [47], spatial co-location patterns refer to multi-class feature collections that are frequently located nearby and are used to discover subsets of features with clear spatial associations. Therefore, POI spatial co-location patterns can be used to model the combination pattern of POIs. After the calculation of the participation rate [46] of POI in the spatial co-location patterns, the spatial distribution of POIs can be obtained.

The computation methods of co-location patterns and the participation rate are shown in Definitions 1 and 2. The POI spatial distribution indicator was calculated based on the spatial co-location patterns and participation rate, as shown in Definition 3.

Definition 1.

Given three types of POIs, I, J, and K, the spatial context relationship context is defined based on the Euclidean distance, and the spatial co-location pattern F is constructed as follows:

F (I, J, K) = \sum_{i, j, k = 1}^{n} c o n t e x t (\max ((d i s t (p_{i}, p_{j}), d i s t (p_{i}, p_{k}), d i s t (p_{j}, p_{k})) \leq E))

(1)

where n is the total number of the three types of POIs,

p_{i} \in I, p_{j} \in J, a n d p_{k} \in K

.

d i s t (p_{i}, p_{j})

represents the Euclidean distance between

p_{i}

and

p_{j}

. The context refers to the spatial distribution relationship of the three types of POIs, i.e., I, J and K, that meet the Euclidean distance conditions. E is the pre-defined proximity threshold, and

F (I, J, K)

is the spatial co-location pattern.

Definition 2.

Given three types of POIs, I, J, and K, and a spatial co-location pattern

F (I, J, K)

, the participation rate

R (F {I, J, K}, I)

of type I in the spatial co-location pattern

F (I, J, K)

is defined as follows:

R (F {I, J, K}, I) = \frac{|Y (F {I, J, K}, I)|}{|Y (I)|}

(2)

where

|Y (F {I, J, K}, I)|

denotes the number of different objects of type I contained in the spatial co-location pattern

F (I, J, K)

, and

|Y (I)|

is the total number of objects of type I.

Definition 3.

The sum of the participation rates of types I, J, and K in the co-location pattern

F (I, J, K)

is denoted as

S R (I, J, K)

, which serves as the characteristic value of the POI combination type I, J and K, reflecting the variance of different POI combinations.

S R (I, J, K)

is defined as follows:

S R (I, J, K) = R (F \{I, J, K\}, I) + R (F \{I, J, K\}, J) + R (F \{I, J, K\}, K)

(3)

For example, within a certain range around a building, there are four types of POIs, A, B, C, and D, distributed as shown in Figure 5. According to Definition 1, there are four spatial co-location patterns:

{A, B, C}

,

{A, B, D}

,

{A, C, D}

, and

{B, C, D}

. However,

{A_{2}, B_{2}, C_{4}}

and

{A_{1}, C_{4}, D_{3}}

do not meet the conditions because the Euclidean distance between

C_{4}

and other POIs is greater than the proximity threshold E.

According to Definition 2, the following participation rates are obtained:

R ({A, B, C}, A)

= 2/3,

R ({A, B, C}, B)

= 3/4,

R ({A, B, C}, C)

= 3/4,

R ({A, B, D}, A)

= 2/3,

R ({A, B, D}, B)

= 3/4,

R ({A, B, D}, D)

= 1,

R ({A, C, D}, A)

= 1/3,

R ({A, C, D}, C)

= 1/2,

R ({A, C, D}, D)

= 1/4,

R (\{B, C, D\}, B)

= 1/2,

R ({B, C, D}, C)

= 1/2,

R ({B, C, D}, D)

= 1/4. According to Definition 3, it is known that

S R (A, B, C)

= 2.17,

S R (A, B, D)

= 2.42,

S R (A, C, D)

= 1.08, and

S R (B, C, D)

= 1.25. In this study, we discovered spatial co-location patterns and their participation rate according to Definitions 1 and 2, and extracted POI spatial distribution information according to Definition 3.

3.2.4. POI Spatial Feature Information Mining

This study used the analysis unit as the research unit, which refers to the area within a certain range around a building. The size threshold of the analysis unit is determined by the distance to the building boundary, instead of the building centroid distance. A proximity threshold was used to determine whether POIs within the analysis unit had a spatial relationship. They were only grouped if they met the proximity criterion. After conducting multiple experiments, the best results were achieved with a combination of a 200 m threshold for the analysis unit and a 70 m threshold for adjacent POIs.

After referring to the Chinese Standard of Land Use Classification (GB 50137-2011) and taking into account both semantic information and social function of POIs considerations, POIs were classified into six categories, namely, transportation facilities, industrial facilities, commercial services, green space and square, science and education culture, and public services. The specific classification rules are shown in Table 1. Then, the POI co-location patterns were constructed, and the participation rate was calculated to obtain the POI spatial distribution indicators based on the algorithm described above. These indicators reflected the spatial distribution characteristics of POIs around the building, and these characteristics were related to the building types and were helpful in distinguishing between different building types. Table 2 presents the POI spatial distribution indicators of some analysis units.

3.3. DeepFM Model

Some traditional machine learning models such as random forest (RF) [48], decision tree [49], and logistic regression (LR) [50] have been employed in building type prediction. However, these models have limited capability to extract complex features, particularly in learning high-order features carrying nonlinear information, thereby restricting the predictive performance of these models. This limitation usually necessitates manual feature engineering to assist in extracting effective feature combinations. In recent years, deep learning research has made significant progress in mining high-order feature combination information, which is the main research focus. The DeepFM [51] model is among the currently popular deep learning models that mine the deep information of features via feature interaction and omit the feature engineering process. Consequently, this study introduces the DeepFM model to automatically learn interactions between low-order and high-order features from multi-class features, with the aim of improving the precision of building type prediction. Figure 6 illustrates the structure of the DeepFM model.

DeepFM consists of FM [52] and DNN [53], which carry out low-order and high-order feature interactions separately. FM mainly implements second-order feature interactions by calculating the inner product of the latent vectors

V_{i}

and

V_{j}

of features

i

and

j

. Its calculation is shown as Equation (4).

y_{F M} = w_{0} + \sum_{i = 1}^{n} W_{i} X_{i} + \sum_{i = 1}^{n} \sum_{j = i + 1}^{n} < V_{i}, V_{j} > X_{i} X_{j}

(4)

In the equation, n represents the number of features of the samples;

w_{0} \in R

,

W \in R^{n}

,

V \in R^{n \times k}

,

V_{i}

and

V_{j}

represent the latent vectors of the

i

th and

j

th features; and

X_{i}

and

X_{j}

are the values of the

i

th and

j

th features of the samples, respectively.

DNN is a forward neural network with multiple hidden layers, used to learn high-level interactive information between features. The training process can be described by Equations (5)–(7):

L_{0} = [e_{1}, e_{2}, \dots, e_{m}]

(5)

L_{k + 1} = σ (W^{k} L^{k} + b^{k})

(6)

Y_{D N N} = σ (W^{H} L^{H} + b^{H})

(7)

where

L_{0}

is the input of DNN,

m

is the total number of features,

k

represents the layer of the neural network,

e_{i}

is the

i

th feature vector,

L^{k}

is the output of the

k

th layer and the input of the

k

+ 1 layer, and

W^{k}

and

b^{k}

are the weights and biases of the

k

th layer.

H

is the number of hidden layers of the forward neural network. FM and DNN share the input and are trained simultaneously, and the results are as follows:

\hat{y} = S i g m o i d (y_{F M} + y_{D N N})

(8)

The equation represents the final output of the DeepFM model, where

y_{F M}

is the output of the FM part and

y_{D N N}

is the output of the deep neural network part. The sigmoid function is used as the activation function.

4. Experiments and Results

4.1. Performance Comparison Experiment

To evaluate the effectiveness of the features and models proposed in this paper, we followed the steps outlined below. First, we replicated the existing residential prediction method described in the literature [28]. Next, we incorporated the features proposed in this paper to the model and trained it using the London dataset. This allowed us to compare the features and models of our paper with those in [28]. To facilitate the comparison with the baseline method, we defined buildings labeled as ‘detached’, ‘house’, ‘hut’, ‘residential’, and ‘teacher housing’ as residential buildings and the others as non-residential buildings based on the method proposed in [28]. Furthermore, we compared some typical classification models such as LR [50], RF [48], K-nearest neighbor (KNN) [54], factorization machine (FM) [52], deep neural network (DNN) [53], and wide and deep [55]. The experimental results are shown in Table 3.

According to the comparison results shown in the second and third rows of Table 3, it can be observed that the combination of the features proposed in this study with the model of [28] led to significant improvements in indicators such as accuracy and recall compared with the features used in [28]. This demonstrated that the proposed features had better predictive performance. Furthermore, the experimental results in the first and third rows of Table 3 indicate that, when the features are the same, the prediction performance of the model proposed in this study is significantly higher than that of the model in [28]. The difference in features is one of the main factors affecting the results. The features of [28] mainly focused on the size, shape, and proximity of adjacent structures. In contrast, the features proposed in this study included semantic features such as the POI spatial co-location patterns, nighttime light, and land use information, which considered the correlation between the environment of the building and the building type. Moreover, the POI spatial co-location patterns took into account the combination relationship between POIs around the building. Furthermore, in comparison with the features proposed in this paper, the shape features of [28] were simpler and fewer in number. In terms of location characteristics, the road type was not taken into consideration in [28]. However, the experimental results reported in this paper indicated that road type plays a crucial role in accurately identifying residential buildings. Therefore, the features proposed in this paper mined more effective information, making them more abundant, and enhancing the accuracy and predictive capability of the model. This was supported by the superiority of the features proposed in this paper over those in [28].

Another important factor affecting predictive performance was the prediction model. Although the integrated model used in [28] was better than a single machine learning model, its predictive ability was limited and the time consumption was large. As the number of features and data increased, the time overhead also increased exponentially. Among the other models compared, KNN, LR, and RF are all traditional machine learning models that have difficultly constructing feature interactions and tend to ignore the relationship between features. The FM model can build second-order feature interactions, but cannot learn higher-order feature interactions. The DNN model can learn high-order feature interactions, but cannot construct second-order feature interactions. The wide and deep model requires artificial feature crossover, which increases the complexity of the model and reduces the training efficiency. Therefore, in this study, we propose to use the DeepFM model. This model can achieve low-level and high-level feature interaction, fully integrate and utilize multi-source data features, and mine more effective information to improve the precision and efficiency of the prediction method.

Two examples of residential building prediction performance in urban and suburban London are shown in Figure 7. It can be found in Figure 7a, that in the central area of the city, some building types were not correctly predicted. The corresponding incorrectly predicted areas are shown on the map in Figure 7b. Despite there being a large number of POIs distributed around them and that the surrounding buildings are densely distributed, it can be seen that these incorrectly predicted buildings are irregular and with strange shapes. Therefore, these irregular and strange building shapes can be the main reason for the misprediction of building types in urban areas.

In addition, another prediction example relates to suburban buildings, as shown in Figure 7c. The incorrectly predicted buildings are also shown on the map as Figure 7d. It can be seen that there are a small number of buildings on the map and they are sparsely distributed, meanwhile, the number of POIs and roads is also quite small, resulting in difficultly in mining effective semantic and location feature information. Therefore, prediction in suburban areas is a more challenging task compared with urban areas.

4.2. Ablation Experiments

The method proposed in this study combined buildings’ shape, semantics, and location features. In order to demonstrate the influence of various features on building type prediction, this study conducted some ablation experiments on the London dataset. The results are presented in Table 4.

The baseline experiments we selected were only shape features; the F1-score of its prediction was 0.9132. Compared with the features used in [28], these features had a more prominent role in prediction, indicating that the shape features can better explore the shape characteristics of residential buildings. The next two experiments added two location features to our experiments, including the distance to the nearest road and AOI. The experimental results showed that these features improved the prediction effect of the model to some extent. According to the discussion in the previous chapter, the road type selected by experimental analysis has a greater impact on prediction, and the distances to the nearest road and AOI that reflect the location distribution of buildings are also helpful for building prediction. Our experimental results also confirmed this analysis. In the last three experiments, three semantic features were added to our experiments. The POI spatial co-location patterns were the first feature added to our experiments, and the experimental results showed that this feature could significantly improve the precision of prediction, which confirmed our expectations. The result indicated that the POI spatial co-location patterns could effectively mine more semantic information around a building. The second feature was land use. The experimental results indicated that, compared with the POI co-location patterns, land use information had less impact on the improvement of model performance. The third one was the nighttime light feature, which confirmed our previous hypothesis that there was indeed a correlation between nighttime light and residential building distribution. Therefore, we could infer that among the semantic features, POI co-location patterns had a greater impact on the performance of the model from the last three experimental results.

Based on the aforementioned experimental results, the following conclusions could be drawn: Firstly, each feature made a certain contribution to the prediction of residential buildings, however, the optimal performance was achieved when all the features were combined. Secondly, the performance of the model improved as more features were taken into consideration. Lastly, the proposed method in this study showed significantly better performance than other methods. The precision and F1-score reached 92.34% and 94.44%, respectively, when all features were integrated.

4.3. Transfer Learning Experiments

To assess the transfer learning capability of the model, this study used the London dataset for model training. Two different test datasets from Beijing and Portland, with significant differences in geography and culture, were used for testing. We employed a pre-trained migration method, which offered the advantages of strong feature extraction ability and good generalization. Moreover, in order to mitigate potential overfitting issues, we fine-tuned the test sets from different regions by incorporating weight decay into the optimizer to increase regularization. This approach reduced the risk of overfitting by penalizing the weights of the model during optimization, inclining them to be smaller. The experimental results are shown in Table 5.

Due to regional and cultural differences, the prediction results of transfer learning can be influenced by features such as building shapes and POI distribution. In the test datasets, it was observed that the precision and F1-score of the model declined, and the magnitude of the decline varied from city to city. Notably, the prediction effect of the model in Beijing was better, followed by Portland. This experiment highlights the promising transfer learning ability of the model across different regions.

In these three regions, there were noticeable differences in the distribution of POIs. The distribution of POIs in Beijing is relatively dense, with many ancient architectural attractions such as the Forbidden City and the Temple of Heaven. These attractions are interlaced with modern places such as government agencies, universities, and business districts. In Portland, OR, U.S.A., however, POIs are sparsely distributed, with many natural landscapes such as parks. As an international metropolis, London has numerous POIs such as commercial centers, museums, theaters, restaurants, and hotels, and the number and variety of these POIs far exceed those in other cities.

In addition to differences in POIs, building shapes also play an important role. Beijing and London have both traditional styles and modern architectural designs. It is noticeable that there are significant differences between Chinese and Western architectural styles. Chinese architecture usually features symmetrical structures such as courtyard houses, while many building shapes in European cities such as London are irregular. To explore the differences in building shapes, this paper normalized the shape features of the three regions and calculated the average value of each shape indicator. Figure 8 illustrates the average value of some shape indicators. It can be observed that there were significant differences in the building shape indicator values among these cities, indicating that the buildings in these cities have unique shape characteristics.

Moreover, building density is another crucial factor in urban environments. Generally, developed cities have higher building densities than less developed ones. Therefore, cities with different building densities also exhibit differences in the values of location features such as distance to the nearest road and AOI. Based on the analysis above, we can infer that the transfer learning capability of the model in different cities is closely related to these features, which is influenced by multiple factors including the region, culture, and degree of development of the city. As a result, the performance of the pre-training model will also vary in cities with different regions, cultures, and degrees of development. Therefore, to further enhance the transfer learning ability of this model, it is necessary to comprehensively consider the influence of building shape, POI distribution, and other factors.

5. Discussion

In order to enhance the accuracy of residential building recognition, this study introduces a deep learning approach that takes both the internal and external features of buildings into account. The internal features primarily focus on the shape characteristics of buildings, which play a crucial role in building recognition research. However, existing studies have mostly utilized simple shape metrics such as area, perimeter, and number of sides, without considering some more complex shape features. In contrast, our study incorporated 13 different shape features, allowing for a more comprehensive representation of the contour information of the buildings. The experimental results demonstrated the effectiveness of these shape features in accurately characterizing the shape of the buildings.

Regarding the external features, we considered both the location and semantic aspects of the building. In terms of location features, we considered the distance of the buildings to the nearest road and AOI separately. Unlike previous research, we took into account the type of road and conducted experimental analysis to select some road types that have a significant impact on recognition performance. Additionally, we analyzed the distance to the AOI and investigated the influence of different AOI types, which has not been explored in previous studies.

With the rapid growth of big data, researchers can obtain rich semantic data information. Specifically, we incorporated nighttime light data as an initial exploration, which provided an intensity value representing the nighttime illumination of the building. The experimental results confirmed the effectiveness of this approach, although they were limited by the resolution of the nighttime light data. In future research, we plan to explore higher-resolution nighttime light data to further improve the accuracy of our approach. Moreover, we incorporated land use information to augment the semantic understanding of the buildings. In addition, we introduced a novel POI co-location pattern to explore the spatial distribution of POIs surrounding buildings. In contrast with previous studies that primarily focused on the frequency features of POIs, our spatial colocation model mined the spatial distribution information of POIs. This innovative approach significantly enhances the accuracy and effectiveness of building recognition. Notably, our research embraces flexibility, allowing for experimentation with various POI processing methods based on different research scenarios. For instance, we can explore the fusion of POI frequency features with spatial distribution features to obtain a more comprehensive and enriched set of semantic information.

In previous studies, traditional machine learning models such as random forest and decision tree have commonly been employed for building recognition training. However, their effectiveness becomes constrained as the number of feature types grows, leading to limited exploration and utilization of the expanding feature space. In our comprehensive experimental approach, we sought to compare and evaluate the performance of various machine learning models, including both traditional and deep learning models. The results of our experiments revealed that deep learning models showed notable advantages against traditional machine learning models in terms of recognition accuracy and overall performance. We selected DeepFM as our experimental model, which was originally developed and successfully applied in the realm of recommender systems. To our knowledge, this is the first application of DeepFM to the domain of building recognition. The rationale behind this selection lies in the inherent capacity of the model to effectively handle complex and diverse feature spaces. Moreover, we endeavored to enhance the DeepFM model by incorporating an attention mechanism, with the anticipation of further elevating its performance. Despite our optimistic expectations, the experimental outcomes did not manifest a significant improvement in recognition accuracy. In future research, we will continue to explore new improvement methods and seek better model structures and parameter combinations to further improve the accuracy and performance of residential recognition.

In our research, we leveraged a diverse range of data sources with a significant portion originating from OSM. As we all know, most of the OSM data have been contributed by volunteers. There truly exist some quality issues with these data, such as inaccuracy and incompleteness. According to some research [29], our experimental regions showed good data quality. Moreover, we integrated multi-source data to mitigate their uncertainties. During the calculation of shape features, we identified and normalized peculiar calculation results resulting from data quality issues. According to our analysis, there are mainly two ways to mitigate the influence of poor quality of OSM data. Firstly, there are many methods that enable us to thoroughly evaluate and verify the quality of data by comparing OSM data with authoritative data [56]. However, authoritative data are always difficult for us to obtain. Other evaluation methods from an internal perspective may be a better choice, such as statistical approaches [14] and a trustworthiness-based approach [16]. Obtaining data from multiple data sources to reduce the single dependence on OSM data is another way to settle this problem. Machine learning methods can be used to fuse and extract information from these data.

In this study, we employed a deep learning model integrated with multi-class features to achieve residential building recognition. This approach not only significantly enhances the accuracy of residential building identification but also offers a new perspective for future research. However, there are some limitations that need to be improved in the future. For instance, this research mainly focused on utilizing OSM data. Remote sensing image data are widely used by land use information extraction because of their rich information and wide coverage. Therefore, image data could be fused into our model to predict residential buildings. Additionally, addressing OSM data quality issues could be achieved through the development of an automatic or semi-automatic evaluation method by OSM data trustworthiness. Lastly, the model could be improved by different feature encoding methods for the features and other deep learning methods. These efforts will contribute to the continuous development and advancement of building type identification.

6. Conclusions

Building type prediction is a crucial task for urban planning and population estimation, particularly in terms of residential types. However, current methods for predicting residential building types often suffer from low precision mainly due to a lack of effective integration of external features. To overcome these limitations, we proposed a deep learning approach that takes both the internal and external characteristics of buildings into consideration to predict residential buildings. The internal features are shape features of buildings that include 13 distinct shape characteristics. The external features include location features and semantic features. The location features mean the proximity of the buildings to the nearest road and AOI, and the semantic features are mainly threefold: spatial co-location patterns POI, nighttime light, and land use information of the buildings. Compared with existing methods, experiments showed our proposed method achieved better performance with higher precision and F1-score.

We also designed feature ablation experiments to understand the different effects of various features on prediction. Additionally, we conducted transfer learning experiments to evaluate the performance of pre-trained models in cities with different regions and cultures. Finally, we analyzed the difference between urban and suburban prediction effects. Our future work aims to improve the features affected by regional and cultural differences to enhance the precision and recall of prediction.

Author Contributions

Conceptualization, Yijiang Zhao and Xiao Tang; methodology, Yijiang Zhao; software, Xiao Tang; validation, Zhuhua Liao; formal analysis, Yijiang Zhao; investigation, Yizhi Liu; resources, Xiao Tang; data curation, Yijiang Zhao; writing—original draft preparation, Xiao Tang; writing—review and editing, Yijiang Zhao; visualization, Yizhi Liu; supervision, Min Liu; project administration, Jian Lin; funding acquisition, Yijiang Zhao. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (41871320), the Key Scientific Research Foundation of Hunan Provincial Education Department of China (22A0341), and the Hunan Provincial Natural Science Foundation of China (2021JJ30276).

Data Availability Statement

The data are available on request.

Conflicts of Interest

The authors declare no conflict of interest. The authors declare that they have no known competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.

References

Lloyd, C.T.; Sturrock, H.J.W.; Leasure, D.R.; Jochem, W.C.; Lázár, A.N.; Tatem, A.J. Using GIS and Machine Learning to Classify Residential Status of Urban Buildings in Low and Middle Income Settings. Remote Sens. 2020, 12, 3847. [Google Scholar] [CrossRef]
Jin, X.Y.; Davis, C.H. Automated building extraction from high-resolution satellite imagery in urban areas using structural, contextual, and spectral information. EURASIP J. Appl. Signal Process. 2005, 14, 745309. [Google Scholar] [CrossRef]
Atwal, K.S.; Anderson, T.; Pfoser, D.; Züfle, A. Predicting building types using OpenStreetMap. Sci. Rep. 2022, 12, 19976. [Google Scholar] [CrossRef]
Kang, J.; Körner, M.; Wang, Y.; Taubenböck, H.; Zhu, X.X. Building instance classification using street view images. ISPRS J. Photogramm. Remote Sens. 2018, 145, 44–59. [Google Scholar] [CrossRef]
Hu, Q.; Zhen, L.; Mao, Y.; Zhou, X.; Zhou, G. Automated building extraction using satellite remote sensing imagery. Autom. Constr. 2021, 123, 103509. [Google Scholar] [CrossRef]
Shao, Z.; Tang, P.; Wang, Z.; Saleem, N.; Yam, S.; Sommai, C. BRRNet: A Fully Convolutional Neural Network for Automatic Building Extraction From High-Resolution Remote Sensing Images. Remote Sens. 2020, 12, 1050. [Google Scholar] [CrossRef]
Hu, T.; Yang, J.; Li, X.; Gong, P. Mapping Urban Land Use by Using Landsat Images and Open Social Data. Remote Sens. 2016, 8, 151. [Google Scholar] [CrossRef]
Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying urban land use by integrating remote sensing and social media data. Int. J. Geogr. Inf. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
Zhao, W.; Bo, Y.; Chen, J.; Tiede, D.; Blaschke, T.; Emery, W.J. Exploring semantic elements for urban scene recognition: Deep integration of high-resolution imagery and OpenStreetMap (OSM). ISPRS J. Photogramm. Remote Sens. 2019, 151, 237–250. [Google Scholar] [CrossRef]
Hu, X.; Noskov, A.; Fan, H.; Novack, T.; Li, H.; Gu, F.; Zipf, A. Tagging the main entrances of public buildings based on OpenStreetMap and binary imbalanced learning. Int. J. Geogr. Inf. Sci. 2021, 35, 1773–1801. [Google Scholar] [CrossRef]
Bittner, K.; Adam, F.; Cui, S.; Körner, M.; Reinartz, P. Building footprint extraction from VHR remote sensing images combined with normalized DSMs using fused fully convolutional networks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 2615–2629. [Google Scholar] [CrossRef]
Xie, Y.; Zhu, J.; Cao, Y.; Feng, D.; Hu, M.; Li, W.; Fu, L. Refined extraction of building outlines from high-resolution remote sensing imagery based on a multifeature convolutional neural network and morphological filtering. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1842–1855. [Google Scholar] [CrossRef]
Zhao, Y.; Yang, W.; Liu, Y.; Liao, Z. Discovering transition patterns among OpenStreetMap feature classes based on the Louvain method. Trans. GIS 2022, 26, 236–258. [Google Scholar] [CrossRef]
Fan, H.; Zipf, A.; Fu, Q.; Neis, P. Quality assessment for building footprints data on OpenStreetMap. Int. J. Geogr. Inf. Sci. 2014, 28, 700–719. [Google Scholar] [CrossRef]
Zhao, Y.; Zhou, X.; Li, G.; Xing, H. A spatio-temporal VGI model considering trust-related information. ISPRS Int. J. Geo-Inf. 2016, 5, 10. [Google Scholar] [CrossRef]
Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Appl. Sci. 2022, 12, 11363. [Google Scholar] [CrossRef]
Burghardt, D.; Steiniger, S. Usage of principal component analysis in the process of automated generalisation. In Proceedings of the 22nd International Cartographic Conference, Coruña, Spain, 9–16 July 2005; pp. 9–16. [Google Scholar]
Wu, H.; Luo, W.; Lin, A.; Hao, F.; Olteanu-Raimond, A.; Liu, L.; Li, Y. SALT: A multifeature ensemble learning framework for mapping urban functional zones from VGI data and VHR images. Comput. Environ. Urban Syst. 2023, 100, 101921. [Google Scholar] [CrossRef]
Cvetek, D.; Muštra, M.; Jelušić, N.; Tišljarić, L. A survey of methods and technologies for congestion estimation based on multisource data fusion. Appl. Sci. 2021, 11, 2306. [Google Scholar] [CrossRef]
Gong, M.; Zhang, P.; Su, L.; Liu, J. Coupled dictionary learning for change detection from multisource data. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7077–7091. [Google Scholar] [CrossRef]
Yu, L.; Wang, J.; Li, X.; Li, C.; Zhao, Y.; Gong, P. A multi-resolution global land cover dataset through multisource data aggregation. Sci. China Earth Sci. 2014, 57, 2317–2329. [Google Scholar] [CrossRef]
Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46 (Suppl. S1), 234–240. [Google Scholar] [CrossRef]
Zhao, B.; He, X.; Liu, B.; Tang, J.; Deng, M.; Liu, H. Detecting Urban Commercial Districts by Fusing Points of Interest and Population Heat Data with Region-Growing Algorithms. ISPRS Int. J. Geo-Inf. 2023, 12, 96. [Google Scholar] [CrossRef]
Qin, Q.; Xu, S.; Du, M.; Li, S. Identifying urban functional zones by capturing multi-spatial distribution patterns of points of interest. Int. J. Digit. Earth 2022, 15, 2468–2494. [Google Scholar] [CrossRef]
Lin, A.; Sun, X.; Wu, H.; Luo, W.; Wang, D.; Zhong, D.; Wang, Z.; Zhao, L.; Zhu, J. Identifying Urban Building Function by Integrating Remote Sensing Imagery and POI Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 8864–8875. [Google Scholar] [CrossRef]
Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
Xu, M.; Cao, C.; Jia, P. Mapping Fine-Scale Urban Spatial Population Distribution Based on High-Resolution Stereo Pair Images, Points of Interest, and Land Cover Data. Remote Sens. 2020, 12, 608. [Google Scholar] [CrossRef]
Sturrock, H.; Woolheater, K.; Bennett, A.F.; Andrade-Pacheco, R.; Midekisa, A. Predicting residential structures from open source remotely enumerated data using machine learning. PLoS ONE 2018, 13, e0204399. [Google Scholar] [CrossRef] [PubMed]
Mordechai, H. How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar]
Bandam, A.; Busari, E.; Syranidou, C.; Linssen, J.; Stolten, D. Classification of Building Types in Germany: A Data-Driven Modeling Approach. Data 2022, 7, 45. [Google Scholar] [CrossRef]
Wurm, M.; Schmitt, A.; Taubenböck, H. Taubenböck. Building types classification using shape-based features and linear discriminant functions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 9, 1901–1912. [Google Scholar] [CrossRef]
Biljecki, F.; Chow, Y.S. Global building morphology indicators. Comput. Environ. Urban Syst. 2022, 95, 101809. [Google Scholar] [CrossRef]
Maidaneh Abdi, I.; Le Guilcher, A.; Olteanu-Raimond, A. A regression model of spatial accuracy prediction for openstreetmap buildings. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 5, 39–47. [Google Scholar] [CrossRef]
Li, X.; Hu, T.; Gong, P.; Du, S.; Chen, B.; Li, X.; Dai, Q. Mapping Essential Urban Land Use Categories in Beijing with a Fast Area of Interest (AOI)-Based Method. Remote Sens. 2021, 13, 477. [Google Scholar] [CrossRef]
Hu, Y.; Gao, S.; Janowicz, K.; Yu, B.; Li, W.; Prasad, S. Extracting and understanding urban areas of interest using geotagged photos. Comput. Environ. Urban Syst. 2015, 54, 240–254. [Google Scholar] [CrossRef]
Liu, Y.; Singleton, A.; Arribas-Bel, D.; Chen, M. Identifying and understanding road-constrained areas of interest (AOIs) through spatiotemporal taxi GPS data: A case study in New York City. Comput. Environ. Urban Syst. 2021, 86, 101592. [Google Scholar] [CrossRef]
Chen, M.; Arribas-Bel, D.; Singleton, A. Understanding the dynamics of urban areas of interest through volunteered geographic information. J. Geogr. Syst. 2019, 21, 89–109. [Google Scholar] [CrossRef]
Forget, Y.; Linard, C.; Gilbert, M. Supervised classification of built-up areas in sub-Saharan African cities using Landsat imagery and OpenStreetMap. Remote Sens. 2018, 10, 1145. [Google Scholar] [CrossRef]
Meng, X.L.; Currit, N.; Wang, L.; Yang, X.J. Detect residential buildings from lidar and aerial photographs through object-oriented land-use classification. Photogramm. Eng. Remote Sens. 2012, 78, 35–44. [Google Scholar] [CrossRef]
Lu, D.; Wang, Y.; Yang, Q.; Su, K.; Zhang, H.; Li, Y. Modeling spatiotemporal population changes by integrating DMSP-OLS and NPP-VIIRS nighttime light data in Chongqing, China. Remote Sens. 2021, 13, 284. [Google Scholar] [CrossRef]
Ma, T.; Zhou, C.; Pei, T.; Haynie, S.; Fan, J. Responses of Suomi-NPP VIIRS-derived nighttime lights to socioeconomic activity in China’s cities. Remote Sens. Lett. 2014, 5, 165–174. [Google Scholar] [CrossRef]
Wu, B.; Yang, C.; Wu, Q.; Wang, C.; Wu, J.; Yu, B. A building volume adjusted nighttime light index for characterizing the relationship between urban population and nighttime light intensity. Comput. Environ. Urban Syst. 2023, 99, 101911. [Google Scholar] [CrossRef]
Wang, L.; Fan, H.; Wang, Y. An estimation of housing vacancy rate using NPP-VIIRS night-time light data and OpenStreetMap data. Int. J. Remote Sens. 2019, 40, 8566–8588. [Google Scholar] [CrossRef]
Niu, H.; Silva, E.A. Delineating urban functional use from points of interest data with neural network embedding: A case study in Greater London. Comput. Environ. Urban Syst. 2021, 88, 101651. [Google Scholar] [CrossRef]
Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar]
Chen, Z.L.; Zhou, L.L.; Yu, W.H.; Wu, L.; Xie, Z. Identification of the urban functional regions considering the potential context of interest points. Acta Geod. Cartogr. Sin. 2020, 49, 907–920. [Google Scholar]
Shekhar, S.; Huang, Y. Discovering spatial co-location patterns: A summary of results. In Proceedings of the International Symposium on Spatial and Temporal Databases, Virtual Event, 23–25 August 2001; pp. 236–256. [Google Scholar]
Hengl, T.; Nussbaum, M.; Wright, M.N.; Heuvelink, G.B.; Gräler, B. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 2018, 6, e5518. [Google Scholar] [CrossRef] [PubMed]
Kotsiantis, S.B. Decision trees: A recent overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
Isaac, J.; Harikumar, S. Logistic regression within DBMS. In Proceedings of the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Greater Noida, India, 4–17 December 2016; pp. 661–666. [Google Scholar]
Guo, H.; Tang, R.M.; Ye, Y.M. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
Rendle, S. Factorization-Machines. In Proceedings of the IEEE International Conference on Data Mining, Sydney, NSW, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
Zhang, J.; Zheng, Y.; Qi, D.; Li, R.; Yi, X. DNN-based prediction model for spatio-temporal data. In Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Francisco Bay Area, CA, USA, 31 October–36 November 2016; pp. 1–4. [Google Scholar]
Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient kNN classification with different numbers of nearest neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1774–1785. [Google Scholar] [CrossRef]
Cheng, H.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Ispir, M. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
Dorn, H.; Törnros, T.; Zipf, A. Quality evaluation of VGI using authoritative data—A comparison with land use data in Southern Germany. ISPRS Int. J. Geo-Inf. 2015, 4, 1657–1671. [Google Scholar] [CrossRef]

Figure 1. The case study area: London, U.K.

Figure 2. Workflow of residential building prediction.

Figure 3. Graph illustration of some shape features: (a) Area; (b) Perimeter; (c) Compacity; (d) Rectangularity; (e) Convexity; (f) Orientation; (g) Elongation; (h) Mean-length.

Figure 4. Comparison of road types.

Figure 5. POI spatial co-location patterns of an analysis unit.

Figure 6. Model structure of DeepFM [51].

Figure 7. Residential building prediction cases in urban and suburban areas: (a) Urban case of prediction; (b) Urban cases in map; (c) Suburban cases of prediction; (d) Suburban cases in map.

Figure 8. The average value of partial shape indicators.

Table 1. Classification of POIs.

Classification	Secondary Classification	Three-Level Classification
Transportation facilities	Transportation service facilities, car service.	Railway station, airport, car rental center.
Industrial facility	Industrial land, industrial and mining workshops, warehouses.	Land for companies, industrial parks, and warehouses.
Business services	Catering, accommodation, shopping, life services.	Restaurant, supermarket, cinema, hotel, bank.
Green space and square	Parks, attractions.	Memorial halls, tourist attractions, parks, squares.
Science and education culture	Schools, places of science and education.	Institutions of higher learning, primary and secondary schools, libraries.
Public service	Government agencies, healthcare services, public facilities.	Government agencies, social organizations, hospitals.

Table 2. POI spatial distribution indicators of some analysis units.

Analysis Units	SR (Transportation Facilities, Industrial Facilities, Business Services)	SR (Business Services, Green Space and Square, Science and Education Culture)	SR (Business Services, Science and Education Culture, Public Service)	…
301	2.0	0.875	1.450	…
302	1.372	0	1.033	…
303	0.625	1.375	0	…
…	…	…	…	…

Table 3. DeepFM compared with other methods.

Model and Features	Evaluation Metrics
Model and Features	Accuracy	Precision	Recall	F1-Score
Features and model of this paper	0.8994	0.9234	0.9663	0.9444
Features and model of [28]	0.8395	0.8631	0.9399	0.8999
This paper features + model of [28]	0.8606	0.8824	0.9578	0.9186
LR	0.8025	0.8208	0.9143	0.8651
RF	0.8354	0.8914	0.9064	0.8988
KNN	0.8260	0.8848	0.8662	0.8754
FM	0.8595	0.8810	0.9584	0.9181
DNN	0.8481	0.8638	0.9614	0.9100
Wide and deep	0.8507	0.8758	0.9474	0.9102

Table 4. Ablation experimental results.

Type Combination	Feature Combination	Accuracy	Precision	Recall	F1-Score
Shape.	Shape.	0.8473	0.8568	0.9774	0.9132
Shape + location.	Shape + distance from the nearest road.	0.8570	0.8607	0.9892	0.9205
Shape + location.	Shape + distance from the nearest road + distance from the nearest AOI.	0.8674	0.8775	0.9781	0.9251
Shape + location + semantic.	Shape + location + POI spatial co-location patterns.	0.8924	0.9047	0.9740	0.9381
	Shape + location + POI spatial co-location patterns + land use.	0.8949	0.9121	0.9711	0.9407
	Shape + location + POI spatial co-location patterns + land use + nighttime light.	0.8994	0.9234	0.9663	0.9444

Table 5. Transfer learning experimental results.

Training Dataset	Test Dataset	Accuracy	Precision	Recall	F1-Score
London	Beijing	0.8732	0.8823	0.9568	0.9180
London	Portland	0.8249	0.8361	0.9505	0.8896

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Y.; Tang, X.; Liao, Z.; Liu, Y.; Liu, M.; Lin, J. Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction. ISPRS Int. J. Geo-Inf. 2023, 12, 356. https://doi.org/10.3390/ijgi12090356

AMA Style

Zhao Y, Tang X, Liao Z, Liu Y, Liu M, Lin J. Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction. ISPRS International Journal of Geo-Information. 2023; 12(9):356. https://doi.org/10.3390/ijgi12090356

Chicago/Turabian Style

Zhao, Yijiang, Xiao Tang, Zhuhua Liao, Yizhi Liu, Min Liu, and Jian Lin. 2023. "Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction" ISPRS International Journal of Geo-Information 12, no. 9: 356. https://doi.org/10.3390/ijgi12090356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Type Features Embedded Deep Learning Framework for Residential Building Prediction

Abstract

1. Introduction

2. Study Area and Datasets

3. Methodology

3.1. Shape Features and Location Features

3.1.1. Shape Features

3.1.2. Location Features

3.2. Semantic Features

3.2.1. Land Use

3.2.2. Nighttime Light

3.2.3. POI Spatial Co-Location Patterns

3.2.4. POI Spatial Feature Information Mining

3.3. DeepFM Model

4. Experiments and Results

4.1. Performance Comparison Experiment

4.2. Ablation Experiments

4.3. Transfer Learning Experiments

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI