1. Introduction
Data on building types is essential for urban planning, land use planning, and resource allocation [
1]. Apart from traditional field surveying and mapping, there are several methods to obtain building information, including high-resolution satellite imagery [
2], OpenStreetMap (OSM), and Google Maps. However, there are some problems with incompleteness and uneven distribution of building data, which is especially obvious in some developing countries. Semantic information such as building types is especially lacking in these countries [
3]. However, semantic information is crucial in applications such as population density and distribution estimation since they have a direct correlation with the residences of the residents. Therefore, how to accurately classify residential buildings holds immense practical significance [
4].
Currently, research on the prediction of building types can be categorized into two approaches: image-based and vector-based. Image-based building extraction involves extracting building outlines and semantic content (such as commercial or residential buildings) from remote sensing images through computer vision and artificial intelligence technology [
5]. Although convolutional neural networks are used to extract buildings from images and are good at extracting shape and texture features [
6], they can hardly extract the socioeconomic attributes of buildings. To solve this problem, researchers have attempted to extract semantic features by integrating social media data with high-resolution images [
4,
7,
8,
9]. However, these methods rely on specific social media data and have semantic quality issues. Furthermore, the predictive effect of buildings is closely related to the resolution of remote sensing images, and the acquisition of high-resolution remote sensing image data limits the applicability of this method, particularly in developing countries [
10]. Therefore, building type prediction from high-resolution images remains a challenging task [
11,
12].
Building type prediction based on vector data (such as points, lines, and polygons) has received less attention than research based on remote sensing images [
4]. With the continuous development of science and technology, volunteered geographic information (VGI) has become an important source of geospatial data [
13]. OSM, relying on volunteer contributions and edits, is the most representative application of VGI projects and has become one of the main sources for building type classification [
14]. Vector polygons are used in OSM to describe the geometry of buildings, i.e., building footprints. However, due to the non-professional nature of the contributors, it is difficult for them to accurately describe building types with OSM tags, which leads to semantic quality issues [
15]. In order to ensure and improve data quality, it is of great significance to automatically determine building types based on information not only of geographical objects themselves but also the characteristics of their surrounding features [
16].
In recent years, some scholars attempted to study building types by internal features of buildings such as building shape [
17,
18]. However, the information that can be extracted from the intrinsic characteristics of buildings is limited, and it is difficult to achieve good prediction results. With the development of the Internet of Things, an increasing amount of data with diverse types can be acquired very conveniently. To improve the precision of building prediction, it is becoming more important and feasible to fuse multiple data sources and extract the external characteristics of buildings [
19,
20,
21].
According to Tobler’s First Law of Geography [
22], everything is related to everything else, but near things are more related to each other. In recent years, scholars explored the use of external feature data such as points of interest (POI) for building function recognition and related research [
23,
24]. For example, Lin et al. [
25] improved the prediction accuracy of urban building functions from remote sensing images by using the importance and density of POIs. Gao et al. [
26] extracted urban functional regions from POIs and human activities on location-based social networks, and the model achieved good performance. Additionally, Xu et al. [
27] effectively predicted residential building types by modeling urban nighttime population distribution based on Google Earth and land use information. The above research demonstrates the positive effect of improving the performance of building type prediction when taking the surrounding features of buildings into consideration.
In current research on residential building identification, there is limited consideration given to the types of features associated with building types. For instance, Sturrock et al. [
28] solely employed simple features, such as area and number of sides, to represent the shape of buildings, and they also neglected to consider semantic features. As a result, the recognition accuracy was relatively low. In addition to shape features, Atwal et al. [
3] also incorporated semantic features, which were represented using label values directly associated with building types. When the dataset contains a significant proportion of residential buildings, the recognition accuracy can be achieved through simple training. However, this method has inherent limitations. Specifically, in areas where the number of residential buildings accounts for a small proportion, the accuracy of residential building identification is expected to decline. Furthermore, it is worth highlighting that the current literature predominantly relies on traditional machine learning models for handling multi-source features, without exploring the potential of deep learning models. This approach may have limitations when dealing with datasets containing diverse feature types, as the expressive capacity of traditional machine learning models is limited.
In contrast, deep learning models offer more powerful capabilities for feature learning and expression. They can automatically learn higher level and more abstract representations from data, enabling them to capture complex relationships and associations in multi-source features. It becomes possible to better exploit the potential associations and nonlinear relationships among features, by leveraging deep learning models. This can lead to improved accuracy in residential identification. In this paper, we try to predict residential buildings using the DeepFM deep learning model. Compared with other machine learning models, the DeepFM model can mine richer feature information and has higher recognition accuracy by achieving the interaction of low-order and high-order features.
From the research presented above, it is obvious that most research on building type prediction often ignores important external features related to building types, which results in insufficient model prediction ability. Therefore, accurately predicting building types based on architectural exterior features such as location and semantics is of significant importance. In light of this, we propose a deep learning prediction model for residential buildings, which integrates three types of features, namely, shape, location, and semantics. This model takes both internal and external building features into account and uses the deep learning method DeepFM to achieve interaction between low-order and high-order features. Therefore, it achieves good prediction performance.
The main contributions of this study are outlined as follows:
We model POI characteristics using POI spatial co-location patterns. Different POI spatial combination patterns were mined by constructing POI spatial co-location patterns. Indicators reflecting POI combination relationships were designed and calculated to model POI features. By combining POI spatial co-location patterns around buildings with other semantic features such as nighttime light and land use, the prediction precision of our model improved. Experiments showed that semantic features, particularly POI spatial co-location patterns, made a greater contribution to the accuracy of prediction;
We propose representing the location characteristics of the building using the distance between the building and the nearest road and areas of interest (AOI). The feature integrated the distances of related residential roads and three types of AOIs surrounding the building to represent the characteristics of residential and non-residential buildings. Experimental results validated the effectiveness of these features in enhancing the accuracy of prediction;
We propose the integration of internal and external features from three types of features, i.e., shape, location, and semantics, to characterize residential buildings from a multi-type features perspective. Shape features included 13 distinct shape characteristics of buildings. To enhance the interaction between low-order and high-order features, we introduced a deep learning method named DeepFM for building prediction;
We conduct an evaluation of the proposed deep learning model that integrates multi-type features by experiments using the OpenStreetMap and nighttime light remote sensing image dataset. Results showed a significant improvement in the prediction efficiency of residential building types compared with existing methods, confirming the effectiveness of our proposed model.
2. Study Area and Datasets
The primary research area of this study is London, U.K. (
Figure 1). London is the largest city in Europe and the capital of the U.K., covering an area of about 65 km
2. As of May 2022, the population of London was expected to reach about 9.43 million. Furthermore, the quality of OSM data in London is high [
29], making it a representative sample in Europe. For the transfer learning experiment in this study, sample data from Beijing, China, and Portland, OR, U.S.A., with distinct geographical characteristics, were selected as the test datasets.
In the experiments presented in this study, two datasets were used: the OSM dataset and the nighttime light remote sensing image dataset of NPP-VIIRS Monthly DNB Composite. OSM is one of the most successful and widely used VGI projects, with free and open-source data available under open license agreement. This study collected five different types of objects from OSM. From the experimental data, there were 381,641 buildings with types, 362,573 roads, 71,359 POIs, 42,279 AOIs, and land use. These data included both raster and vector formats. Specifically, the nighttime light data was in raster format, while other data such as buildings and road networks were in vector format. The vector data consisted of three types: road network data were represented as lines, POI data were in points, and buildings were represented as polygons. The OSM dataset used in this study was downloaded from the official OSM website (
http://www.openstreetmap.org, accessed on 3 May 2022). The nighttime light data used in this study was obtained from the NPP-VIIRS dataset in 7 January 2022. The data have a spatial resolution of 500 m and were sourced from the NOAA.
3. Methodology
The framework for residential building prediction was composed of four steps, namely, data acquisition, data preprocessing and feature extraction, model training, and building type prediction, which are shown in
Figure 2.
In the first step, we obtained the data from OSM and NPP-VIIRS according to our experiment regions. The subsequent step involved data preprocessing and feature extraction. During this phase of our research, we used ArcGIS 10.7 to facilitate data processing and matching between buildings and other datasets. Specifically, in semantic features, we performed essential data preprocessing tasks on the nighttime light data, which included resampling, cropping, and outlier removal. Once the nighttime light data were appropriately preprocessed, we proceeded with classifying the nighttime light according to light intensity levels. This classification allowed us to categorize the light data into different brightness levels, which were then matched with the corresponding buildings. Furthermore, we extracted POIs located within a specified distance from buildings. These POIs were then classified based on predetermined classification criteria. Then, for location features, we measured the distances from every building to the nearest road and AOI, by leveraging the proximity analysis tool. After matching buildings with their surrounding POIs, road network, and AOIs with corresponding algorithms, the spatial distribution characteristics of POI, distance to the nearest road, and AOI were extracted. We then collected 13 shape features commonly used in such analyses. After the three types of features were collected, the eigenvalues were computed using the characteristic formula. After removing outliers, the data were normalized. Finally, all features were fused to form a multi-source eigenmatrix, and the deep learning model named DeepFM was applied for training and to predict residential building types.
3.1. Shape Features and Location Features
3.1.1. Shape Features
Buildings with the same function tend to have similar architectural shapes, while those with different functions will have different shapes. For instance, apartments and industrial buildings show notable differences in shape characteristics. Many researchers have proposed to predict building types using geometric features of buildings [
18,
30,
31]. In this study, 13 common classic shape indicators [
32,
33] were selected, including perimeter, area, rectangularity, mean length, maximum length, minimum length, outlier, compacity, convexity, elongation, right angle, orientation, and granularity. Detailed illustrations and equations of some shape indicators are shown in
Figure 3.
3.1.2. Location Features
In this study, the distance of the buildings from the nearest road and AOI were used as location features. AOI, as a type of spatial data describing geographic entities [
34], refers to areas within the urban environment that absorb public attention. Furthermore, AOI is of great significance in multiple application fields [
35], and is often used in the identification of urban functional areas as it reveals areas that are highly exposed to the public [
36]. Public facilities, such as parks, schools, and sports fields, are examples of AOI types. They are indispensable public facilities in people’s daily life because their public demand and application frequency are very high [
37]. Therefore, this study selected these three AOI types and calculated the distance from the building to the nearest AOI site as one of the location features.
The distance to the nearest road was considered as another important location feature for building prediction, which is often used as a significant predictor of building types [
28,
38]. Maximum speed limits and location distributions vary widely depending on the type of road, therefore, road types may impact the building type prediction. We conducted comparative experiments on different road types from the OSM road network and eventually selected road types that have more significant impacts on building prediction.
The experiments mainly considered three road types related to residential buildings with different speed limits and distributions, including ‘footway’, ‘primary’, and ‘residential’. Comparing these three road types, as shown in
Figure 4, ‘primary’ and ‘footway’ had little difference in their impact on prediction, while ‘residential’ had a greater effect on prediction, with higher accuracy, recall, and F1-score than the former. After investigation, we found that ‘residential’ was primarily distributed in residential building areas and was closest to residential building areas, followed by ‘footway’, and finally ‘primary’. Therefore, we chose the distance from ‘residential’ as one of the location features.
3.2. Semantic Features
3.2.1. Land Use
In this study, semantic features used for prediction included POI spatial co-location patterns, nighttime light, and land use. The land use information used in this study was obtained from the OSM dataset, which included the land use information for the building location. This information covered semantic features such as the natural and social attributes of the land such as ‘farmyard’, ‘forest’, ‘commercial’, and so on. Previous studies demonstrated that land use information can be used to detect residential buildings [
39] and predict building types [
3]. Therefore, this study selected land use as one of the semantic features. First, to ensure data quality, we removed certain land use categories with less quantity from the land use dataset. Six categories such as ‘retail’, ‘residential’, ‘industrial’, ‘forest’, ‘farmyard’, and ‘commercial’ were retained. Then, after connecting the land use information with the building coordinate using the ArcGIS tool, six land use indicators were obtained as semantic features for building type prediction.
3.2.2. Nighttime Light
Nighttime light data provide an accurate reflection of the brightness of a city at night and can also serve as a feature of human activities. Research shows that nighttime light data are correlated with population distribution [
40,
41,
42]. Some researchers have utilized nighttime light data to distinguish between residential and non-residential areas and estimate residential vacancy rates [
43]. Therefore, this study hypothesized that there may be a correlation between nighttime light data and residential building distribution, and used it as an important supplementary data source for prediction. In this study, the standard deviation classification method was employed to classify the light brightness into six levels. The corresponding light brightness values were assigned to buildings by spatial mapping.
3.2.3. POI Spatial Co-Location Patterns
POI can partially reflect surrounding geographic objects of buildings [
44]. However, most studies only consider the statistical information of POIs, neglecting the spatial distribution of POIs, which leads to insufficient mining of feature information [
45]. To address this issue, Chen et al. [
46] proposed that the combination pattern of POIs within a certain range can more accurately reflect the distribution of surrounding buildings.
This study aims to infer building types based on POI features. Based on the above analysis, the combined pattern of POIs can better distinguish residential buildings from non-residential buildings. According to research [
47], spatial co-location patterns refer to multi-class feature collections that are frequently located nearby and are used to discover subsets of features with clear spatial associations. Therefore, POI spatial co-location patterns can be used to model the combination pattern of POIs. After the calculation of the participation rate [
46] of POI in the spatial co-location patterns, the spatial distribution of POIs can be obtained.
The computation methods of co-location patterns and the participation rate are shown in Definitions 1 and 2. The POI spatial distribution indicator was calculated based on the spatial co-location patterns and participation rate, as shown in Definition 3.
Definition 1. Given three types of POIs, I, J, and K, the spatial context relationship context is defined based on the Euclidean distance, and the spatial co-location pattern F is constructed as follows:where n is the total number of the three types of POIs, .
represents the Euclidean distance between and . The context refers to the spatial distribution relationship of the three types of POIs, i.e., I, J and K, that meet the Euclidean distance conditions. E is the pre-defined proximity threshold, and is the spatial co-location pattern. Definition 2. Given three types of POIs, I, J, and K, and a spatial co-location pattern , the participation rate of type I in the spatial co-location pattern is defined as follows:where denotes the number of different objects of type I contained in the spatial co-location pattern , and is the total number of objects of type I. Definition 3. The sum of the participation rates of types I, J, and K in the co-location pattern is denoted as , which serves as the characteristic value of the POI combination type I, J and K, reflecting the variance of different POI combinations. is defined as follows: For example, within a certain range around a building, there are four types of POIs,
A,
B,
C, and
D, distributed as shown in
Figure 5. According to Definition 1, there are four spatial co-location patterns:
,
,
, and
. However,
and
do not meet the conditions because the Euclidean distance between
and other POIs is greater than the proximity threshold
E.
According to Definition 2, the following participation rates are obtained: = 2/3, = 3/4, = 3/4, = 2/3, = 3/4, = 1, = 1/3, = 1/2, = 1/4, = 1/2, = 1/2, = 1/4. According to Definition 3, it is known that = 2.17, = 2.42, = 1.08, and = 1.25. In this study, we discovered spatial co-location patterns and their participation rate according to Definitions 1 and 2, and extracted POI spatial distribution information according to Definition 3.
3.2.4. POI Spatial Feature Information Mining
This study used the analysis unit as the research unit, which refers to the area within a certain range around a building. The size threshold of the analysis unit is determined by the distance to the building boundary, instead of the building centroid distance. A proximity threshold was used to determine whether POIs within the analysis unit had a spatial relationship. They were only grouped if they met the proximity criterion. After conducting multiple experiments, the best results were achieved with a combination of a 200 m threshold for the analysis unit and a 70 m threshold for adjacent POIs.
After referring to the Chinese Standard of Land Use Classification (GB 50137-2011) and taking into account both semantic information and social function of POIs considerations, POIs were classified into six categories, namely, transportation facilities, industrial facilities, commercial services, green space and square, science and education culture, and public services. The specific classification rules are shown in
Table 1. Then, the POI co-location patterns were constructed, and the participation rate was calculated to obtain the POI spatial distribution indicators based on the algorithm described above. These indicators reflected the spatial distribution characteristics of POIs around the building, and these characteristics were related to the building types and were helpful in distinguishing between different building types.
Table 2 presents the POI spatial distribution indicators of some analysis units.
3.3. DeepFM Model
Some traditional machine learning models such as random forest (RF) [
48], decision tree [
49], and logistic regression (LR) [
50] have been employed in building type prediction. However, these models have limited capability to extract complex features, particularly in learning high-order features carrying nonlinear information, thereby restricting the predictive performance of these models. This limitation usually necessitates manual feature engineering to assist in extracting effective feature combinations. In recent years, deep learning research has made significant progress in mining high-order feature combination information, which is the main research focus. The DeepFM [
51] model is among the currently popular deep learning models that mine the deep information of features via feature interaction and omit the feature engineering process. Consequently, this study introduces the DeepFM model to automatically learn interactions between low-order and high-order features from multi-class features, with the aim of improving the precision of building type prediction.
Figure 6 illustrates the structure of the DeepFM model.
DeepFM consists of FM [
52] and DNN [
53], which carry out low-order and high-order feature interactions separately. FM mainly implements second-order feature interactions by calculating the inner product of the latent vectors
and
of features
and
. Its calculation is shown as Equation (4).
In the equation, n represents the number of features of the samples; , ,, and represent the latent vectors of the th and th features; and and are the values of the th and th features of the samples, respectively.
DNN is a forward neural network with multiple hidden layers, used to learn high-level interactive information between features. The training process can be described by Equations (5)–(7):
where
is the input of DNN,
is the total number of features,
represents the layer of the neural network,
is the
th feature vector,
is the output of the
th layer and the input of the
+ 1 layer, and
and
are the weights and biases of the
th layer.
is the number of hidden layers of the forward neural network. FM and DNN share the input and are trained simultaneously, and the results are as follows:
The equation represents the final output of the DeepFM model, where is the output of the FM part and is the output of the deep neural network part. The sigmoid function is used as the activation function.
4. Experiments and Results
4.1. Performance Comparison Experiment
To evaluate the effectiveness of the features and models proposed in this paper, we followed the steps outlined below. First, we replicated the existing residential prediction method described in the literature [
28]. Next, we incorporated the features proposed in this paper to the model and trained it using the London dataset. This allowed us to compare the features and models of our paper with those in [
28]. To facilitate the comparison with the baseline method, we defined buildings labeled as ‘detached’, ‘house’, ‘hut’, ‘residential’, and ‘teacher housing’ as residential buildings and the others as non-residential buildings based on the method proposed in [
28]. Furthermore, we compared some typical classification models such as LR [
50], RF [
48], K-nearest neighbor (KNN) [
54], factorization machine (FM) [
52], deep neural network (DNN) [
53], and wide and deep [
55]. The experimental results are shown in
Table 3.
According to the comparison results shown in the second and third rows of
Table 3, it can be observed that the combination of the features proposed in this study with the model of [
28] led to significant improvements in indicators such as accuracy and recall compared with the features used in [
28]. This demonstrated that the proposed features had better predictive performance. Furthermore, the experimental results in the first and third rows of
Table 3 indicate that, when the features are the same, the prediction performance of the model proposed in this study is significantly higher than that of the model in [
28]. The difference in features is one of the main factors affecting the results. The features of [
28] mainly focused on the size, shape, and proximity of adjacent structures. In contrast, the features proposed in this study included semantic features such as the POI spatial co-location patterns, nighttime light, and land use information, which considered the correlation between the environment of the building and the building type. Moreover, the POI spatial co-location patterns took into account the combination relationship between POIs around the building. Furthermore, in comparison with the features proposed in this paper, the shape features of [
28] were simpler and fewer in number. In terms of location characteristics, the road type was not taken into consideration in [
28]. However, the experimental results reported in this paper indicated that road type plays a crucial role in accurately identifying residential buildings. Therefore, the features proposed in this paper mined more effective information, making them more abundant, and enhancing the accuracy and predictive capability of the model. This was supported by the superiority of the features proposed in this paper over those in [
28].
Another important factor affecting predictive performance was the prediction model. Although the integrated model used in [
28] was better than a single machine learning model, its predictive ability was limited and the time consumption was large. As the number of features and data increased, the time overhead also increased exponentially. Among the other models compared, KNN, LR, and RF are all traditional machine learning models that have difficultly constructing feature interactions and tend to ignore the relationship between features. The FM model can build second-order feature interactions, but cannot learn higher-order feature interactions. The DNN model can learn high-order feature interactions, but cannot construct second-order feature interactions. The wide and deep model requires artificial feature crossover, which increases the complexity of the model and reduces the training efficiency. Therefore, in this study, we propose to use the DeepFM model. This model can achieve low-level and high-level feature interaction, fully integrate and utilize multi-source data features, and mine more effective information to improve the precision and efficiency of the prediction method.
Two examples of residential building prediction performance in urban and suburban London are shown in
Figure 7. It can be found in
Figure 7a, that in the central area of the city, some building types were not correctly predicted. The corresponding incorrectly predicted areas are shown on the map in
Figure 7b. Despite there being a large number of POIs distributed around them and that the surrounding buildings are densely distributed, it can be seen that these incorrectly predicted buildings are irregular and with strange shapes. Therefore, these irregular and strange building shapes can be the main reason for the misprediction of building types in urban areas.
In addition, another prediction example relates to suburban buildings, as shown in
Figure 7c. The incorrectly predicted buildings are also shown on the map as
Figure 7d. It can be seen that there are a small number of buildings on the map and they are sparsely distributed, meanwhile, the number of POIs and roads is also quite small, resulting in difficultly in mining effective semantic and location feature information. Therefore, prediction in suburban areas is a more challenging task compared with urban areas.
4.2. Ablation Experiments
The method proposed in this study combined buildings’ shape, semantics, and location features. In order to demonstrate the influence of various features on building type prediction, this study conducted some ablation experiments on the London dataset. The results are presented in
Table 4.
The baseline experiments we selected were only shape features; the F1-score of its prediction was 0.9132. Compared with the features used in [
28], these features had a more prominent role in prediction, indicating that the shape features can better explore the shape characteristics of residential buildings. The next two experiments added two location features to our experiments, including the distance to the nearest road and AOI. The experimental results showed that these features improved the prediction effect of the model to some extent. According to the discussion in the previous chapter, the road type selected by experimental analysis has a greater impact on prediction, and the distances to the nearest road and AOI that reflect the location distribution of buildings are also helpful for building prediction. Our experimental results also confirmed this analysis. In the last three experiments, three semantic features were added to our experiments. The POI spatial co-location patterns were the first feature added to our experiments, and the experimental results showed that this feature could significantly improve the precision of prediction, which confirmed our expectations. The result indicated that the POI spatial co-location patterns could effectively mine more semantic information around a building. The second feature was land use. The experimental results indicated that, compared with the POI co-location patterns, land use information had less impact on the improvement of model performance. The third one was the nighttime light feature, which confirmed our previous hypothesis that there was indeed a correlation between nighttime light and residential building distribution. Therefore, we could infer that among the semantic features, POI co-location patterns had a greater impact on the performance of the model from the last three experimental results.
Based on the aforementioned experimental results, the following conclusions could be drawn: Firstly, each feature made a certain contribution to the prediction of residential buildings, however, the optimal performance was achieved when all the features were combined. Secondly, the performance of the model improved as more features were taken into consideration. Lastly, the proposed method in this study showed significantly better performance than other methods. The precision and F1-score reached 92.34% and 94.44%, respectively, when all features were integrated.
4.3. Transfer Learning Experiments
To assess the transfer learning capability of the model, this study used the London dataset for model training. Two different test datasets from Beijing and Portland, with significant differences in geography and culture, were used for testing. We employed a pre-trained migration method, which offered the advantages of strong feature extraction ability and good generalization. Moreover, in order to mitigate potential overfitting issues, we fine-tuned the test sets from different regions by incorporating weight decay into the optimizer to increase regularization. This approach reduced the risk of overfitting by penalizing the weights of the model during optimization, inclining them to be smaller. The experimental results are shown in
Table 5.
Due to regional and cultural differences, the prediction results of transfer learning can be influenced by features such as building shapes and POI distribution. In the test datasets, it was observed that the precision and F1-score of the model declined, and the magnitude of the decline varied from city to city. Notably, the prediction effect of the model in Beijing was better, followed by Portland. This experiment highlights the promising transfer learning ability of the model across different regions.
In these three regions, there were noticeable differences in the distribution of POIs. The distribution of POIs in Beijing is relatively dense, with many ancient architectural attractions such as the Forbidden City and the Temple of Heaven. These attractions are interlaced with modern places such as government agencies, universities, and business districts. In Portland, OR, U.S.A., however, POIs are sparsely distributed, with many natural landscapes such as parks. As an international metropolis, London has numerous POIs such as commercial centers, museums, theaters, restaurants, and hotels, and the number and variety of these POIs far exceed those in other cities.
In addition to differences in POIs, building shapes also play an important role. Beijing and London have both traditional styles and modern architectural designs. It is noticeable that there are significant differences between Chinese and Western architectural styles. Chinese architecture usually features symmetrical structures such as courtyard houses, while many building shapes in European cities such as London are irregular. To explore the differences in building shapes, this paper normalized the shape features of the three regions and calculated the average value of each shape indicator.
Figure 8 illustrates the average value of some shape indicators. It can be observed that there were significant differences in the building shape indicator values among these cities, indicating that the buildings in these cities have unique shape characteristics.
Moreover, building density is another crucial factor in urban environments. Generally, developed cities have higher building densities than less developed ones. Therefore, cities with different building densities also exhibit differences in the values of location features such as distance to the nearest road and AOI. Based on the analysis above, we can infer that the transfer learning capability of the model in different cities is closely related to these features, which is influenced by multiple factors including the region, culture, and degree of development of the city. As a result, the performance of the pre-training model will also vary in cities with different regions, cultures, and degrees of development. Therefore, to further enhance the transfer learning ability of this model, it is necessary to comprehensively consider the influence of building shape, POI distribution, and other factors.
5. Discussion
In order to enhance the accuracy of residential building recognition, this study introduces a deep learning approach that takes both the internal and external features of buildings into account. The internal features primarily focus on the shape characteristics of buildings, which play a crucial role in building recognition research. However, existing studies have mostly utilized simple shape metrics such as area, perimeter, and number of sides, without considering some more complex shape features. In contrast, our study incorporated 13 different shape features, allowing for a more comprehensive representation of the contour information of the buildings. The experimental results demonstrated the effectiveness of these shape features in accurately characterizing the shape of the buildings.
Regarding the external features, we considered both the location and semantic aspects of the building. In terms of location features, we considered the distance of the buildings to the nearest road and AOI separately. Unlike previous research, we took into account the type of road and conducted experimental analysis to select some road types that have a significant impact on recognition performance. Additionally, we analyzed the distance to the AOI and investigated the influence of different AOI types, which has not been explored in previous studies.
With the rapid growth of big data, researchers can obtain rich semantic data information. Specifically, we incorporated nighttime light data as an initial exploration, which provided an intensity value representing the nighttime illumination of the building. The experimental results confirmed the effectiveness of this approach, although they were limited by the resolution of the nighttime light data. In future research, we plan to explore higher-resolution nighttime light data to further improve the accuracy of our approach. Moreover, we incorporated land use information to augment the semantic understanding of the buildings. In addition, we introduced a novel POI co-location pattern to explore the spatial distribution of POIs surrounding buildings. In contrast with previous studies that primarily focused on the frequency features of POIs, our spatial colocation model mined the spatial distribution information of POIs. This innovative approach significantly enhances the accuracy and effectiveness of building recognition. Notably, our research embraces flexibility, allowing for experimentation with various POI processing methods based on different research scenarios. For instance, we can explore the fusion of POI frequency features with spatial distribution features to obtain a more comprehensive and enriched set of semantic information.
In previous studies, traditional machine learning models such as random forest and decision tree have commonly been employed for building recognition training. However, their effectiveness becomes constrained as the number of feature types grows, leading to limited exploration and utilization of the expanding feature space. In our comprehensive experimental approach, we sought to compare and evaluate the performance of various machine learning models, including both traditional and deep learning models. The results of our experiments revealed that deep learning models showed notable advantages against traditional machine learning models in terms of recognition accuracy and overall performance. We selected DeepFM as our experimental model, which was originally developed and successfully applied in the realm of recommender systems. To our knowledge, this is the first application of DeepFM to the domain of building recognition. The rationale behind this selection lies in the inherent capacity of the model to effectively handle complex and diverse feature spaces. Moreover, we endeavored to enhance the DeepFM model by incorporating an attention mechanism, with the anticipation of further elevating its performance. Despite our optimistic expectations, the experimental outcomes did not manifest a significant improvement in recognition accuracy. In future research, we will continue to explore new improvement methods and seek better model structures and parameter combinations to further improve the accuracy and performance of residential recognition.
In our research, we leveraged a diverse range of data sources with a significant portion originating from OSM. As we all know, most of the OSM data have been contributed by volunteers. There truly exist some quality issues with these data, such as inaccuracy and incompleteness. According to some research [
29], our experimental regions showed good data quality. Moreover, we integrated multi-source data to mitigate their uncertainties. During the calculation of shape features, we identified and normalized peculiar calculation results resulting from data quality issues. According to our analysis, there are mainly two ways to mitigate the influence of poor quality of OSM data. Firstly, there are many methods that enable us to thoroughly evaluate and verify the quality of data by comparing OSM data with authoritative data [
56]. However, authoritative data are always difficult for us to obtain. Other evaluation methods from an internal perspective may be a better choice, such as statistical approaches [
14] and a trustworthiness-based approach [
16]. Obtaining data from multiple data sources to reduce the single dependence on OSM data is another way to settle this problem. Machine learning methods can be used to fuse and extract information from these data.
In this study, we employed a deep learning model integrated with multi-class features to achieve residential building recognition. This approach not only significantly enhances the accuracy of residential building identification but also offers a new perspective for future research. However, there are some limitations that need to be improved in the future. For instance, this research mainly focused on utilizing OSM data. Remote sensing image data are widely used by land use information extraction because of their rich information and wide coverage. Therefore, image data could be fused into our model to predict residential buildings. Additionally, addressing OSM data quality issues could be achieved through the development of an automatic or semi-automatic evaluation method by OSM data trustworthiness. Lastly, the model could be improved by different feature encoding methods for the features and other deep learning methods. These efforts will contribute to the continuous development and advancement of building type identification.