1. Introduction
With the development of cities, the demand for urban functional classification in various industries and applications is increasing. Buildings are a fundamental component of a city, and they shape the urban structure and morphology [
1,
2]. Sun Wenhua et al. confirmed that there is a major relationship between urban space and building functions [
3]. Consequently, building function recognition and analysis are of great significance to urban structure optimization and rational allocation of development planning [
4,
5]. Throughout this paper, the term “building function” refers to the use function of buildings, such as residence, shopping, etc. Accordingly, buildings can be divided into school buildings, residential buildings, commercial buildings and communal facility buildings. Building function recognition contributes to mapping navigation because buildings convey significant cognitive information and structural knowledge [
3,
6]. Therefore, obtaining this knowledge is also an important prerequisite for digital mapping. We can retain and enhance this information as much as possible through the data enrichment technology on the map.
A growing body of researchers gives insight into the urban functional areas [
7,
8,
9]. Most studies in the field of building function or urban function classification method include unsupervised methods [
10], semi-supervised methods [
11] and supervised methods [
12], which is generally based on multi-source data in different regions [
13]. Sun Wenhua et al. introduced time utilization activity analysis with the existing domestic building function standards to address this issue [
3]. Remote sensing images and the existing urban land use/cover data with a Bayesian network were used to classify buildings [
14]. These classification methods are based on deep learning, which extracts feature information (individual and overall spatial feature information) as much as possible from buildings and building groups, as well as other multi-source data for auxiliary analysis. In the era of big data, deep learning plays an important role in scientific research within different disciplines [
15].
Some recent studies showed the advantages of mining information from different data sources and deep learning methods. However, how to the context information from spatial vector data and how to train an intelligent model with less labeled samples are urgent problems for urban building function recognition. This research introduces a new semi-supervised learning model, called the Unified Message Passing model (UniMP), which can be trained by less labeled samples. Moreover, a Graph Transformer jointly labels embedding to propagate both the feature and label information in the UniMP, and it can extract more context information of node label and node features in the graph. In addition, POIs and building footprints are combined to extract the building geometric and attribute information. One-hot encoding was used to describe the attributes of the building by POIs, and eight geometric features of buildings were defined, including building height, circumscribed circle radius, building orientation, minimum enclosing rectangle area, compactness, complexity, density and shape.
2. Literature Review
With the development of artificial intelligence, deep learning has been used in many geospatial analysis tasks [
15,
16,
17,
18,
19], for it can learn the non-line relations from the training dataset. Under the impetus of multi-source of geospatial big data in the past few years [
20,
21], a number of deep learning methods and machine learning methods have been proposed to address the building type classifications problem [
22]. In order to recognize the building type (detached building, semi-detached building, terraced building, villa, Wilhelminian-style building, etc.) from very coarse 3D city model data, support vector machines (SVMs) were introduced [
23,
24]. Random forest (RF) was also introduced to classify the building’s roof into a flat roof, gabled roof, hipped roof, mixed form roof, pitched/shed roof and pyramid roof [
25]. The above machine learning methods perform well in building roof type classification; however, none of them are suitable for building function classification.
Over a long period in the past, the functional description was characterized by visual features [
26] (e.g., spectral, textural and geometrical features). With the extensive application of convolutional neural networks (CNN) in computer vision, a set of neural networks were used to classify the building function from street images [
27,
28,
29,
30]. A fusion model was proposed by Hoffmann et al. [
31] to analyze the building function classification from aerial and street view images. Several scholars demonstrated that CNNs could be used to multi-label building function classification through Google Street View. HierarchyNet, a hierarchical network, was developed to classify global urban buildings into main and subcategories [
32]. Moreover, in satellite imagery data, recently, the mining of information hidden in emerging geospatial big data (e.g., POIs, street view images data) was combined to classify scenes function by some CNNs [
33,
34]. Deep learning methods can achieve building function classification, while the accuracy needs to be improved because the contextual information of the building is not considered. Meanwhile, the buildings along roads are easy to obtain street view images, but not all buildings. Therefore, a new method for building function classification is needed.
“Everything is related to everything else, but near things are more related to each other” is not only Tobler’s First Law but also a significant law of geospatial analysis. With the development of big data, technology promotes urban development and scientific progress and, at the same time, brings opportunities and challenges to the research field of artificial intelligence [
15]. Graph Convolution Neural Networks (GCN) [
35] provide new insight into land use classification [
36] and object detection [
37,
38] by mining more context information. GCN has shown excellent performance in many fields, such as computer vision [
39], speech recognition [
40] and natural language processing [
41]. Moreover, it performs well on vector data [
42] because it takes more neighbor information into account [
2,
43]. A building function is largely related to its neighbors; for instance, residential buildings are always built together and away from factorial buildings. Therefore, the GCN is suitable for identifying individual building functions. However, large amounts of labeled data are required to train the model by general GCNs, which is a great challenge for researchers to label a large number of buildings manually. Furthermore, at the stage of model training for node classification by general GCNs, some neighbors and topological relation information are not taken full advantage of [
44,
45,
46].
Additionally, it is difficult to classify the building function in a city merely by applying a single data source [
47,
48]. Under the background of the surging development of information technology, a large amount of text data, including spatial location information of buildings, can be obtained, which provides a data source for the identification and analysis of urban building functions [
49]. The effective combination of POIs, remote sensing images, street view data and GIS technology can analyze city information in detail, which not only realizes the classification of building functions but also improves the accuracy and reliability of classification [
27,
34,
50]. Recent research has shown the effectiveness of urban function recognition using multiple data sources [
13,
51]. However, how to use the multiple geographic data and less labeled samples to train the deep learning model, which can learn more context information, is still a gap in recognizing the building classification.
3. Methodology
In this study, a novel framework was introduced for building function recognition (
Figure 1). First, to recognize the function of buildings, the POIs and building footprints were matched by coordination so that the POIs corresponding to a building can be used to extract the building attributes. In this research, we introduced one-hot embedding to express the feature information of building POIs. Second, eight geometric features of building footprints were calculated, including building height, circumscribed circle radius, building orientation, minimum enclosing rectangle area, compactness, complexity, density and shape. Third, to represent spatial relationships between buildings, we conducted a graph through Delaunay Triangulation (DT) with the center point of the buildings’ footprints. Therefore, every building was treated as a node of the built graph, and geometric features and attributes extracted from POIs were combined and used as the features of graph nodes. Fourth, we proposed building a multi-layer GCN architecture with two convolutional layers and one softmax layer to tackle the building function recognition problem. Then, the building function could be recognized by the trained model (
Figure 1).
3.1. Building Impact Factor Calculation
The difference between buildings cannot be shown only by coordinates and the height of the buildings [
52]. Several attempts were made to describe the building more comprehensively. Xu et al. introduced 8 geometric feature factors of buildings to extract the features from building footprint, including building height, circumscribed circle radius, building orientation, minimum enclosing rectangle area, compactness, complexity, density and shape [
1,
2]:
Building height was applied to distinguish the functions of buildings as a 3D feature and played a certain role in our research object. For example, office buildings are generally higher than gymnasiums. This paper also introduces the area of minimum bounder rectangle (MBR) as one of the features of a building, which is often used in GIS to give the approximate location of a geographical element. Radius refers to the radius of the minimum circumscribed circle (MCC) of the building footprint;
Complexity means the complexity of a building footprint structure, which is calculated by the proportion between skeleton line lengths and the building footprint perimeter. This formula is as follows:
m refers to the number of skeleton lines in the building footprint;
Compactness means the compact and plump degree of building footprint and is always used for the comparison and analysis of urban morphology [
53]. In the case of different shapes with the same area, the compactness of their spatial distribution is different. An equation was formulated based on the moment of inertia [
54] and as follows:
In Equation (2),
Ig refers to inertia moment, and
A represents the area of the footprint. In this work, we suppose that all building footprint consists of several infinitesimal area units
da; it can define the inertia moment as follows:
In Equation (3), zg denotes the distance of centroid of building footprint from da. The value of Ig ranges from 0 to 1. If Ig is equal to 1, it has a circular footprint. On the contrary, when Ig approaches 0, the building footprint is close to a line and is the least compact;
Orientation is defined as the direction of the MBR in footprints and shows the direction consistency of a building with other buildings. Accordingly, orientation is calculated based on the vertexes coordinates of MBR, and the equation was formulated as follows:
where
l1 and
l2 refer to the edge’s length of MBR. Correspondingly, (
x1,
y1), (
x2,
y2) and (
x3,
y3) in Equation (4) represent vertex coordinates;
The term “Density” in this paper refers to the building density within a certain range. Before calculating this index, A buffer with a maximum radius
R of buildings was conducted according to the footprint centroid. The density formula derived from the ratio of the sum area of building footprints in the buffer to the buffer area is as follows:
where
Ai describes the
i-th building’s area in the buffer, and
n means the building number within the buffer area. The value of density ranges from 0 to 1;
Building shape was introduced to represent the span ratio of the building footprint. There are three typical types of building shapes: expansive, circular and compact. The calculation formula is as follows:
where
A represents the area of the building footprint, and
L denotes the longer edge of MBR.
3.2. POI Data Processing
Original POI has only the second-level classification of 14 categories without major classification, including scientific research and education, residential area, committee, catering, etc. This work drew on the POI classification of AutoNavi Map and classified it into the following four categories: educational buildings, residential buildings (e.g., residential building, committee, accommodation), commercial buildings (e.g., lottery shop catering, restaurants, shopping malls, shopping finance buildings) and communal facilities (e.g., other facilities, leisure and entertainment, tourism, government).
In this research, the method of one-hot embedding was introduced to describe the POIs by binary vectors. One-hot encoding is also called one-bit effective coding. It has 4 possible classification values for any POI, and it becomes 4 binary features after one-hot embedding. These category values are mutually exclusive, and for each building, only one is activated at a time. For discrete category features, the ordinary coding method cannot be used in machine learning, and the one-hot embedding makes the calculation between features more reasonable. There are serval benefits of using one-hot encoding:
- (1)
The problem of the classifier being difficult when handling discrete values, such as category data, is solved;
- (2)
It can make some progress in expanding features, and the discrete data are converted into sparse data.
In this research, all buildings were considered research objects. If a building contained a POI of the corresponding type, the location was set to 1, and vice versa is 0. At the same time, results were predicted through the subsequent unsupervised model according to the characteristic matrix of the building. In order to match the buildings with POIs, a 10 m buffer for every building was built. Then, a one-hot encoding matrix for subsequent experiments was carried out to the training quantity of the sample data set.
3.3. Building Function Classification
Semi-supervised classification model UniMP was used for building function classification. UniMP can aggregate the features and label information, which is helpful for graph node classification. UniMP is also a multi-layer Graph Transformer jointly using label embedding to transform nodes labels into the same vector space as nodes features, which can benefit in extracting the context information of buildings and improving the recognition ability of the model. Moreover, multi-head attentions were used as the transition matrix for propagating feature vectors so that each node could aggregate more information from its neighbors. The framework of the proposed method mainly included two parts (
Figure 2): (1) representation of building graph network structure; (2) reasonable application of semi-supervised message passing network model to realize the recognition and classification of a single function of buildings in building groups. This model used multi-data and a semi-supervised model to mask the label of the building in order to avoid the leakage of node labels in the iterative process of the experimental model, resulting in inaccurate or even wrong prediction results. It can not only predict node labels according to node characteristics but also further integrate other node label information to predict the current node label.
When recognizing the building function by the UniMP, the graph structure is crucial to represent the building. Additionally, it is also the key to describing the spatial distribution of buildings. By considering the properties of the edges, an undirected connected graph was used in this paper, where and is a finite set of nodes and edges.
The UniMP model applies the Graph Transformer and label embedding to learn the node features and node labels from the graph. In the Graph Transformer modular, graph multi-head attention (
Figure 3) was used to extract more features from multiple dimensions. For given nodes features
, multi-head attention for each edge from
j to
i can be calculated as follows:
where the
represents the exponential scale dot-product function and
d is the hidden size of each head.
is the source feature and
is the distant feature;
is the query vector and
is the key vector for the
c-th head attention;
,
,
and
are the trainable parameters;
is the edge features of the graph, which are encoded and added into the key vector as additional information for each layer.
For each head attention, the source feature
and distant feature
are transformed into query vector
and key vector
, respectively. In this study, we transformed and propagated the features of each layer of nodes to obtain the node feature information data of the next layer. After the graph multi-head attention, the information aggregation from
l-th layer to (
l + 1)-th layer is as follows:
where the
represents the concatenation operation for head attention
C. For the output layer, the averaging for multi-head is used as the output as follows:
The label embedding was used in UniMP to propagate the obtained label information. First, the labeled node was represented by a one-hot matrix, and the node without a label was represented by a zero vector. Second, the adjacency matrix
after regularization was used to represent
propagation and to obtain the label information representation of the (
l + 1)-th layer. Third, the fused node feature information and label information was further propagated, as shown below:
This paper also took full advantage of the node’s label information, further enhancing the characteristic information of the node so that the model can obtain more necessary information. The UniMP framework model predicts the node by the neighbor labeled node, node feature and topological information as follows:
4. Quasi-Experiment and Analysis
4.1. Study Area
Nanjing is situated in the middle and lower reaches of the Yangtze River in eastern China. As the capital of Jiangsu province, it is one of the first batches of national historical and cultural cities and an important birthplace of Chinese civilization. The city covers an area of over six thousand square kilometers, and it governs 11 districts. The specific geographical location is shown in
Figure 4. This study took Nanjing as the research area, and the core functional area consists of commercial buildings, residential buildings, educational buildings and communal buildings. In this research, building footprints were obtained from the software of Shuijingzhu, and the POIs were downloaded from Gaode. Both of them are geospatial data providers in China. In order to ensure data reliability and validity for this study, the building functions were labeled by at least three participants according to Google Maps or Baidu Maps. If the results marked by the three participants were inconsistent, the area was remarked or discarded, which are manually marked and verified to ensure the authenticity, reliability and accuracy of the data.
4.2. Experimental Process and Network Setting
In order to match the POIs with buildings, we performed a buffer analysis with a radius of 10 m of each POI, thereby finding the building closest to the POI point data within the set radius and finally generating a one-to-one matching attribute table between POI and build. Therefore, the attributes of POIs can be used as the features of buildings to recognize their function. Four core fields of POI were reserved: name, address, coordinate and category.
In this experiment, computer configuration and experimental environments were Intel i7-10700k eight-core CPU and Ubuntu 15.5.0 Linux operating system, 64 GB memory and two NVIDIA RTX 2080Ti GPU graphics cards. The methods involved in this paper were implemented in the Pytorch platform implemented by Python. Some geometric features of building footprints were calculated by ArcGIS, such as the area, perimeter, etc. Building footprints were obtained from the software of Shuijingzhu. We set the learning rate to 0.001 and used the Adam optimization algorithm to realize the iterative update of the weights in the model. The maximum number of iterations of the model was 2000, and the network model performance was tested once every 10 iterations of the model.
4.3. Evaluation Indexes
The provided metrics of
F1 score and the accuracy of each building function type are used to assess the quantitative performance.
F1 score is a representation of the harmonic mean of precision and recall, and it can be calculated as follows:
where
Here,
TP is the number of true positives for each building function type,
FP and
FN represent false positives and false negatives, respectively. These metrics were computed using the sample-based confusion matrices for the dataset [
55].
4.4. Results and Analysis
Through the description of the study area, we can understand that the characteristics of the building group are closely related to the local climate, green landscape and human factors. A Building group is usually regarded as the basic unit of a city block. The good results of the experiment cannot be achieved by only using building data. We tried to improve the accuracy of recognizing building functions using building footprints and POI data because they can reflect the real situation of buildings and express urban building units objectively and specifically. Information in this model, including class name, geographical location and POI category (primary and secondary classification), provides a powerful basis for the analysis of urban basic unit buildings in this paper.
From the results (
Figure 5), we can see before 800 iterations of the model, the loss value of the model decreases at a fast and stable rate. On the contrary, the training accuracy increases at a faster rate. The test accuracy and verification accuracy basically increased at a relatively rapid rate before 200 iterations of the model. At this time, the loss value of the model is still declining, and the test accuracy cannot be determined before the loss becomes stable. Finally, when the loss reached about 1600, it basically stabilized at a small fluctuation around 0.1. At this time, the training accuracy has reached a stable peak, and finally, the model verification and testing accuracy has reached the most stable range. The stability and convergence rate of the model was greatly improved, and the most concerned test accuracy was achieved at over 81%.
Figure 6 displays the classification results of UniMP for the building function classification on the validated dataset. From the results, we can see that most buildings can be classified into the right category, especially residential buildings and educational buildings. The accuracies for these two types were over 80%, and the accuracies for educational buildings and residential buildings were 81% and 89%, respectively. Such results are due to these features for these two categories of buildings being obvious compared to other categories of buildings. For example, most of the residential buildings and educational buildings are regular. Moreover, the POIs for educational buildings are obvious, such as “teaching building” and “faculty”, etc. In the research area, some educational buildings were misclassified into residential buildings. The reason for the misclassification may be because of the widespread existence of residential faculties and family residential areas within Chinese university areas. However, the performance of the designed model was a little poor in identifying communal facilities. First of all, there are relatively few training samples in this type of buildings compared with other types. Secondly, communal facilities are usually included or adjacent to commercial and residential buildings within the research area. From the confusion matrix, it can be found that communal facilities are partially recognized as residential buildings and commercial buildings.
Based on the building feature and POI attributes, the semi-supervised message passing model can predict the building function according to the characteristics of the node. Moreover, the model can learn the information of adjacent nodes to predict the label of the current node. Because multi-source data were used, it was easier for the model to capture more meaningful feature information. The comparison between predicted results and the actual situation is shown in
Figure 7. We can see that the proposed model can recognize the building function of some complex situations, such as school and residential buildings.
4.5. Ablation Study
In order to analyze the advantages of combining the POIs and building footprints data, this research conducted several ablation experiments. From
Figure 8, we can see that after adding the POI characteristic information, the model loss convergence speed was significantly improved, and the average convergence rate of the loss value in the model iteration was also faster. By using both POIs attributes and building footprint features, the loss value was stable at around 0.1. This is an important improvement compared with the results of single POI or single build footprints data. The results proved that combining the POIs and building footprints is useful for building function recognition by semi-supervised methods.
4.6. Comparison to Other Methods
This paper selected several mainstream classification methods to make a contrast to the method proposed in this work, including Support Machine (SVM) and Random Forest (RF). During the experiment, we used the same datasets in the above three models, and indexes of accuracy, recall and F1 were used to evaluate the performance of the three models statically. The results presented in
Table 1 demonstrated that the proposed model performs superior to other models because it effectively benefits from context features learning. The UniMP algorithm extracts the features and label information of surrounding nodes both in the training and prediction stages, which is more convenient to extract building information according to surrounding buildings. On the other hand, machine learning models such as SVM and RF cannot learn the deep features and surrounding buildings’ information; therefore, it is not suitable for building function recognition. From the results, we can see that UniMP has the best performance for all the evaluator indicators. In detail, the F1 score of UniMP reached 80.56%, which improved 23.25% and 12.51% for SVM and RF, respectively. The accuracy was improved maximum by 11.22% compared with RF, and the recall was improved maximum by 9.58% compared with SVM. Therefore, the proposed novel framework for building function recognition can learn more context information. Moreover, the deep learning model can be trained well with less labeled samples.
4.7. Discussion
From the perspective of contextual, geometric and attribute information of buildings, the aggregation of contextual information, geometric information and attribute information is meaningful. In this paper, we proposed a novel semi-supervised framework based on UniMP for building function recognition, and POIs and building footprints were combined to mine the features of building functions. In order to extract the context information of buildings and improve the recognition ability, a multi-layer graph transformer was used to transform node labels into the same vector space as node features in UniMP.
With the development of information technology, there are a large amount of spatial location information of buildings can be obtained; these data sources are helpful for the identification and analysis of urban building functions. However, how to use multiple data sources to train a deep learning model, especially for heterogeneous data, is still a gap in recognizing the building classification. This paper proposes encoding the POIs with one-hot; eight geometric features of buildings were designed, including building height, circumscribed circle radius, building orientation, minimum enclosing rectangle area, compactness, complexity, density and shape. The experiment proves that combing the POIs and building footprints can improve the accuracy of recognition and speed of model convergence.
Contextual information is important for building function recognition based on the first law of geography. Moreover, how to train an intelligent model with fewer samples is also a difficult problem for building function recognition. In order to solve these issues, this paper introduced the semi-supervised classification, UniMP. These buildings were organized into a graph by the Delaunay triangulation, and the geometric and attribute information of buildings were treated as the node features of graph. Experiential results show that the UniMP can extract contextual information effectively. Moreover, semi-supervised learning models can obtain a good performance compared with other machine learning methods, such as SVM and RF.
5. Conclusions
Buildings are the core elements of people living in the city. In order to recognize the function types of city buildings, the semi-supervised classification model, UniMP, was introduced to speed up cognition and understanding of buildings. In order to mine more information for building function recognition, the buildings’ footprints and POI information were combined. For the POIs, the one-hot encoding was used to extract the attributes of buildings; while for buildings footprints, eight geometric features were defined to describe the different functional buildings, including building height, circumscribed circle radius, building orientation, minimum enclosing rectangle area, compactness, complexity, density and shape. Due to the use of multi-source data, the model captured more meaningful feature information of relevant nodes, then achieved better results in building function recognition. Moreover, this research treated every building as a node of the graph during the training model. Thus, we could not only predict the label according to the node characteristics but also integrate the label information of other neighbor nodes. Compared with the results of single POI (79.14%) or a single building’s footprint data (79.97%), the accuracy for using both POIs attributes and buildings’ footprint features is stable at around >81%. The results proved that combining the POIs and buildings’ footprints is useful for building function recognition by semi-supervised methods.
In order to extract more context information of node labels and node features in the graph, the semi-supervised classification model was introduced in this work, which can achieve good performance with limited labeled samples. Although this study effectively classified urban building functions, it had certain limitations because all the buildings are Chinese architectural styles. In the future, we will try to apply some other types of data, including social media data such as microblog check-in data, taxi track information, street view information, etc. It is expected that the model can achieve better results than the current experimental results in the subsequent experimental process.