1. Introduction
Buildings are the primary sites of urban economic and social activity, making them the fundamental spatial unit for analysing a city’s internal structure [
1]. With the acceleration of urbanisation in China, growth of the urban population has impacted resource allocation within cities, resulting in changes to their spatial structures due to increases in building number. Classifying the functions of urban buildings helps to understand the internal structure of a city. This, in turn, provides support for urban resource allocation decision-making and spatial structure optimisation [
2].
Building functional types are traditionally identified through field surveys or manual interpretation based on map data. However, these methods are influenced by the subjective perception of researchers and are time-consuming, inefficient, and costly. With the development of remote sensing and GIS technology, automatic feature classification can now be carried out. High-resolution remote sensing image data contains rich information on building surfaces, such as morphology and texture. Machine learning and deep learning models can be used to classify buildings based on their physical characteristics. This type of research relies on the similarities between buildings and is mainly used for classification purposes [
3,
4,
5,
6]. In the study of building function classification, some researchers have integrated multiple data sources, such as high-resolution remote sensing images, LiDAR data, and nighttime lighting data [
7,
8,
9]. This is because a single type of remote sensing image is not expressive enough. It is important to consider all available data sources to accurately classify building functions. Remote sensing images can extract the physical features of buildings, but they lack semantic information and have low accuracy in identifying functional types. This limited ability can lead to errors in recognising buildings with similar physical features but different functional types, as noted in [
10]. Furthermore, while building outlines can be automatically extracted from remote sensing images, there is a need to enhance the precision of this process.
Automatic building classification has been attempted using topographic raster maps, cadastres, digital landscape models, and land use data. However, obtaining up-to-date map data is difficult and requires significant human and material resources, limiting their application in functional building classification [
11,
12].
The arrival of the geographic big data era has brought massive online geographic information resources, such as social perception data from social media [
13], cab trajectory data [
14], cell phone data [
15], street-view image data [
16], and point of interest (POI) data [
17,
18]. These record a large amount of information about human activities and have been widely used in the study of urban spatial structure characteristics. Use of social perception data to study urban spatial structures is a new approach to recognising urban building functions. Some scholars have utilised socio-perceptual data to identify urban building functions and study their fine-scale spatial layouts [
19,
20]. Chen used trip and stop data from traffic and public transportation databases to establish connections with surrounding buildings and infer their functions [
21]. Liu integrated social network data, cab trajectory data, POI data, and remote sensing imagery to identify mixed-function buildings in a city [
22]. Li used cab trajectory data and real-time Tencent user density data to analyse human activity patterns and spatial distribution characteristics. The precision of identifying building functions was increased by analysing spatial interactions [
23]. Gao proposed a method that integrates feature decomposition with k-means clustering to infer building functions according to location-based Tencent user density (TUD) data [
24]. Hoffmann proposed a content-first social media image dataset filtering method to extract the geospatial information of ground-level buildings from more than 28 million images. This can categorise buildings into commercial, residential, and other types [
25]. Tang utilised telecommunication traffic data to group urban buildings and then used planning documents and POIs to identify building functions to infer ‘high-frequency’ urban functions [
2].
The use of social perception data to recognise building functions has become a hotspot of research. However, the location accuracy of data from social media, mobile communications and rental car tracking is low. Furthermore, the service is expensive and the data are not easily accessible. POI data originating from Internet maps have the advantages of being information-rich, time-sensitive, and easy to access. POI data have high location accuracy and contain information about the types of geospatial entities, which is advantageous in recognising the functions of urban buildings [
26]. More and more researchers are using data provided by web maps to identify building functions [
27,
28]. Identifying building function using POI data relies heavily on the density values of POI types within the building contour [
29]. Fan estimated the functional types of OpenStreetMap building footprint data based on urban morphology analysis, and was able to accurately differentiate between residential buildings, industrial buildings, and accessory buildings [
30]. Qu conducted a study on the functional classification of urban buildings using POI and high-resolution remote sensing imagery [
29]. Sturrock used machine learning to analyse shape, size, and location features from OpenStreetMap building data to predict whether residential buildings can be sprayed with insecticides to help prevent malaria [
31]. Cao used POI data to classify building functions by improving the indicator frequency density ratio method commonly used in the identification of urban functional areas. This approach solves the problem of identifying building functions in POI-sparse areas by expanding the POI search range and applying inverse distance weighting to the indicator frequency density ratio [
32]. Chen et al. collected POI data, building coverage area data, and road information from the Baidu and Gaode maps. This was combined with land use data to develop an NLP-based method for identifying urban building types in Beijing, China. It improves the low identification accuracy of buildings that do not contain POI points and utilises distance metrics to estimate building functional types [
33]. Deng et al. proposed a hierarchical data mining model to identify urban building functions. It combines Gaode map building footprints, POIs, land survey data, and high-resolution remote sensing imagery to solve the missing data problem of some residential POIs [
34]. Bandam et al. proposed a machine learning classification algorithm based on OpenStreetMap (OSM) building data to categorise buildings into three types: single-family residential, multi-family residential, and non-residential [
35]. Lin et al. proposed a step-by-step identification method for urban building functions based on remote sensing imagery and POI data, which integrates the spatial similarity of buildings and the kernel density of POIs to improve the accuracy and completeness of identification [
36].
In summary, the use of POIs and building outline data from online maps to recognise urban building functions provides high building outline accuracy and rich semantic information on POIs. However, it is still necessary to recognise building functions where there are no POI data. This study proposes an urban building identification method based on the spatial relationships between POIs and geographic entities. Based on POI data, online map building outline data, OSM road vector data and GIS spatial analysis, it categorises urban building functions and deduces the functional types of buildings that do not contain POI points based on the spatial relationship between entities.
Section 2 of this paper describes the study area and data sources,
Section 3 details the research methodology,
Section 4 describes the experiments and discussion, then
Section 5 provides a conclusion.
3. Method
3.1. Overall Design
First, the data were pre-processed by (1) coordinate system harmonisation of multiple types of data, (2) reclassification of POIs, and (3) generation of road meshes. Second, the spatial relationships between POIs, buildings, and road meshes were analysed. Buildings were classified into those containing/not containing POIs. The ratio of the frequency density of the types of POIs within buildings containing POIs was calculated to identify their functional types. Third, the distance, area, orientation, and shape similarities between buildings containing/not containing POIs in the same road network mesh were calculated. The entropy weighting method was utilised to determine the weights of each similarity index. Finally, the functional types of buildings not containing POIs were inferred from the maximum similarity between buildings containing/not containing POIs in the same road network mesh. The technical process is shown in
Figure 2.
3.2. Data Pre-Processing
Gaode POIs include real estate, corporate enterprises (companies, factories), shopping, transportation facilities (coach stations, wharf ports, subway stations, etc.), education and training, finance, hotels, beauty salons, tourist attractions, gourmet food, automobile services, life services (logistics, photo studios, intermediaries, telecommunications business halls, etc.), culture and media (cultural palaces, art museums, news publishers, etc.), leisure and entertainment, medical care, sports and fitness (stadiums, fitness centres, etc.), and government agencies. The first-level classification contains 22 categories, while the second-level classification contains 263. Some transportation facilities (parking lots, bus stations, etc.), tourist attractions, public toilets, can easily interfere with the functional identification of buildings. Data for these places have a low information content and unclear functional indicators, so were excluded from the POI dataset. In addition, duplicate POI points in the POI dataset were deleted to improve the data quality.
Against the Standard for Classification of Land Use Status Quo (GB/T 21010-2017) [
37], this paper classifies building function types into four categories: residential (R), office (O), commercial (C), and public services (S). The existing POI categories were reclassified into these four categories. A building containing more than one type of function is a mixed-function building. Mixed-function building types include mixed commercial and office (CO), mixed commercial and public service (CS), mixed office and public service (OS), mixed residential and public service (RS), mixed residential and commercial (RC), mixed residential and office (RO), and complex functional (F). The POIs corresponding to the functional categories of urban buildings are shown in
Table 2.
3.3. Spatial Relationships between Geographical Entities
As POI data were unavailable for some buildings, it was difficult to determine their functionality based on POI data. Therefore, we used the spatial relationships among buildings, roads, and POIs to identify the functional types of buildings. The spatial relationships between entities include topological spatial relationships, orientation spatial relationships, metric spatial relationships, and similar spatial relationships [
38,
39].
- (1)
Spatial relationship between POIs and buildings. The topological spatial relationship between POIs and buildings can be simply categorised into two cases. (i) The building contains a certain number of POI points within the building. These may be of the same or different types. The number and category of POIs can reflect the functions of the building. (ii) The building does not contain any POI points. This type of building cannot be identified directly from POI points, so further processing is required.
- (2)
The spatial relationship between two buildings. Two spatial entities have a spatial similarity relationship between them. Buildings with the same function tend to have certain similarity, such as footprint size, shape, distance, and orientation. By analysing the spatial similarity index between buildings without POIs and buildings with a known function, buildings with the same functional type can be identified.
- (3)
Spatial relationship between roads and buildings. When identifying building functions, it should be noted that buildings on different sides of a road may have completely different functions. Usually, the buildings in residential communities or industrial parks are relatively far from main roads, while commercial centres are in convenient locations. A road mesh is an area surrounded by urban roads [
40]. The buildings contained in the same road mesh have a high degree of functional similarity. A road mesh and building containment relationship can be used to preliminarily screen buildings with the same function and assist in analysing their functional types.
3.4. Functional Type Identification of Buildings Containing POIs
POI data describes the spatial locations and attributes of geographic entities. Each POI has a corresponding point entity. A building may contain multiple POIs; for example, stores in a shopping mall may correspond to multiple POIs, and different companies in an office building may correspond to different POIs. For buildings containing multiple POIs, the frequency density ratio method is used to count the distribution of POIs in each building unit and classify the building types accordingly. This can improve the accuracy of building function identification. The method of calculating the POI frequency density is shown by Formulas (1) and (2).
where i denotes the POI class,
is the number of class i POIs within the building footprint,
is the total number of class i POIs in the POI dataset,
denotes the frequency density of POI type i within the building footprint as a percentage of the POIs of that type, and
denotes the frequency density of POI type i as a proportion of the frequency density of all POIs within the building footprint.
The CR values are mapped to the functional attributes of the building and the optimal threshold of frequency density is determined after several trials to classify the functional type of the building. As a POI represents an information point, which may be a small convenience store, large shopping mall or residential building, it is not possible to represent the area occupied by different types of information points. So, the selection of thresholds should be carefully considered when classifying building functional types. In this paper, building functions are categorised as single function, mixed function, or integrated function. The best classification is obtained with a maximum CR threshold of 0.5 and a minimum threshold of 0.2.
When the frequency density ratio of a single type of POI within the building footprint is >0.5, the building is considered to be single function, and this POI type is taken as the building’s functional type. When there are two types of POI with frequency density ratios >0.5, the building function is categorised as mixed function. Common mixed-function building types are residential commercial mixed, commercial office mixed, and public services office mixed. When there are more than two POI types with frequency density ratios >0.5, the building is considered to have multiple functions.
If all types of POIs within the scope of the building footprint have frequency density ratios <0.5, the building has a variety of functional types. Then, a minimum threshold of 0.2 is introduced to assist in delineating the building functions. When two POIs have frequency densities >0.2, the building is considered to be mixed function. When three or more POIs have frequency densities >0.2, or all POIs have frequency densities of 0–0.2, the building is considered to be combined function.
3.5. Functional-Type Identification of Buildings Not Containing POIs
The functions of buildings containing POIs were identified using frequency density ratios, while the remaining buildings (containing no POIs) were unidentified buildings. Their functional types need to be inferred from the spatial similarity relationship between identified and unidentified buildings.
3.5.1. Calculating the Spatial Similarity of Buildings
There are potential type-matching relationships between buildings located within the same road network mesh. These can be further identified by calculating the characteristic similarity between candidate matching buildings [
40]. Spatial similarity is an important basis for determining whether two geographic entities have a type-matching relationship. In this paper, area, distance, shape, and orientation similarity indicators are used to determine the similarity relationship between buildings with identified and unidentified functions within the same road mesh.
- (1)
Distance similarity
The distance similarity was taken as the ratio of the inter-building distance to the distance threshold. The distance threshold is determined by the Hausdorff distance [
41] between the samples. In Formula (3), K denotes the distance threshold and h denotes the shortest distance between two buildings A and B, which is calculated by converting the building boundaries into point sets and finding the minimum Euclidean distance between any point in set A
and any point in set B
, as shown in Formula (4).
- (2)
Area similarity
The area similarity of a building is calculated as the ratio between the areas of buildings A and B, as shown in Formula (5).
- (3)
Shape similarity
In this paper, compactness is used to represent the shape characteristics of buildings (Formula (6)). The shape similarity is calculated using the compactness values of two buildings A and B, according to Formula (7), where C, L and F denote the building’s compactness, perimeter and area, respectively.
- (4)
Orientation similarity
The angle between the long axis of the smallest outer rectangle of the building outline and the horizontal line is taken as the orientation of the building [
42]. The similarity of the orientations of two buildings is the similarity angle, which is calculated using Formula (8), where
denotes the angle.
- (5)
Overall similarity of buildings
The total similarity between two buildings is obtained by weighted summation of several characteristics. When the total similarity between an unidentified building and an identified building in the same road mesh reaches the maximum value, the two buildings are considered to have the same functional type. This is calculated using Formula (9).
3.5.2. Calculation of Characteristic Similarity Weights
The entropy weight method was used to find the weight of each similarity characteristic. This method objectively assigns weights according to the degree of variability of each indicator. Information entropy is used to calculate the entropy weight of each indicator, and then the entropy weight of the indicator is used to correct the weight, so as to obtain a more objective weight of the similarity of characteristics [
43]. The entropy weight method is calculated using the following steps:
In the study area, n samples with m indicators are randomly selected to construct the original raw data matrix (i = 1, 2…, n; j = 1, 2…, m), where X denotes the value of the jth indicator of the ith sample. The sample matrix used in this paper consists of 4 similarity indicators for 71 pairs of samples.
- (2)
Normalisation of indicators
To solve the problem of non-uniformity in the units of measurement of the indicators, the indicators are normalised. All values of the normalised matrix
are adjusted to be within the interval [0, 1]. The normalisation calculation method is shown in Formula (10).
- (3)
Calculation of the indicator weight p-value
The weight of the indicator
p-value is calculated as shown in Formula (11), where
denotes the weight of the jth indicator value of the ith sample.
- (4)
Calculating the information entropy of the indicator
The information entropy e of the indicator is calculated as shown in Formula (12), where
denotes the entropy value of the jth indicator with a constant term
, satisfying
, the maximum value of
is 1.
- (5)
Calculating the weights of each indicator
The weight of each indicator is calculated as shown in Formula (13). The equation describes the weighting of indicator j, which represents the redundancy of information entropy and is calculated by Formula (14).
In this study, the entropy weighting method was utilised to obtain the similarity weights of each feature, as shown in
Table 3.
4. Results and Discussion
4.1. Building Function Identification Results
Figure 3 shows the results of the proposed building function identification method as applied to the test area. There are 3163 buildings in the test area, all with identified functions.
Figure 4 is a frequency histogram of the numbers of buildings according to their functions. The highest number of buildings identified as residential in the research test area was 1963. This was followed by commercial buildings and public service buildings with 268 and 217, respectively.
4.2. Evaluation of the Building Function Identification Results
The results of the experiment were evaluated by comparison with conventional identification by professional investigators. The number of buildings that are automatically recognised as being of the same functional type as manually recognised is TP, and the number of buildings that are automatically recognised as being of a particular type but not manually recognised as being of that type is FP. The number of buildings manually identified as a certain type but not automatically identified as that type is NP. The accuracy rate and recall rate are used as evaluation metrics, and are calculated using Formulas (15) and (16), respectively. F1 values are used to combine the accuracy and recall rates according to Formula (17).
4.3. Validation of Results
We randomly selected 957 buildings to validate the identification results. Producer accuracy (PA) and user accuracy (UA) were calculated for each building function type. The building identification result is shown in
Table 4. Public service buildings and residential buildings have the highest identification precision, having both PA and UA of >90%. The PA and UA of mixed commercial and public service buildings and mixed residential and public service buildings are <70%, indicative of low identification precision. The PA of residential buildings, residential and commercial buildings, and complex functional buildings are lower than their UA. These types of buildings are underclassified at a high rate. The number of buildings identified is less than the actual number of buildings in the corresponding category. The PA of other types of buildings are higher than their UA, indicating that misclassification occurs more often than failed classification.
4.4. Accuracy Comparison of the Proposed and Other Methods
The paper method proposed is compared with the kernel density and Tyson polygon area share methods using the same dataset (
Table 5). The Tyson polygon area share method is only used for the functional identification of buildings without POIs.
Kernel density based on POI [
29] had the lowest precision in building function identification, with an F1 value of only 6.04%. The Tyson polygon area share method [
33] had a higher recall rate of 94.79% but its accuracy was only 74.17%, indicating that it can easily misclassify buildings. The proposed method had the highest overall identification precision, with accuracy, recall and F1 values all >90%.
Figure 5 compares the functional identification results of the proposed method and the kernel density method for the Langdong Farmers’ Market in Nanning City. This market mainly includes retail, catering, hotels, chess and cards, and entertainment. From
Figure 5a,b, it can be seen that there are multiple POIs in Nanning Langdong Farmers’ Market, which are mainly commercial POIs. According to the proposed method, the buildings in Nanning Langdong Farmers’ Market are identified as commercial buildings, as shown in
Figure 5c, which is accurate. The kernel density method [
29] incorrectly identified the market as having residential buildings due to the high number of residential POIs in the neighbourhood, as shown in
Figure 5d.
Figure 6 compares building functional identification by the proposed and kernel density methods at Nanning People’s Hall. The People’s Hall is the meeting place of the People’s Congress and the office of the Standing Committee of the People’s Congress. It is a place where political, diplomatic, and cultural activities are held. From
Figure 6a,b, it can be seen that there are two POIs in the Nanning People’s Hall building, both of which are public service POIs. According to the proposed method, the building is correctly identified as a public service building, as shown in
Figure 6c. Because of the surrounding Nanning International Convention and Exhibition Center, Hangyang City and residential communities, the kernel density method [
29] incorrectly identifies the Nanning People’s Hall as having mixed residential and public service, office, and public service functions, as shown in
Figure 6d.
For buildings that do not contain POIs, the proposed method has an advantage over the Tyson polygon area share method.
Figure 7 compares the building functional identification results of the proposed method and the Tyson polygon area share method at the Guangxi People’s Procuratorate. This place is responsible for accepting complaints and appeals to the autonomous region’s people’s procuratorate and undertaking criminal complaints and state judicial assistance cases under the jurisdiction of the procuratorate. The Guangxi Procuratorate includes Buildings A and B. As shown in
Figure 7b, Building A contains a public services POI and is recognised as a public services building, while Building B does not contain a POI. The proposed entity-based spatial relationship method correctly identifies the function of building B, as shown in
Figure 7c. Because of the proximity of Building A to a residential neighbourhood, the Tyson polygon area share method misidentified it as residential (
Figure 7d).
Figure 8 compares the identification results for the buildings of the Nanning Daily Newspaper, which comprises buildings A, B, C, and D. As shown in
Figure 8b, building A contains a public services POI and is recognised as a public services building, while buildings B, C, and D do not contain POIs. The entity spatial relationship method was used to identify that Buildings B, C, and D all have public services functions (
Figure 8c). Building E is a restaurant by the gate of the Nanning Daily Newspaper and contains a commercial POI. The Tyson polygon area share method misidentifies building B as commercial (
Figure 8d).
4.5. Spatial Characteristics of Building Functional Types
Buildings with the same functional type tend to have certain spatial similarities in terms of area, shape, orientation, and location. The following is an analysis of several of the spatial characteristics of the buildings in the study area.
Figure 9 shows the distribution of spatial characteristics for each building function type using box plots.
Buildings of different functional types have different demands on land area due to urban planning, land use types, demographic structure, etc. Among single-function buildings, the footprint of residential buildings is the smallest. The residential buildings in the city are usually designed with several floors to save land area. Office buildings are mainly used by companies and need to accommodate office workers and environmental work requirements, so have larger footprints than R. Public services buildings are of variable size. Public service buildings are typically owned by government agencies and departments at all levels. They serve as workplaces for these agencies and institutions. High floor designs can reduce the building footprint while providing more office space. Smaller agencies may have offices in buildings with smaller footprints, resulting in a wide variation in the size of public service buildings. Commercial buildings include large shopping malls, building materials markets, and small stores. Large shopping malls include and sales, logistics, and other functions, increasing the need for land area, while small stores have a relatively small land area. Mixed-function buildings generally have larger footprints than single-function buildings. For example, the footprint area of buildings with mixed residential and other functions, such as mixed residential and public service, mixed residential and commercial, and mixed residential and office is larger than that of buildings with a single residential function. The probable reason is that buildings with mixed residential types need additional space for commercial, office, or public service activities while also providing a residential function, thus increasing the demand for site space.
The values of the shape compactness of each type of building are concentrated in the range of 0.5 to 0.8. This indicates that the shape of most buildings is regular. The shape of single-function buildings is more regular and simpler than those of mixed-function and comprehensive-function buildings. Residential buildings are mostly high-rise buildings with simple shapes such as simple rectangles or L-shapes to maximise the available living space. Commercial buildings in large shopping centres take into account the spatial layout inside the stores and malls, and traditional office buildings and public service buildings favour regular designs such as rectangles, squares and other shapes in order to create regular office and service spaces. A small number of buildings adopt non-regular designs to show their sense of design and unique cultural connotations, such as museums, new types of business buildings, school buildings, and landmark government buildings. Buildings with mixed and comprehensive functions have more flexible and complex spatial designs to accommodate different functions, so their shapes are diverse and complex.
Urban buildings have different needs for road access depending on their function. The straight-line distance between most buildings and urban roads is <100 m. Commercial buildings are usually located close to urban roads to facilitate shopping, attract customers and connect with parking lots, while office buildings and public service buildings are usually located close to urban roads to allow commuting. Some of the residential buildings are located in areas with convenient transportation to facilitate the travel of residents. However, some residential buildings are built at a certain distance from main roads to provide a nicer living environment with green space. The distance of mixed-function buildings from arterial streets will vary in residential and commercial office areas. In residential areas, mixed residential and public service buildings are generally located farther away from urban roads. In commercial office areas, the distances between mixed-function buildings and urban roads are varied.
There are differences in building orientation according to function, providing differences in sunlight, ventilation, landscape, and environment. Most residential buildings have an orientation angle of <25°. This is because north–south oriented buildings are able to obtain good ventilation and light. So, residential buildings are usually oriented north–south. The choice of orientation for commercial and office buildings may be influenced by the type of business, views, and accessibility conditions, with flexibility between north–south and east–west orientations.
4.6. Discussion
Building functional types are important information for urban planning and management. Building functions may change over time. For example, a residential building may be changed to commercial use. Therefore, accurate identification of building functional types is of great significance in the assessment of urban space, population [
44], facility siting, and fire safety. The identification of the functions of urban buildings requires the classification of urban spatial uses at a finer scale than the identification of urban functional zones [
45].
Most past studies on building function identification have relied on high-resolution remote sensing imagery, land survey data, cab tracks, and network heat. However, it is not too easy to obtain such types of up-to-date data due to the limitations of data service prices and confidentiality. This study proposes a method for the accurate identification of urban building functions using only POI, building footprint, and OSM road data. All of these data are relatively easy to obtain for free from internet mapping platforms. To address the lack of POIs for some buildings, we assessed the topological relationships between POIs and buildings and between road meshes and buildings, as well as the spatial similarity between buildings. We determined building functional types by calculating the maximum spatial similarity between buildings with and without POIs in the same road mesh. In previous studies, Qu utilised the kernel density value of each type of POI to determine building types. This can avoid the lack of functional type density value of buildings without POIs but it does not take into account the spatial morphology characteristics of the city [
29]. The functional types of buildings on both sides of the same road may be completely different, and the results of the experiment show that the accuracy of this method is low. Chen considered the constraint of block range and adopted the Tyson polygon area occupancy ratio to determine the functional types of buildings without POIs; however, they only considered the single factor of distance [
33]. In this paper, we synthesise location, orientation, area, and shape similarity data to identify building functional types more accurately. Experiments show that the integrated multi-feature similarity method has 10% greater identification accuracy than Chen’s method. Compared with the data mining methods proposed in [
34], the proposed method achieves higher identification accuracy without the need for high-resolution remote sensing images, land use surveys and other data, giving it lower needs in terms of data acquisition. Compared with the identification method fusing the spatial similarity of buildings and POI density values proposed by Lin et al. [
36], this paper considers the spatial relationship between the urban road network and buildings when determining the function of buildings lacking POI, which is more conducive to avoiding misidentification.
The experimental results of this paper show that residential buildings and public service buildings are identified with high accuracy. Single-function residential buildings mainly carry the residential function of the public, the number and type of POIs contained in the contour are relatively small, and the use of POIs can accurately identify the functional type of the building. Due to the concentrated spatial locations and similar spatial patterns of urban residential buildings, it is easy to accurately infer the functional types of residential buildings that do not contain POIs using the spatial relationships between geographic entities. Single-function public service buildings are mainly the locations of public administration and public service organisations such as education, health, and government. The functional types of POIs they contain are more concentrated and are identified with higher accuracy. Mixed residential and public service buildings and mixed commercial and public service buildings are identified with lower accuracy. Mixed residential and public service buildings are mainly residential buildings where some floors are used for public management and public services in the community, which contain a higher density of public service POIs and are easily misidentified as public service buildings. The mix of commercial and public services is predominantly office buildings, which contain more organisations, both commercial companies and public services, and it is often difficult to determine the type of function through the ratio of POI density for each type.
The shortcomings of this study are that the POIs and building footprints were obtained from data from the Gaode map platform, which has some areas with missing data. The next step will be to integrate data from a variety of internet maps, including Gaode, Baidu, Tencent and others, to further improve the identification of urban building function types.
5. Conclusions
Identifying building functional types is important in urban planning assessment and spatial refinement governance. POI data are characterised by having rich social attribute information and high timeliness and are used for urban functional classification research. Combining POI with online building footprint data is a popular way to identify building function types. It is difficult to accurately identify building functions that do not contain POIs in areas with sparse POI distributions. To address this problem, this paper proposes an urban building function identification method based on the spatial relationship between POIs and entities. Its steps can be summarised as follows: (1) the functions of buildings containing POIs are identified by calculating the frequency density ratio of POI types. (2) The spatial topology and similarity between POIs, buildings, and road network entities are combined to calculate the area, shape, distance and orientation similarities between buildings with and without POIs in the same road network. (3) The entropy weighting method is used to determine the weight of each similarity and, finally, (4) the functions of building containing POIs are applied to buildings without POIs based on the maximum similarity.
The experiments show that the building function identification accuracy, recall and F1 value of the proposed method were 90.28%, 97.52% and 93.76%, respectively. Compared with existing methods (Kernel density method [
29], Tyson polygon area share method [
33]), the proposed method is >10% more accurate overall. In addition, the POI, building, and road data used in this paper were all obtained from online platforms and are easily accessible, making the proposed method very easy to generalise and apply. In the future, we will conduct experiments in different regions to further validate the generalisability of the method.