Next Article in Journal
Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco
Previous Article in Journal
Harmonising the OGC Standards for the Built Environment: A CityGML Extension for LandInfra
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying Building Functions from the Spatiotemporal Population Density and the Interactions of People among Buildings

1
School of Geography and Planning, Center of Integrated Geographic Information Analysis, Sun Yat-sen University, Guangzhou 510275, China
2
Guangdong Provincial Key Laboratory of Urbanization and Geo-simulation, Guangzhou 510275, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2019, 8(6), 247; https://doi.org/10.3390/ijgi8060247
Submission received: 17 April 2019 / Revised: 23 May 2019 / Accepted: 27 May 2019 / Published: 29 May 2019

Abstract

:
Buildings are fundamental components of cities. Understanding the function of buildings is therefore of great importance for urban development and management. Some studies have identified building functions using spatiotemporal data, which assumes that buildings with the same function have similar temporal activity patterns. However, these methods present difficulties in coping with the situation when buildings with the same function have heterogeneous activity patterns. To solve this problem, this research proposes a new method to identify building functions from the perspective of the spatial distribution and spatial interactions of human activities. First, taxi data were used to acquire the spatiotemporal interaction characteristics among buildings with different functions. Then, the spatiotemporal population density distribution was adopted to depict the building vitality. Finally, an iterative clustering method was introduced to identify the building functions. The proposed scheme was applied in the Haizhu district of Guangzhou and compared with the traditional method. The results prove that the spatial interaction characteristics are more helpful than the temporal variation characteristics and therefore can be used to improve the accuracy of building function identification. A higher accuracy for identifying building functions can be realized by combining the spatiotemporal interactions and building vitality characteristics. The overall accuracy reaches 0.8566, with a Kappa coefficient of 0.8174, which are both better than the results of using a single characteristic only.

1. Introduction

Buildings are fundamental structural elements of the urban physical space and serve many functions with respect to human living, working, and recreation. Associated with the physical space of a building, the various functions form a functional space of the building, which directly affects the movement of people, goods, and information, and further involves the interaction of urban flows. Obtaining the spatial distribution of buildings and identifying their functions can enhance the understanding of various temporal and spatial behavior patterns and assist in analyzing complex urban functional structures as well as their changes. This not only provides key data to support high spatiotemporal resolution population estimates and risk assessment, but also serves as an important basis for urban economic development planning and urban management.
Due to the development of high spatial resolution remote sensing and light detection and ranging (LiDAR) technology, research on building 3D information extraction and reconstruction has made significant advancements [1,2]. However, building classification, especially building function classification, has made much slower progress. The standards for building classification can be summarized from the physical materials and forms and their functions. The existing remote sensing-based building classification studies mainly classify buildings as brick-wood, brick-concrete or multistory-/high-rise buildings based on their physical characteristics, such as the load-bearing structural materials and building sizes. Traditional studies use the spectral, textural and shape features derived from medium- or high-resolution remote sensing images and further combine the landscape attributes to obtain the building type information [3,4,5,6]. LiDAR images were also utilized to identify the related building types by considering their three-dimensional information characteristics [7,8,9]. Some scholars overcame the limitation of using a single remote sensing image and proposed comprehensive building classification schemes based on multisource remote sensing data, including multispectral remote sensing images, LiDAR data, and nighttime lighting data [10,11,12,13,14].
The above-mentioned studies demonstrate the potential of using high spatial resolution remote sensing data and LiDAR images to capture the apparent physical characteristics, such as the building morphology and structure, but also expose the shortcomings in identifying the building function. The main reason is that the apparent physical characteristics of buildings are usually not highly relevant to the building function, especially for buildings with complex shapes, materials, and neighboring land uses. In addition, the apparent forms of buildings are relatively fixed, but their functions may change as people alter the usage of the buildings [15].
The function of a building is often closely related to human activities. Therefore, it can be inferred from the characteristics of human activity. In recent years, multisource spatiotemporal big data, such as mobile phone data, taxi trajectory data, and social media check-in data, have emerged. These data record people’s spatiotemporal activity position information and have shown unique advantages and potential in researching human activities, urban functional regions, regional structures and land uses [16,17,18,19,20,21,22]. Some scholars have also tried to use these human time-series position data to infer the building functions. Chen et al. [23] assumed that the social media activities in buildings with similar functions have similar spatiotemporal patterns and applied a cluster-based method to identify urban building functions. Niu et al. extracted the peak activity characteristics from taxi global positioning system (GPS) trajectory data and real-time Tencent user density data for each type of building by analyzing the building training samples, and then combined such characteristics with the density-based spatial clustering of applications with noise (DBSCAN) and spatial point density methods to infer the functions of the buildings [24]. Moreover, the peak activity characteristics of different functional buildings were also utilized to analyze the mixed functions of buildings by using a probabilistic model [25]. Zhong et al. [15] combined survey data and smart card data to deduce the purpose of people’s daily activities using the probabilistic Bayesian model and then inferred the building functions by linking the daily activities to the buildings. Their method was successfully applied in two areas of Singapore. In these studies, the spatiotemporal characteristics of the human activities related to different building functions are revealed, for instance, people mainly flow to office buildings, schools and other workspaces during the day and return to residential buildings at night, and such insights have been used to identify the building functions. Such research usually assumes that buildings of the same function share similar temporal human activity curves, i.e., the daily variation curve of the number of people in the building. However, similarly to the spectral heterogeneity found in remote sensing images, buildings with the same function often have heterogeneous temporal activity curves. Therefore, how to solve the synonymous activity curve phenomenon becomes the key to further improve the accuracy of identifying the building functions.
In an urban network, the spatially discrete buildings are connected by the spatiotemporal activities of humans [26], and their functions are updated and refined in the network, which further promotes the evolution of the city [27]. That means that the buildings are not isolated in urban space; instead, they interact with each other with different strengths and directions through the connection of crowd activities. Differences between the intensity and direction of the spatial interaction are closely related to the functions of the building. Therefore, considering the spatial interaction among buildings can help overcome the limitations of using only the spatiotemporal population density characteristics, solve the heterogeneous temporal activity curve problem and finally improve the accuracy of identifying the building functions. Unfortunately, detailed spatial interaction is usually ignored in existing studies of identifying the building functions.
Human activities and building functions have mutual effects on each other. Specifically, building functions may determine people’s travel characteristics to a certain extent. Many existing studies were inspired by this and utilized the spatiotemporal variations in human activity density to infer building functions. On the other hand, the characteristics of human movement between buildings may also reflect the spatial distribution of different functional buildings and their connection. However, previous studies have mainly focused on quantifying the spatial interaction at the parcel scale through individual mobility data, and the application is generally limited to urban spatial structure and land use classification.
With the above background, this study explored the application of the spatial interaction at the building scale, and proposed a new scheme for building function identification based on the integration of both the spatial distribution and spatial interaction characteristics. Specifically, the taxi GPS trajectory data were used to construct the spatiotemporal interactions among different functional buildings, and the real-time Tencent user density data were used to depict the spatiotemporal distribution characteristics of the building vitality. The rest of the paper is organized as follows: The study area and data sources are introduced in Section 2; the methods are described in Section 3, and the results are presented in Section 4 followed by a discussion and conclusions.

2. Study Area and Data

The Haizhu district is selected as our study area (Figure 1). As one of the oldest districts in Guangzhou, China, Haizhu covers an area of 102 km2 and has a residential population of 1,613,900 in 2015 (Bureau of Statistics of Guangzhou 2015, http://www.gzstats.gov.cn/tjgb/qstjgb/). This district has a complex urban structure and contains many types of buildings, including residential buildings, businesses, schools, hospitals, and urban villages, which refer to villages that appear in both the outskirts and the downtown segments of major Chinese cities, including Shenzhen and Guangzhou. In the past, it used to be an industrial area, but now, after continuous development and planning, it has gradually entered the Silicon Valley era. The rapid development of the regional economy and the government planning of urban construction have made the forms and functions of the buildings in the region more diverse.
Three different datasets, including taxi GPS trajectory data, real-time Tencent user density data, and building footprints in Haizhu, were used in this research.
The taxi GPS trajectory data record the position, time and status information of a taxi in real time, and can be processed to obtain the passengers’ time and the geographic locations of the pick-up or drop-off points, which can be used to represent the relationship between the origin and destination to a certain extent. According to the 2014 Guangzhou Transportation Development Annual Report, the daily average number of taxi passengers in Guangzhou was approximately 2.16 million, accounting for 14% of the passengers using public transportation (railway, conventional bus, and taxi). This number indicates that taxis play an important role in the urban traffic of Guangzhou. Therefore, this study uses the taxi GPS trajectory data to extract the spatiotemporal interactions among the buildings from the data generated by taxi passengers.
The taxi data used in the study was provided by Guangdong Ritu Wanfang Science & Technology Co., Ltd, with a time range of one consecutive week, from January 1 to January 7, 2014, from 6:00 to 23:00 every day. The original trajectory records of each taxi were first sorted according to the record time; then, the pick-up/drop-off records were selected for when the status of the taxi changed from cruising to occupied, or the other way around. The records with pick-up or drop-off locations outside the study area were excluded. After these preprocessing steps, attributes such as the latitude, longitude, and time of the pick-up/drop-off, were obtained.
The real-time Tencent user density data record the number of smartphone users who use Tencent’s real-time location service products every hour, such as Tencent QQ, WeChat, Tencent Maps, etc. According to the Tencent WeChat data report, the number of monthly active users of WeChat reached 549 million in 2015. Thanks to its enormous user base, the Tencent data could serve as a representative indicator of the real-time human activities in China. In this study, we therefore implement a web crawler from ‘Easygo’ and collect the data from June 15 to June 21, 2015, to represent the dynamic spatiotemporal distribution of humans in buildings. The data have a spatial resolution of 25 m and a temporal resolution of one hour. Compared with traditional population census data, the Tencent data have a much finer spatiotemporal resolution.
Moreover, based on the Baidu map platform, a total of 20,928 building footprints were obtained as the basic unit in this research. All the datasets were preprocessed by coordinate transformation and uniformly converted to the WGS-1984 geographic coordinate system.

3. Methodology

3.1. Construction of the Spatial-Temporal Interaction Matrix of the Buildings

The spatial interaction of human activities can be depicted on various scales. However, most of the existing studies focus on the subdistrict scale, while few of them have performed their analysis on the scale of the buildings. The interaction among buildings can be represented by the connecting human flow among the buildings. Since it is difficult to obtain people’s trajectory from one building to other buildings, we used the taxi data to construct the spatiotemporal interaction matrix among the buildings.
Usually, a taxi pick-up/drop-off occurs on a road, which is at a certain distance from the destination building. Therefore, it is first necessary to associate the pick-up/drop-off locations with the buildings. Existing studies have found that the maximum walking distance of taxi passengers is approximately 300 meters; hence, we used this as the distance from the passenger’s pick-up/drop-off location to the destination building and set the building buffer radius parameter as 300 meters [28,29,30], which is used to calculate the total number of pick-ups/drop-offs for the buildings within the range.
By performing the above analysis, all the pick-ups and drop-offs were assigned to their corresponding buildings, whose footprints were used as the research units to construct the spatiotemporal interaction matrix, B S t , among all the buildings. Assuming that there are total N buildings in the study area and B i , j t represents the outflows from building i to building j at time t (t [ 1 , 2 , , 24 ] , i , j [ 1 , 2 , . N ] ), the interaction matrix, B s t , among N buildings at t time can be expressed as in Equation (1).
  B S t = ( B 1 , 1 t B 1 , 2 t B 1 , N t B 2 , 1 t B 2 , 2 t B 2 , N t B N , 1 t B N , 2 t B N , N t )
For the convenience of the following expression, we use B i , t = [ B i , 1 t   ,   B i , 2 t   ,   , B i , N t ] to represent the outflows from building i to other buildings at time t and B , j t = [ B 1 , j t   , B 2 , j t   ,   , B N , j t ] as the inflows from other buildings to building j at time t.
The matrix B s t describes the interactions among the buildings in detail. To further express the interaction characteristics among different functional buildings, we constructed the spatiotemporal interaction matrix among different functional buildings based on Equation (1), which is denoted as B F t . We assumed that there were K types of building functions in the study area. Then, the interaction matrix among N buildings with K functions at time t was constructed as Equation (2).
B F t = ( B 1 , 1 , t B 1 , K , t   B N , 1 , t B N , K , t B , 1 1 , t B , 1 K , t   B , N 1 , t B , N K , t )
where n [ 1 , 2 , . N ] , k [ 1 , 2 ,   ,   K ] .   B i , k , t represents the outflows from building i to other buildings with function k at time t, while B , i k , t represents the inflows of other buildings with function k into building i at time t.
Considering the interactions of different functional buildings over 24 h a day, the spatiotemporal interaction matrix, B F , can be constructed and expressed using Equation (3).
B F = ( B 1 , 1 B 1 , K B N , 1 B N , K B , 1 1 B , 1 K B , N 1 B , N K )
where B i , k = ( B i , k , 1 , , B i , k , 24 ) ,   B , i k = ( B , i k , 1 , , B , i k , 24 ) .
For the convenience of comparison, we used the z-score normalization algorithm to normalize the B i , k and B , i k vectors in Equation (3) according to Equations (4) and (5).
B i , knorm =   ( B i , k , 1 μ i , k σ i , k , , B i , k , 24 μ i , k σ i , k )
B , i knorm =   ( B , i k , 1 μ , i k σ , i k , , B , i k , 24 μ , i k σ , i k )
where   μ i , k = t = 1 24 B i , k , t / 24 , σ i , k = t = 1 24 ( B i , 1 k , t μ i , k ) 2 / 23   ,     μ , i k = t = 1 24 B , i k , t / 24 and   σ , i k = t = 1 24 ( B , i k , t μ , i k ) 2 / 23 .

3.2. Spatiotemporal Distribution Characteristics of Building Vitality

Montgomery [31] defined urban vitality as the number of people in and around the street across different times of the day and the process of human activity at different spatial scales, and it generally depicts the extent to which a place feels alive or lively. Yang et al. [32] used the numbers of mobile phone users in a 24-h period as the measuring index for the vitality of urban communities. The real-time Tencent user density data have a fine spatiotemporal resolution and are similar to cell phone data in their potential for characterizing population activity. Therefore, we took the opportunity to characterize the building vitality in the study area, where a higher user density indicates a stronger vitality.
If B n t is used to represent the real-time Tencent user density value of a building at time t, the vitality spatiotemporal distribution characteristics of the total N buildings in the study area, B D , can be expressed as follows:
B D = ( B 1 1 B 1 24 B N 1 B N 24 )
Similarly, z-score normalization was performed for the vitality spatiotemporal distribution characteristics of each building.

3.3. Identifying the Building Functions

The construction of a spatiotemporal interaction matrix among different functional buildings was based on the assumption that the building functions are known. However, in our research, the building functions were unknown and needed to be identified. Since there are two unknown interdependent variables, it is difficult to directly solve the bivariate problem using the conventional clustering methods. Liu et al. [18] met a similar bivariate problem in their study of identifying urban land uses, and they solved it by using the iterative clustering method, which is simple but effective. In this study, we therefore used the iterative clustering method to cluster the buildings, but with different features.
In the iterative clustering method, the parameter K is a predetermined number of clusters, representing the total number of building functions in the study area. We first randomly initialized each building’s function type and then extracted the spatiotemporal interaction of different functional buildings and the spatiotemporal distribution characteristics of building vitality. After that, the buildings were clustered and given a new function type based on the current characteristics. The extraction of the characteristics and the clustering of the buildings were iteratively performed until the convergence condition was reached, i.e., most of the building functions remained unchanged between two consecutive iterations. A detailed flowchart is presented in Figure 2 to illustrate the proposed method, which mainly includes the seven steps that are briefly described as follows:
(1) Defining the convergence threshold
The convergence threshold was used to define the condition for stopping the algorithm. The iterative clustering method converged when the proportion of function-changed buildings between two consecutive iterations was lower than a certain threshold. A small convergence threshold value means that only a small fraction of buildings had a function change.
(2) Determining the optimal number of clusters, K
In this study, we combined the iterative convergence stability analysis with clustering effectiveness evaluation indicators to determine the optimal K value by performing iterative clustering algorithms with different K values. The iterative convergence stability analysis was performed to determine whether the convergence was fast and stable. Theoretically, with an optimal number of clusters, the proportion of function-changed buildings should decrease rapidly and be close to the convergence threshold after a small number of iterations. The Davies-Bouldin index (DB) [33] was adopted to evaluate the clustering effect (as shown in Equation (7)). The DB indicator describes the intraclass divergence of the sample and the distances among the cluster centers [33]. Smaller DB values indicate that there is less similarity among the classes, hence better clustering results.
(3) Initialize the K parameter randomly
The iterative clustering method began to run after K building function types were randomly initialized for each building.
(4) Extracting the spatiotemporal interaction and distribution characteristics
According to Equation (3), the spatiotemporal interaction characteristics among different functional buildings were calculated with the current K parameter, and each building had an updated spatiotemporal interaction characteristic. In addition, the spatiotemporal distribution characteristics of building vitality were extracted by Equation (6).
(5) Clustering buildings
Based on the present spatiotemporal interaction characteristics and the distribution characteristics of building vitality, the k-means clustering algorithm was used to cluster the buildings into K types, and the function of every building was assigned an updated value.
(6) The iterative process
The function of each building was updated in Step (5). The iteration Step (4) and the new characteristics of the spatiotemporal interaction and distribution were computed with the updated K value, and the buildings were clustered with the new characteristics according to Step (5). This process can be summarized as follows: Steps (4) and (5) above were repeated until the clustering result reached the convergence condition or the maximum number of iterations was reached.
(7) Identifying the building functions
By interpreting the temporal variation characteristics of the flows for every functional building and referring to the Baidu street view map information, the building functions were assigned to each cluster.
DB ( K ) =   1 K i = 1 K max j i ( W i + W j C ij )
where K denotes the number of clusters, W i is the average distance of all the samples in class C i to its cluster center and represents the dispersion degree of cluste C i , C j is the average distance from all the samples in class C i to the center of class C j , and C i j refers to the center distance between class C i and class C j .

3.4. Performance Assessment

We divided the study area into grids of 500 m × 500 m and 1 km × 1 km and then calculated the identification rate and accuracy rate for the identified building functions at these two different spatial scales by randomly selecting gird samples. The identification rate is defined as the proportion of the identified buildings to the total number of buildings in the study area, while the accuracy rate is defined as the proportion of correctly identified buildings to the total number of identified buildings.
In addition, the identified building function results of the sample areas at different spatial scales were further used to construct the confusion matrix by referring to the method for evaluating the accuracy of remote sensing image classification. Two indices, namely the overall accuracy (OA) and Kappa coefficient, were calculated for quantitatively evaluating the accuracy of identifying the building function.

4. Results

4.1. Clustering Results Based on the Spatiotemporal Interactions and Building Vitality Characteristics

Using the methods described in Section 3, we calculated the spatiotemporal interaction and building vitality characteristics of 20,928 buildings in the study area, based on the preprocessed taxi GPS trajectory data and real-time Tencent user density data. A buffer distance of 300 meters and a convergence threshold of 0.1% were used in the experiments. An optimal cluster value of 6 was found through repeated experiments, which will be analyzed in detail in Section 5.1. In order to avoid the possible local optimum dilemma and ensure the reliability of the results, the k-means clustering algorithm was run 100 times while the experiment was repeated 50 times. The building function classification was then determined as the most frequent class. The building function identification results derived from the three different characteristic combinations, i.e., spatiotemporal interaction characteristics and building vitality distribution characteristics or using either one or the other of these two characteristics, are shown in Figure 3.
Based on the clustering results, we calculated the average population inflow/outflow of each cluster over time, as well as the temporal variation of the population density for each cluster type, which was reflected by the real-time Tencent user density. The red curves in Figure 3 represent the ratios of the building outflow to the total outflow during the same period for the different building types. The blue curves represent the ratios of the inflows over time. The orange curves represent the ratios of different building types’ populations to the total population at a specific time. Note that cluster 6 is defined as an “unclassified building” since there is no taxi data nearby; hence, no further analysis was performed.

4.2. Results of Building Function Identification

Referring to Shenzhen’s standardized guiding technical document for building function classification (SZDS/Z 26-2010), the clusters are labeled with corresponding functions based on their temporal variation characteristics of the population inflow/outflow (Figure 4). The Baidu Map street view was also used to provide additional information on these clusters. Cluster 1 buildings are mostly located in areas used for medical, educational, cultural or recreational services, such as the Guangzhou Red Cross Hospital, Sun Yat-sen University, and Guangzhou International Convention and Exhibition Center, and are therefore labeled “public facilities”. The inflow peaks of cluster 1 are 8 A.M. and 2 P.M., which are consistent with the daily activities of such buildings, e.g., going to school, visiting doctors or going to work. In the meantime, the population density of cluster 1 is the lowest, thanks to the small number of people engaging in such activities. Clusters 2 and 3 have similar patterns in terms of the population change; both having a larger outflow in the morning and a larger inflow in the evening. This is consistent with the daily activities of residential areas, i.e., leaving home in the morning and returning home in the evening.
With the help of Baidu Map’s street view, we labeled cluster 2 as “multistore residential buildings”, which usually have fewer than seven stores and are distributed in high-density communities. Cluster 3 was labeled “high-rise residential buildings”, which are mainly distributed in low-density neighborhoods. The buildings of cluster 4 are mainly distributed along two subway lines, Guangzhou Metro Lines 2 and 8, which cover several central business districts, such as the Jiangnanxi business hub, Second Workers’ Cultural Palace business hub, and the Kecun business hub. These are areas where people in the Haizhu District would go for shopping, dining, accommodations and working, which corresponds well with the inflow and outflow trend of the people in cluster 4. Cluster 4 was therefore labeled “business and service buildings”. For cluster 5, we found that most of the buildings are located in urban villages, such as Shixi Village, Lijiao Village, Xiaozhou Village, and Fenghe Village. The temporal variation of the population density in cluster 5 is also similar to that of clusters 2 and 3. However, the inflow/outflow rate of cluster 5 is the smallest of all the clusters, as shown in Figure 3. There are two possible reasons for this. First, people living in urban villages seldom use taxis for transportation. Second, the roads inside urban villages are usually very narrow, making it impossible or inconvenient for taxis to pass through such areas. The functions of some specific buildings, derived through iterative clustering, are listed in Table 1.

4.3. Performance Assessment and Comparative Analysis

As described in Section 4.2, buildings of different functions in the study area are marked with different colors (Figure 3). Overall, the function of 83.3% of the buildings in the Haizhu District was identified, leaving 16.7%, i.e., 3495, of the buildings unidentified. To verify the accuracy rate of building function identification, we randomly selected a series of sample areas under two different spatial scales and used satellite imagery, a street view map from Baidu, and information from field trips to determine the actual building functions in the sample area. As shown in Table 2, we compared the function identification results of the buildings in the sample areas with the actual functions. The results show that the method proposed in this study achieved high accuracy rates (between 81.76% and 87.44%) under the two different spatial scales. Furthermore, we used a confusion matrix to compare the actual identification rate of each type of building function. It can be learned from Table 3 that in experiment A, which combines both the spatiotemporal interaction characteristics among the buildings and the building vitality characteristics, has an average OA of 85.66% and an average Kappa of 0.8174. As seen in Figure 5, an identification accuracy rate close to or exceeding 80% was achieved in experiment A. The identification of urban village buildings is the most accurate, while the identification of business and service buildings is slightly lower than 80%.
The proposed scheme of identifying the building function is compared with a traditional method proposed by Chen et al. [23], which assumed that the human activities in buildings with similar functions have similar spatiotemporal patterns. Their method consists mainly of three steps. The spatiotemporal distribution characteristics of the human activity in each building are first extracted based on the real-time Tencent user density data; then, the dynamic time warping distance-based k-medoids method is applied to group the buildings; finally, the buildings are labeled with different functions. In this experiment, we used not only the spatiotemporal characteristic of the human activity extracted from the real-time Tencent user density data but also the spatiotemporal density distribution characteristics reflected by the taxi trajectory data, which represents the temporal variation of the pick-ups and drop-offs associated with each building. The corresponding identification results are shown in Figure 3D. Overall, the inferred building functions from the two different methods differ greatly in the PF, HR and UV types. Table 4 shows the function identification accuracy of the traditional method. The accuracy rates vary under different spatial scales and are consistently lower than those of the proposed scheme. Moreover, from the results of the four sample areas, it can be seen clearly that PF buildings are misclassified into UV, and some MR buildings are misidentified as HR. Based on the above analysis, it can be concluded that the proposed scheme for building function identification is superior to the traditional method. The results also indicate that the spatial interaction of the human activities reflected by the taxi trajectory data is more useful in identifying building functions than the human spatiotemporal distribution characteristics reflected by the same data.

5. Discussion

5.1. Parameter Sensitivity Analysis

In the iterative clustering algorithm, the number of clusters, K, determines the building function types in the study area. To find the optimal K value, we set the maximum number of iterations as 50 and the convergence threshold as 0.001. By repeating the experiment with different K values (from 4 to 10), we analyzed the proportion of changes in the building functions with the number of iterations. Figure 6A shows the relationships among the three parameters. When the K value is less than six, the curve tends to remain stable as the number of iterations increases. However, when the K value is greater than six, the proportion of changes in the building functions fluctuates. Furthermore, we calculate the DB indicator (Figure 6B). The larger DB values are distributed in clusters with K values greater than 6, indicating that values over six are not applicable to building clustering. The DB values for the curves of K = 4, K = 5 and K = 6 are quite close. However, if we take the optimization goals, i.e., minimize the building function variation, a stable iterative convergence and an accurate building function identification, into consideration, it is obvious that six is the optimal K-cluster value.

5.2. Advantages of Combining the Spatiotemporal Interaction and Building Vitality Characteristics

Three sets of experiments were carried out to analyze the influences of different building characteristics on the function identification, whose results are shown in Figure 3, Figure 5 and Table 3. Figure 3 shows the building function identification results based on different characteristic combinations. Table 3 shows the building function identification accuracy, while Figure 5 shows the confusion matrices calculated for the three experiments. Table 3 indicates that the best identification accuracy can be obtained by combining both the spatiotemporal interaction characteristics and the building vitality characteristics. Figure 5 shows that the accuracy rate of building function identification can be improved by incorporating the building vitality characteristics into the spatiotemporal interaction characteristics.
When using only the spatiotemporal interaction characteristics, the rates of correct identification are quite low for public facilities, high-rise residential buildings and business and service buildings. Some public facilities were misidentified as commercial services buildings or urban villages. Figure 7A shows the enlarged #1 area in Figure 3A, which is the campus of Sun Yat-sen University. A large proportion of the campus buildings were misidentified as urban villages. This may be caused by the low density of the passengers getting on/off taxis on the campus. Compared to using only the spatiotemporal interaction characteristics, the incorporation of the building vitality characteristics can help effectively distinguish the public facilities from other building types (Figure 7B). The confusion matrix can also be greatly improved. The accuracy rate of identifying public facilities increased from 0.7037 to 0.8235, thanks to the differences in their varied population distribution characteristics (as shown in Figure 4 for cluster 1 and cluster 5). Similarly, the accuracy rate of identifying business and service buildings was also significantly improved (0.5784 vs. 0.7829), although the improved rate is slightly lower than those of other building types. This is probably caused by the “1st floor commerce” model in some high-rise residential buildings. From Figure 5, we also find that the spatiotemporal interaction characteristics present a great advantage in the identification of urban villages, with the highest accuracy rate reaching more than 0.88. In this case, the incorporation of the building vitality characteristics can only slightly improve the identification accuracy.
When considering only the spatiotemporal distribution characteristics of building vitality, 97.91% of the buildings in the study area can be marked with the corresponding functional type. In contrast, the accuracy rate for each function type is rather low when only the building vitality characteristic is used. For example, the residential buildings, both multistore and high-rise, cannot be effectively distinguished. Only 11.11% of the high-rise residential buildings can be correctly identified, while 55.6% of these buildings are multistore residential buildings. Compared to other residential buildings, urban villages can be easily identified. The accuracy rate of the business and service buildings, which is only 0.2488, is mainly affected by the different types of residential buildings, especially the high-rise residential buildings. Generally, the accuracy of identifying the building functions based only on the building vitality characteristics is quite low. However, this may be due to the classification system. We found that the optimal K-cluster value changes to three when using only the building vitality characteristics. The corresponding three building function types are business/service buildings, residential buildings and public facilities. Under this classification system, the OA can increase to 0.6321. These results indicate that the method using the building vitality characteristics only has a relatively lower accuracy rate in identifying the detailed building functions, although it can identify most of the buildings.
Using both the spatiotemporal interactions among the different functional buildings and the building vitality characteristics, more detailed demographic information can be provided for the identification of the building functions, thus improving the accuracy rates. However, there are still some errors in the identification results, such as the #2 area in Figure 3A, which in fact is an industrial site. This is mainly because we only considered the spatial interactions of people who use taxis, which are not adequate to reveal all the interaction characteristics among all the buildings.

5.3. Limitations of the Proposed Method

In this paper, the proposed scheme obtained a better accuracy rate in identifying the building functions. Admittedly, there are still some limitations that should be considered in future studies. First, the accuracy of identifying the building functions depends highly on the quantity of taxi GPS trajectory data and real-time Tencent user density data. For example, the function of some buildings cannot be determined due to the lack of taxi data in the neighboring areas; these buildings were categorized as unclassified buildings. Secondly, taxis are not the most representative transport mode for certain trip purposes. For example, educational trips may not be well represented by taxi trajectories. However, the taxi trajectory data have one great advantage, the pick-up and drop-off locations can be easily related to the source and destination buildings. For other transportation modes, such as buses or metros, it would be rather difficult to do this because there may be a long distance between the source/destination building and the bus station or metro station. Thirdly, there are certain peculiarities in regard to both the Tencent data and the study area, which may limit the applicability of the proposed scheme in other counties. According to the statistical reports on Internet development in China, smartphone users account for approximately 59% of the total population in mainland China. For the first-tier cities, i.e., Beijing, Shanghai, Guangzhou and Shenzhen, the ratio is definitely much higher. The smartphone user group includes a certain percentage of low-income and elderly people due to the availability of inexpensive smartphones and affordable telecom services. Moreover, most smartphone users in China are also users of Tencent apps. These peculiarities make the Tencent data suitable for representing the human spatiotemporal activities in the study area. However, the proposed scheme for identifying the building functions may not apply well in regions without so many smartphone and social media app users. Fourth, many buildings in cities have more than one function. The proposed scheme can only identify one function for each building for now. Further research is still required to solve this problem.

6. Conclusions

In big data-based urban geography research, the spatial distribution and spatial interactions of human activities are considered two effective means to measure the similarity of urban land uses and urban spatial structures. In this study, we explored the application of the spatial interaction characteristics at the building-level scale and proposed a new method to identify the building functions from the perspective of the spatial distribution and spatial interactions of human activities. The spatiotemporal interaction characteristics among different functional buildings were extracted from taxi trajectory data, while the spatiotemporal distribution characteristics of the population density were extracted from real-time high spatiotemporal resolution Tencent user density data. Combining both characteristics, the iterative clustering method was then introduced to identify the building functions. A case study was carried out in the Haizhu District of Guangzhou, which also includes a comparison with traditional methods and an analysis of the different characteristics’ advantages and disadvantages.
The following conclusions can be drawn from the analysis of the study results: (1) the spatial interaction characteristics extracted from the taxi trajectory data could provide critical information from the perspective of the spatial interactions of human activities among different functional buildings, which is more useful than the spatiotemporal density distribution characteristics reflected by the taxi trajectory data. (2) Coupling the spatiotemporal interactions with the distribution characteristics of the building vitality and using the iterative clustering method proves to be an effective way of identifying the building functions. Compared with using only one of the characteristics, the coupling method can obtain the highest identification accuracy (OA = 0.8566, Kappa = 0.8174), which indicates that the building characteristics from multiple sources can help to identify the building functions more accurately. (3) When used solely, the spatiotemporal interaction characteristics can help to produce more accurate building function classification results than the building vitality characteristics. Identification of the urban village buildings has the highest accuracy rate of 0.8827, while the accuracy rates for the other building types are slightly lower. (4) As much as 97.91% of the buildings can be identified by using only the building vitality characteristics, but with very low accuracy, especially for multistore and high-rise residential buildings. This indicates that the building vitality characteristics are not effective enough to identify the building functions in a more detailed way. (5) The combination of the spatiotemporal interactions and the building vitality characteristics can reduce the confusion of identifying the public facilities, high-rise residential buildings and business and service buildings effectively, thus improving the accuracy of identifying the different building function types.
The method proposed in this paper for identifying the building functions has great potential for application. It can address the difficulty of identifying buildings with different functions but similar temporal activity characteristics, and provide an objective way of determining the building characteristics and an easy-to-use iterative clustering method. Future research may focus on obtaining the daily activity characteristics of different groups from multisource spatiotemporal big data, such as transportation smart card data, and the linkage between human activities and building functions. The proposed scheme also requires further improvement to infer the mixed functions of buildings.

Author Contributions

Conceptualization, Li Zhuo and Qingli Shi; Funding acquisition, Li Zhuo; Methodology, Li Zhuo, Qingli Shi and Chenyang Zhang; Supervision, Li Zhuo, Qiuping Li and Haiyan Tao; Writing-Original Draft Preparation, Qingli, Shi; Writing-Review & Editing, Li Zho, Qiuping Li and Haiyan Tao.

Funding

This research was supported by the National Natural Science Foundation of China (No. 41371499).

Acknowledgments

The authors would like to thank professor Xiaoping Liu from Sun Yat-sen University and the Ritu Wanfang Science & Technology Co., Ltd for providing the datasets. Furthermore, the authors would like to thank the editors and anonymous reviewers, whose detailed comments and suggestions have notably helped to improve the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Shahzad, M.; Zhu, X.X. Automatic Detection and Reconstruction of 2-D/3-D Building Shapes From Spaceborne TomoSAR Point Clouds. IEEE Trans. Geosci. Remote Sens. 2016, 54, 1292–1310. [Google Scholar] [CrossRef]
  2. Frommholz, D.; Linkiewicz, M.; Meissner, H.; Dahlke, D. Reconstructing Buildings with Discontinuities And Roof Overhangs from Oblique Aerial Imagery. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, XLII-1/W1, 465–471. [Google Scholar] [CrossRef]
  3. Graesser, J.; Cheriyadat, A.; Vatsavai, R.R.; Chandola, V.; Long, J.; Bright, E. Image Based Characterization of Formal and Informal Neighborhoods in an Urban Landscape. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1164–1176. [Google Scholar] [CrossRef]
  4. Ok, A.O. Automated detection of buildings from single VHR multispectral images using shadow information and graph cuts. ISPRS J. Photogramm. Remote Sens. 2013, 86, 21–40. [Google Scholar] [CrossRef]
  5. Wurm, M.; Schmitt, A.; Taubenbock, H. Building Types’ Classification Using Shape-Based Features and Linear Discriminant Functions. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1901–1912. [Google Scholar] [CrossRef]
  6. Du, S.; Zhang, F.; Zhang, X. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS J. Photogramm. Remote Sens. 2015, 105, 107–119. [Google Scholar] [CrossRef]
  7. Belgiu, M.; Tomljenovic, I.; Lampoltshammer, T.; Blaschke, T.; Höfle, B. Ontology-Based Classification of Building Types Detected from Airborne Laser Scanning Data. Remote Sens. 2014, 6, 1347–1366. [Google Scholar] [CrossRef] [Green Version]
  8. Lu, Z.; Im, J.; Rhee, J.; Hodgson, M. Building type classification using spatial and landscape attributes derived from LiDAR remote sensing data. Landsc. Urban Plan. 2014, 130, 134–148. [Google Scholar] [CrossRef]
  9. Tooke, T.R.; VanderLaan, M.; Coops, N.; Christen, A.; Kellett, R. Classification of Residential Building Architectural Typologies Using LiDAR. In Proceedings of the 2011 Joint Urban Remote Sensing Event, Munich, Germany, 11–13 April 2011; pp. 221–224. [Google Scholar]
  10. Hecht, R.; Meinel, G.; Buchroithner, M. Automatic identification of building types based on topographic databases—A comparison of different data sources. Int. J. Cartogr. 2015, 1, 18–31. [Google Scholar] [CrossRef]
  11. Awrangjeb, M.; Ravanbakhsh, M.; Fraser, C.S. Automatic detection of residential buildings using LIDAR data and multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2010, 65, 457–467. [Google Scholar] [CrossRef] [Green Version]
  12. Sritarapipat, T.; Takeuchi, W. Building classification in Yangon City, Myanmar using Stereo GeoEye images, Landsat image and night-time light data. Remote Sens. Appl. Soc. Environ. 2017, 6, 46–51. [Google Scholar] [CrossRef]
  13. Geiß, C.; Aravena Pelizari, P.; Marconcini, M.; Sengara, W.; Edwards, M.; Lakes, T.; Taubenböck, H. Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques. ISPRS J. Photogramm. Remote Sens. 2015, 104, 175–188. [Google Scholar] [CrossRef]
  14. Huang, Y.; Zhuo, L.; Tao, H.; Shi, Q.; Liu, K. A Novel Building Type Classification Scheme Based on Integrated LiDAR and High-Resolution Images. Remote Sens. 2017, 9, 679. [Google Scholar] [CrossRef]
  15. Zhong, C.; Huang, X.; Müller Arisona, S.; Schmitt, G.; Batty, M. Inferring building functions from a probabilistic model using public transportation data. Comput. Environ. Urban Syst. 2014, 48, 124–137. [Google Scholar] [CrossRef]
  16. Shen, Y.; Karimi, K. Urban function connectivity: Characterisation of functional urban streets with social media check-in data. Cities 2016, 55, 9–21. [Google Scholar] [CrossRef] [Green Version]
  17. Gong, L.; Liu, X.; Wu, L.; Liu, Y. Inferring trip purposes and uncovering travel patterns from taxi trajectory data. Cartogr. Geogr. Inf. Sci. 2016, 43, 103–114. [Google Scholar] [CrossRef]
  18. Liu, X.; Kang, C.; Gong, L.; Liu, Y. Incorporating spatial interaction patterns in classifying and understanding urban land use. Int. J. Geogr. Inf. Sci. 2016, 30, 334–350. [Google Scholar] [CrossRef]
  19. Tu, W.; Cao, J.; Yue, Y.; Shaw, S.-L.; Zhou, M.; Wang, Z.; Chang, X.; Xu, Y.; Li, Q. Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns. Int. J. Geogr. Inf. Sci. 2017, 31, 2331–2358. [Google Scholar] [CrossRef]
  20. Zhou, Y.; Fang, Z.; Zhan, Q.; Huang, Y.; Fu, X. Inferring Social Functions Available in the Metro Station Area from Passengers’ Staying Activities in Smart Card Data. ISPRS Int. J. Geo-Inf. 2017, 6, 394. [Google Scholar] [CrossRef]
  21. Manley, E.; Zhong, C.; Batty, M. Spatiotemporal variation in travel regularity through transit user profiling. Transportation (Amst) 2018, 45, 703–732. [Google Scholar] [CrossRef]
  22. Cuttone, A.; Lehmann, S.; González, M.C. Understanding predictability and exploration in human mobility. EPJ Data Sci. 2018, 7, 2. [Google Scholar] [CrossRef] [Green Version]
  23. Chen, Y.; Liu, X.; Li, X.; Liu, X.; Yao, Y.; Hu, G.; Xu, X.; Pei, F. Delineating urban functional areas with building-level social media data: A dynamic time warping (DTW) distance based k-medoids method. Landsc. Urban Plan. 2017, 160, 48–60. [Google Scholar] [CrossRef]
  24. Niu, N.; Liu, X.; Jin, H.; Ye, X.; Liu, Y.; Li, X.; Chen, Y.; Li, S. Integrating multi-source big data to infer building functions. Int. J. Geogr. Inf. Sci. 2017, 31, 1871–1890. [Google Scholar] [CrossRef]
  25. Liu, X.; Niu, N.; Liu, X.; He, J.; Ou, J.; Jiao, L.; Liu, Y. Characterizing mixed-use buildings based on multi-source big data. Int. J. Geogr. Inf. Sci. 2018, 32, 738–756. [Google Scholar]
  26. Batty, M. Towards a new science of cities. Build. Res. Inf. 2010, 38, 123–126. [Google Scholar] [CrossRef]
  27. Batty, M. Cities in Disequilibrium. In Non-Equilibrium Social Science and Policy; Johnson, J., Nowak, A., Ormerod, P., Rosewell, B., Zhang, Y.C., Eds.; Springer: Cham, Switzerland, 2017; pp. 81–96. [Google Scholar] [Green Version]
  28. Liu, Y.; Seah, H.S. Points of interest recommendation from GPS trajectories. Int. J. Geogr. Inf. Sci. 2015, 29, 953–979. [Google Scholar] [CrossRef]
  29. Li, A.; Axhausen, K.W. Trip Purpose Imputation for Taxi Data. In Proceedings of the 18th Swiss Transport Research Conference, Ascona, Switzerland, 16–18 May 2018. [Google Scholar]
  30. Hu, X.; An, S.; Wang, J. Taxi Driver’s Operation Behavior and Passengers’ Demand Analysis Based on GPS Data. J. Adv. Transp. 2018, 2018, 1–11. [Google Scholar] [CrossRef]
  31. Montgomery, J. Making a city: Urbanity, vitality and urban design. J. Urban Des. 1998, 3, 93–116. [Google Scholar] [CrossRef]
  32. Yue, Y.; Zhuang, Y.; Yeh, A.G.O.; Xie, J.-Y.; Ma, C.-L.; Li, Q.-Q. Measurements of POI-based mixed use and their relationships with neighbourhood vibrancy. Int. J. Geogr. Inf. Sci. 2017, 31, 658–675. [Google Scholar] [CrossRef]
  33. Davies, D.L.; Bouldin, D.W. A Cluster Separation Measure. IEEE Trans. Pattern Anal. Mach. Intell. 1979, PAMI-1, 224–227. [Google Scholar] [CrossRef]
Figure 1. Location of the Haizhu case study area and its building footprints.
Figure 1. Location of the Haizhu case study area and its building footprints.
Ijgi 08 00247 g001
Figure 2. Flowchart of the proposed method for identifying the building functions.
Figure 2. Flowchart of the proposed method for identifying the building functions.
Ijgi 08 00247 g002
Figure 3. Spatial distribution of six building function clusters based on the different characteristics in the proposed scheme. (A) Clustering results derived by combining the spatiotemporal interactions among the buildings and the building vitality characteristics. (B) Clustering results based on spatiotemporal interaction characteristics only. (C) Clustering results based on building vitality characteristics only and (D) Clustering results obtained from the traditional method.
Figure 3. Spatial distribution of six building function clusters based on the different characteristics in the proposed scheme. (A) Clustering results derived by combining the spatiotemporal interactions among the buildings and the building vitality characteristics. (B) Clustering results based on spatiotemporal interaction characteristics only. (C) Clustering results based on building vitality characteristics only and (D) Clustering results obtained from the traditional method.
Ijgi 08 00247 g003
Figure 4. The temporal variation characteristics of the population inflow/outflow and population density in different building function clusters.
Figure 4. The temporal variation characteristics of the population inflow/outflow and population density in different building function clusters.
Ijgi 08 00247 g004
Figure 5. Confusion matrix of the building function identification results. (A) Results derived by combining the spatiotemporal interactions among the buildings and the building vitality characteristics. (B) Results based on the spatiotemporal interaction characteristics only. (C) Results based on the building vitality characteristic only.
Figure 5. Confusion matrix of the building function identification results. (A) Results derived by combining the spatiotemporal interactions among the buildings and the building vitality characteristics. (B) Results based on the spatiotemporal interaction characteristics only. (C) Results based on the building vitality characteristic only.
Ijgi 08 00247 g005
Figure 6. Determination of the optimal K value (A) The ratio of the building function changes as the number of iterations increases under different K values; (B) the DB value as the number of iterations increases.
Figure 6. Determination of the optimal K value (A) The ratio of the building function changes as the number of iterations increases under different K values; (B) the DB value as the number of iterations increases.
Ijgi 08 00247 g006
Figure 7. Figure 3 (A) #1 Analysis of the regional building function identification results (A) based on the spatiotemporal interaction characteristics and (B) based on the combination of the spatiotemporal interaction and building vitality characteristics.
Figure 7. Figure 3 (A) #1 Analysis of the regional building function identification results (A) based on the spatiotemporal interaction characteristics and (B) based on the combination of the spatiotemporal interaction and building vitality characteristics.
Ijgi 08 00247 g007
Table 1. Building function identification results.
Table 1. Building function identification results.
ClusterBuilding FunctionsExamples
1Public Facilities (PF)Guangdong 2nd Provincial People’s Hospital, Guangzhou Red Cross Hospital, Guangzhou NO.5 Middle School, Sun Yat-sen University, Baogang Stadium, Guangzhou International Convention and Exhibition Center
2Multistore
Residential Buildings (MR)
Tongqing Community, Desheng Community, Nanyuan Community, Zhoutouzui Community, Houde Community
3High-rise Residential Buildings
(HR)
Haichenghuayuan, Binjiang Garden, Haizhu Peninsula Garden, Jinyayuan, Chigangdong Community
4Business and Service Buildings
(BS)
R&F Haizhucheng, Guangzhou Modern Sea Shopping Department, Huaxia Building, Wedding square, Haiyi Shopping Plaza, Acer Building, Chuangzhi Freeport, Tianyi Hotel
5Urban Village
(UV)
Fenghe Village, Beishan Village, Luntou Village, Xiaozhou Village, Tuhua Village, Hongwei Village, Shixi Village, Lijiao Village
6Unclassified Buildings
(UB)
Table 2. The accuracy rate of identifying buildings at different spatial scales from the proposed scheme.
Table 2. The accuracy rate of identifying buildings at different spatial scales from the proposed scheme.
Spatial ScaleSample IdentificationNumber of BuildingsNumber of Unidentified BuildingsNumber of Correctly Identified buildingsIdentification RateAccuracy Rate
500 m*500 m Ijgi 08 00247 i001170111300.93520.8176
Ijgi 08 00247 i002173014710.8497
1 km*1 km Ijgi 08 00247 i003520134380.9750.8639
Ijgi 08 00247 i004207018110.8744
Table 3. Overall building recognition rate based on different characteristics.
Table 3. Overall building recognition rate based on different characteristics.
ExperimentCharacteristicsAccuracy
B F B D OAKappaIdentification Rate
A0.85660.81740.8330
B 0.77060.70940.8330
C 0.38410.19400.9791
Table 4. The accuracy rate of identifying buildings with different spatial scales using the traditional method.
Table 4. The accuracy rate of identifying buildings with different spatial scales using the traditional method.
Spatial ScaleSamples IdentificationNumber of BuildingsNumber of Unidentified BuildingsNumber of Correctly Identified BuildingsIdentification RateAccuracy Rate
500 m*500 m Ijgi 08 00247 i005170111000.93520.6289
Ijgi 08 00247 i00617304910.2832
1 km*1 km Ijgi 08 00247 i007520132100.9750.4142
Ijgi 08 00247 i00820703410.1643

Share and Cite

MDPI and ACS Style

Zhuo, L.; Shi, Q.; Zhang, C.; Li, Q.; Tao, H. Identifying Building Functions from the Spatiotemporal Population Density and the Interactions of People among Buildings. ISPRS Int. J. Geo-Inf. 2019, 8, 247. https://doi.org/10.3390/ijgi8060247

AMA Style

Zhuo L, Shi Q, Zhang C, Li Q, Tao H. Identifying Building Functions from the Spatiotemporal Population Density and the Interactions of People among Buildings. ISPRS International Journal of Geo-Information. 2019; 8(6):247. https://doi.org/10.3390/ijgi8060247

Chicago/Turabian Style

Zhuo, Li, Qingli Shi, Chenyang Zhang, Qiuping Li, and Haiyan Tao. 2019. "Identifying Building Functions from the Spatiotemporal Population Density and the Interactions of People among Buildings" ISPRS International Journal of Geo-Information 8, no. 6: 247. https://doi.org/10.3390/ijgi8060247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop