Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City

Wang, Xi; Chen, Bin; Li, Xuecao; Zhang, Yuxin; Ling, Xianyao; Wang, Jie; Li, Weimin; Wen, Wu; Gong, Peng

doi:10.3390/rs14236143

Open AccessArticle

Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City

by

Xi Wang

^1,2,3,*,

Bin Chen

⁴,

Xuecao Li

⁵

,

Yuxin Zhang

^2,3,

Xianyao Ling

²

,

Jie Wang

⁶,

Weimin Li

⁷,

Wu Wen

⁸ and

Peng Gong

⁹

¹

Department of Automation, Tsinghua University, Beijing 100084, China

²

AI for Earth Laboratory, Cross-Strait Research Institute, Tsinghua University, Beijing 100084, China

³

Tsinghua Urban Institute, Tsinghua University, Beijing 100084, China

⁴

Future Urbanity & Sustainable Environment (FUSE) Lab, Division of Landscape Architecture, Faculty of Architecture, The University of Hong Kong, Hong Kong, China

⁵

College of Land Science and Technology, China Agricultural University, Beijing 100084, China

⁶

Peng Cheng Laboratory, Shenzhen 518000, China

⁷

2861 Data Technology, Tsinghua Science Park, Beijing 100084, China

⁸

Smart Steps Digital Technology, Joy City Office Building, Xicheng District, Beijing 100032, China

⁹

Urban Systems Institute, Department of Geography and Department of Earth Sciences, The University of Hong Kong, Hong Kong, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(23), 6143; https://doi.org/10.3390/rs14236143

Submission received: 9 November 2022 / Revised: 28 November 2022 / Accepted: 1 December 2022 / Published: 3 December 2022

(This article belongs to the Special Issue Geo-Information in Smart Societies and Environment)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate and timely mapping of essential urban land use categories (EULUC) is vital to understanding urban land use distribution, pattern, and composition. Recent advances in leveraging big open data and machine learning algorithms have demonstrated the possibility of large-scale mapping of EULUC in a new cost-effective way. However, they are still limited by the transferability of samples, models, and classification results across space, particularly across different cities. Given the heterogeneities of environmental and socioeconomic conditions among cities, in-depth studies of data and model adaptation towards city-specific EULUC mappings are highly required to support policy making, and urban renewal planning and management practices. In addition, the trending need for timely and detailed small land unit data processing with finer data granularity becomes increasingly important. We proposed a City Meta Unit (CMU) data model and classification framework driven by multisource data and artificial intelligence (AI) algorithms to address these challenges. The CMU Framework was innovatively applied to systematically set up a grid-based data model and classify urban land use with an improved AI algorithm by applying Moore neighborhood correlations. Specifically, we selected Xiamen, Fujian, in China, a coastal city, as the typical testbed to implement this proposed framework and apply an AI transfer learning technique for grid and parcel land-use study. Experimental results with our proposed CMU framework showed that the grid-based land use classification performance achieves overall accuracies of 81.17% and 76.55% for level I (major classes) and level II (minor classes), which is much higher than the parcel-based land use classification (overall accuracies of 72.37% for level I, and 68.99% for level II). We further investigated the relationship between training sample size and classification performance and quantified the contribution of different data sources to urban land use classifications. The CMU framework makes data collections and processing intelligent and efficient, with finer granularity, saving time and cost by using existing open social data. Incorporating the CMU framework with the proposed grid-based model is an effective and new approach for urban land use classification, which can be flexibly extended and applied to various cities.

Keywords:

city meta unit; data model; land grid; land parcel; land use; Xiamen

1. Introduction

Rapid urbanization has profoundly changed the built environment and affected residents’ daily life. To date, about 55% of the global population resides in cities, which is expected to be 70% by 2050 [1,2,3]. In the meantime, urban areas consume about 78% of global energy and account for 60% of greenhouse gas emissions. Given the pressing challenges in urban environmental problems, such as environment related disease [4] heat island threat [5] air pollution, clean water shortage, and renewable energy crisis, climate change [6], global awareness of sustainable urban planning [7], design and development has become increasingly far-reaching in academia, professionals, and society [8,9,10]. Among these, urban land use maps [11,12] that reflect socioeconomic functions and human activity attributions [13] are crucial for urban planning and management [14]. However, detailed urban land use classifications outlining the distribution, pattern, and composition of different land use types are continuously limited due to difficulties in: (i) coordinating financial support and professional manpower for on-site investigation and individual mapping; (ii) differentiating complex urban landscapes to different semantic land use type; and (iii) securing spatially and temporally explicit datasets of high-resolution urban scanning.

Urban diversity and social fabrics generate beauty in cities [15]. “Policy silos” exist [16]. To implement integrated policy making, effective information integrations are needed. In addition, timely updated land use maps are required for urban renewal development, especially for land policy making, which plays a key role in the provision of housing [7].

Fortunately, remote sensing and satellite technology have greatly enhanced our ability to observe the Earth’s surface on a large scale [17,18].and monitor its time-series changes [19,20]. Based on multisource remote sensing data [21], an accumulating body of research efforts has been conducted in the field of urban land use classification, which can be categorized into: (i) pixel-based mapping [22]; (ii) object-based mapping [23]; and (iii) scene-based mapping [24]. With the advancement of Internet technology, multidimensional social data are created exponentially [24,25]. Research related to social data is expanded quickly, including the use of mobile phones [26], taxis [27], Weibo, Jingdongshenghuo, POI data, and grid statistical data [28]. By combining remote sensing data and social data, land use classification and land use maps are increasingly improving and covering broader areas [29,30].

Noticeably, Gong et al. (2020) reported a new map of essential urban land use categories for the whole of China (EULUC-China) that uses 10-m satellite images, OpenStreetMap (OSM), nighttime lights, Point of Interest (POI), and Tencent location-based service data as feature inputs for machine learning based classification practices [31]. A crowdsourcing mapping approach is implemented by coordinating 68 research scientists from 21 research teams to collect training and validation samples for different cities in China. This work marks the beginning of a new paradigm of collaborative and collective urban land use mapping over large areas. It provides guided insights for scaling down city-specific characteristics of urban land use from a top-down perspective.

Although many progressive advances have been achieved in the campaign of urban land use classifications, the following needs from policy makers are still not satisfied. (1) Timely, detailed, and smaller land unit-based data analyses are needed for urban planning for large urban areas. Currently, the average parcel sizes are fairly large, exceeding square kilometers. For smaller land unit scales, intensive manual work is required to generate parcel land units, and collect and process data. (2) Across cities and regions comparisons are needed for socio-economic, environment, and biodiversity analysis. In order to compare information across different regions, city-specific studies are needed, and using similar data models and data sources is an essential prerequisite. (3) Digital simulations and projections of planned urban development require an accumulation of historical data for cities or urban areas. In order to accomplish effective data accumulations, new approaches in model creation and data collection techniques are needed.

Following up the EULUC-China study, city-specific investigations by different research teams regarding sample sensitivity, feature engineering, method adaptation, and classification scheme were performed in Ningbo [32], Nanjing [33], Lanzhou [34], Shenzhen [35], and Hangzhou [36]. As one of the crowd-sourcing teams conducting the EULUC study for Xiamen, we have done further studies for the city. As the Xiamen city area is much smaller than other cities in China, we found it is useful and necessary to make the parcel unit size much smaller. Significant efforts were made to manually select 9741 parcels with an average size of about one ha. We noted that small parcel land classification is time-consuming and costly. The amount of information per land unit is proportional to its unit size. However, the amount of work to collect the same amount of meaningful data is inversely proportional to the unit size, i.e., to accurately predict land use for smaller land units, one must collect additional GIS and social data. In the meantime, the number of land units increases hundreds of times for the same coverage area, making data processing more challenging. For example, a large amount of data must be collected and processed even for small areas of a city.

Through these findings, we have concluded the following aspects that require further in-depth studies: (i) fine-granular urban land use classification with a smaller size of the minimum classification unit. With a sub-meter resolution of satellite data and a large amount of social data, small land unit study becomes feasible. For instance, more precise land maps can be generated. This is important for urban study and planning, as larger land units sometimes consist of more different land types; (ii) Grid-based land use classification studies are lacking, especially for small classification grids equal to or less than 1 ha. Most urban land use studies are based on parcel units segmented by roads. With small land units, describing the parcel becomes labor-intensive or even impractical. In the meantime, most of the social data are frequently available in grid form, as a grid-based data system is more effective for creating and expanding; and (iii) A land use classification-oriented systematic framework for multisource and high-dimensional data storage, processing, and synthesizing is limited. Since both satellite data and social data are complex and large, sometimes it can be expanded beyond more than 100,000 plus dimensions and millions of data sources for many years temporally. Data-based solutions impose challenges.

This led us to investigate and systematically set up a CMU data model to collect and pre-process the data. In this study, we proposed a City Meta Unit (CMU) data model and classification framework driven by multisource data and artificial intelligence (AI) algorithms to address the challenges mentioned above. With the CMU data model, information can be stored over time systematically. This is the basic building block for city or urban renewal simulation, policymaking, planning and development. We chose Xiamen City as our case study. The main contributions of this study are as follows: (1) We established a city meta unit (CMU) data and model framework for processing multisource datasets into abstract grid-based feature layers, which consists of multi-level functions, including a foundation layer, summation feature layer, density index function, visualization analysis layer, and application solution layer; (2) We developed a new approach to the grid-based data model for urban land use classification by using the city of Xiamen as a testbed. We used an improved RF algorithm by applying Moore neighborhood correlations; and (3) We analyzed the classification performance in parcel-based and grid-based mapping practices, attempted an AI transfer learning technique for grid and parcel land use prediction, and further investigated the relationship between training samples and classification accuracy [35,37,38]. The CMU framework is the beginning of a new paradigm to discover a more effective methodology and means to address urban renewal and planning needs.

The remaining paper is structured as follows: Section 1 describes the background and reviews related work of this research. Section 2 describes the study area and data sources in detail. Section 3 introduces the CMU framework. Section 4 introduces the methodology. Section 5 illustrates the experimental results and analysis. Section 6 provides discussion of the results and future work. Section 7 presents conclusions of this study.

2. Study Area and Datasets

2.1. Study Area

Xiamen, a prefecture-level coastal city (24°23’N—24°54’N, 117°53’E—118°26’E) of Fujian province, is located in the southeast region of China (Figure 1). Xiamen is well known for its mild climate, Minnan culture, and livable environment. It is also one of the most beautiful sightseeing tourist places in China, with an area of 1700.61 square kilometers and about 5.16 million population. In 2021, Xiamen’s GDP reached 7003.9 billion RMB, with per capita GDP above 140 k RMB. Rapid urbanization growth over the past few decades has brought significant changes to the land use patterns in Xiamen, thus posing increasing challenges to urban planning and management of land, water, transportation, industry, energy, and development.

2.2. Datasets

We used multisource datasets, mainly categorized into two groups: (1) high-resolution satellite data from Gaofen-2 and Gaofen-7; and (2) social big data collected from publicly available Internet resources and different companies, such as Tsinghua 2861 DaaS Project, Zhihuizuji, Baidu, and Gaode.

2.2.1. Satellite Spectral and Textural Data

The Gaofen-2 satellite is the first civil optical remote sensing satellite developed in China with a spatial resolution of 1 m. The Gaofen-7 satellite is a high-resolution earth observation satellite, achieving sub-meter level stereo mapping accuracy. In this study, we collected Gaofen-2 data for 2015–2018 and Gaofen-7 for 2021.

We used satellite data sources to extract spectral and textural features for the CMU summation feature layer, which will be described in Section 3.2. Spectral features were calculated. Texture features were calculated using the grey level concurrence matrix (GLCM) [39] with parameters as follows: row and column number of processing window are 3, co-occurrence shift in X and Y dimensions is 1, and greyscale quantization level is 64.

2.2.2. Social Big Data

POI data were collected from 2019 to 2021 from Shuijingzhu, which contains information including name, location coordinates, urban function attributes, etc. A total of 437,085 POIs were retained in Xiamen after data cleaning and filtering. We checked the geospatial projection, mapped the POIs into 4 different groups, and then calculated the proportion and total number of POIs in each grid, as shown in Figure 2a.

The Tsinghua 2861 DaaS Project is an Internet-based data collection system. It takes crowdsourcing Internet data as inputs and builds about 9.8 million information grids for China. The grid size of 2861 index data is about 0.010869 × 0.008983 degree. There are 18 indexes in total. We used a mapping algorithm to calculate the corresponding index from the original index data. For example, Figure 2b below describes 2861 shopping convenience level.

Mobile statistics of location-based service records are very useful in city studies. Compared with other types of data, this has the advantage of integrated full coverage of activities in time and space. The grid size of mobile data is 0.001 × 0.001 degree. We used data from Zhihuizuji company to calculate the number of people who live, work, or visit a specific grid of Xiamen in December 2021. We set up the projected number of residents, workers, and visitors for the grid or parcel, as shown in Figure 3.

We also collected the WorldPop population dataset (https://www.worldpop.org/, accessed on 10 January 2021), which provides the estimated number of people residing in each 100 × 100 m grid based on a random forest model and a global database of administrative unit-based census information [40].

Building data of Xiamen were downloaded from Shuijingzhu. The original source of the construction data from Shuijingzhu were based on a combination of Baidu Map and Gaode Map. We used the data to calculate the number of buildings, total coverage area, and the average building story for each parcel or grid, as shown in Figure 4.

Road data were obtained from the OSM platform (http://www.openstreetmap.org, accessed on 10 January 2021). The raw OSM road network comprises 27 categories of road types: primary, secondary, trunk, pedestrian, and so on. Specifically, we included nine major types of roads in this study. They are primary, primary link, secondary, secondary link residential, residential link, tertiary, tertiary link, and trunk road types (Figure 5).

3. CMU Framework

We proposed a City Meta Unit (CMU) framework for data processing with three specific objectives: (1) To enable a scalable and traceable multi-dimensional meta-model framework for collecting, storing, describing, and grouping citywide multisource data; (2) Based on this framework of data structure, processing the data by calculating hierarchically grouped information can be used as feature input for applications; and (3) To make solution-oriented AI algorithms more effective and to realize different AI algorithms as applications in the proposed framework. The diagram of the CMU framework is shown in Figure 6.

3.1. CMU Foundation Layer

A data layer is created in the data model for each data source. For example, satellite images from the Gaofen series, POIs, and human mobility from the location-based service data are sorted and stored. We grouped the data layers based on the nature of the data, for example, traffic, population, building, education, environments, etc. This practice scheme is called the CMU Foundation Layer (CMU FL).

3.2. CMU Summation Feature Layer

We grouped and summarized the data based on the geometric unit (grid or parcel), which serves as feature collections for subsequent algorithm processing and application implementations. In the meantime, it also serves as a data abstraction to reduce storage needs and improve application efficiency. Scalable grid size can be customized with the geometric dimension by summating a smaller grid. In this study, we have created both 0.001°N × 0.001°E and 0.01°N × 0.01°E grids. Regarding the temporal dimension, historical data can be accumulated parallel for model simulation and time-series analysis. For instance, the summation of remote sensing data is calculated for spectral and textural statistics; for the summation of POI data is calculated for commercial activities analysis. We called this the CMU Summation Feature Layer (CMU SFL).

3.3. CMU Density Index Layer

Based on the CMU SFL, abstracting or grouping certain features together is very useful to create a density function or index function layer. A specific feature density function is defined as the area of the feature divided by the area of the grid or parcel. For example, based on the NDVI of multispectral remote sensing data, we can create a greenspace density function to describe the spatial extent and magnitude of greenspace coverage in a grid or parcel. The same function can be applied to other land cover types such as water, road, building, etc. We also added a weighting factor function to account for the fact that certain features, like the number of POIs, shall be amplified to account for missing areas occupied by grass (greenspace density), for example. An index function was defined for its specific attribute for a grid or parcel. For instance, urban environmental information such as PM_2.5 and carbon consumption can be added as a spatially explicit index and the nighttime light (NTL) intensity. We called this the CMU Density Index Layer (CMU DIL).

3.4. CMU Visualization Analysis Layer

We have created a Visualization Analysis Layer to present data in two or three dimensions in space. For example, the number of POIs can be displayed in three dimensions (Figure 7). The number of POIs is much bigger in the central urban area. We find visualization tools like this very useful and supportive in using AI algorithms as they can correlate features with the study grid spatially explicitly.

The data collection and preparation of a digital city can be massive, with data dimensions exceeding millions and data sources exceeding hundreds of millions. We, therefore, developed knowledge graph (KG) tools to describe the ontology of the data model. For example, we used KG to describe POIs information (Figure 8). We called this the CMU Visualization Analysis Layer (CMU VAL).

3.5. CMU Application Solution Layer

After setting up the data model, one can easily use the data model to study or solve application problems in city planning, traffic control, and renewable new energy needs analysis. These applications can be added as part of the framework. We called this the CMU Application Solution Layer (CMU ASL). Section 4 will use Xiamen as a case study and apply the CMU framework to generate urban land use classification.

4. CMU-Based Xiamen Land Use Study

We proposed a systematic approach to grid and parcel land use classification. For the land grid, the grid is set up as 0.001°N × 0.001°E and 0.01°N × 0.01°E. For land parcels, we use the OpenStreetMap road network to generate the land parcels [41,42]. We used the CMU data model as an application example to study Xiamen City land use and generate land use maps.

4.1. Proposed Method

In this study, we used a modified EULUC scheme (Table 1) for land use classification because there is insufficient information for analysis at a smaller grid size. For grid-level analysis, it contains seven Level I land-use classifications (Residential, Commercial, Industrial, Public management and service, road, greenspace, water) and 11 Level II land-use classifications (Residential for both low- and high-rise building, Business office, Commercial service, Public & Admin, Road first class, Road second class, Road third class, Greenspace, Water) were formed, as shown in Table 1. We named the modified EULUC as GULUC (Grid Urban Land Use Classification). For parcel-level analysis, it contains four Level I land-use classifications (Residential, Commercial, Industrial, Public management and service) and seven Level II land-use classifications (Residential for both low- and high-rise buildings, Business office, Commercial service, Public & Admin, Greenspace). We named the modified EULUC as PULUC (Parcel Urban Land Use Classification).

The implementation of the proposed method is shown as Figure 9. We have incorporated both grid-based and parcel-based land use classification. For the grid-based study, we found that it is a new study area as the grid naturally combines different land cover types. We formulated the solutions by using exclusion–inclusion techniques. First, road, green, and water density functions were used to identify road, greenspace, and water grids, respectively. Second, grids identified were excluded, then a random forest algorithm was used to predict the remaining classes, including Residential, Public, Commercial, and Industrial. Lastly, a Moore neighborhood algorithm was applied to increase the accuracy of the RF algorithm.

4.2. Data Preparation

We used CMU FL created in 4.1 and combined them in Table 2. Parcel data preparation was the same as grid data preparation, except that road features were not considered because OSM was used to segment and group the parcels [41,42,43].

All features in Table 2 were obtained from verifiable data sources, which were also verified by our team and have been widely used in other projects: (1) Gaofen satellite images are processed using remote sensing image calibration. Visual verifications are performed for the specific sample points; (2) For POI data, multiple POI points are selected for verification using the visual method to ensure that POI points are consistent with the actual sites in the real world. We have also developed a visualization tool to study POI characteristics in each land type; (3) 2861 index data were originally produced based on the open social data on the Internet with rigorous data processing. We verified the data manually in Xiamen; (4) To verify the precision of mobile data, we validated the distribution and the trend of data with the actual activities on the ground by analyzing the heat map created by the raw grid data; (5) The WorldPop population dataset was downloaded from the WorldPop website. The WorldPop team started complementing traditional population sources with dynamic, high-resolution data for mapping human population distributions in 2004, cross-checked by Zhihuizhuji and 2861 data. (6) Visual verifications were performed on building data using street view pictures from Baidu and online map information; (7) We confirmed the precision of road data through field and visual verifications based on remote sensing images; and (8) Haihang and Xiamen local teams verified 9741 parcel samples. Through Tabulate Intersection selection, grid samples were processed initially, which were further verified manually.

In addition to the features in Table 2, we further derived Road, Greenspace, Water density functions for each grid.

(1): For road density function: OSM road data were used to calculate the area of roads. Road width value was specified for each road according to its type, road areal vector data were obtained by using the buffer tool in ArcGIS. After drawing the buffer regions of all types of roads, areal vector data of roads in Xiamen were generated. The road density of each grid was calculated as the area of roads divided by the area of the grid (Figure 10).

(2): For Green Density Function: We used Gaofen7 data to construct the density function. In Gaofen7 satellite data, there are four bands which are RED, GREEN, BLUE, and NIR, we used these bands and grid vector data to calculate NDVI and the fractional green coverage. Specifically, the green density of each grid was calculated as the greenspace area divided by the grid’s area (Figure 11).

(3): For Water Density Function: Like the NDVI mentioned above, the NDWI was calculated as follows: the water density of each grid was calculated as the water area divided by the grid’s area (Figure 12).

We first used exclusion–inclusion techniques [44]. Based on the road, green, and water density function created above, we assigned the thresholds as 30%, 70%, and 75% for road, green, and water, respectively, as we concluded that these are the optimal choices after on-site and images verifications. For example, we set up a threshold of 0.7 according to the green density index to determine whether the grid was greenspace. If the green density index was above 0.7, the grid was more likely to be greenspace and vice versa.

For road, green, and water, the following land grids were identified, with 8380 road grids, 23,258 greenspace grids, and 17,292 water grids. Visual inspection was performed. The overall accuracy for the road was 90.37%, with 374 testing samples. The overall accuracy for greenspace was 87.39%, with 333 testing samples. The overall accuracy for water was 79.28%, with 362 testing samples. We applied the same procedures for parcel land use analysis to exclude green and water parcels.

4.3. Random Forest (RF) for Urban Land Use Classification

We collected 2800 grid training samples and 600 grid testing samples. In the meantime, we also collected 6284 parcel training samples and 699 parcel testing samples provided by Haihang company, for which we have arranged a research team for on-site investigations in Xiamen to verify the sample quality.

Firstly, we completed 13,000 experiments for the grid and 7000 for the parcel, respectively. Each experiment runs 1000 times and uses unique combinations of different CMU SFL features from different sources, including Satellite only, social data only, Satellite + POI, Satellite + 2861 index, Satellite + WorldPop, Satellite + zhihuizuji mobile, All features, etc. Second, we also conducted a total of 120,000 experiments to study the relationship between training sample size and testing accuracy, which also covered different combinations of CMU SFL features from different data sources. Third, we attempted a new method to use parcel land training samples to train the model, then apply it to grid-based land classification and vice versa. Because those samples are costly to process, such usage helps to expand the studies to all cities in China.

4.4. Using Moore Neighborhood to Improve Land Use Prediction

A new research method was developed to use Moore neighborhood to increase the accuracy of the RF algorithm [45,46]. For the RF algorithm voting scheme, when two or more predictions have similar probabilities, the high error rate is developed to be calculated by testing samples, which will be described in detail in Section 5.1. We use Moore 3 × 3 neighborhood to determine the grid type with a threshold defined by the confidence level is equal to or less than 60%. The algorithm is as follows: we select the Moore neighborhood of eight cells (grids) around the uncertain grid. Of these eight cells, we calculate the number of grids corresponding to the most confident prediction of the uncertain grid as A, and the number corresponding to the second most confident prediction as B. Suppose A > B, then the uncertain grid will be chosen as the most confident prediction, and vice versa. If A and B are zero, then the most confident prediction voting wins. Take Figure 13 for an example, the most confident prediction of the uncertain grid is Industrial, and the second most confident prediction is Public. We have three certain grids in which confidence is higher than a certain threshold (here is 60%) in the Moore Neighborhood, two of them are Industrial and the other one is Public. According to the algorithm, we can determine the type of uncertain grid as Industrial.

5. Results and Analysis

5.1. Grid Experiments and Performance

Different data sources and feature combinations contribute to overall accuracy (Figure 14). (1) We compared the classification results derived from satellite data and social data. Satellite only data achieved 65.97%, indicating the importance of high-resolution images. Social data achieved 75.03%, indicating the importance of involving human activities in interpreting land use functions. The CMU data model framework was created for using social data more effectively. (2) We quantified the contribution of each data type in detail. Satellite spectral only achieved 58.80%, which is almost the same as satellite texture. WorldPop data has only one feature, but it achieved 45.08%, which indicates that the corresponding feature has a greater contribution. POI data only achieved 55.87%, which surpassed other social data such as WorldPop, Mobile statistic, and the 2861 index. (3) We tested each satellite data with a combination of different social data sources. The combination of satellite and 2861 index data achieved 75.86% and surprisingly surpassed other combinations, as shown in Experiment 2, which indicated that 2861 index data is more complimentary to satellite data. (4) By combining all satellite and social data, the overall accuracy achieved 80.67% with a kappa coefficient of 0.7194, which indicates multiple dimensionalities of urban land uses are important to complement high-level semantic urban land use differentiation.

We further analyzed the importance of features contributing to the performance of the RF model. As shown in Figure 15, the top 5, top 10, and top 15 of all features can achieve 68.91%, 75.47%, and 78.97%, respectively. The satellite with the top five social features achieved 75.90%, and the top 10 social features achieved 79.24%.

The confusion matrix results are as follows.

For Level I, the RF algorithm achieves 80.33% of OA with a kappa coefficient of 0.7146 (Table 3).

Based on the RF algorithm, the Moore neighborhood algorithm achieved 81.17% of OA with a kappa coefficient of 0.7253 (Table 4).

For Level II, The OA achieved 76.55% with a kappa coefficient of 0.6847 (Table 5).

We find that Public Management & Services, Industrial are easily misclassified as Resident and vice versa. By examining the RF prediction voting scheme, we find that prediction accuracy improved as the most confident prediction (Figure 16a) increases or the difference between the most confident prediction and the second most confident prediction increases (Figure 16b). As shown in the chart below for a total of 600 testing samples, (a) indicates the most confident prediction increases from 0.3 to 1 (30% to 100%), and the correct predictions increase; and (b) the difference between the most confident prediction and the second most confident prediction increases from 0 to 1, and the correct predictions increase. When the first and second most confident predictions are almost the same, the correct prediction is less than 50%. Therefore, we need to focus on those uncertain ranges of voting confidence to improve classification accuracy further.

By setting up the threshold of confidence level to 60%, we improved the OA by 0.84%. We then used Moore neighborhood to further predict the grid type with different combinations of data sources. The Moore neighborhood results are shown in Table 6. We find that the lower the RF accuracy, the more significant the resulting improvement, ranging from 1% to 2%. When the accuracy surpasses 80%, the improvement becomes limited. All in all, the results prove the algorithm’s effectiveness in grid land use prediction.

5.2. Parcel Experiments and Performance

In parcel-based land use classification, we also quantified the contributions of different feature combinations to overall accuracy (Figure 17). The OA achieved 69.54% for the RF algorithm with a kappa coefficient of 0.55. Satellite only achieved an OA of 57.68%, while social data only achieved 57.54%. Interestingly, the derived classifications from these two scenarios were very close, indicating that social and satellite data are equally important to land use classification at parcel levels. 2861 index data achieved an OA of 64.26%, higher than other social data and verified the importance of more data dimensionality.

The confusion matrix results are as follows.

For Level I, the RF algorithm achieves 72.37% of OA with a kappa coefficient of 0.5841 (Table 7).

For Level II, the OA achieved 68.99% with a kappa coefficient of 0.5240 (Table 8).

A land use map was produced for land grid study by combining the results of RF predictions with the road, green and water grids. For the land parcel study, a land use map was produced by combining the results of RF predictions with green and water parcels. The map is limited to the selected parcels for the area provided by Haihang. The detailed maps of Xiamen are presented in Figure 18.

5.3. Sensitivity of Training Sample Size

We compared two scenarios, in which training sample sizes of different land types were either proportional, as in the raw data, or balanced. The parcel data we used in this experiment is shown in Table 9, and the results are shown in Figure 19. With 120,000 experiments, we conclude that with the average parcel size of 6284, a total of 6000 training samples under the scenario of feeding all features will reach 69.54% accuracy in the first scenario and 73.25% in the second scenario. Overfitting occurs as we continue to add more samples in both scenarios.

5.4. Grid and Parcel Exchange Experiments

We used grid-based land use classification results to predict parcel-based land use. We overlapped grids with land parcels, the most dominant land use was used to determine the parcel land type. The confusion matrix results are listed in Table 10.

We also tested to predict grid-based land use using an RF algorithm trained by parcel training samples. There are 6284 parcel training samples and 600 grid testing samples. The confusion matrix results are listed in Table 11.

6. Discussion

6.1. CMU Data Model and Data Granularity

We find that the combination of remote sensing and social data achieves the best land classification performance results. In the meantime, the satellite data plus the top 5 and top 10 of social features achieved OA 75.90% and 79.24%, respectively, indicating redundancy of multiple dimensionalities. PCA analysis can be used to further the data model research, etc. [47,48]. Experiments also indicated that we can extend the experiments to other cities efficiently with fewer but important data selections based on the CMU framework. Secondly, CMU data model abstraction is multi-dimensional. Sparsity along the geometric dimension for both grid or parcel types is common. For example, less or non-POI data exist in some green space or rural area grids. Therefore, it is crucial to study the data pattern using data visualization analysis tools such as CMU VAL. Thirdly, adding additional social data to fulfill the sparsity along specific dimensions is necessary. In some cases, when obtaining additional social information is difficult, one can explore using the crowdsourcing method for adding additional CMU data sources [31]. Lastly, from a data science perspective, data granularity and transparent feature structures provide data insight. A CMU framework enables such capabilities.

In addition, the proposed grid-based CMU framework can be flexibly extended to other cities in practice: (1) the digital structure of CMU data model is reusable for other cities or urban areas; (2) data collection and processing work can be leveraged because of the availability of remote sensing and social data in grid format from satellite imagery, 2861, Zhihuizuji, Baidu, etc. [43]. In addition, area-specific data can be added as well; (3) For CMU ASL, either algorithms already trained or training samples for a specific city can be reused as a base for other cities. City-specific characteristics can be addressed with tuning of the CMU ASL algorithms and adding additional relevant training samples. As more cities are added, training samples and CMU ASL solutions will be accumulated for improved performance and future use; (4) The new methods of combining Moore neighborhood and RF algorithm, grid and parcel exchange analysis in land use prediction can also be utilized and generalized to other cities and regions in China, or internationally.

6.2. Grid and Parcel Exchange Analysis

A large volume of data collection and costly data processing work are needed to improve prediction accuracy. The data characteristics or training features are the same for both parcel-based and grid-based training samples. Therefore, for example, using grid-based training samples to train the RF tree model and apply it to the parcel-based land classification saves time and cost. Such concepts and practices are common in transfer learning models [49]. By doing this, one can leverage work already done for the collection of training samples, which are labor intensive and costly.

Using grid-based land use classification results to predict parcel land use, we found that the OA reached 71.36%, which is partially or even better than the parcel-based trained model itself. While the grid-based model is more efficient to set up, this finding can be further expanded to use a grid-based model for parcel-based land use. On the other hand, while using parcel trained RF model to predict grid land use, an OA of 71.69% was achieved, which was lower than the aforementioned grid-based model; however it verifies the feasibility of this new method.

6.3. Sensibility Analysis of Training Sample Size

We find that prediction accuracy improves with the increase of training sample size and improves much faster while the sample size is less than 1000. As shown in Figure 19a, it takes 80 samples to reach 70% of the accuracy range (with the range from 0% to 69.54%), 320 samples to reach 80%, and 1200 samples to reach 90%, respectively, which is consistent with the stable sampling concept Gong et al. [50]. We verified that the accuracy curve covering these three key sample size points is the same in different feature combinations in the first scenario of sample size testing. We need 40, 80 and 800 samples for the second scenario to reach the same conditions, respectively. This is somewhat surprising since it reveals and verifies that the sample size range for RF training accuracy can be accurately predicted.

6.4. Limitations and Future Research

Our study was limited to the urban area of Xiamen city and has not further researched rural areas and other cities so far. In addition, we use the RF algorithm in this study because it has been proven effective in land cover and use classification. However, it is worth exploring other, more effective AI algorithms. Although we collected and processed a larger amount of data and also studied and clarified the top data feature contributions, we concluded that additional data sources are needed to enhance solution accuracies. To expand the CMU model in volume and in higher dimension, we believe that new methods to use AI and other means to automatically collect and process data define future research needs and digital technology trends.

The CMU framework can be used to address the urban housing and renewal challenges in terms of analyzing of quantity, quality, and distribution need [7], and provides policy makers with timely updated information [51]. The CMU data model is multidimensional. Additional data can be added and accumulated. With additional cities added, comparable comparisons with similar regions, cities can be accomplished. With accumulations of data, future growth projection and carrying capability studies can be added [52]. Such capabilities are very important for policymaking and urban reviewal planning. In addition, the CMU framework can be further used to study city-specific functions, such as traffic control, economic analysis, environment managements, renewal energy planning, etc.

In particular, there are several areas of research interest for the future: (1) Grid-based land use methodology can be extended to cover all cities. Knowledge sharing can be further studied for CMU ASL AI algorithms and training samples sharing, as it has been done for global land cover study [37] (2) For the setting up of the CMU data model to multiple cities with multimillion data dimensions, it becomes necessary to use automated AI methods to collect and process data for the model. Such studies are important for future digital and intelligent city research; (3) Different AI algorithms can be added and/or tested with the CMU data model, including SVM [53], ANN, CNN [54], etc. (4) Applications using AI algorithms can be added to the framework in the CMU Application Solution Layer CMU ASL for urban renewal development, traffic management, renewable energy planning, etc.; (5) We used exclusion–inclusion techniques [44] to identify green, water, and road grids. By combining the results with RF predictions, the overall accuracy is 84.06%. Since roads, green space, and water cover more land areas, further analysis can be done for rural areas, and to further study combinations of CMU ASL algorithms; and (6) As social data sources are accumulated temporally, city development projections and simulations can be added to the model [25,55,56,57].

7. Conclusions

Addressing urban land use classification challenges in the development curve from digital city, intelligent city to now meta city, we established a CMU framework for city-specific studies. First, it combines remote sensing data with open social data, which can be served as a framework for meta-city study from data collection, features summation, and abstraction across space and time (historical simulation and future projections). The CMU framework consists of five layers of functions, including the Foundation Layer (CMU FL),Summation Feature Layer (CMU SFL), Density Index Layer (CMU DIL), Visualization Analysis Layer (CMU VAL), and Application Solution Layer (CMU ASL). Second, we implemented the proposed CMU framework for a systematic grid-based land use classification by leveraging the city of Xiamen as a testbed. Third, the large data size of meta-city analysis imposes challenges for data collection and processing. We studied the relationship between grid-based and parcel-based land classifications and concluded that the two models complement each other and can be reused robustly. Fourth, we also considered the factors that neighboring grids influence each other by applying the Moore neighborhood concept to the study. With the integration of Moore neighborhood methods, we can improve RF accuracy by resolving RF tree prediction uncertainty, which can be leveraged as a good strategy for guiding future land use classification practice. Finally, the proposed grid-based CMU framework can be flexibly extended to other cities in practice, mainly because of the availability of remote sensing and social data from satellite imagery, 2861, Zhihuizuji, Baidu, etc. Such a study shall enable urban land use analysis and planning more effectively by leveraging fast-advancing digital twin technology since most social data are more conveniently available in a grid format. It presents a detailed demonstration of data-rich experiment and model-driven framework for essential urban land use classification, which can be adapted to any other cities across the globe.

Author Contributions

Conceptualization, X.W.; methodology, X.W. and P.G.; formal analysis, X.W., B.C., X.L. (Xuecao Li) and X.L. (Xianyao Ling); writing—original draft preparation, X.W., Y.Z. and X.L. (Xianyao Ling); writing—review and editing, B.C. and X.L. (Xuecao Li); software, J.W. and Y.Z.; investigation, X.W., B.C., and X.L. (Xuecao Li); resources, X.W. and P.G.; data curation, W.L., W.W., J.W.,Y.Z. and X.L. (Xianyao Ling); visualization, X.W. and Y.Z.; supervision, X.W.; funding acquisition, P.G. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by City Digital Appraisel research project and High-resolution GIS Study for Urban Management Phase II project estabilished by Ministry of Housing and Urban Rural Development of the People’s Republic of China.

Acknowledgments

This research was supported by Department of Automation, AI for Earth Laboratory, Cross-strait Research Institute, Tsinghua University. We express our appreciation for Professor Jun Li for his guidance for data model study, and for our colleagues Linping Deng, Shengjie Xu, Yifan Liu in the laboratory for their support of field inspection and software preparation.

Conflicts of Interest

The authors declare no conflict of interest.

References

The World Bank. Available online: https://www.worldbank.org/en/topic/urbandevelopment/overview (accessed on 16 July 2020).
United Nations. World Urbanization Prospects: The 2014 Revision; United Nations: New York, NY, USA, 2015. [Google Scholar]
Angel, S.; Parent, J.; Civco, D.L.; Blei, A.; Potere, D. The dimensions of global urban expansion: Estimates and projections for all countries, 2000–2050. Prog. Plan. 2011, 75, 53–107. [Google Scholar] [CrossRef]
Gong, P.; Liang, S.; Carlton, E.J.; Jiang, Q.; Wu, J.; Wang, L.; Remais, J.V. Urbanisation and health in China. Lancet 2012, 379, 843–852. [Google Scholar] [CrossRef] [PubMed]
Clinton, N.; Gong, P. MODIS detected surface urban heat islands and sinks: Global locations and controls. Remote Sens. Environ. 2013, 134, 294–304. [Google Scholar] [CrossRef]
Yang, J.; Gong, P.; Fu, R.; Zhang, M.; Chen, J.; Liang, S.; Xu, B.; Shi, J.; Dickinson, R. The role of satellite remote sensing in climate change studies. Nat. Clim. Change 2013, 3, 875–883. [Google Scholar] [CrossRef]
Shahab, S.; Hartmann, T.; Jonkman, A. Strategies of municipal land policies: Housing development in Germany, Belgium, and Netherlands. Eur. Plan. Stud. 2020, 29, 1132–1150. [Google Scholar] [CrossRef]
Zhou, Y.; Li, X.; Asrar, G.R.; Smith, S.J.; Imhoff, M. A global record of annual urban dynamics (1992–2013) from nighttime lights. Remote Sens. Environ. 2018, 219, 206–220. [Google Scholar] [CrossRef]
Wang, L.; Li, C.; Ying, Q.; Cheng, X.; Wang, X.; Li, X.; Hu, L.; Liang, L.; Yu, L.; Huang, H.; et al. China’s urban expansion from 1990 to 2010 determined with satellite remote sensing. Chin. Sci. Bull. 2012, 57, 2802–2812. [Google Scholar] [CrossRef] [Green Version]
Gong, P. Remote sensing of environmental change over China: A review. Chin. Sci. Bull. 2012, 57, 2793–2801. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Tu, Y.; Song, Y.; Theobald, D.M.; Zhang, T.; Ren, Z.; Li, X.; Yang, J.; Wang, J.; Wang, X.; et al. Mapping essential urban land use categories with open big data: Results for five metropolitan areas in the United States of America. ISPRS J. Photogramm. Remote Sens. 2021, 178, 203–218. [Google Scholar] [CrossRef]
Chen, B.; Xu, B.; Gong, P. Mapping essential urban land use categories (EULUC) using geospatial big data: Progress, challenges, and opportunities. Big Earth Data 2021, 5, 410–441. [Google Scholar] [CrossRef]
Li, X.; Zhou, Y. Urban mapping using DMSP/OLS stable nighttime light: A review. Int. J. Remote Sens. 2017, 38, 6030–6046. [Google Scholar] [CrossRef]
Huang, H.; Chen, Y.; Clinton, N.; Wang, J.; Wang, X.; Liu, C.; Gong, P.; Yang, J.; Bai, Y.; Zheng, Y.; et al. Mapping major land cover dynamics in Beijing using all Landsat images in Google Earth Engine. Remote Sens. Environ. 2017, 202, 166–176. [Google Scholar] [CrossRef]
Hartmann, T.; Jehling, M. From diversity to justice—Unraveling pluralistic rationalities in urban design. Cities 2019, 91, 58–63. [Google Scholar] [CrossRef]
Rayner, J.; Howlett, M. Introduction: Understanding integrated policy strategies and their evolution. Policy Soc. 2009, 28, 99–109. [Google Scholar] [CrossRef] [Green Version]
Yu, L.; Shi, Y.; Gong, P. Land cover mapping and data availability in critical terrestrial ecoregions: A global perspective with Landsat thematic mapper and enhanced thematic mapper plus data. Biol. Conserv. 2015, 190, 34–42. [Google Scholar] [CrossRef]
Gong, P.; Zhang, W.; Yu, L.; Li, C. A new research paradigm for global land cover mapping. Ann. GIS 2016, 22, 87–102. [Google Scholar] [CrossRef] [Green Version]
Yu, L.; Wang, J.; Gong, P. Improving 30 m global land-cover map FROM-GLC with time series MODIS and auxiliary data sets: A segmentation-based approach. Int. J. Remote Sens. 2013, 34, 5851–5867. [Google Scholar] [CrossRef]
Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2012, 34, 2607–2654. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Huang, B.; Xu, B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J. Photogramm. Remote Sens. 2017, 124, 27–39. [Google Scholar] [CrossRef]
Gong, P.; Marceau, D.J.; Howarth, P.J. A comparison of spatial feature extraction algorithms for land-use classification with SPOT HRV data. Remote Sens. Environ. 1992, 40, 137–151. [Google Scholar] [CrossRef]
Zhang, X.; Chen, G.; Wang, W.; Wang, Q.; Dai, F. Object-based land-cover supervised classification for very-high-resolution UAV images using stacked denoising autoencoders. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3373–3385. [Google Scholar] [CrossRef]
Li, X.; Gong, P.; Liang, L. A 30-year (1984–2013) record of annual urban dynamics of Beijing City derived from Landsat data. Remote Sens. Environ. 2015, 166, 78–90. [Google Scholar] [CrossRef]
Li, X.; Gong, P. Urban growth models: Progress and perspective. Sci. Bull. 2016, 61, 1637–1650. [Google Scholar] [CrossRef]
Chen, B.; Song, Y.; Huang, B.; Xu, B. A novel method to extract urban human settlements by integrating remote sensing and mobile phone locations. Sci. Remote Sens. 2020, 1, 100003. [Google Scholar] [CrossRef]
Liu, Y.; Wang, F.; Xiao, Y.; Gao, S. Urban land uses and traffic ‘source-sink areas’: Evidence from GPS-enabled taxi data in Shanghai. Landsc. Urban Plan. 2012, 106, 73–87. [Google Scholar] [CrossRef]
Jia, Y.; Ge, Y.; Ling, F.; Guo, X.; Wang, J.; Wang, L.; Chen, Y.; Li, X. Urban Land Use Mapping by Combining Remote Sensing Imagery and Mobile Phone Positioning Data. Remote Sens. 2018, 10, 446. [Google Scholar] [CrossRef] [Green Version]
Hu, T.; Yang, J.; Li, X.; Gong, P. Mapping urban land use by using landsat images and open social data. Remote Sens. 2016, 8, 151. [Google Scholar] [CrossRef]
Zhong, Y.; Su, Y.; Wu, S.; Zheng, Z.; Zhao, J.; Ma, A. Open-source data-driven urban land-use mapping integrating point-line-polygon semantic objects: A case study of Chinese cities. Remote Sens. Environ. 2020, 247, 111838. [Google Scholar] [CrossRef]
Gong, P.; Chen, B.; Li, X.; Liu, H.; Wang, J.; Bai, Y.; Chen, J.; Chen, X.; Fang, L.; Feng, S.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef] [Green Version]
Tu, Y.; Chen, B.; Zhang, T.; Xu, B. Regional Mapping of Essential Urban Land Use Categories in China: A Segmentation-Based Approach. Remote Sens. 2020, 12, 1058. [Google Scholar] [CrossRef]
Sun, J.; Wang, H.; Song, Z.; Lu, J.; Meng, P.; Qin, S. Mapping Essential Urban Land Use Categories in Nanjing by Integrating Multisource Big Data. Remote Sens. 2020, 12, 2386. [Google Scholar] [CrossRef]
Zong, L.; He, S.; Lian, J.; Bie, Q.; Wang, X.; Dong, J.; Xie, Y. Detailed Mapping of Urban Land Use Based on Multi-Source Data: A Case Study of Lanzhou. Remote Sens. 2020, 12, 1987. [Google Scholar] [CrossRef]
Su, M.; Guo, R.; Chen, B.; Hong, W.; Wang, J.; Feng, Y.; Xu, B. Sampling Strategy for Detailed Urban Land Use Classification: A Systematic Analysis in Shenzhen. Remote Sens. 2020, 12, 1497. [Google Scholar] [CrossRef]
Mao, W.; Lu, D.; Hou, L.; Liu, X.; Yue, W. Comparison of Machine-Learning Methods for Urban Land-Use Mapping in Hangzhou City, China. Remote Sens. 2020, 12, 2817. [Google Scholar] [CrossRef]
Zhao, Y.; Gong, P.; Yu, L.; Hu, L.; Li, X.; Li, C.; Zhang, H.; Zheng, Y.; Wang, J.; Zhao, Y.; et al. Towards a common validation sample set for global land-cover mapping. Int. J. Remote Sens. 2014, 35, 4795–4814. [Google Scholar] [CrossRef]
Li, C.; Gong, P.; Wang, J.; Zhu, Z.; Biging, G.S.; Yuan, C.; Hu, T.; Zhang, H.; Wang, Q.; Li, X.; et al. The first all-season sample set for mapping global land cover with Landsat-8 data. Sci. Bull. 2017, 62, 508–515. [Google Scholar] [CrossRef] [Green Version]
Rao, D.S.; Prasad, A.V.V.; Nair, T. Application of Texture Characteristics for Urban Feature Extraction from Optical Satellite Images. Int. J. Image Graph. Signal Process. 2014, 7, 16–24. [Google Scholar] [CrossRef] [Green Version]
Open Spatial Demographic Data and Research. Available online: https://www.worldpop.org/ (accessed on 10 January 2021).
Goodchild, M.F. Citizens as sensors: The world of volunteered geography. GeoJournal 2007, 69, 211–221. [Google Scholar] [CrossRef] [Green Version]
Haklay, M. How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets. Environ. Plan. B Plan. Des. 2010, 37, 682–703. [Google Scholar] [CrossRef] [Green Version]
Liu, X.; Long, Y. Automated identification and characterization of parcels with OpenStreetMap and points of interest. Environ. Plan. B Plan. Des. 2015, 43, 341–360. [Google Scholar] [CrossRef]
Li, X.; Gong, P. An “exclusion-inclusion” framework for extracting human settlements in rapidly developing regions of China from Landsat images. Remote Sens. Environ. 2016, 186, 286–296. [Google Scholar] [CrossRef]
Li, X.; Liu, X.; Gong, P. Integrating ensemble-urban cellular automata model with an uncertainty map to improve the performance of a single model. Int. J. Geogr. Inf. Sci. 2015, 29, 762–785. [Google Scholar] [CrossRef]
Chen, J.; Gong, P.; He, C.; Luo, W.; Tamura, M.; Shi, P. Assessment of the Urban Development Plan of Beijing by Using a CA-Based Urban Growth Model. Photogramm. Eng. Remote Sens. 2002, 68, 1063–1072. [Google Scholar]
Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Uddin, M.P.; Mamun, M.A.; Hossain, M.A. PCA-based Feature Reduction for Hyperspectral Remote Sensing Image Classification. IETE Tech. Rev. 2020, 38, 377–396. [Google Scholar] [CrossRef]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef] [Green Version]
Shach-Pinsly, D.; Bindreiter, S.; Porat, I.; Sussman, S.; Forster, J.; Rinnerthaler, M. Multiparametric Analysis of Urban Environmental Quality for Estimating Neighborhood Renewal Alternatives. Urban Plan. 2021, 6, 172–188. [Google Scholar] [CrossRef]
Graymore, M.L.M.; Sipe, N.G.; Rickson, R.E. Sustaining Human Carrying Capacity: A tool for regional sustainability assessment. Ecol. Econ. 2010, 69, 459–468. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Bao, H.; Ming, D.; Guo, Y.; Zhang, K.; Zhou, K.; Du, S. DFCNN-Based Semantic Recognition of Urban Functional Zones by Integrating Remote Sensing Data and POI Data. Remote Sens. 2020, 12, 1088. [Google Scholar] [CrossRef]
Li, X.; Yeh, A.G. Modelling sustainable urban development by the integration of constrained cellular automata and GIS. Int. J. Geogr. Inf. Sci. 2000, 14, 131–152. [Google Scholar] [CrossRef]
Liu, X.; Li, X.; Shi, X.; Wu, S.; Liu, T. Simulating complex urban development using kernel-based non-linear cellular automata. Ecol. Model. 2008, 211, 169–181. [Google Scholar] [CrossRef]
Li, X.; Gong, P.; Yu, L.; Hu, T. A segment derived patch-based logistic cellular automata for urban growth modeling with heuristic rules. Comput. Environ. Urban Syst. 2017, 65, 140–149. [Google Scholar]

Figure 1. (a) Gaofen-7 image of Xiamen; (b) Xiamen location; and (c) enlarged subset of the red-frame area.

Figure 2. POI and 2861 index data of Xiamen. (a) POI data; and (b) 2861 shopping convenience level.

Figure 3. Mobile data of Xiamen: (a) resident data; (b) working data; and (c) visit data.

Figure 4. Building data of Xiamen: (a) building data in the main urban area; and (b) zoomed-in area of the red frame.

Figure 5. Road data of Xiamen: (a) all roads; (b) primary/secondary/tertiary roads; and (c) residential roads.

Figure 6. Diagram of CMU framework.

Figure 7. Grid-based POI numbers: (a) POIs in the 3D display; and (b) zoomed-in area of the red frame.

Figure 8. (a) Ontology description of CMU data sources; and (b) zoomed-in area of the red frame.

Figure 9. Grid and Parcel Land Use Process.

Figure 10. (a) Road grids; (b) Zoomed-in area of the red frame.

Figure 11. (a) Greenspace grids; and (b) zoomed-in area of the red frame.

Figure 12. (a) Water grids; and (b) zoomed-in area of the red frame.

Figure 13. (a) Moore 3 × 3 neighborhood Representation; and (b) zoomed-in area of the red frame.

Figure 14. Accuracy of different feature combinations for Level I (run 1000 times, 2800 training, and 600 testing samples): (a) single data source; and (b) features combinations.

Figure 15. Features importance and contributions for RF model (run 1000 times, 2800 training and 600 testing samples): (a) all features importance; (b) social features importance; and (c) features combinations.

Figure 16. Prediction probability distribution of RF model: (a) most confident prediction; and (b) difference of the first and second confident prediction.

Figure 17. Accuracy of different feature combinations for Level I (run 1000 times, 6000 training, and 699 testing samples).

Figure 18. (a) Land use grid-based urban area map of Xiamen; (b) zoomed-in area of the right red frame in graph (a); (c) zoomed-in area of the red frame in graph (b); (d) parcel-based map for the same area of graph (c); (e) zoomed-in area of the left red frame in graph (a); (f) parcel-based map for the same area of graph (e).

Figure 19. Accuracy for different training sample size (run 1000 times each): (a) training samples balanced; (b) from 0 to 400 samples of graph (a); (c) training samples proportionally; and (d) from 0 to 400 samples of graph (c).

Table 1. Land Usage Classification (GULUC/PULUC).

Level I	Level II	Descriptions
01 Residential	0101 Lower building	Houses and lower apartment.
01 Residential	0102 High rise building	Higher level buildings.
02 Commercial	0201 Business office	Buildings where people work, including office buildings, and commercial office places for finance, media etc.
02 Commercial	0202 Commercial service	Houses and buildings for commercial retails, restaurants, and entertainments.
03 Industrial	0301 Industrial	Land and buildings used for manufacturing, warehouse, mining, etc.
04 Public Management & Service	0401Administrative, Education, Medical and Sport	Lands used for administrative, education, medical and sport related.
05 Road (grid only)	0501 Road first class 0502 Road second class 0503 Road third class	Paved roads including freeways Major and minor city-roads.
06 Greenspace	0601 Greenspace	Woodland, grassland, farmland and other greenspace.
07 Water	0701 Water	Lakes, rivers and other surfaces of water.

Table 2. Summary of features from CMU FL.

Data Sources	Features	Count
Satellite Spectral	ndviMEAN, ndviSTD, ndviVAR, ndwiMEAN, ndwiSTD, ndwiVAR, b1MEAN, b1STD, b1VAR, b2MEAN, b2STD, b2VAR, b3MEAN, b3STD, b3VAR, b4MEAN, b4STD, b4VAR	18
Satellite Texture (grid only)	mean, variance, homogeneity, contrast, dissimilarity, entropy, second moment and correlation calculated by the Grey Level Concurrence Matrix (GLCM) of each spectral band	32
POIs	residential ratio, residential total, commercial ratio, commercial total, transportation ratio, transportation total, public ratio, public total, total number, company number	10
2861 index	traffic outflow, traffic inflow, traffic comfort, medical comfort, residence, labor grade, business population, evening peak outflow, evening peak inflow, evening peak speed, morning peak outflow, morning peak inflow, morning peak speed, kindergarten comfort, primary school comfort, consumption level, shopping comfort, community average price	18
WorldPop	pop density	1
Building	height	1
Mobile statistic	work pop, resident pop, visit pop	3

Table 3. Grid confusion matrix for Level I (2800 training & 600 testing samples).

	Residential	Public	Commercial	Industrial	UA	PA
Residential	204	25	14	22	87.55%	76.98%
Public	12	75	4	9	68.81%	75.00%
Commercial	7	5	38	6	67.86%	67.86%
Industrial	10	4	0	165	81.68%	92.18%
OA = 80.33%, Kappa coefficient = 0.7146

Table 4. Grid confusion matrix for Level I (Moore Neighborhood addition).

	Residential	Public	Commercial	Industrial	UA	PA
Residential	207	21	11	26	87.71%	78.11%
Public	15	74	2	9	71.15%	74.00%
Commercial	7	5	38	6	74.51%	67.86%
Industrial	7	4	0	168	80.38%	93.85%
OA = 81.17%, Kappa coefficient = 0.7253

Table 5. Grid confusion matrix for Level II (2100 training & 516 testing samples).

	Low Resident	High Resident	Business	Commercial	Industrial	Adm. etc.	UA	PA
Low Resident	130	0	1	16	24	3	84.42%	74.71%
High Resident	0	60	0	2	0	1	74.07%	95.24%
Business	1	9	7	1	1	0	77.78%	36.84%
Commercial	7	6	1	18	6	0	39.13%	47.37%
Industrial	6	0	0	1	157	4	80.51%	93.45%
Adm.etc.	10	6	0	8	7	23	74.19%	42.59%
OA = 76.55%, Kappa coefficient = 0.6847

Table 6. Moore Neighborhood results.

Features Combinations	RF	Moore Neighborhood	Accuracy Improved
Satellite	65.50%	67.33%	1.83%
Satellite + mobile	71.67%	72.83%	1.16%
All features	80.33%	81.17%	0.84%

Table 7. Parcel Confusion matrix for Level I (4800 training & 485 testing samples).

	Residential	Public	Commercial	Industrial	UA	PA
Residential	197	24	11	21	84.19%	77.87%
Public	18	55	8	8	63.95%	61.80%
Commercial	12	6	35	12	58.33%	53.85%
Industrial	7	1	6	64	60.95%	82.05%
OA = 72.37%, Kappa coefficient = 0.5841

Table 8. Parcel confusion matrix for Level II (4000 training & 445 testing samples).

	Residential	Business	Commercial	Industrial	Adm. etc.	UA	PA
Residential	188	7	11	28	19	85.07%	74.31%
Business	7	13	2	5	3	48.15%	43.33%
Commercial	4	2	17	7	4	50.00%	50.00%
Industrial	13	3	2	59	2	56.19%	74.68%
Adm. etc.	9	2	2	6	30	51.72%	61.22%
OA = 68.99%, Kappa coefficient = 0.5240

Table 9. Original sample structure.

Category	Training Size	Testing Size
Public	2723	303
Residential	2273	253
Industrial	707	78
Commercial	581	65

Table 10. Confusion matrix of grid and parcel exchange experiment I.

	Residential	Public	Commercial	Industrial	UA	PA
Residential	225	7	6	28	71.66%	84.59%
Public	27	53	4	14	74.65%	54.08%
Commercial	31	3	10	11	47.62%	18.18%
Industrial	31	8	1	138	72.25%	77.53%
OA = 71.36%, Kappa coefficient = 0.6121

Table 11. Confusion matrix of grid and parcel exchange experiment II.

	Residential	Public	Commercial	Industrial	UA	PA
Residential	224	13	1	20	72.44%	87.96%
Public	30	55	0	16	79.73%	59.00%
Commercial	38	3	6	8	87.50%	12.96%
Industrial	31	3	1	151	78.46%	85.00%
OA = 71.69%, Kappa coefficient = 0.6825

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Chen, B.; Li, X.; Zhang, Y.; Ling, X.; Wang, J.; Li, W.; Wen, W.; Gong, P. Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City. Remote Sens. 2022, 14, 6143. https://doi.org/10.3390/rs14236143

AMA Style

Wang X, Chen B, Li X, Zhang Y, Ling X, Wang J, Li W, Wen W, Gong P. Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City. Remote Sensing. 2022; 14(23):6143. https://doi.org/10.3390/rs14236143

Chicago/Turabian Style

Wang, Xi, Bin Chen, Xuecao Li, Yuxin Zhang, Xianyao Ling, Jie Wang, Weimin Li, Wu Wen, and Peng Gong. 2022. "Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City" Remote Sensing 14, no. 23: 6143. https://doi.org/10.3390/rs14236143

APA Style

Wang, X., Chen, B., Li, X., Zhang, Y., Ling, X., Wang, J., Li, W., Wen, W., & Gong, P. (2022). Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City. Remote Sensing, 14(23), 6143. https://doi.org/10.3390/rs14236143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Grid-Based Essential Urban Land Use Classification: A Data and Model Driven Mapping Framework in Xiamen City

Abstract

1. Introduction

2. Study Area and Datasets

2.1. Study Area

2.2. Datasets

2.2.1. Satellite Spectral and Textural Data

2.2.2. Social Big Data

3. CMU Framework

3.1. CMU Foundation Layer

3.2. CMU Summation Feature Layer

3.3. CMU Density Index Layer

3.4. CMU Visualization Analysis Layer

3.5. CMU Application Solution Layer

4. CMU-Based Xiamen Land Use Study

4.1. Proposed Method

4.2. Data Preparation

4.3. Random Forest (RF) for Urban Land Use Classification

4.4. Using Moore Neighborhood to Improve Land Use Prediction

5. Results and Analysis

5.1. Grid Experiments and Performance

5.2. Parcel Experiments and Performance

5.3. Sensitivity of Training Sample Size

5.4. Grid and Parcel Exchange Experiments

6. Discussion

6.1. CMU Data Model and Data Granularity

6.2. Grid and Parcel Exchange Analysis

6.3. Sensibility Analysis of Training Sample Size

6.4. Limitations and Future Research

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI