1. Introduction
Urbanization has greatly changed our living environments, and more than half of the global population resides in urban areas [
1]. China has undergone the fastest urbanization worldwide over the past three decades, and its artificial impervious area ranked first in 2015 [
2]. For better urban planning, spatial governance, and sustainable development of urbanized areas in China, more up-to-date, detailed, and accurate land use classification is critically important.
Thus far, detailed urban land use classification in China has been performed only through field surveys [
3,
4]. Currently, only a few major cities, such as Shenzhen, Wuhan, and Chongqing, have detailed urban land use classifications at the entire city level [
3,
5,
6,
7]. This is an important task for the Third Terrestrial Survey of China [
8].
Field surveys are time consuming and laborious, and researchers have long been committed to improving the efficiency of land use classification through remote sensing technology [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20]. Gong and his colleagues were among the earliest researchers to use spatial-context information in addition to spectral data from satellite images to map urban land use categories, and their algorithms have been adopted in mapping global settlement areas [
21]. However, because of the limitation of physical property measurements, the above-mentioned methods involving only spectral, texture, and structural features face challenges in effectively differentiating among residential, industrial, commercial, and service types of land uses.
In 2000, Zhang et al. proposed conducting urban land use classification by integrating GIS and remote sensing data [
22]. In 2007, Goodchild noted that volunteered geographic information (VGI) can be used as a new data source for urban land use classification [
23]. Information from OpenStreetMap (OSM), point of interest (POI), and social data, such as traffic trace data of individuals, taxis, and public transportation, can all be applicable to urban land use mapping [
24,
25,
26,
27,
28,
29,
30]. VGI can be used as an important supplement to remotely sensed data in the detailed mapping of urban land use [
31] and has since become a new focus area of research [
32,
33,
34,
35,
36,
37,
38,
39,
40]. The most influential work was the mapping of essential urban land use categories (EULUC) in all cities in China by 70 researchers from more than 30 organizations [
40].
Because it is impossible to determine the classification results simply through visual interpretation of images, the difficulty and workload of sample collection are increasing exponentially, representing a difficult challenge for most researchers. Researchers are in urgent need of sampling strategies as a guide to achieve more effective classification with relatively low labor costs. In the field of traditional land use/land cover, scholars have accumulated a large number of samples over a long time and quantitatively analyzed the impact of the sample number and other conditions on classification accuracy [
41,
42,
43,
44,
45]. However, detailed urban land use classification is a new research focus; most studies use a limited number of sample units to test experimental classification methods, and no research results regarding the optimal sampling strategies have been reported [
31,
34,
36,
40,
46,
47,
48].
In this study, we take advantage of the availability of an urban land use map of Shenzhen city that has been generated through a field survey of the entire city. By converting the map into a parcel-based land use map, we obtain a complete sample set for experiments with various sample sizes. Based on this map, we evaluated the impact of the sample size and land use mix of samples on the resulting classification accuracy.
3. Experimental Tests and Results
3.1. The Impact of the Sample Size
We set up two experiments. The first experiment tested the influence of different training sample sizes on accuracy. From the complete sample, 30% of the stratified random sampling was used as validation samples, and the remaining samples were used as training samples. The number of training samples decreased by 1% each time, and each decrease repeated randomly sampled k times. The second experiment tested the influence of the different validation sample sizes on the accuracy evaluation. From the complete sample, 35% of the stratified random samples were used as training samples, and the remaining samples were used as validation samples. The number of validation samples decreased by 1% each time, and each decrease repeated randomly sampled k times. For k = 5, the accuracy of each classification and the average accuracy are shown in
Figure 6.
We define stable accuracy as a classification accuracy of the reduced samples no greater than 1% compared with that of all samples. Experiment One shows that the relationship between the number of samples and accuracy follows the rule of stable classification with limited samples (Gong, Liu, et al., 2019). The classification accuracy kept stable until the number of training samples was reduced to 61% of all training samples (5540, accounting for 40% of all urban parcels). When the number was reduced to 10% (908, approximately 7% of all urban parcels), the classification accuracy began to significantly decline.
Experiment Two shows that as the number of validation samples decreases, the range of the accuracy evaluation results increases. Considering the average accuracy as the measurement, when the number of validation samples was reduced to 14% of all validation samples (1178, approximately 9% of all urban parcels), the accuracy evaluation results were no longer stable.
In summary, to obtain stable and reliable classification results, the training samples need at least 40% of the total number of parcels or no less than 5500. At least 10% of the total number of parcels is required for the validation samples or no less than 1200. If the labor force is insufficient, the high-cost performance scheme requires the training samples to be at least 7% of all parcels or no less than 900. In this situation, the maximum accuracy loss was not greater than 7%.
3.2. Impact of the Sample Purity
In this experiment, the influence of the sample purity on the classification accuracy was tested. Currently, in most research concerning urban land use classification, the level of mixed land use is not high, and the training samples always have high purity [
31,
39,
40]. The mixed-use level of land in Shenzhen is high, and there are many low-purity parcels. Therefore, it is necessary to study whether it is reasonable to select high-purity samples as training samples (
Figure 7).
We selected seven categories of 11,034 parcels for the test. The specific categories included urban residential, urban village, business and finance, storage, other commercial, industrial, instructional and research, parks and green space.
Among them, 30% of the stratified random sampling was used as validation samples, and the remaining samples were used as the mixed-purity [0,100%] sample set. Then, we divided the mixed-purity set into high purity (≥90%), medium purity (60–90%), and low purity (≤60%). Finally, we randomly selected the same number of training samples from the above four sets, and the results are shown in
Figure 8.
The experimental results show that under the same number of conditions, the classification accuracy of the mixed-purity samples was equal to that of the medium-purity samples and higher than that of the high-purity samples. The classification accuracy of the low-purity samples was the lowest. These results show that for a study area with a high land use mixing level, the representativeness of high-purity samples is not enough, which could lead to accuracy loss. The classification features of the low-purity samples are all mixed; thus, it is difficult for the classifier to learn effectively. The classification effect of the medium-purity samples is representative and can be used as the principle of sample collection.
3.3. Impact of the Sample Spatial Distribution
In this experiment, the influence of the sample space distribution on accuracy was tested. We divided Shenzhen into three zones: the original special zone, former Bao’an, and former Longgang. The original special zone included Luohu District, Futian District, Nanshan District, and Yantian District. Former Bao’an included current Bao’an District, Longhua District, and Guangming District. Former Longgang included the current Longgang District, Pingshan District, and Dapeng District. The same numbers of training and validation samples were randomly selected from the three regions for the cross experiment, and the accuracy was calculated with the training samples from the original special zone, former Bao’an, former Longgang, and the validation samples from the three regions (
Figure 9).
The experimental results show that land use in different areas in a single city also has heterogeneity and that an uneven spatial distribution of samples could cause accuracy loss. In this experiment, the original special zone was the old special economic zone, which has good planning control and orderly land development. Former Bao’an is a labor-intensive industrial agglomeration area with inefficient and extensive land use. Former Longgang is restricted by ecological protection due to location factors, and its density is relatively low. There are differences in the representativeness of the three samples, and the classification accuracy of other areas is significantly reduced.
From the perspective of sample migration capacity, the more diverse the regional urban land use model, the stronger the migration capacity. In former Bao’an, Guangming is a relatively less developed area of Shenzhen, and Bao’an Qianhai center is the most important economic center. Therefore, multiple internal development stages coexist in former Bao’an, land use is extremely complex, and the migration capacity is strong. Due to the high level of overall urban development, the original special zone has low representativeness and a weak migration capacity.
3.4. Mapping of SULUC in Shenzhen
At the beginning, local professional urban land use surveyors were invited to choose training samples from the complete sample set according to their knowledge and experience. They generated 1163 high-purity samples. Four-fold cross-validation was adopted to optimize the land use classifier and the classifier was applied to the complete sample set for accuracy assessment. The overall accuracy for the Level I categories was 62%, and 55% for Level II categories. Then, we took the best sampling strategy in terms of the above-mentioned experiments and selected 5028 samples of medium purity as the training samples. Its frequency distribution was similar to that of the complete sample set (
Figure 10). Using the same parcels, features, and classifier, the overall accuracy for Level I categories reached 76%, and that for Level II categories reached 71% (
Table 5 and
Table 6). The accuracy was improved by approximately 15% under the optimal sampling strategy, shown in
Figure 11.
Regarding Level I categories, major discrepancies were clustered in residential and industrial land, and the misclassification of other land use types to residential and industrial land accounted for over 50% of each of the misclassified categories. Regarding Level II categories, major discrepancies were clustered in the urban residential, industrial, and parks and green space land. For example, urban residential land was primarily misclassified as industrial land, industrial land was primarily misclassified as urban villages, and parks and green space land was primarily misclassified as urban residential, industrial, and road areas.
We compared the difference between the mapping of SULUC and land surveys in terms of the urban land use structure (
Figure 12). Most commercial and public services lands are not correctly classified and are basically misclassified as residential and industrial, which is critical for improving accuracy in the future.
From the perspective of the feature contribution rate, the most important feature is building height information, followed by POI and Sentinel 2A/B multispectral information (
Figure 13). In the MPL data, the Luojia-1 nighttime light feature contribution rate is very low, mainly because the original spatial resolution of these data is low, which is not suitable for high-resolution urban land use classification tasks.
4. Discussion
Mixed land use is a big obstacle to improving classification accuracy. Current results show that misclassifications of low-purity parcels were much more than those of high-purity parcels. The lower the purity of the parcel, the worse the classification accuracy (
Figure 14). The reasons are as follows:
Due to the high scarcity of land, commercial, transportation, and public facilities in high-density cities such as Shenzhen often exist in the form of nonindependent land occupation. In this case, the features mentioned above may not be sufficiently significant compared with those in other cities.
There is more and more three-dimensional utilization of land use. For example, a business center generated by urban renewal could have a commercial center on its low floors and high-quality housing on the top floors; thus, this center is both commercial and urban residential. Additionally, government agencies could rent some commercial buildings for office space, and in this situation, the building is both for commercial use and public service use. In the above cases, it is unreasonable to assign only one category to a parcel. A possible solution is to assign multiple categories to a parcel through a probability method.
The methodology of the parcel segmentation and feature extraction can be improved:
The segmentation of parcels is not detailed enough. Because road segmentation technology is not suitable for the underdeveloped areas of the road network in the city, this results in superlarge parcels which contain multiple land use categories. In the future, image segmentation can be introduced to segment the superlarge parcels generated by road segmentation.
The POI information collection from commercial companies is biased, resulting in unsatisfactory classification results. In the future, POI information from official electronic maps can be combined with POI information from commercial institutions to enhance the classification accuracy.
Given the opportunity that Shenzhen has a complete set of ground truth of land use samples, it makes it possible to design a series of experimental tests to investigate the impact of sample quantity and quality on detailed land use classification performance. We have further checked the availability of data in different cities around the world. The multispectral and nighttime light remote sensing data used in this paper can be obtained globally. Global road network data can also be accessible through OpenStreetMap. However, the major challenge of this study was to collect sufficient land use samples. Fortunately, Shenzhen has just conducted an urban land use survey, and we could obtain its complete sample set from the survey results. Similar research can be conducted in other cities in China after the completion of the Third Nationwide Land Survey of China. In other areas, the cadastral data could be considered as a source of samples in similar experiments to demonstrate whether the conclusions are representative throughout the world.