1. Introduction
Remote sensing images have been widely used in environmental monitoring and thematic mapping as they can provide periodic thematic information at various spatial and temporal scales [
1,
2,
3,
4]. Image classification is regarded as one of the most important tasks in thematic mapping using remote sensing images [
5,
6]. Thematic maps derived from the classification of remote sensing images, such as land-cover and crop/forest type maps, are typically used as inputs for physical and environmental models, thereby affecting the model outputs. Therefore, it is important to generate reliable and accurate thematic maps from remote sensing images [
5].
Supervised classification is typically performed to derive various types of thematic maps from remote sensing images [
7,
8]. The quality of supervised classification results is sensitive to many factors, such as available remote sensing images, classification methodologies, and training samples [
9]. In particular, spatial distributions and the accuracy of supervised classification results depend significantly on the quantity and quality of training samples [
9,
10,
11,
12]. Therefore, it is critical to collect training samples that provide useful information to correctly determine decision boundaries between classes of interest.
In general, one of the most significant issues frequently encountered in image classification is the mixed pixel effect. A mixed pixel refers to a pixel containing more than one land-cover class [
13]. This effect is prominent in the classification of mid/low spatial resolution remote sensing images. The conventional pixel-based supervised classification approach assumes that each training pixel represents spectral signatures of a single class (pure training pixel). When remote sensing images contain many mixed pixels, however, mixed pixels may be selected as training samples, which fail to provide the representative spectral signature of a certain class. Therefore, the mixed-pixel effect should be treated accordingly during classification [
14].
To solve the mixed pixel effect problem, spectral unmixing or spectral mixture analysis has been widely applied to extract pure pixels from an image of interest [
15,
16,
17,
18]. By applying spectral unmixing, pure pixels (also known as endmembers) are first extracted and then used as training samples for supervised classification [
13]. In addition to spectral unmixing, other statistical approaches have been applied to collect representative training samples. For example, Kavzoglu [
19] selected representative training samples using spectral histogram and boundary analyses with dimension reduction. Conventional descriptive statistics including mean and standard deviation have also been used to extract pure training samples [
20].
Despite the promising results, most of the aforementioned studies have focused mainly on the extraction of spectrally pure pixels. As many factors apart from spectral purity need to be considered, it is challenging to collect and select representative training samples. Spatial resolutions of images and the complexity of landscapes to be classified significantly affect training sample selection and consequently, classification performance. Chen and Stow [
21] reported that more training samples are required to classify fine spatial resolution images than coarse ones, and that block-based training samples are recommended to classify heterogeneous landscapes. Chen et al. [
22] emphasized the impacts of landscape heterogeneity on classification performance, in addition to the impurity of training samples (compositional heterogeneity), when coarse spatial resolution images are used for crop classification. The results from previous studies indicate the necessity of considering the purity of training samples, landscape heterogeneity, and spatial resolutions for the appropriate selection of training samples.
Regarding classification methodologies, interest in deep learning for remote sensing image processing has increased owing to its superior classification accuracy to those of conventional machine learning models [
23,
24]. Among the various deep learning models, convolutional neural networks (CNNs) have been widely applied to the supervised classification of remote sensing images [
25,
26,
27,
28,
29]. CNN models can be regarded as a patch-based classifier, in that an image of interest is divided into several spatial units (patch) including multiple pixels [
30,
31]. In particular, this patch-based classifier, which can account for spatial correlation information between neighboring pixels within a patch, is effective for crop classification because of its ability in considering specific spatial features, such as cultivation patterns of crops and shapes of crop parcels [
27,
29,
32].
Patch-based supervised classification requires training patches that comprise a center pixel representing a specific land-cover class and its neighboring pixels. Hence, the effect of multiple pixels in training patches should be quantified accordingly because weights assigned to neighboring pixels in a patch-based classification vary according to the impurity or heterogeneity of class compositions within a training patch. Furthermore, the representativeness of training patches for a specific land-cover class has significant influence on classification performance, similar to conventional pixel-based classification [
33]. Significant effort has been expended for the selection of representative training samples in pixel-based classification. For example, Zhu et al. [
34] developed a strategy for selecting representative training samples, including the optimum amount of training samples and the best balance of training samples. Variations in the degree of spatial clustering of training samples and the use of explicit spatial information were also tested for classification using machine learning models [
35]. To our best knowledge, however, the effect of compositional homogeneity within a training patch on the accuracy of patch-based classifiers such as CNNs has not been fully quantified.
The objective of this study is to quantitatively analyze the effect of class purity of a training patch on the performance of CNN-based classification. The class purity of a training patch refers to the degree of compositional homogeneity of classes within a training patch. Various training patches with different class purity values and sizes are first generated and then used as inputs for supervised classification using a two-dimensional CNN (2D-CNN). Quantitative indices, in particular, are newly defined to quantify both local and global variations of class homogeneity in the study area and then used to analyze the relationship between the class homogeneity of the study area and classification performance. Crop classification in two study areas with significant differences in landscape heterogeneity and spatial resolutions of input images is demonstrated to quantify and compare the effects of class purity on classification performance in patch-based classification.
3. Results
3.1. Comparison of Class Homogeneity of Two Regions
Prior to comparing the classification results, the class homogeneity of the two study areas was first compared to quantify the different class compositions and landscape heterogeneity that significantly affect the classification accuracy.
Figure 5 presents the variations in GCH and CV values with respect to different training patch sizes. For a fair comparison of GCH and CV values in the two study regions, only three patch sizes applied to the Illinois region were considered for the Anbandegi region. The GCH values in Anbandegi were higher than 0.9 for all patch sizes (
Figure 5a), signifying the homogeneous distribution of crop classes in Anbandegi. As the patch size increased, the GCH value decreased, but the difference in the GCH value with respect to the patch size was very small (less than 0.1). Because ultra-high spatial resolution UAV imagery was used for classification, each crop contained a large number of pixels with the same crop type. Consequently, most of the patches were likely to be homogeneous in terms of class composition.
In contrast, the Illinois region exhibited GCH values between 0.5 and 0.8, implying a relatively heterogeneous class distribution, compared with Anbandegi. The relationship between patch size and GCH value in the Illinois region was similar to that in Anbandegi: the larger the patch size, the smaller the GCH value. However, the difference in the GCH value with respect to the patch size was approximately 0.3, which was larger than in Anbandegi. This indicates that the class composition within a patch in the Illinois region became more complex or heterogeneous as the patch size increased. The area extent of crop areas in Illinois was much larger than that in Anbandegi, but the number of pixels comprising crop parcels was small in the coarse resolution Landsat images. Furthermore, mixed pixels containing boundaries either between crop parcels or between crop and non-crop areas included in the images. Hence, many pixels having different classes existed around the center pixel within a patch, thereby resulting in heterogeneous class compositions within the patches in the Illinois region.
Analyzing the CV values (
Figure 5b), those for the two regions increased accordingly as the patch size increased. However, the CV value in Anbandegi was much smaller than it was in Illinois, regardless of the patch size. Furthermore, the difference in CV for different patch sizes was small (0.05) in Anbandegi, compared with that in Illinois (0.11). The small CV value in Anbandegi implies small variations in LCH within the patch across the entire study area. By contrast, the larger CV value in the Illinois region indicates that variations in LCH within the patches were more prominent across the study area. The relatively larger CV value for the larger patch size was due to a decrease in class homogeneity by the inclusion of more pixels of different classes in a large patch. These significant differences in class homogeneity of the two regions imply that the class purity of the training patches will exert different effects on the classification performance.
3.2. Classification Results in the Anbandegi Region
Figure 6 presents the variations in overall accuracy with respect to different patch sizes for each class purity value. As the class purity value increased, the corresponding classification accuracy increased, regardless of the patch size. In particular, when the patch size was small (e.g., 5 by 5 and 9 by 9), the difference in overall accuracy between CP60 and CP100 was larger than it was for large patch sizes (approximately 4.3%p and 3.8%p for patch sizes of 5 by 5 and 9 by 9, respectively). When both the class purity and patch size are small, the LCH within a patch is likely to be large. Consequently, the spatial features extracted from the trained model may fail to accurately reflect the homogeneous spectral patterns of most parcels in the study area, thereby yielding poor classification accuracy. In contrast, as the patch size increases, the training patches have higher LCH values and contain more homogeneous pixels, thereby improving classification accuracy.
The classification results generated using a 9 by 9 patch, which achieved the highest classification for CP100 and also demonstrated a significant difference in the overall accuracy with respect to different class purity values, were further analyzed. The classification results generated using other patch sizes and different class purity values in Anbandegi are presented in
Figure S1.
Figure 7 is one of four classification results generated using a 9 by 9 training patch with the highest overall accuracy. The spatial distributions of classification results are locally different according to the change in class purity. In the case of CP60, the misclassification inside potato and fallow parcels was prominent, and the highland Kimchi cabbage near the parcel boundaries was misclassified as cabbage. In particular, the potato parcels located in the northwestern part were misclassified as highland Kimchi cabbage, and a furrow pattern appeared inside the potato parcels. This misclassification inside the parcels might be owing to the use of training patches selected near the parcel boundaries.
By contrast, misclassification in the potato parcels reduced significantly as the class purity increased. Furthermore, the misclassification of fallow as highland Kimchi cabbage in the southeastern part was alleviated significantly for CP100. However, training patches were selected only inside the parcel for CP100. Hence, these patches might not accurately extract spatial features near the parcel boundaries. Consequently, the sporadic misclassification of highland Kimchi cabbage as fallow or cabbage was observed near the boundaries of highland Kimchi cabbage parcels. The two cabbage parcels in the western part could not be correctly identified, regardless of the class purity values. This misclassification of cabbage as highland Kimchi cabbage and potato was due to the harvest of cabbage in August, unlike other cabbage parcels in the study area.
Table 5 lists the accuracy statistics of one classification result using the 9 by 9 patch shown in
Figure 7 that yielded the highest classification accuracy. The accuracy statistics of the classification results using other patch sizes and different class purity values in Abandegi are also listed in
Table S1. The significant difference in the producer’s accuracy for potato and fallow with respect to the class purity values appeared to result in the difference in the overall accuracy. For CP100, the producer’s accuracy values of fallow and potato improved significantly. Improvements in accuracy values of fallow and potato for CP100 over CP60 were approximately 24.1%p and 6.5%p, respectively. However, because the potato and fallow parcels occupied only 30% of the study area, the overall accuracy did not increase substantially. The producer’s accuracy for highland Kimchi cabbage, a major crop in the study area, slightly decreased for CP100, but the user’s accuracy for CP100 improved by approximately 6.2%p, compared with that for CP60. The increase in accuracy by using the homogeneous training patches yielded more reliable class distributions, as shown in
Figure 7. In the case of cabbage, the second major crop in the study area, the producer’s accuracy was approximately 72% and did not change significantly with respect to the class purity values. This lower accuracy was mainly due to the harvest in some cabbage parcels, as depicted in
Figure 7. However, the large class purity value resulted in an increase of 7.3%p in the user’s accuracy of cabbage owing to the reduction in the misclassification of other crops as cabbage inside the crop parcels.
The Anbandegi region includes homogeneous and even distributions of crop parcels, as shown in
Figure 1. These spatial distribution characteristics of crop parcels yielded high class homogeneity in the patch and low local variations of the LCH. Using training patches with high class purity can significantly improve the classification accuracy in regions with homogeneous crop distributions. Furthermore, using large training patches can reduce noise patterns in the classification result, which is typical when high-resolution imagery is used for classification. The inclusion of more pixels located inside the crop parcels in large training patches resulted in smaller class variations within each crop parcel. Consequently, the uniform distributions of the crop parcels were well represented in the classification result. The experimental results of Anbandegi indicate that using large training patches with high class purity (inclusion of many pixels located inside crop parcels in a training patch) is more beneficial for the classification of regions with homogeneous crop distributions using high-resolution remote sensing imagery.
3.3. Classification Results in the Illinois Region
The overall accuracy in the Illinois region decreased as class purity values increased (
Figure 8), in contrast to the result of the Anbandegi region. When the patch size was 15 by 15, the overall accuracy decreased by 13.7%p for CP100, compared with that for CP60. This was due to the relatively heterogeneous distributions of crops in the study area, as quantified by the GCH and CV values in
Figure 5. When the smallest patch size of 5 by 5 was used, the differences in overall accuracy for different class purity values were lower than those for other patch sizes, but the lowest classification accuracy was still yielded for CP100.
Comparing the classification accuracy with respect to the training patch size, the overall accuracy decreased as the training patch size increased, regardless of the class purity values. The lowest accuracy was obtained when a large patch size was used for CP100. As shown in
Figure 5, both the GCH and CV values of the Illinois region were smaller than those of Anbandegi, indicating the relatively heterogeneous distributions of crops in the study area and larger variations of class homogeneity in the patch unit. As the patch size increased, many pixels belonging to different classes were contained in the training patch. This increased class heterogeneity resulted in significant misclassifications.
Figure 9 presents the classification results generated using the 9 by 9 training patch that produced a significant difference in the overall accuracy with respect to different class purity values. The classification results generated using other patch sizes and different class purity values in Illinois are presented in
Figure S2. The distinctive differences between the classification results for different class purity values can be highlighted in two subareas, denoted as A and B in
Figure 9. For CP 100, the exaggeration of soybean parcels and the misclassification of small corn parcels as soybean were observed. This occurred because the training patches selected primarily inside the crop parcels could not provide information regarding the discrimination of different crop types located at the boundary of the parcels. By contrast, the misclassification of corn parcels surrounded by the soybean parcels reduced significantly when using training patches with CP60. When a lower class purity value was applied to select the training patches, those located at the boundary between different crop parcels were primarily selected. Consequently, the 2D-CNN model trained using these training patches extracted spatial features that were useful to discriminate adjacent crops, thereby achieving increased classification accuracy. However, misclassified pixels were identified inside the crop parcels in subarea A of
Figure 9, as observed for CP60 in Anbandegi, which is a limitation of using small training patches.
Table 6 summarizes the accuracy statistics of one classification result generated using a 9 by 9 patch shown in
Figure 9. The accuracy statistics of the classification results using other patch sizes and different class purity values in Illinois are also listed in
Table S2. As indicated in
Figure 8, the overall accuracy for CP60 was higher than that for CP100. This improvement in overall accuracy was attributed to the significant increase of approximately 7.3%p in the producer’s accuracy of corn, which is one of the major crops in the Illinois region. In the case of CP100, the user’s accuracy of soybean decreased significantly by 4.4%p owing to the misclassification of corn as soybean, as shown in
Figure 9.
The Illinois region exhibited lower GCH and higher CV values than the Anbandegi region, as shown in
Figure 5. This implies the heterogeneous distributions of crop parcels and the large variations in class homogeneity within a patch across the study area. Consequently, using training patches with a low class purity value allows crops to be discriminated more accurately. Many training patches collected at the boundaries between crop parcels contributed to the correct identification of adjacent crop parcels. The significant decrease in classification accuracy when using large training patches was due to the inclusion of more subareas having non-uniform class homogeneity with a patch. Based on the results in the Illinois region, using training patches with low class purity and a smaller size is more likely to produce more accurate results in the classification of regions with heterogeneous crop distributions using mid-resolution remote sensing imagery.