1. Introduction
As a part of the terrestrial ecosystem, the farmland ecosystem plays an important role in global climate change, food production, and ecological assessment [
1]. As an increasing number of satellite observations become available, there is an urgent need to obtain accurate crop maps from the massive images and assess the significance of each crop type in farmland ecosystems. In recent years, several studies have been carried out on crop monitoring and carbon sequestration in different ecosystems. However, these studies mainly focused on the forest ecosystem and grassland ecosystem [
2,
3,
4]. Due to the strong human factors, crops in farmland have seasonal and complex characteristics, leading to more challenges in crop automatic classification and carbon sequestration estimation. Therefore, a method must be established to extract accurate crop information using remote sensing (RS) technology in order to estimate crop area, disaster loss, and carbon sequestration.
Farmland vegetation is an essential part of the carbon cycle of the terrestrial ecosystem, and the crops play different roles in carbon sequestration [
5]. As a developing agricultural country with a large population, cropland accounts for about 12.5% of the total area of China. It is necessary to pay more attention to monitoring the crop planting area and its impact on the carbon cycle. The emerging satellite technologies with diverse sensors provide reliable large-scale land use and crop observation [
6]. In particular, the Sentinel-2A (S-2A) satellite developed by the European Space Agency provides 13-band optical imagery with a high spatial resolution of 10 m. The S-2A images have been widely used in global agricultural monitoring, such as crop yield estimation, disaster monitoring, and environmental assessment. In addition, the Google Earth Engine (GEE) provides an interactive platform for feature acquisition and geospatial algorithm development for satellite imagery [
7,
8]. Several studies show that it is convenient to obtain multiple feature images from GEE, which are essential for crop classification in complex agricultural areas [
7,
9]. Therefore, combining spectral bands, derived features (including vegetable indices and texture features), and crop phenological information, has become an important method to overcome the problems of within-class diversity and between-class similarity and to obtain high-accuracy crop information.
Due to the seasonality of crop planting, the acquisition of ground samples and imagery often has to be completed in a short time. On the one hand, it is difficult to obtain time-series optical images during the crop growth period because of climatic factors. On the other hand, traditional machine learning methods, such as support vector machine (SVM), random forest (RF), and object-oriented classification methods, rely on a large number of high-quality ground sample data sets [
10]. However, these sample data sets would lose their effectiveness due to crop seasonality limitations across years. Furthermore, the pixel-based classification results struggle to meet the precise application requirements due to the “salt and pepper phenomenon” [
8]. For the object-oriented classification method, it is difficult to determine the optimal segmentation scale due to the image resolution, crop complexity, and field fragmentation, resulting in over-segmentation and under-segmentation problems [
11]. Despite the success of traditional machine learning in earth science, the above problems and limitations have hampered the model performance and classification accuracy [
12]. To alleviate the dependence and workload related to ground investigations, it is essential to use the transferable learning capability of classification models and to complete the automatic extraction of crop information from RS images based on few label data sets.
Deep learning has become a semantic segmentation and target detection tool for solving many challenging problems in computer vision [
9,
12]. As data-driven deep neural networks, deep learning networks usually outperformed shallow classifiers due to the hierarchical feature transformation capability from multi-feature RS imagery. The 2012 ImageNet classification challenge and massive label data sets have been crucial in demonstrating the effectiveness of deep convolutional neural networks (CNNs) [
13]. However, it is difficult to obtain a large number of spatiotemporal crop data sets for RS images. The end-to-end transferable learning capacity reduced manual feature engineering and improved model performance. Many studies have shown that CNNs and fully convolutional networks (FCNs) are important network architecture for RS image and natural scene semantic segmentation [
14,
15]. Furthermore, FCNs introduced the encoder–decoder paradigm, which transforms the feature map size to the original image size and overcomes the limitations of losing detailed information in the down-sampling and the fixed size of the input data set [
16]. Therefore, the development of FCN facilitates the transferable model performance and high-accuracy crop mapping in the large-scale region across years.
To obtain high-accuracy crop maps, current studies mainly focus on feature fusion and FCN-based semantic segmentation methods. Li et al. [
17] proposed an improved deep learning method to classify crops of corn and soybean from RS time-series images and achieved the classification results with a Kappa coefficient of 0.79 and an overall accuracy of 0.86. Giannopoulos et al. [
18] extended a deep learning model based on UNet architecture to extract information from Landsat-8 images and achieved higher accuracy than low-order deep learning models. Yang et al. [
19] used different semantic segmentation models, including temporal feature-based segmentation model, long short-term memory (LSTM), and UNet, to accomplish rice mapping based on time-series SAR images. Wang et al. [
20] proposed a two-stage model that fused DeepLab V3+ and UNet for cucumber leaf disease severity classification with a segmentation accuracy of 93.27%. The above-mentioned studies have achieved high classification accuracy and provided a reference for image segmentation and classification of major crops using remote sensing images. However, applying the semantic segmentation models of the studies in complex agricultural areas with imbalanced samples still needs further testing. Moreover, Zhou et al. [
21] proved that the UNet++ model had more advantages in feature map-generating strategy and image semantic segmentation with few training sample data sets. Wang et al. [
9] used the improved UNet++ architecture to classify 10 categories from Sentinel-2 imagery (including 17 bands of spectral, vegetation indices, and texture features) in 3 years, and indicated that the UNet++ achieved higher segmentation and classification accuracy than UNet and DeepLab V3+. These studies provided methodological support for feature fusion and agricultural information extraction of Sentinel-2 imagery. However, time series prediction classification results of recent research still need to be further applied and analyzed in farmland ecosystems, such as in disaster assessment and crop carbon sequestration. Therefore, it is essential to establish a regional and transferable-learning model with few training data sets for RS crop classification and carbon sequestration estimation during the critical crop growth stage.
Chinese agricultural land has experienced dramatic changes in crop area, cropping system, and planting structure optimization in recent decades [
22]. These changes can substantially affect crop carbon sequestration in the farmland. The widespread and scattered smallholders in China have a profound impact on agricultural production in response to climate change. Tang et al. [
23] evaluated the contribution and estimation error of the farmland ecosystem to carbon sequestration in China’s terrestrial ecosystem. The uncertainties of carbon sequestration estimation can be reduced by collecting the finer crop area and statistical data in farmland ecosystem. In terms of agro-ecosystem services, vegetation indices and time-series production were used to evaluate and predict the impact of current vegetation cover on the farmland ecosystem [
24,
25], or to quantify the spatiotemporal changes, including the assessment of carbon sources/sinks and soil erosion [
26]. Zhang et al. [
27] provided a novel perspective on LCC (land cover change)-induced gross primary production (GPP) changes and concluded that the LCC-induced reduction in GPP was partially offset by increases in cropland using the GEE platform. Wang et al. [
28] showed that the changes in crop planting area can substantially affect greenhouse gas emissions, and that farmland in China was a carbon source due to a large amount of CH4 emission in paddy lands. Such a capability to collect timely and high-accuracy crop information from multispectral imagery using the transferable-learning model and to produce accurate crop maps is crucial to assess the farmland ecosystem services.
In this study, we pay more attention to the automatic classification of crop information from S-2A imagery in complex agricultural areas. As an agricultural region in China, Xinxiang City of Henan Province has a representative topography and diversified crop types. This study aims to (1) evaluate the influence of feature-selection schemes on the UNet++ model; (2) evaluate the performance of different deep learning models; (3) compare prediction classification accuracies of different models based on overall accuracy, user’s accuracy, producer’s accuracy, and F1 scores; and (4) complete crop mapping and carbon sequestration estimation in farmland ecosystem. Overall, this work aims to offer an improved deep learning procedure as applied to RS in crop monitoring and carbon sequestration estimation.
6. Conclusions
An end-to-end transferable learning model based on the UNet++ architecture is proposed for crop classification in complex agricultural areas. First, based on the feature fusion and upsampling of small samples, the UNet++ model yields the best performance and classification accuracy when compared to UNet, DeepLab V3+, and PSPNet, with a lower joint loss value and higher mIoU value of 0.432 and 0.871, respectively. The OA and macro F1 values of the UNet++ model based on the spatiotemporal transfer experiment are higher than 83% and 58%, respectively. Subsequently, according to the three-year time series classification results and reclassification rules, the disaster area in 2021 accounted for about 3.48% of the total area and was concentrated in the middle part of the study area. Finally, the total carbon sequestration of the five target crops in 2019, 2020, and 2021 was estimated by integrating statistical data, with the values of 2460.56, 2549.16, and 1814.07 thousand tons, respectively. These results can provide data and method support for damage assessment and carbon sequestration assessment in farmland ecosystems. The prediction classification accuracy without prior knowledge available from ground samples proves that the improved UNet++ model can provide better spatiotemporal transfer learning capability with respect to the baseline models. The transferable learning model is suitable for automatically extracting regional crop information from multi-feature imagery to produce near-real-time crop maps. The experimental conclusion can provide a method reference to RS segmentation and classification, and offer inspiration for understanding the crop carbon sequestration in farmland ecosystems.