Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability

Yang, Lingbo; Huang, Ran; Zhang, Jingcheng; Huang, Jingfeng; Wang, Limin; Dong, Jiancong; Shao, Jie

doi:10.3390/rs15092443

Open AccessArticle

Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability

by

Lingbo Yang

¹,

Ran Huang

¹

,

Jingcheng Zhang

¹

,

Jingfeng Huang

^2,3,*

,

Limin Wang

^4,5,

Jiancong Dong

¹ and

Jie Shao

⁶

¹

School of Artificial Intelligence, Hangzhou Dianzi University, Hangzhou 310018, China

²

Institute of Applied Remote Sensing and Information Technology, Zhejiang University, Hangzhou 310058, China

³

Key Laboratory of Agricultural Remote Sensing and Information Systems, Zhejiang University, Hangzhou 310058, China

⁴

Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China

⁵

Key Laboratory of Agri-Informatics, Ministry of Agriculture and Rural Affairs of China, Beijing 100081, China

⁶

Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(9), 2443; https://doi.org/10.3390/rs15092443

Submission received: 28 February 2023 / Revised: 18 April 2023 / Accepted: 5 May 2023 / Published: 6 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

Monitoring of rice planting areas plays an important role in maintaining food security. With powerful automatic feature extraction capability, crop mapping based on deep learning methods has become one of the most important research directions of crop remote sensing recognition. However, the training of deep learning models often requires a large number of samples, which restricts the application of these models in areas with a lack of samples. To address this problem, based on time-series Sentinel-1 SAR data, this study pre-trained the temporal feature-based segmentation (TFBS) model with an attention mechanism (attTFBS) using abundant samples from the United States and then performed an inter-continental transfer of the pre-trained model based on a very small number of samples to obtain rice maps in areas with a lack of samples. The results showed that an inter-continental transferred rice mapping model was feasible to achieve accurate rice maps in Northeast China (F-score, kappa coefficient, recall, and precision were 0.8502, 0.8439, 0.8345, and 0.8669, respectively). The study found that the transferred model exhibited a strong spatiotemporal generalization capability, achieving high accuracy in rice mapping in the three main rice-producing regions of Northeast China. The phenological differences of rice significantly affected the generalization capability of the transferred model, particularly the significant differences in transplanting periods, which could have resulted in a decrease in the generalization capability of the model. Furthermore, the study found that the model transferred based on an extremely limited number of samples could attain a rice recognition accuracy equivalent to that of the model trained from scratch with a substantial number of samples, indicating that the proposed method possessed strong practicality, which could dramatically reduce the sample requirements for crop mapping based on deep learning models, thereby decreasing costs, increasing efficiency, and facilitating large-scale crop mapping in areas with limited samples.

Keywords:

deep learning; rice mapping; generalization ability; inter-continental transfer; Sentinel-1; temporal feature-based segmentation

1. Introduction

Although food security is fundamental to human existence, the world is moving backward in its efforts to end hunger, food insecurity, and malnutrition [1]. There are still 3.1 billion people worldwide that do not have access to a healthy diet. As one of the most common food staples, rice cultivation is extremely important for maintaining food security [2]. Rice monitoring based on remote sensing can efficiently and cost-effectively provide rich information on planting areas, growth, yields, disasters, environments, and resource utilization [3,4].

With in-depth research on rice remote sensing classification methods, a number of large-scale regional rice mapping achievements have emerged [5,6]. Compared to other dryland crops, rice, as a paddy field crop, shows remarkable characteristics in both active and passive satellite remote sensing images. Based on this, a series of remote sensing classification methods for rice have been developed. Among these methods, one extracts rice using an expert knowledge-based decision tree or machine learning based on the significant water body features in rice fields during the transplanting period (i.e., Land Surface Water Index [LSWI] or Modified Normalized Difference Water Index [MNDWI]), combined with the significant vegetation characteristics of rice fields during the middle and late growth periods (i.e., Normalized Difference Vegetation Index [NDVI] and Enhanced Vegetation Index [EVI]). Xiao et al. [7,8,9] proposed the phenology- and pixel-based paddy rice mapping (PPPM) method based on MODIS time-series images and produced rice maps with a resolution of 500 m in South China, South Asia, and Southeast Asia. Peng et al. [10] proposed a sophisticated rice mapping method based on the time-series EVI, LSWI, and variable EVI/LSWI threshold function to identify mixed paddy rice fields from MODIS images. Dong et al. [11] obtained rice maps in Northeast Asia at a resolution of 30 m using Landsat images based on an improved PPPM method, achieving rice maps in Northeast China with a resolution of 10 m using the random forest method based on the vegetation index and texture features in time-series Sentinel-2 imagery [12]. Han et al. [13] obtained 10 m paddy rice maps for Northeast and Southeast Asia from 2017 to 2019 based on a series of rice features extracted from SAR and optical images. Boschetti et al. [13,14] proposed the PhenoRice algorithm based on MODIS data and obtained rice maps for Italy, India, the Philippines, and Senegal. Other researchers have used this algorithm to identify rice in Nepal [15], Jiangsu [16], and California [17]. Based on the V-shaped characteristics of rice pixels in Sentinel-1 SAR images before and after transplanting, Zhan et al. [18] constructed recognition rules for rice in multiple experimental areas. Such methods generally have a good generalization ability and interpretability and have achieved great success in rice recognition at large regional scales [12,19]. However, whether building rice recognition rules based on expert knowledge or ordinary machine learning methods, they must all manually build and screen rice recognition features, such that the accuracy of rice mapping may be affected by the experts’ subjective experience and ability, thus affecting the stability of model performance [20].

With the rapid development of deep-learning technology, crop mapping based on deep learning has gradually become a research hotspot. Compared with traditional methods, crop mapping methods based on deep learning can automatically extract rich and different types of features from massive raw satellite images and labels, which is more conducive to realizing accurate crop mapping in large-scale areas under complex scenarios [21,22,23,24]. Xu et al. [21] proposed the DeepCropMapping algorithm, which combines the long short-term memory and attention mechanism to achieve the high-accuracy identification of corn and soybeans in multiple regions of the United States (US) corn belt. This showed that the accuracy of the proposed method is superior to that of traditional methods, such as random forests. Zhang et al. [25] compared the capabilities of various deep learning and machine learning models in rice mapping based on remote sensing; the results showed that the performance of deep learning models was significantly better than that of traditional machine learning models. Whether based on temporal [21,26,27], spatial [28,29,30], or spatial-temporal fusion features [20,31,32,33], deep learning models show significant application potential in crop mapping based on remote sensing.

Despite progress in crop mapping using deep learning models, their reliance on large amounts of sample data limits their application. This limitation becomes even more apparent when attempting to use deep learning models for accurately identifying crops on a large regional scale in areas with few reliable samples, creating an urgent problem. In response to this problem, some scholars have attempted to examine the spatiotemporal generalization and transfer abilities of deep learning models. Wei et al. [29] divided a research area located in Arkansas, USA, into three subregions from south to north, analyzing the rice recognition ability of the deep learning model trained in one subregion in the other two subregions. Furthermore, Ge et al. [34] used the phenological matching method to transfer a rice mapping model trained with the Arkansas dataset to a research area in Liaoning, Northeast China, with an overall accuracy of more than 87%. However, there is still insufficient research on crop model inter-continental transfer based on a very small number of samples, which cannot answer key questions, such as the required sample size for model inter-continental transfer, the accuracy and spatiotemporal generalization ability of the transferred model, and the influencing factors affecting this ability, thus limiting the application of deep learning models in rapid and accurate crop mapping in areas with a lack of samples. Building on the success of Yang et al. [20] in applying the temporal feature-based segmentation (TFBS) model to rice mapping in the US, we conducted further research to explore the feasibility and effectiveness of inter-continental transfer for rice mapping models in areas with limited samples, which aims to answer the following questions:

(1): Is it feasible to achieve inter-continental transfer of pre-trained rice mapping models based on extremely limited samples from areas with a lack of samples, thereby reducing the cost of large-scale crop mapping and improving efficiency?
(2): How does sample size impact the accuracy of the model transfer during the transfer process? Can the transferred model’s accuracy match that of a model trained from scratch with a large number of samples?
(3): What is the spatiotemporal generalization capability of the transferred model, and which factors influence its spatiotemporal generalization performance?

2. Study Area and Data

2.1. Study Area

The study area consisted of six regions located in the USA and China (Figure 1 shows the six regions labeled A–E). Region A is located in the Mississippi Alluvial Plain, spanning Arkansas, Missouri, Tennessee, and Mississippi, and is the most important rice-planting area in the USA. The sowing date of rice in Region A started in early April, concentrated in May, and continued until mid–late June, while its harvest period began in middle and late August, concentrated in September, and continued to middle and late October [35]. Region B was located in the Sacramento Valley, California, which is one of the most important rice-producing areas in the US; its rice planting area accounts for 20.4% of the total rice planting area in the United States. The rice growth period in Region B lasted from April to October. Regions D, E, and F were located in the Songnen, Sanjiang, and Liaohe plain, respectively, the three most important rice-producing regions in Northeast China. Region C was the Shuangcheng District. The study area in Northeast China spans more than 1000 km from both east to west and north to south. The growth period of rice in Northeast China generally lasts from mid to late April to the end of September. The main phenological stages of rice in different regions of Northeast China, such as the transplanting and maturity stages, can differ by 20–30 days in 1 year [36]. For region C (Shuangcheng District), the phenological stages of rice may differ by more than 10 days between different years [36].

2.2. Sentinel-1 Imagery

Sentinel-1 imagery was employed in this study as a remote-sensing data source. Sentinel-1 is a two-satellite constellation developed by the European Space Agency. The dual-polarization C-band SAR instrument carried by the Sentinel-1 satellite is capable of all-day all-weather ground imaging. The stable and fast revisit cycle combined with its excellent ground imaging capability makes it a reliable data source for crop identification based on time-series remote sensing data. The Sentinel-1 data used in this study were processed using Google Earth Engine. Each Sentinel-1 image includes vertical transmit/vertical receive + vertical transmit/horizontal receive (VV + VH) polarization data preprocessed to the backscatter coefficient [20].

The Sentinel-1 images used in Regions A (Mississippi Alluvial Plain) and B (Sacramento Valley) were acquired in 2019. The images used in Region C (Shuangcheng District) were acquired from 2017 to 2020. The images used in Regions D (Songnen Plain), E (Liaohe Plain), and F (Sanjiang Plain) were obtained from 2017. Considering the phenological period of rice in the study area, all Sentinel-1 images acquired from 1 April to 2 November of each year were used. Although the revisit period of Sentinel-1 is 12 days, it is actually not possible to obtain Sentinel-1 data every 12 days in some areas due to missing data. Therefore, Sentinel-1 24-day composite time-series SAR data were generated based on the average of all Sentinel-1 images acquired during each 24-day period from 1 April to 2 November to ensure the consistency of time-series SAR data and avoid missing data (Figure 2). The final time-series SAR data contained nine 24-day-average composite SAR images; each image included VV and VH bands. The preprocessed Sentinel-1 images were used as input data for training the deep learning-based crop recognition model and crop mapping.

2.3. Ground Survey and Sampling

Field campaigns were carried out in Northeast China (Regions C, D, E, and F) from 2017 to 2022, and a series of ground survey points (Figure 1b–d) were collected, covering the main crop types (rice, maize, and soybean) in Northeast China (Figure 1e–h), as well as some other types of ground object such as wetlands, forests, etc. Among them, two ground surveys were carried out in Shuangcheng District (Region C) in early June and August 2017, and a large number of ground sample points were collected. These samples were used to produce an accurate rice distribution map in Shuangcheng District to verify the transfer accuracy of the rice classification model.

The field campaigns in regions D, E, and F were conducted in areas where rice was concentratedly distributed. The main method was to conduct a ground survey along the main roads at regular intervals to collect sample points’ photos and nearby crop distribution information. In region D, 401 ground survey points (blue dots in Figure 1b) were collected, most of which were located in region C; In region E, ground surveys were mainly carried out in the central area where rice was concentratedly distributed. Ground surveys were carried out at a total of 26 locations at intervals of approximately 10–20 km (blue dots in Figure 1c); In region F, ground surveys were carried out at a total of 76 locations at intervals of approximately 5–10 km (blue dots in Figure 1d). These samples obtained through field surveys provide necessary support for rice mapping and accuracy assessment.

Considering the uneven distribution and relatively small number of sample points collected, we used a random generation method to generate 1500 validation sample points in regions D, E and F to better evaluate classification accuracy. Visual interpretation was used to determine whether these sample points were rice based on their proximity to ground survey locations or routes. Photos obtained from ground surveys combined with high-resolution satellite images (Google Earth images) and SAR images were used to classify sample points located near ground survey locations or routes; for other sample points, high-resolution satellite images (Google Earth images) and time-series SAR images combined with Sentinel-2 optical images and existing rice mapping result [12] were employed. As a result, 217 rice validation sample points and 1283 non-rice validation sample points were obtained (Figure 1b–d).

2.4. Crop Type Maps

Crop type maps are primarily used as the label data for the deep learning model for model training and accuracy verification. In this study, three crop-type maps were used according to the situation in different regions. For the Mississippi Alluvial Plain (Region A) and Sacramento Valley (Region B), crop data layer (CDL) data, produced by the National Agricultural Statistics Service (NASS), United States Department of Agriculture (USDA), were employed [37]. The year of the CDL data used in this study was 2019 with a resolution of 30 m.

For the Songnen (Region D), Liaohe (Region E), and Sanjiang plains (Region F), the 10 m crop type map of year 2017 in Northeast China produced and released by You et al. [12] was used. The user accuracy and producer accuracy of the rice mapping product reached 0.87 and 0.92, respectively [38].

For the Shuangcheng area (Region C), Sentinel-1 and Sentinel-2 satellite images were used as remote sensing image data sources, high-resolution satellite images and ground surveys were used to obtain sample points; the random forest method was used for crop classification. Crop maps of the Shuangcheng area from 2017 to 2020, with a resolution of 10 m, were produced and used in this study. The relevant classification method can be found in [2], which will not be repeated here.

3. Methods

Figure 2 shows the flowchart of this study, which included the following steps.

(1): Data preparation and preprocessing

The time-series SAR images and corresponding label data used for model training and verification were cut into tiles with a spatial size of 128 × 128 pixels. The time-series SAR image includes 18 bands; each band represents the VV or VH backscatter coefficient of a 24-day period.

(2): Model pre-training

A rice remote sensing recognition model was trained based on the dataset of the Mississippi Alluvial Plain (Region A, year 2019) in the southern United States with a total of 6608 samples to provide a pre-trained model for fine-tuning and transfer. Based on the Mississippi Alluvial Plain dataset, the TFBS and attTFBS models were trained and verified simultaneously by 10-fold cross-validation. The accuracies of the two models were compared; the model with the highest accuracy was used as the base model (Model A) for fine-tuning and transfer (Figure 3). Subsequently, the pre-trained model was directly used for rice mapping in the Shuangcheng District to test whether fine-tuning technique is needed to perform inter-continental transfer of the model.

(3): Model inter-continental transfer based on very few samples

Model A was transferred and applied to the Shuangcheng District of Northeast China (Region C, Model C, 2017) by fine-tuning with a very small number of samples for rice classification and accuracy evaluation. The parameters of the first layer (temporal feature extraction layer) and the output layer of the pre-trained model were fine-tuned to achieve the inter-continental transfer of the model. In this study, the classification accuracy was evaluated by indicators, including the overall accuracy, F-score, recall, and precision.

(4): Spatial-temporal generalization ability test and impact factor analysis of the transferred model

The model C was directly used for mapping rice in Shuangcheng (Region C) from 2018 to 2021, thereby testing the temporal generalization ability of the transferred model. In addition, the model was also used to produce the rice distribution map of Songnen (Region D), Liaohe (Region E), and Sanjiang Plain (Region F) in 2017, testing its spatial generalization ability. The accuracy of the classification results was evaluated using ground survey sample points and existing rice spatial distribution results, and factors affecting rice temporal and spatial generalization ability were analyzed.

In addition, this study used the t-distributed stochastic neighbor embedding (t-SNE) feature visualization analysis method to analyze the differences between the short-distance transfer of the model (from the Mississippi Alluvial Plain to the Sacramento Valley, Model B) and the inter-continental transfer of the model (from the Mississippi Alluvial Plain to Northeast China, Model C), so as to analyze which network layer parameters need to be adjusted when using the fine-tuning method for model transfer.

(5): Evaluate the difference in crop recognition ability between the model transferred based on a small number of samples and model trained from scratch based on massive samples

Model transfer can greatly reduce the dependence on sample data. However, the practicality of the proposed method may be reduced if there is a significant discrepancy in its rice recognition capabilities compared to those of a model trained from scratch using a large number of samples. Therefore, to evaluate the practicality of the proposed method, a deep learning-based rice mapping model was trained from scratch using all available data in Shuangcheng, Northeast China, and its performance was then compared with that of the transferred model in this study.

3.1. Attention TFBS(AttTFBS)

The attTFBS evolved from the TFBS model by increasing the attention mechanisms to the temporal feature extraction module. Through the introduction of attention mechanisms, the model can focus on complicated temporal features that are more helpful at improving the classification ability. Specifically, the attTFBS model contains three modules: an attention-based Long Short-Term Memory (attLSTM) module used for extracting deep-seated temporal features, UNET module used for extracting spatial semantic information from the temporal features received from the attLSTM module, and an output module to perform classification based on the features fed by the UNET module (Figure 4).

3.1.1. Attention-Based LSTM Module

The attLSTM module consisted of an input layer, LSTM layer, and attention layer. The input layer was used to receive time-series SAR image tiles and convert them into the data format required by the LSTM layer. The LSTM layer was used to extract the deep temporal features contained in the time-series SAR data. The attention layer employed the attention mechanism to screen a large number of temporal features obtained by the LSTM layer such that the model can focus on features with high importance.

The shape of the data fed into the input layer is [bs, 18, 128, 128], which represents the batch size, channels, width, and height. Then, the shape of the input data is converted to [bs × 128 × 128, 9, 2], which corresponds to the number of pixels of the input image tiles, sequence length (also known as time steps), and number of features of each pixel (VV and VH).

The LSTM layer consisted of 64 LSTM units, each of which included nine time steps; each time step corresponded to the backscatter coefficient values of the SAR image for a 24-day time period. The shape of the data fed to each LSTM unit was [2,9]; each time step of the LSTM unit produced a hidden-state value. Thus, the 64 LSTM units produced a hidden state matrix with a shape of [bs × 128 × 128, 9, 64], which indicates that the number of features in each time period increased from 2 to 64. The specific formulas for each time step (also known as a cell) of the LSTM unit were as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(1)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(2)

C_{t}^{'} = t a n h (W_{C^{'}} \cdot [h_{t - 1}, x_{t}] + b_{C^{'}}),

(3)

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot C_{t}^{'},

(4)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}),

(5)

h_{t} = o_{t} \cdot t a n h (C_{t}),

(6)

where

σ (\cdot)

and

t a n h (\cdot)

are sigmoid and tanh functions, respectively;

f_{t}

,

i_{t}

, and

o_{t}

are the forget gate, input gate, and output gate, respectively (all determined by the hidden state of the previous time step

h_{t - 1}

and current input

x_{t}

);

f_{t}

determines how much of the cell state input from the previous time step,

C_{t - 1}

, is retained in the current state

C_{t}

;

C_{t}^{'}

is a candidate state that can be added to the current state

C_{t}

and is also determined by the old hidden state

h_{t - 1}

and current input

x_{t}

; and

i_{t}

determines how much of the candidate state

C_{t}^{'}

will be input into the current state

C_{t}

. Thus, the current state

C_{t}

can be calculated by adding the retained old-cell state to the input candidate state (Equation (4)). Finally,

o_{t}

determines how much of the current state

C_{t}

will be output to the new hidden state of the current time steps (Equation (5)). Here,

W_{f}

,

W_{i}

,

W_{C^{'}}

,

W_{o}

,

b_{f}

,

b_{i}

,

b_{C^{'}}

and

b_{o}

are the learnable weights and biases of the LSTM layer, respectively. Through the three well-designed gate structures, the LSTM network can mine deep-seated temporal features from raw time-series data and largely avoid the gradient explosion and gradient disappearance faced by the vanilla RNN.

After the original time-series SAR data flowed through the LSTM network, the number of features was increased from 2 (VV and VH) to 64; a total of 576 features were obtained in nine periods. Considering that the importance of the features of each period for rice classification is different, an attention layer was employed subsequent to the LSTM layer to determine the weight of the features of each period. Weighted summation was used to obtain a comprehensive feature based on the features of the nine periods. Thus, the shape of the output data in the attention layer was [b × 128 × 128, 64]. The detailed formulas for the attention layer are as follows:

W_{h_{t}}^{a t t} = S o f t m a x (W_{h_{t}} \cdot h_{t} + b_{h_{t}}),

(7)

h_{t}^{a t t} = W_{h_{t}}^{a t t} \cdot h_{t},

(8)

O^{a t t} = \sum_{t = 1}^{9} h_{t}^{a t t},

(9)

where

W_{h_{t}}

and

b_{h_{t}}

are the weight and bias parameters, respectively, which are updated automatically in the model training progress;

W_{h_{t}}^{a t t}

is the weight value that determines the contribution of its corresponding hidden state,

h_{t}

, to the output data of the attention layer,

O^{a t t}

;

S o f t m a x (\cdot)

is the softmax function, which makes the sum of

W_{h_{t}}^{a t t}

(t = 1, 2, 3, …, 9) equal to one; and

h_{t}^{a t t}

is the weighted hidden state of time step t.

3.1.2. UNET Module

After mining the deep-seated time-series features using the attLSTM module, a UNET module was used to mine the spatial semantic features between the time-series features of different pixels to obtain the spatiotemporal fusion features for rice recognition. Before the output data of the attLSTM module are input to the UNET module, the scattered pixels must be reassembled into complete image tiles, i.e., the shape of the data must be converted from [bs × 128 × 128, 64] to [bs, 128, 128, 64].

UNET is a typical semantic segmentation network that consists of an encoder and decoder, which can mine different levels and types of spatial semantic information in the input image data to achieve end-to-end image classification. The encoder consists of a series of convolutional layers, ReLU (rectified linear unit) activation functions, and max-pooling layers. Its main function is to extract spatial semantic features at different levels, while continuously compressing the image resolution. After the input data passed through the encoder, its spatial resolution was reduced from the original 128 × 128 pixels to 8 × 8 pixels, but the number of feature channels increased from the original 64 to 1024; thus, the data shape was converted to [bs, 8, 8, 1024]. The decoder followed the encoder, and its basic structure was symmetrical to that of the encoder. It was composed of a series of convolution layers, ReLU activation functions, and upsampling layers. The main purpose of the decoder was to gradually improve the spatial resolution of deep-seated features until they were finally restored to the original 128 × 128 pixels to achieve pixel-based classification. Meanwhile, the spatial semantic features obtained at different levels of the encoder were concatenated with the features of the corresponding levels of the decoder via skip connection structures to combine the accurate boundary information and deep abstract semantic features. After the input data passed through the decoder layer, the shape was restored to [bs, 128, 128, 64]. Although the number of output features in the UNET module did not increase compared with the output features of the attLSTM module, the type of features changed from simple temporal features to spatiotemporal fusion features with a stronger crop recognition ability.

3.1.3. Output Module

The output layer, also referred to as the classification layer, is composed of a 1 × 1 convolutional layer and sigmoid layer to identify rice pixels and non-rice pixels from the input 64 spatiotemporal fusion features. The shape of the final output data was [bs, 128, 128, 1], which represented the probability that each pixel in the input image tile was classified as rice. In this study, pixels with a probability of greater than 0.5 were classified as rice; the rest were classified as non-rice.

3.2. Model Pre-Training

Model pre-training was conducted using the dataset from the Mississippi River alluvial plain in 2019. Here, we first compared the accuracy of the attTFBS model with that of the ordinary TFBS model using 10-fold cross-validation to evaluate the rice recognition ability of the attTFBS model. Next, we used 70% of the data as the training set and the remaining as the validation set to train the deep learning model. The trained model was used as the base model for the subsequent fine-tuning of the rice classification models in other regions. The batch size parameter for training was set to 16; Adam was selected as the optimizer. Please refer to the code attached to this article for the other model parameters, which will not be repeated here.

In order to test whether fine-tuning techniques are necessary for the inter-continental transfer of the pre-trained model, we first directly applied the un-fine-tuned model (Model A) to rice classification in Shuangcheng District. We then compared the time-series SAR curves of the main crops in the Mississippi Alluvial Plain and Shuangcheng District, as well as the differences in rice fields in the time-series SAR images of these two regions. Additionally, we used principal component analysis (PCA) [39] to extract the first three principal components (PCs) of the time-series SAR images of both regions and compared the differences of rice fields in the pseudo-color images composited by the extracted first three PCs of both regions.

3.3. Inter-Continental Model Transfer Based on Fine-Tuning and Small Sample Sizes

Inter-continental model transfer was conducted based on the fine-tuning of the pre-trained model (Model A) and a small sample size. The Shuangcheng District, Songnen Plain, Sanjiang Plain, and Liaohe Plain, which are located in Northeast China, were selected as the target areas for model inter-continental transfer. The dataset for the Shuangcheng District in 2017 was used as the training and validation dataset to fine-tune the model. Considering that the features of rice in Northeast China are significantly different from those in the United States, the pre-trained model was unable to extract effective recognition features of rice in this region (which will be discussed in the next section). We fine-tuned the parameters of both the first layer (temporal feature extraction layer) and the output layer of the pre-trained model with a small sample size; the remaining layers of the pre-trained model were frozen. After the fine-tuning of the model was completed, it was applied to rice mapping in the Shuangcheng District from 2018 to 2020 to test the temporal generalization ability of the model. The fine-tuned model was also applied to the Songnen, Sanjiang, and Liaohe plains to test the spatial generalization ability of the inter-continental transferred model.

Additionally, the influence of the number of samples used for fine-tuning on the accuracy of inter-continental model transfer was examined to determine the balance point between the number of samples and the rice mapping accuracy during model fine-tuning. In the process of model fine-tuning and transfer, the number of training samples used was increased from 5 to 50 at intervals of 5. Each number of samples was repeatedly trained 10 times. The average value of each accuracy assessment metric in the 10 training samples was taken as the final accuracy result. Its standard deviation was calculated to evaluate the stability of the model transfer with different sample sizes.

3.4. Dimensionality Reduction and Visualization of Features

The dimensions of the final temporal-spatial fusion feature input to the classification layer of the attTFBS model were 64. To realize feature visualization of the extracted features, the t-SNE method was used for dimensionality reduction and 64-dimensional features were converted into 2-D data. The effectiveness of the extracted features for rice classification was analyzed based on 2-D data obtained using the t-SNE method. The t-SNE method was developed based on the classical SNE method [40,41]. The goal of t-SNE is to reduce data with higher feature dimensions to lower dimensions (such as 2-D or 3-D) for analysis or visualization. If two data points are similar in a high-dimensional space (e.g., they belong to the same category), after reducing their dimensionality to a low-dimensional space, their distance will then be relatively close, and vice versa. The Kullback–Leibler (K–L) divergence was employed by t-SNE to quantitatively calculate the similarity between the distances of points in high-dimensional space and the distances of points in low-dimensional space. t-SNE minimizes the sum of the K-L divergences of all point pairs through the gradient descent algorithm to reduce the dimension of high-dimensional space points to low dimensions while ensuring the similarity of the distances of point pairs.

In this study, 10,000 rice pixels and 10,000 non-rice pixels from each study area were randomly selected from the feature images for feature dimensionality reduction and visual analysis. The pre-trained model and the fine-tuned model (Model C) were used to extract the hidden spatiotemporal features from the time-series SAR images of Shuangcheng District to compare the effectiveness of the features obtained by each model. At the same time, to further analyze the difference between model inter-continental transfer and model close-range transfer, this study also used a pre-trained model to extract the features of the Mississippi Alluvial Plain (Region A, which was used as the source domain) and Sacramento Valley (Region B, which is approximately 2800 km from Region A). The t-SNE method was used in all of these areas to perform feature reduction and visualization.

3.5. Accuracy Assessment

The accuracy evaluation metrics employed in this study included the overall accuracy (OA), F-score, kappa coefficient, recall, and precision. The overall accuracy represents the proportion of correctly classified pixels in the classification results. Kappa measures the agreement between the classification image and the reference image. Recall is the proportion of rice pixels in the reference image that have been correctly classified. Precision is the proportion of rice pixels in the classification image that are correctly classified. F-score is the harmonic mean of precision and recall. It takes into account how the data are distributed.

Additionally, Sankey maps were used to show the flow direction of the classification results and visually analyze the misclassification situation. The width of the arrows in the Sankey maps was proportional to the flow rate of each class.

The formulas for the accuracy evaluation metrics were as follows.

R e c a l l = \frac{N u m (P_{r i c e}^{c} \cap P_{r i c e}^{r})}{N u m (P_{r i c e}^{r})},

(10)

P r e c i s i o n = \frac{N u m (P_{r i c e}^{c} \cap P_{r i c e}^{r})}{N u m (P_{r i c e}^{c})},

(11)

F_{s c o r e} = \frac{2 \times R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n},

(12)

O A = \frac{N u m [(P_{r i c e}^{c} \cap P_{r i c e}^{r}) \cup (P_{o t h e r s}^{c} \cap P_{o t h e r s}^{r})]}{N u m (P_{a l l})}

(13)

P_{e} = \frac{N u m (P_{r i c e}^{c}) \times N u m (P_{r i c e}^{r}) + N u m (P_{o t h e r s}^{c}) \times N u m (P_{o t h e r s}^{r})}{{[N u m (P_{a l l})]}^{2}}

(14)

K a p p a = \frac{O A - P_{e}}{1 - P_{e}}

(15)

where Num(*) denotes a function that calculates the number of pixels;

P_{r i c e}^{c}

and

P_{r i c e}^{r}

refer to the rice pixels in the classification image and corresponding reference image, respectively;

P_{o t h e r s}^{c}

and

P_{o t h e r s}^{r}

refer to all non-rice pixels in the classification and reference images, respectively; and

P_{a l l}

refers to all the pixels in the classification image.

4. Results

4.1. Model Pre-Training

Based on the dataset for the Mississippi alluvial plain in 2019, the rice classification ability of the attTFBS and TFBS models was compared by 10-fold cross validation; the model with the highest accuracy was selected as the pre-trained model for subsequent model transfer research. Figure 5 shows the trend of the attTFBS model accuracy with the training epoch. After the training epoch for the attTFBS model reached 20, its loss basically converged to a lower level and each accuracy metric reached a higher and stable level, indicating that the model had obtained a high accuracy. Figure 6 shows the accuracy comparison between the attTFBS model and the vanilla TFBS model. In the 10-fold cross-validation, the attTFBS model outperformed the TFBS model in all accuracy metrics when the accuracy stability was consistent with that of the vanilla TFBS model. The F-score, kappa coefficient, recall, and precision of attTFBS were 1.97%, 2.67%, 1.51%, and 1.15% higher than those of the vanilla TFBS model, respectively. This indicates that the attTFBS outperformed the vanilla TFBS with a higher rice classification accuracy and is suitable as a pre-trained model for model transfer research.

4.2. Rice Mapping in Northeast China Based on the Pre-Trained Model without Fine-Tuning

We then used the model (Model A) trained in the Mississippi Alluvial Plain (Region A) directly for rice mapping in the Shuangcheng area of Northeast China (Region C) in 2017, and the classification accuracies are shown in Table 1. From Table 1, it is clear that the model without fine-tuning had difficulty mapping rice in the target area, and its F-score of rice was only 0.2253, and the Kappa coefficient was only 0.1895. It should be pointed out that although the overall accuracy of classification has reached 90.31%, the proportion of rice pixels in the total number of pixels in the Shuangcheng area is only 2.87%. Therefore, the overall accuracy cannot represent the true classification accuracy of the model for rice, which can also be seen from the recall and precision values of rice in Table 1. Specifically, the recall value of the pre-trained model for rice in the Shuangcheng area was only 0.4916, indicating that up to 50.84% of rice pixels have been misclassified; The precision value of rice was only 0.1462, indicating that up to 85.38% of rice pixels in the classification result were actually pixels of other classes.

To investigate the reasons for this result, we compared the differences in image data and rice features between the United States and China. Figure 7a,c are pseudo-color images of raw SAR data from the United States (Region A) and Northeast China (Region C), respectively. Figure 7b,d are pseudo-color PC images of the United States (Region A) and Northeast China (Region C), respectively. The white polygons in Figure 7 represent rice fields. Figure 7 shows a significant difference in color between the rice fields in the United States and Northeast China, indicating that the rice fields exhibit different temporal features in the images of the two regions. Therefore, we calculated the SAR time-series curves of rice and corn in the two regions, as shown in Figure 8. Figure 8 indicates that the minimum VV value of rice in Shuangcheng appeared on 19 May, which was significantly later than that of Mississippi Alluvial Plain (25 April).

In addition, we further made the probability density functions (PDFs) and boxplots of the backscattering coefficient values of rice pixels in all 9 periods of Region A and Region C (Figure 9 and Figure 10). Figure 9 and Figure 10 indicate that there were significant differences in the distribution of rice backscatter coefficient values between Region A and Region C on 1 April, 25 April, 19 September, and 10 October. Taking 25 April as an example, the peak value of the PDF of the rice pixels in Region A appears at −14.14, its standard deviation is 3.24 (Figure 9b), its median is −13.94, and the interquartile range (IQR) is 4.6 (Figure 10b); The peak of the PDF of the rice in Region B appears at −10.99, with a standard deviation of 2.27 (Figure 9b), a median of −11.27, and an IQR of 2.8 (Figure 10b).

4.3. Influence of Sample Size on Model Fine-Tuning and Inter-Continental Transfer Accuracy

Based on the fine-tuning method, the model trained on the Mississippi alluvial plain in the United States was transferred to the Shuangcheng area in Northeast China; the influence of sample size on the accuracy of model fine-tuning and inter-continental transfer was evaluated by changing the number of samples in the model transfer training process (Figure 11). When the sample size was 5, the accuracy of model migration was quite low and unstable; the average F-score, kappa coefficient, recall, and precision of the rice were only 0.5684, 0.5616, 0.7594, and 0.5541, respectively, with standard deviations of 0.3496, 0.3474, 0.0992, and 0.3553, respectively. When the sample size was increased to 10, the accuracy of model fine-tuning and transfer reached a relatively high level; the average F-score, kappa coefficient, recall, and precision of rice were 0.8502, 0.8439, 0.8345, and 0.8669, respectively, with standard deviations of 0.0030, 0.0032, 0.0108, and 0.0115, respectively. However, as the number of samples continued to increase, the accuracy of the model transfer increased slowly, whereas the stability of the model remained basically unchanged. When the sample size was increased to 50, the average F-score, kappa coefficient, recall, and precision of rice were 0.8560, 0.8499, 0.8424, and 0.8704, respectively, with standard deviations of 0.0029, 0.0029, 0.0135, and 0.0107, respectively. Based on these results, the number of samples for model inter-continental transfer was set to 10 to reduce the demand of samples to the greatest extent possible on the premise of ensuring high model transfer accuracy.

4.4. Rice Mapping of the Shuangcheng District Based on the Fine-Tuned Model and Its Temporal Generalization Ability

Based on the fine-tuned rice classification model, rice mapping in the Shuangcheng area from 2017 to 2020 was conducted (Figure 12). The accuracy of the rice classification results was evaluated based on reference maps (Figure 13). The results showed that it is feasible to realize the inter-continental transfer of a pre-trained rice mapping model based on a fine-tuning method and a small sample size, which is of great significance for rice mapping in areas lacking samples and can significantly reduce the cost of deep learning-based rice mapping models. The F-score values of the rice maps for 2018, 2019, and 2020 reached 0.8316, 0.8180, and 0.8432, respectively. The kappa coefficients of the rice maps for 2018, 2019, and 2020 were 0.8259, 0.8123, and 0.8388, respectively (Figure 13). The high F-score and kappa values indicated that the model after inter-continental transfer based on a small number of samples still had a strong temporal generalization ability, which further expands the applicability of the model after transfer. The precision values of the rice maps in 2018, 2019, and 2020 reached 0.9154, 0.9604, and 0.9402, respectively (Figure 13), which indicated that the pixels of other ground object classes were barely misclassified into rice when the transferred model was applied over an extended period. Moreover, the recall values of the rice maps in 2018, 2019, and 2020 were 0.7619, 0.7124, and 0.7644, respectively, which were significantly lower than the precision values, indicating that, compared with the commission error for rice, the transferred model was more prone to rice omission errors (i.e., misclassifying rice pixels into pixels of other classes). This was due to the slightly different values of the rice backscattering coefficient between different years in the Shuangcheng area, which reduced the possibility of rice pixels being identified and led to a lower recall value (Figure 14). For example, the VV values of rice in 2017 decreased from −11.2 (on 25 April) to −14.9 (on 19 May) during the transplanting stage, with a decrease of 3.7, indicating that rice pixels in 2017 showed significant irrigation characteristics during the transplanting stage. However, in 2018, 2019, and 2020, the VV value only decreased by 1.8, 2.4, and 2.3, respectively, during the transplanting stage, which was significantly smaller than that in 2017, indicating that the irrigation characteristics of paddy fields were not as significant as those in 2017, thus leading to the omission of rice classification (Figure 14). In addition, differences in crop phenology between different years, as well as slightly different dates of SAR image acquisition, can also have a certain impact on the time series curves for VV and VH between different years, thereby affecting the temporal generalization ability of the model.

4.5. Spatial Generalization Ability of the Inter-Continental Transferred Rice Mapping Model

The rice mapping model fine-tuned based on the 2017 Shuangcheng dataset was further extended to three typical rice-growing areas in Northeast China, i.e., the Songnen, Liaohe, and Sanjiang plains (Figure 15, Figure 16 and Figure 17), to test the spatial generalization ability of the inter-continental transferred model. The accuracy was validated using 1500 validation samples points from Northeast China, and the results showed that the overall accuracy of rice mapping reached 96.87%, the Kappa coefficient reached 0.8637, and the recall and precision of rice reached 0.8065 and 0.9722, respectively (Table 2). This result indicates that it is feasible to use a small amount of data from a small region to perform model inter-regional transfer and then apply the transferred model to a larger region for rice recognition, which helps to reduce the difficulty of large-scale inter-regional transfer and application of crop recognition models. However, we also observed that the recall value of rice is far lower than the precision value, indicating that the model is more prone to misclassifying rice pixels as non-rice pixels compared to misclassifying non-rice pixels as rice pixels. We further compared our rice mapping results with the crop distribution map produced by You et al. [12]) to analyze the reason for this phenomenon.

Figure 18 illustrates the accuracy of the rice mapping results as compared to the crop maps created by You et al. [12]. With respect to the overall accuracy of rice classification, as represented by the F-score and Kappa coefficient in the Songnen, Liaohe, and Sanjiang plains, the transferred model demonstrated high rice recognition accuracy in all three regions. The F-score values were 0.8697, 0.8625, and 0.8463, while the Kappa coefficients were 0.8605, 0.8537, and 0.8315, respectively. This indicates that the inter-continental transferred model has a robust ability for spatial generalization. Additionally, the precision values in all three regions (0.9192, 0.9001, and 0.9458) were noticeably higher than the recall values (0.8252, 0.8280, and 0.7657), suggesting that the omission error for rice pixels was more pronounced than the commission error. This phenomenon can also be observed intuitively in the Sankey diagrams of rice classification (Figure 19). The narrow light blue flows in Figure 19 indicate that a significant portion of rice pixels are misclassified as non-rice pixels in all three regions, while only a small number of non-rice pixels are misclassified as rice pixels. A comparison between Figure 11 and Figure 18 reveals that when the model was fine-tuned using data from the Shuangcheng region, the recall and precision of rice were relatively close (0.8345 and 0.8669, respectively); however, when the fine-tuned model was applied to a larger spatial extent for rice recognition, the difference between the recall and precision of rice increased significantly, with the recall of rice being significantly lower than the precision.

Taking the Sanjiang Plain with the lowest accuracy as an example, we analyzed the time-series curves of the misclassified and missed rice pixels and the rice pixels in the Shuangcheng area (Figure 20). The results showed that if the lowest point of the VV time-series curve was taken as the transplanting period, the correctly classified rice and the rice in the Shuangcheng area had the same transplanting period, which was between May 19th and 11 June; the non-rice pixels that were misclassified as rice also had the lowest point of their VV time-series curve appear in the same period, leading to their being misidentified as rice pixels. Additionally, for the rice pixels that were incorrectly classified as non-rice, their transplanting period was between 25 April and 18 May, significantly earlier than the rice in the Shuangcheng area (the elliptical area in Figure 20c), thus leading to the model misidentifying them as non-rice pixels. This indicates that the difference in the transplanting period of rice will significantly affect the spatial generalization ability of the model. When the transplanting period of rice in two regions differs significantly, the misclassification error of rice pixels will increase, thus reducing the accuracy of rice classification, especially the recall value.

4.6. Feature Visualization Analysis before and after Model Transfer

T-SNE method was employed in this study to visually analyze the spatiotemporal fusion features extracted by each model, evaluate the effectiveness of the features in the model transfer process, and provide a basis for which layers to adjust in the model fine-tuning. Figure 21a shows the t-SNE feature dimensionality reduction results for rice and other classes in the Mississippi alluvial plain (Region A) in 2019. It could be found that the features extracted by the attTFBS model (Model A) could effectively distinguish rice from other ground objects. Figure 21b shows the t-SNE dimensionality reduction results of the features of rice and other types of ground objects in the Sacramento area (Region B) in 2019 extracted using the attTFBS model (Model B) with the same hidden layers as the model used in the Mississippi alluvial plain (Model A). Although this area (Region A) is thousands of kilometers from the source area (Region B), the features extracted by the model could still be effectively used to distinguish rice from other classes. However, when the model (Model A or B) was applied directly to the inter-continental (Region C), it could no longer extract effective features for rice classification, which was confirmed by the visualization results of the t-SNE feature dimension reduction (Figure 21c–f). It is not possible to map rice in Region C directly using model A. Then, a fine-tuned model (Model C) was used to extract the spatiotemporal fusion features of rice in the Shuangcheng area (Region C) from 2017 to 2020; the t-SNE method was used to reduce their dimensions and visualize them; the results are shown in Figure 21g–j. This indicates that the features extracted by the fine-tuned model (Model C) significantly improved the rice identification ability compared with the model without adjusting the parameters of the feature extraction layer (Model A or B), thus significantly improving the confusion of rice and non-rice pixels. The results showed that we must fine-tune the feature extraction layer, as well as the output layer, to realize the inter-continental transfer of the pre-trained model.

4.7. Spatio Temporal Fusion Features Compared to Original Time Series SAR Data

We also used the t-SNE method for dimension reduction and visualization of the original time-series SAR data (the feature dimension was 18, i.e., 9 periods × 2 backscattering coefficients) in the Shuangcheng area to analyze the separability of the original features for rice and other ground objects. Figure 22 shows the results. There was serious confusion between the rice pixels and pixels of other types of ground objects. This was mainly because the original features were purely temporal; the spatiotemporal relationship between each pixel and its surrounding pixels was not considered. Consequently, it was difficult to identify rice based solely on the original VV and VH time-series data. A comparison with Figure 21 showed that the deep learning model had powerful feature extraction ability, which could mine deep-level spatiotemporal fusion features from time-series remote sensing images, thus providing the possibility for high-precision rice identification.

4.8. Accuracy Comparisons between the Fine-Tuned Models and Retained Models

We found that the fine-tuned model based on a small sample size could realize inter-continental transfer of the pre-trained model and obtain high rice classification accuracy. However, the gap between the rice classification accuracy of the fine-tuned model based on a small sample size and that of the model trained from scratch based on a large number of samples also affected the application potential of model transfer based on the fine-tuning method in large-scale rice mapping. Therefore, we compared the rice mapping accuracy of the model fine-tuned based on 10 samples with that of the 10-fold cross-validation based on all samples in the Shuangcheng area in 2017; the results are shown in Figure 23. The rice mapping accuracy based on the fine-tuned model was based on the accuracy of retraining a model using all samples from scratch. For example, the F-score and kappa coefficient of the retrained model, which represent the overall classification accuracy, were only 0.58% and 0.63% higher, respectively, than those of the fine-tuned model based on a small sample size. The results showed that it was completely feasible to transfer the rice mapping model trained in areas with abundant samples to areas lacking samples based on the fine-tuning method and a small number of samples, such that we could realize low-cost, large-scale rice mapping, which has high application value.

5. Discussion

5.1. Influencing Factors for Model Generalization Ability

In this study, we realized the inter-continental transfer of the model based on the fine-tuning method. The results show that with this method, only a very small number of samples are needed to achieve a rice recognition accuracy similar to that of the model trained with a large number of samples, showing the strength of the proposed method. However, although current research shows that the transferred model has a strong spatiotemporal generalization ability, research on the factors influencing this ability is still insufficient. Among the factors affecting the spatiotemporal generalization ability of the model, one is the similarity of features between crops and background features in the source and target domains. When the rice classification features in the target domain are similar to those in the source domain, the accuracy of model transfer tends to be higher (Figure 13 and Figure 14). The result of this study indicates that the different transplanting dates of rice have a significant impact on the generalization ability of the transferred model (Figure 18). Nevertheless, there are still many other factors that affect the similarity and spatiotemporal generalization ability of the model, including sample factors (the number, distribution, and class proportions of samples), model factors (the type and architecture of the model, among others), image factors (acquisition time, quality, type, and processing method of remote sensing images, among others), and other factors (inter-annual/regional differences in rice phenology, rice planting mode, crop structure, and fragmentation of cultivated land). The influences of these factors are complex. Distinguishing the main influencing factors in a targeted and accurate manner, analyzing the influence mechanism of these influencing factors on the spatiotemporal generalization ability of the model, and developing suitable methods to improve the spatiotemporal generalization ability of the model are of great significance.

5.2. Advantage and Limitation

Large-scale crop mapping is restricted by the difficulty in obtaining sample data and the lack of generalization ability of models [38,39]. Transfer learning provides a promising solution to this problem by transferring samples or models across regions or years [34,42,43,44]. However, existing research on the size of the sample dataset required for model transfer is lacking, and research on the difference between the transferred model and the model trained with a large number of local samples is also insufficient, which restricts large-scale crop mapping based on the transfer learning method. In response to this situation, this study evaluated the impact of different sample sizes on model transfer accuracy, analyzed the spatiotemporal generalization ability of the transferred model and possible influencing factors that affect this ability. It was found that the transferred model can achieve accuracy close to those trained with a large number of local samples, providing a solid practical foundation for crop mapping based on transferred models. In addition, only time-series SAR images were used in this study, making it easier to combine transfer learning methods for applications of large-scale crop mapping.

However, there are still certain shortcomings in this study. Firstly, although SAR data has a strong recognition ability for rice, its recognition ability for other crops is relatively low. Therefore, optical images are required to improve classification accuracy. Secondly, the crop mapping model is still limited in its recognition ability under complex situations, such as different crop phenology and planting structures. Thirdly, only CDL data were used as label data in this study, which limited feature extraction ability of pre-trained models in other regions. With the increasing abundance of label data, a more powerful pre-trained model based on more and more widely distributed crop label data will help improve crop mapping ability based on transfer learning. Finally, it should be noted that the validation samples and reference rice maps used in Northeast China in this paper, as well as the CDL data from the United States used for model pre-training, are not entirely accurate label data. Due to the need for a large number of samples to support crop mapping based on deep learning, current research on large-scale deep learning-based crop mapping mainly relied on existing high-accuracy crop classification results such as CDL [21,22,34]. Assessing the impact of inaccurate label data on the reliability of deep learning crop mapping models will help improve the credibility of crop mapping results based on deep learning.

6. Conclusions

Aiming to address the conflict between the requirement for massive samples and the lack of sample data in the process of building a large-scale crop mapping model based on deep learning and remote sensing data, this paper proposes an inter-continental transfer method based on fine-tuning and realizes the large-scale high-resolution rice mapping based on a very small number of samples. The following conclusions were drawn:

(1): A mere 10 samples were used to achieve inter-continental transfer of the pre-trained model in this study, resulting in high-accuracy rice mapping of three typical regions in Northeast China, with an F-score of 0.8502.
(2): With the continued increase in sample size, the accuracy improvement in the transfer model was not significant. When the sample size is 50, the F-score is 0.8560, which is almost the same as when the sample size is 10, indicating that the inter-continental transfer of the model does not require too many samples.
(3): Transfer learning based on a small number of samples can achieve similar accuracy to models trained on a large number of samples, indicating that the method proposed in this paper maintains high accuracy in rice recognition while effectively reducing the sample requirement.
(4): The transferred rice mapping model exhibits strong spatiotemporal generalization ability and can be used directly for rice mapping in multiple years and larger areas, which greatly reduces the workload of large-scale rice mapping, indicating high practical value of the proposed method.
(5): The visualization results of the model features indicate that for the close-range transfer of the pre-trained model, it is only necessary to fine-tune the output layer’s parameters, while for inter-continental transfer, the pre-trained model can no longer extract effective features for rice recognition, so it is necessary to synchronize adjust the parameters of the feature extraction layer of the pre-trained model.
(6): The phenological differences of rice significantly affect the generalization ability of the transfer model. However, follow up research must further strengthen our knowledge on the influencing factors that affect the spatiotemporal generalization ability of the model to provide more powerful theoretical and practical support for the spatiotemporal transfer of the model.

Author Contributions

Conceptualization, L.Y., R.H. and J.H.; methodology, L.Y.; software, L.Y.; validation, L.Y., L.W. and J.D.; formal analysis, L.Y. and J.S.; investigation, L.Y.; resources, L.Y.; data curation, L.Y.; writing—original draft preparation, L.Y.; writing—review and editing, L.Y. and J.Z.; visualization, L.Y.; supervision, J.H.; project administration, L.Y. and J.H.; funding acquisition, L.Y. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42201412 and 42171314, and the Youth Innovation Science and Technology Support of Universities in Shandong Province, grant number 2021RW004.

Data Availability Statement

The code of the attTFBS model together with the dataset are available at: https://github.com/younglimpo/attTFBS, (accessed on 12 September 2022).

Acknowledgments

The authors would like to thank USDA-NASS for providing the CDL dataset. They also thank the anonymous reviewers for their constructive comments and advice.

Conflicts of Interest

The authors declare no conflict of interest.

References

FAO; IFAD; UNICEF; WFP; WHO. The State of Food Security and Nutrition in the World 2022; FAO: Rome, Italy, 2022; Volume 2022. [Google Scholar]
Yang, L.; Wang, L.; Huang, J.; Mansaray, L.R.; Mijiti, R. Monitoring policy-driven crop area adjustments in northeast China using Landsat-8 imagery. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101892. [Google Scholar] [CrossRef]
Zhang, C.; Harrison, P.A.; Pan, X.; Li, H.; Sargent, I.; Atkinson, P.M. Scale Sequence Joint Deep Learning (SS-JDL) for land use and land cover classification. Remote Sens. Environ. 2020, 237, 111593. [Google Scholar] [CrossRef]
Zhang, J.; Tian, Y.; Yan, L.; Wang, B.; Wang, L.; Xu, J.; Wu, K. Diagnosing the symptoms of sheath blight disease on rice stalk with an in-situ hyperspectral imaging technique. Biosyst. Eng. 2021, 209, 94–105. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X. Evolution of regional to global paddy rice mapping methods: A review. ISPRS-J. Photogramm. Remote Sens. 2016, 119, 214–227. [Google Scholar] [CrossRef]
Zhao, R.; Li, Y.; Ma, M. Mapping paddy rice with satellite remote sensing: A review. Sustainability 2021, 13, 503. [Google Scholar] [CrossRef]
Xiao, X.; Boles, S.; Frolking, S.; Salas, W.; Moore Iii, B.; Li, C.; He, L.; Zhao, R. Observation of flooding and rice transplanting of paddy rice fields at the site to landscape scales in China using VEGETATION sensor data. Int. J. Remote Sens. 2002, 23, 3009–3022. [Google Scholar] [CrossRef]
Xiao, X.; Boles, S.; Liu, J.; Zhuang, D.; Frolking, S.; Li, C.; Salas, W.; Moore, B., III. Mapping paddy rice agriculture in southern China using multi-temporal MODIS images. Remote Sens. Environ. 2005, 95, 480–492. [Google Scholar] [CrossRef]
Xiao, X.; Boles, S.; Frolking, S.; Li, C.; Babu, J.Y.; Salas, W.; Moore, B., III. Mapping paddy rice agriculture in South and Southeast Asia using multi-temporal MODIS images. Remote Sens. Environ. 2006, 100, 95–113. [Google Scholar] [CrossRef]
Peng, D.; Huete, A.R.; Huang, J.; Wang, F.; Sun, H. Detection and estimation of mixed paddy rice cropping patterns with MODIS data. Int. J. Appl. Earth Obs. Geoinf. 2011, 13, 13–23. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X.; Menarguez, M.A.; Zhang, G.; Qin, Y.; Thau, D.; Biradar, C.; Moore, B., III. Mapping paddy rice planting area in northeastern Asia with Landsat 8 images, phenology-based algorithm and Google Earth Engine. Remote Sens. Environ. 2016, 185, 142–154. [Google Scholar] [CrossRef]
You, N.; Dong, J.; Huang, J.; Du, G.; Zhang, G.; He, Y.; Yang, T.; Di, Y.; Xiao, X. The 10-m crop type maps in Northeast China during 2017–2019. Sci. Data. 2021, 8, 41. [Google Scholar] [CrossRef] [PubMed]
Boschetti, M.; Busetto, L.; Manfron, G.; Laborte, A.; Asilo, S.; Pazhanivelan, S.; Nelson, A. PhenoRice: A method for automatic extraction of spatio-temporal information on rice crops using satellite data time series. Remote Sens. Environ. 2017, 194, 347–365. [Google Scholar] [CrossRef]
Busetto, L.; Zwart, S.J.; Boschetti, M. Analysing spatial–temporal changes in rice cultivation practices in the Senegal River Valley using MODIS time-series and the PhenoRice algorithm. Int. J. Appl. Earth Obs. Geoinf. 2019, 75, 15–28. [Google Scholar] [CrossRef]
Luintel, N.; Ma, W.; Ma, Y.; Wang, B.; Xu, J.; Dawadi, B.; Mishra, B. Tracking the dynamics of paddy rice cultivation practice through MODIS time series and PhenoRice algorithm. Agric. For. Meteorol. 2021, 307, 108538. [Google Scholar] [CrossRef]
Liu, L.; Huang, J.; Xiong, Q.; Zhang, H.; Song, P.; Huang, Y.; Dou, Y.; Wang, X. Optimal MODIS data processing for accurate multi-year paddy rice area mapping in China. Gisci. Remote Sens. 2020, 57, 687–703. [Google Scholar] [CrossRef]
Wang, H.; Ghosh, A.; Linquist, B.A.; Hijmans, R.J. Satellite-based observations reveal effects of weather variation on rice phenology. Remote Sens. 2020, 12, 1522. [Google Scholar] [CrossRef]
Zhan, P.; Zhu, W.; Li, N. An automated rice mapping method based on flooding signals in synthetic aperture radar time series. Remote Sens. Environ. 2021, 252, 112112. [Google Scholar] [CrossRef]
Pan, B.; Zheng, Y.; Shen, R.; Ye, T.; Zhao, W.; Dong, J.; Ma, H.; Yuan, W. High resolution distribution dataset of double-season paddy rice in china. Remote Sens. 2021, 13, 4609. [Google Scholar] [CrossRef]
Yang, L.; Huang, R.; Huang, J.; Lin, T.; Wang, L.; Mijiti, R.; Wei, P.; Tang, C.; Shao, J.; Li, Q. Semantic Segmentation Based on Temporal Features: Learning of Temporal–Spatial Information from Time-Series SAR Images for Paddy Rice Mapping. IEEE Trans. Geosci. Remote Sens. 2021, 60, 4403216. [Google Scholar] [CrossRef]
Xu, J.; Zhu, Y.; Zhong, R.; Lin, Z.; Xu, J.; Jiang, H.; Huang, J.; Li, H.; Lin, T. DeepCropMapping: A multi-temporal deep learning approach with improved spatial generalizability for dynamic corn and soybean mapping. Remote Sens. Environ. 2020, 247, 111946. [Google Scholar] [CrossRef]
Wei, P.; Chai, D.; Lin, T.; Tang, C.; Du, M.; Huang, J. Large-scale rice mapping under different years based on time-series Sentinel-1 images using deep semantic segmentation model. Isprs-J. Photogramm. Remote Sens. 2021, 174, 198–214. [Google Scholar] [CrossRef]
Xu, J.; Yang, J.; Xiong, X.; Li, H.; Huang, J.; Ting, K.C.; Ying, Y.; Lin, T. Towards interpreting multi-temporal deep learning models in crop mapping. Remote Sens. Environ. 2021, 264, 112599. [Google Scholar] [CrossRef]
Turkoglu, M.O.; D’Aronco, S.; Perich, G.; Liebisch, F.; Streit, C.; Schindler, K.; Wegner, J.D. Crop mapping from image time series: Deep learning with multi-scale label hierarchies. Remote Sens. Environ. 2021, 264, 112603. [Google Scholar] [CrossRef]
Zhang, W.; Liu, H.; Wu, W.; Zhan, L.; Wei, J. Mapping rice paddy based on machine learning with Sentinel-2 multi-temporal data: Model comparison and transferability. Remote Sens. 2020, 12, 1620. [Google Scholar] [CrossRef]
Zhao, H.; Chen, Z.; Jiang, H.; Jing, W.; Sun, L.; Feng, M. Evaluation of three deep learning models for early crop classification using sentinel-1A imagery time series—A case study in Zhanjiang, China. Remote Sens. 2019, 11, 2673. [Google Scholar] [CrossRef]
Reedha, R.; Dericquebourg, E.; Canals, R.; Hafiane, A. Transformer Neural Network for Weed and Crop Classification of High Resolution UAV Images. Remote Sens. 2022, 14, 592. [Google Scholar] [CrossRef]
Du, Z.; Yang, J.; Ou, C.; Zhang, T. Smallholder crop area mapped with a semantic segmentation deep learning method. Remote Sens. 2019, 11, 888. [Google Scholar] [CrossRef]
Wei, P.; Huang, R.; Lin, T.; Huang, J. Rice Mapping in Training Sample Shortage Regions Using a Deep Semantic Segmentation Model Trained on Pseudo-Labels. Remote Sens. 2022, 14, 328. [Google Scholar] [CrossRef]
Ji, S.; Zhang, Z.; Zhang, C.; Wei, S.; Lu, M.; Duan, Y. Learning discriminative spatiotemporal features for precise crop classification from multi-temporal satellite images. Int. J. Remote Sens. 2020, 41, 3162–3174. [Google Scholar] [CrossRef]
Teimouri, N.; Dyrmann, M.; Jørgensen, R.N. A novel spatio-temporal FCN-LSTM network for recognizing various crop types using multi-temporal radar images. Remote Sens. 2019, 11, 990. [Google Scholar] [CrossRef]
Luo, C.; Meng, S.; Hu, X.; Wang, X.; Zhong, Y. Cropnet: Deep spatial-temporal-spectral feature learning network for crop classification from time-series multi-spectral images. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 4187–4190. [Google Scholar]
Li, H.; Hu, W.; Li, W.; Li, J.; Du, Q.; Plaza, A. A³ CLNN: Spatial, spectral and multiscale attention ConvLSTM neural network for multisource remote sensing data classification. IEEE Trans. Neural Netw. Learn. Syst. 2020, 33, 747–761. [Google Scholar] [CrossRef] [PubMed]
Ge, S.; Zhang, J.; Pan, Y.; Yang, Z.; Zhu, S. Transferable deep learning model based on the phenological matching principle for mapping crop extent. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102451. [Google Scholar] [CrossRef]
Shipp, M. (Ed.) Rice Crop Timeline for the Southern States of Arkansas, Louisiana, and Mississippi; NSF Center for Integrated Pest Management: Raleigh, NC, USA, 2005. [Google Scholar]
Luo, Y.; Zhang, Z.; Chen, Y.; Li, Z.; Tao, F. ChinaCropPhen1km: A high-resolution crop phenological dataset for three staple crops in China during 2000–2015 based on leaf area index (LAI) products. Earth Syst. Sci. Data. 2020, 12, 197–214. [Google Scholar] [CrossRef]
USDA—National Agricultural Statistics Service. Cropland Data Layer—National Download; USDA: Beltsville, MD, USA, 2022; Volume 2022.
Wei, P.; Chai, D.; Huang, R.; Peng, D.; Lin, T.; Sha, J.; Sun, W.; Huang, J. Rice mapping based on Sentinel-1 images using the coupling of prior knowledge and deep semantic segmentation network: A case study in Northeast China from 2019 to 2021. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102948. [Google Scholar] [CrossRef]
Romero, A.; Gatta, C.; Camps-Valls, G. Unsupervised deep feature extraction for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 2015, 54, 1349–1362. [Google Scholar] [CrossRef]
Hinton, G.E.; Roweis, S. Stochastic neighbor embedding. In Proceedings of the Advances in Neural Information Processing Systems 15 (NIPS 2002), Vancouver, BC, Canada, 9–14 December 2002; Volume 15. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
You, N.; Dong, J. Examining earliest identifiable timing of crops using all available Sentinel 1/2 imagery and Google Earth Engine. Isprs-J. Photogramm. Remote Sens. 2020, 161, 109–123. [Google Scholar] [CrossRef]
Hao, P.; Di, L.; Zhang, C.; Guo, L. Transfer Learning for Crop classification with Cropland Data Layer data (CDL) as training samples. Sci. Total Environ. 2020, 733, 138869. [Google Scholar] [CrossRef]
Wang, S.; Azzari, G.; Lobell, D.B. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens. Environ. 2019, 222, 303–317. [Google Scholar] [CrossRef]

Figure 1. Location of the study areas and samples. (a) Regions A, B, C, D, E and F are located in the Mississippi Alluvial Plain, Sacramento Valley, Shuangcheng District, Songnen Plain, Liaohe Plain, and Sanjiang Plain, respectively. Location of ground survey points and validation samples in the Songnen Plain (b), Liaohe Plain (c), and Sanjiang Plain (d). Photos of ground survey sites for rice (e), maize (f), soybean (g), and wetland (h). The arrows in (a) indicate the direction of model transfer. Other validation samples in (b–d) include non-rice samples such as maize, soybeans, wetlands, etc.

Figure 2. Number of available Sentinel-1 images between 1 April to 2 November in Northeast China in 2017 (a), 2018 (b), 2019 (c), and 2020 (d).

Figure 3. Overall technical process used in this study.

Figure 4. Architecture of the attTFBS model.

Figure 5. Results of the 10-fold cross validation of attTFBS model in the Mississippi Alluvial Plain in 2019. The line in each subgraph represents the average value of each accuracy metric in the 10-fold cross validation. The gray areas along the lines refer to

\pm 1 σ

from the average. (a) Loss. (b) Overall accuracy. (c) F-score. (d) Kappa. (e) Recall. (f) Precision.

Figure 5. Results of the 10-fold cross validation of attTFBS model in the Mississippi Alluvial Plain in 2019. The line in each subgraph represents the average value of each accuracy metric in the 10-fold cross validation. The gray areas along the lines refer to

\pm 1 σ

from the average. (a) Loss. (b) Overall accuracy. (c) F-score. (d) Kappa. (e) Recall. (f) Precision.

Figure 6. Accuracy comparison of the TFBS and attTFBS models. The error bars in the figure represent

\pm 1 σ

.

Figure 6. Accuracy comparison of the TFBS and attTFBS models. The error bars in the figure represent

\pm 1 σ

.

Figure 7. Comparison of Sentinel-1 SAR images in the source (Region A) and target regions (Region C). (a) Pseudo-color SAR image of Region A (R: 1 April; G: 25 April; B: 19 May). (b) Pseudo-color PC image of Region A (R: PC 1; G: PC 2; B: PC 3). (c) Pseudo-color SAR image of Region C (R: 01 April; G: 25 April; B: 19 May). (d) Pseudo-color PC image of Region C (R: PC 1; G: PC 2; B: PC 3).

Figure 8. The comparison of Sentinel-1 VV time-series curves of rice and corn in Shuangcheng (Region C) with those in the Mississippi Alluvial Plain (Region A).

Figure 9. Probability density functions (PDFs) of VV backscattering coefficients of rice pixels in all 9 time periods of SAR images in the Mississippi Alluvial Plain of the United States and the Shuangcheng District of China.

Figure 10. Boxplots of VV backscatter coefficients of rice pixels in all 9 periods of SAR images of Mississippi Alluvial Plain and Shuangcheng District. The top and bottom sides of the box are the lower and upper quartiles. The box covers the interquartile interval (interquartile range, IQR), where 50% of the data are found. The horizontal line that splits the box in two is the median. The top and bottom whiskers are two lines outside the box, which represent maximum and minimum of the data.

Figure 11. Accuracy of the fine-tuned model with different sample sizes.

Figure 12. Rice mapping results for the Shuangcheng District in 2017 (a), 2018 (b), 2019 (c), and 2020 (d) based on inter-continental transferred model.

Figure 13. Classification accuracies of the rice maps for Shuangcheng from 2018 to 2020 based on the fine-tuned model.

Figure 14. Time-series curves for the Sentinel-1 VV and VH of the main crops in the Shuangcheng area from 2017 to 2020. The blue and red curves represent the mean backscattering coefficients of rice and corn, respectively. The light-colored areas around the curves represent

\pm 1 σ

from the average.

Figure 14. Time-series curves for the Sentinel-1 VV and VH of the main crops in the Shuangcheng area from 2017 to 2020. The blue and red curves represent the mean backscattering coefficients of rice and corn, respectively. The light-colored areas around the curves represent

\pm 1 σ

from the average.

Figure 15. Comparison between reference paddy rice map (a) and paddy rice map produced by fine-tuned attTFBS model (b) in the Songnen Plain (Region D) in 2017.

Figure 16. Comparison between reference paddy rice map (a) and paddy rice map produced by fine-tuned attTFBS model (b) in the Sanjiang Plain (Region E) in 2017.

Figure 17. Comparison between reference paddy rice map (a) and paddy rice map produced by fine-tuned attTFBS model (b) in the Liaohe Plain (Region F) in 2017.

Figure 18. Accuracies of rice maps for the Songnen, Liaohe and Sanjiang Plains.

Figure 19. Sankey maps of paddy rice maps produced by fine-tuned attTFBS model in the Songnen Plain (a), Liaohe Plain (b), and Sanjiang Plain (c) in 2017. The width of the flow in the Sankey map is proportional to the number of pixels.

Figure 20. The comparison of Sentinel-1 VV time-series curve of rice in Shuangcheng (blue lines in Maps (a–c)) with those in the Sanjiang Plain (red lines in Map (a–c)). The red lines in Maps (a–c) are the VV curves of correctly classified rice pixels, non-rice pixels misclassified as rice pixels, and rice pixels incorrectly classified as non-rice, respectively. The blue and red arrows, respectively, represent the lowest VV value of rice pixels in the Shuangcheng region and the Sanjiang Plain, indicating the transplanting period of rice. The light-colored areas around the curves represent ±1σ from the average.

Figure 21. Feature visualization based on the t-SNE method. Each graph was calculated and drawn from 10,000 rice and 10,000 non-rice pixels randomly selected from the corresponding feature map. The farther the separation between rice pixels and other pixels in the t-SNE map, the more effective the features are for rice classification, and vice versa. (a) t-SNE result in Mississippi Alluvial Plain in 2019, (b) t-SNE result in Sacramento Valley Plain in 2019, (c) t-SNE result before transfer in Shuangcheng in 2017, (d) t-SNE result before transfer in Shuangcheng in 2018, (e) t-SNE result before transfer in Shuangcheng in 2019, (f) t-SNE result before transfer in Shuangcheng in 2020, (g) t-SNE result after transfer in Shuangcheng in 2017, (h) t-SNE result after transfer in Shuangcheng in 2018, (i) t-SNE result after transfer in Shuangcheng in 2019, (j) t-SNE result after transfer in Shuangcheng in 2020.

Figure 22. Feature visualization based on t-SNE method. The input data of the t-SNE is the time-series SAR data in Shuangcheng from 2017 to 2020. The farther the separation between rice pixels and other pixels in the t-SNE map, the more effective the features are for rice classification, and vice versa.

Figure 23. Accuracy comparison between the fine-tuned models and retained models. The fine-tuned models were based on 10 randomly selected samples in Shuangcheng in 2017. The retained models were based on 10-fold cross validation of the Shuangcheng 2017 dataset.

Table 1. Confusion matrix of rice maps in Shuangcheng in 2017. The rice map was produced by Model A without fine-tuning.

	Paddy Rice (Pixels)	Others (Pixels)	Precision
Classification	Paddy Rice (Pixels)	Others (Pixels)	Precision
Paddy rice (Pixels)	116,108	678,174	0.1462
Others (Pixels)	120,063	7,325,548	0.9839
Recall	0.4916	0.9153
Overall accuracy	90.31%	Kappa coefficient	0.1895
F-score	0.2253

Table 2. Confusion matrix of rice maps in Northeast China in 2017.

	Paddy Rice (Points)	Others (Points)	Precision
Classification	Paddy Rice (Points)	Others (Points)	Precision
Paddy rice (Points)	175	5	0.9722
Others (Points)	42	1278	0.9682
Recall	0.8065	0.9961
Overall accuracy	96.87%	Kappa coefficient	0.8637
F-score	0.8816

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, L.; Huang, R.; Zhang, J.; Huang, J.; Wang, L.; Dong, J.; Shao, J. Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability. Remote Sens. 2023, 15, 2443. https://doi.org/10.3390/rs15092443

AMA Style

Yang L, Huang R, Zhang J, Huang J, Wang L, Dong J, Shao J. Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability. Remote Sensing. 2023; 15(9):2443. https://doi.org/10.3390/rs15092443

Chicago/Turabian Style

Yang, Lingbo, Ran Huang, Jingcheng Zhang, Jingfeng Huang, Limin Wang, Jiancong Dong, and Jie Shao. 2023. "Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability" Remote Sensing 15, no. 9: 2443. https://doi.org/10.3390/rs15092443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Inter-Continental Transfer of Pre-Trained Deep Learning Rice Mapping Model and Its Generalization Ability

Abstract

1. Introduction

2. Study Area and Data

2.1. Study Area

2.2. Sentinel-1 Imagery

2.3. Ground Survey and Sampling

2.4. Crop Type Maps

3. Methods

3.1. Attention TFBS(AttTFBS)

3.1.1. Attention-Based LSTM Module

3.1.2. UNET Module

3.1.3. Output Module

3.2. Model Pre-Training

3.3. Inter-Continental Model Transfer Based on Fine-Tuning and Small Sample Sizes

3.4. Dimensionality Reduction and Visualization of Features

3.5. Accuracy Assessment

4. Results

4.1. Model Pre-Training

4.2. Rice Mapping in Northeast China Based on the Pre-Trained Model without Fine-Tuning

4.3. Influence of Sample Size on Model Fine-Tuning and Inter-Continental Transfer Accuracy

4.4. Rice Mapping of the Shuangcheng District Based on the Fine-Tuned Model and Its Temporal Generalization Ability

4.5. Spatial Generalization Ability of the Inter-Continental Transferred Rice Mapping Model

4.6. Feature Visualization Analysis before and after Model Transfer

4.7. Spatio Temporal Fusion Features Compared to Original Time Series SAR Data

4.8. Accuracy Comparisons between the Fine-Tuned Models and Retained Models

5. Discussion

5.1. Influencing Factors for Model Generalization Ability

5.2. Advantage and Limitation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI