A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping

Li, Congcong; Xian, George; Jin, Suming

doi:10.3390/rs16193717

Open AccessArticle

A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping

by

Congcong Li

^1,*,

George Xian

² and

Suming Jin

²

¹

ASRC Federal Data Solutions, Contractor to the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center, Work Performed under USGS Contract 140G0124D0001, Sioux Falls, SD 57198, USA

²

U.S. Geological Survey, Earth Resources and Observation Science (EROS) Center, 47914 252nd St., Sioux Falls, SD 57198, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(19), 3717; https://doi.org/10.3390/rs16193717 (registering DOI)

Submission received: 23 July 2024 / Revised: 25 September 2024 / Accepted: 29 September 2024 / Published: 6 October 2024

(This article belongs to the Special Issue Advances of Remote Sensing in Land Cover and Land Use Mapping)

Download

Browse Figures

Versions Notes

Abstract

:

An accurate and historical land cover monitoring dataset for Alaska could provide fundamental information for a range of studies, such as conservation habitats, biogeochemical cycles, and climate systems, in this distinctive region. This research addresses challenges associated with the extraction of training data for timely and accurate land cover classifications in Alaska over longer time periods (e.g., greater than 10 years). Specifically, we designed the “Region-Specific Model Adaptation (RSMA)” method for training data. The method integrates land cover information from the National Land Cover Database (NLCD), LANDFIRE’s Existing Vegetation Type (EVT), and the National Wetlands Inventory (NWI) and machine learning techniques to generate robust training samples based on the Anderson Level II classification legend. The assumption of the method is that spectral signatures vary across regions because of diverse land surface compositions; however, despite these variations, there are consistent, collective land cover characteristics that span the entire region. Building upon this assumption, this research utilized the classification power of deep learning algorithms and the generalization ability of RSMA to construct a model for the RSMA method. Additionally, we interpreted existing vegetation plot information for land cover labels as validation data to reduce inconsistency in the human interpretation. Our validation results indicate that the RSMA method improved the quality of the training data derived solely from the NLCD by approximately 30% for the overall accuracy. The validation assessment also demonstrates that the RSMA method can generate reliable training data on large scales in regions that lack sufficient reliable data.

Keywords:

fine-tuning; CNN; U-net; training data; validation data; Alaska

1. Introduction

Understanding land cover and land cover changes (LCCs) in Alaska is important for understanding the global biogeochemical cycle [1,2]. Unique land cover types like tundra and glaciers are found in Alaska [3,4]. Tundra and glaciers contribute to the environmental dynamics of the state and the world and provide vital services. Forests, tundra, wetlands, and permafrost store large amounts of carbon dioxide, functioning as carbon sinks and helping to control global temperatures [5]. Changes in land surface, like deforestation [6] and thawing permafrost [7], can affect hydrological patterns [8], alter wildlife habitats [9], release stored carbon [4], and exacerbate climate change [10] on different scales. As a result, precise land cover monitoring is needed to comprehend and support conservation initiatives, as well as climate change research, in Alaska.

Two land cover and vegetation disturbance products provide full coverage of Alaska, as follows: the National Land Cover Database (NLCD) [3] and LANDFIRE’s Existing Vegetation Type (EVT) [11]. Additionally, approximately 63% of the nation’s wetlands are located in Alaska [12]. Two notable wetland products were identified that provide additional wetland information. One is the National Wetlands Inventory (NWI) wetland extent [13], which only covers certain regions of Alaska and has not been updated for a long period of time [14]. The second is a comprehensive vegetated wetland map of Alaska [14] with a spatial resolution of 50 m. The map was produced using variables from the Phased-Array L-band SAR (PALSAR) from Advanced Land Orbiting Satellite (ALOS) data for 2007 and other ancillary data as classification features. Despite these resources, more timely and accurate land cover and change maps are needed for Alaska.

Training data, features, and classification algorithms are three important factors in land cover classification [15]. Over the past decades, classification algorithms evolved from conventional classification to deep learning algorithms, improving the classification accuracy [16,17]. Classification features, like texture [18], contextual [19,20], and time-series features [21], are used to further enhance the land cover classification process. Sufficient and accurate extraction of training data continues to be the bottleneck in the creation of timely and reliable land cover maps that facilitate progress in classification techniques and feature extraction [22]. Currently, extraction of accurate training data remains largely dependent on labor-intensive collection methods [23,24]. Human interpretation is often considered a reliable method but time-consuming and difficult to implement, especially in large and diverse geographic regions like Alaska. Nevertheless, identifying vegetated wetland without periodically occurring standing water is challenging when using human interpretation based on Landsat or other online map images. Moreover, the unique characteristics of the Alaskan landscape and the lack of expert knowledge and data (e.g., historical high-resolution images) [25] complicate efforts to collect accurate and reliable training data for the region. Additionally, research that focuses on the automatic extraction of training data is limited, and new, innovative approaches could help improve the classification process.

In previous studies, three training data methods were developed and used to collect training samples automatically. First, the method of collecting samples from existing reliable products and sources was adopted by LCMAP Collection 1.0 products [26,27]; however, the accuracy is limited by the quality of the sources used. Second is the method of generating consensus samples from various published products, which improves the quality of training data, or from different epochs of land cover products to obtain time-consistent samples. This method solves the problem of a lack of suitable land cover products (e.g., incompatible classification scheme and product mapping uncertainties) from which to extract training data for the mapping year. Li et al. [28] designed a series of rules to integrate several published products and collect consensus samples for land-cover mapping of Hawaii. However, the mapping accuracy is determined by the leveraged spatial coverage and quality of the training samples. Generating consensus samples provides training samples that partially cover the mapping region, which can reduce the representativeness of training samples for land cover types with high intraclass spectral variability or rare classes. Third, Li et al. [22] developed an automatic phenology learning (APL) method to generate reliable training data with full spatial coverage of the mapping region without human intervention. The APL method was developed based on the assumption that the time-series patterns for the same land cover type on a local scale are the same or similar. Thus, the APL method measures the time-series similarity between standard phenology patterns and unknown sample units using the time-weighted dynamic time warping (twDTW) method [29] and labels the unknown sample units automatically based on analysis and other land cover products. Nevertheless, the limited availability of Landsat observations due to the removal of clouds, ice, and snow and the short “leaf-on” season constrains the construction of reliable phenological curves in applying the APL method. Given these challenges, the APL method may not accurately represent the Alaska region.

Validating a training data method ensures the reliability of the method in real-world applications for end users. Traditionally, in large-scale land-cover mapping, the generation of validation samples has heavily relied on human interpretation to ensure their distributions and quantity without introducing subjectivity and potential biases [23,24,25,30]. The inconsistency among validation samples interpreted by different experts poses additional challenges in evaluating the quality of land cover maps. Therefore, we developed an alternative way of interpreting the Alaska Vegetation Plots Database (AKVEG) [31] to collect validation data that aims to minimize reliance on human interpretation and reduce subjectivity in the validation process.

In this study, we present a novel approach involving machine learning for training data to bridge the knowledge and information gaps across regions in Alaska. Our research objectives were the following:

(1): Design a framework that integrates machine learning with multiple scientific datasets to minimize errors and uncertainties in the training samples, which are often sourced from a single dataset source.
(2): Evaluate the efficacy of the training data method through experiments with two different sampling strategies and two different deep learning classifiers in the meta-learning framework.
(3): Convert the online vegetation plot information into distinct land cover types (including wetland) for validating the developed training data method.

2. Method

We designed a “Region-Specific Model Adaptation (RSMA)”-based training data method (Figure 1) to generate training data for land-cover mapping over a large geographical region. The assumption of the method is that despite variations in spectral signatures for the same vegetation type collected from different regions within a large area, there are still shared characteristics that remain consistent across the entire region. The method involves the integration of a meta-training process [32] and fine-tuning [33] and can be divided into the following several steps:

Meta-training process: The process constructed a basic deep-learning model using a set of consensus sample units obtained from large geographical regions within the same area. These consensus sample units were identified based on their agreement across two or more existing land cover products. Pixels with high confidence labels were used to build the machine learning model.

Fine-tuning process: The pretrained basic model was transferred to a specific target Landsat “Analysis Ready Data” (ARD) tile within the same geographical region. The model’s parameters were updated with a different set of consensus sample units in the target tile. This adapted the model to the local characteristics of land cover in that region.

Prediction process: The adapted model was utilized to predict the land cover probabilities of undefined sample units (any location without an agreed land cover label) within the same geographical region. This enables the model to generate predictions for areas where consensus training data are not available.

Optimal label selection: For every unlabeled training sample unit that was taken out of the common region, the approach identified the best land cover label based on the probabilities using the predetermined criteria.

2.1. Dataset

In the RSMA training data method, a common layer was developed to extract consensus sample units for building the model. The common layer was produced by integrating available land products in Alaska, including the NLCD for the year 2011, LF EVT for the year 2010, the PALSAR wetland product for the year 2007, and the NWI. Specifically for wetlands, regions classified as woody wetland or emergent herbaceous wetland in the NLCD, LF EVT, and PALSAR wetland products were preserved in the common layer. These regions were considered “reliable wetland areas”. In contrast, the areas that were classified as wetland by at least one of the four products (NLCD, LF EVT, PARSAR, and NWI) were identified and grouped as “potential-wetland (pWet) masks”. These areas showed a high probability of being wetlands and were considered “potential wetland regions” and used for further processing in the methodology.

For non-wetland types, which include water, forest, shrubland, herbaceous, and barren regions, common land cover types between the NLCD and LF EVT products based on the Anderson level I classification legend [34] were kept in the common layer. This common layer did not contain any potential wetland pixels from the pWet masks. The pixels in the common layer were labeled by the NLCD product according to the Anderson Level II classification legend [34]. Additionally, developed and planted/cultivated pixels from the NLCD remained in the common layer as training data for classification but were not involved in the RSMA training data method. The final “consensus” sample units were then collected from this common layer. As a result, the generated common layer and final training data should represent the year 2011.

The following procedure was designed to generate training data for a Landsat ARD tile. We randomly selected 20,000 sample units for each land cover category from the common layer as meta-training data from the surrounding 3 × 3 Landsat ARD Tiles (9 tiles in total, hereafter referred to as “the surrounding broad region”). Based on previous research, a 3 × 3 Landsat ARD region size is adequate for capturing the spectral variation in land cover categories while avoiding unnecessary complexity [26]. The number of sample units was determined to balance the various spectral signatures and computation times, after testing 10,000 and 40,000 sample units per class. We used these selected random sample units in the region to develop a comprehensive model for the meta-training processing. An equal-number sampling method can help reduce bias toward categories with larger sample sizes and improve the model’s generalization abilities for other regions while ensuring the representativeness of each land cover category in the training data.

We examined two sampling methods for constructing the fine-tuning dataset in the target Landsat ARD tile. First, we used a random sampling approach to gather a total of 100,000 sample units. Second, we used an equal-number random sampling method for each category [35] to collect an equivalent of 100,000 sample units in total. The sample sets were used to refine the model from the surrounding broad region to the target region in the fine-tuning procedure.

After generating the sample locations, we obtained all cloud-free records from the Landsat Collection 2 [36] data spanning 2009–2012, along with the 25 m resolution PALSAR dataset for each sample location. By spanning four years that encompass 2011, the aim is to increase the available cloud-free and snow-free data for constructing machine learning models on the study area. In addition to the original band values, we calculated several indices to enhance the characterization of vegetation, water, and barren areas. These indices are listed in Table 1.

Upon extracting the remotely sensed spectral features for the sample units, we observed variations in the numbers and intervals of the observations within the time series. To standardize the data, we generated semi-monthly data composites using all available clear Landsat data from 2009 to 2012. Taking the near-infrared (NIR) reflectance values as an example (Figure 2), we calculated the maximum, median, and minimum values for the first half and second half of each month. The new time series spanned from the second half of April to the first half of October after removal of the pixels contaminated by clouds, cloud shadows, and snow/ice. To address missing values, we filled them with the mean values of neighboring data points. The resulting reconstructed time series represents 2011 to match the consensus layer.

2.2. Basic Model of the RSMA-Based Training Data Method

In this research, the input features are multi-spectral temporal time series. When processing time series, long short-term memory (LSTM) is often deemed as a natural starting point, because it was initially proposed to analyze sequential data. Convolutional neural networks (CNNs) are network architectures of deep learning methods that are commonly used in spatial and spectral domains [46]. However, Zhong, Hu, and Zhou [16] suggested that a one-dimensional CNN model (1D-CNN) can deal with the temporal features and produce better results than LSTM. Therefore, we selected a 1D-CNN model as the basic model of the RSMA. For our CNN architecture, we used two convolutional blocks, each adapted to understand diverse patterns, especially in sequential data. Each convolutional block had a specific pattern, including the use of a 3 × 3 convolutional kernel for local feature capture, a max-pooling layer to further improve the network’s capacity to extract important information, rectified linear activation function (ReLU) for enhanced nonlinearity enhancement [47], and batch normalization for stability [48]. Following the convolutional blocks, a fully connected layer was adjusted to the information provided, and a 50% dropout was applied after flattening, ensuring that the model remained adaptable [49]. The process is named the “1D-CNN-RSMA” in this research.

Another widely used neural network for land cover classification is UNet [50], which was originally designed for semantic segmentation and adapted for image classification. The classifier followed the encoder–decoder architecture, which is known for its capacity to preserve spatial information while capturing both local and global features [51]. In this study, we designed a UNet Classifier that was especially made for one-dimensional (1D) time-series classification applications. We applied skip connections with the encoder–decoder architecture to capture temporal patterns and hierarchically extract features at different temporal resolutions; the encoder module used three successive 1D convolutional layers with a kernel size of 3. The first layer had 64 filters, the second layer had 128 filters, and the third layer had 256 filters. Convolutional operations were used to link each layer of the encoder network. Batch normalization and ReLU activation functions were then added to reduce overfitting. The feature maps were downsampled using max-pooling layers with a stride of 2 and a convolutional kernel size of 2 × 2. Furthermore, a bottleneck layer with 512 filters was incorporated into the architecture to reduce the dimensionality of the feature maps and to facilitate more efficient learning of deep representations. Moreover, skip connections that store feature maps established direct connections between corresponding encoder and decoder layers, which maintain critical temporal information across network layers by preserving critical temporal details. In the decoder module, up-convolutional layers were used to upsample the feature maps back to the original temporal resolution. Each up-convolutional layer was then followed by batch normalization and ReLU activation functions to enhance the learning capabilities of the network. The process was named “UNet-RSMA” in this research.

2.3. Implementation of the RSMA-Based Training Data Method

The meta-training process is the same as in the Model-Agnostic Meta-Learning (MAML) method [52], in which the meta-training dataset is partitioned into 80%

T_{s u p p o r t}

(train) and 20%

T_{q u e r y}

(test). As shown in Figure 3, each variable of the meta-training set was input as an individual channel of the model. The convolutional kernel moved from the beginning of the median values toward the minimum of the values, accommodating the following three distinct patterns: median, maximum, and minimum time series. PALSAR data were integrated as an individual channel in this process. The training procedure was divided into an inner loop and an outer loop. The neural network parameters,

\emptyset

, were iteratively optimized by gradient descent [53] after being randomly initialized. In the inner loop, the basic model (1D-CNN or UNet) was trained with a batch size of 512 sample units from the

T_{s u p p o r t}

data. The parameters,

\emptyset_{i}

, of the model were updated to each task via gradient descent on the

T_{s u p p o r t}

data. The outer loop then refined the parameters,

\emptyset_{i}

, based on the performance of model on

T_{q u e r y}

data with a batch size of 512 sample units.

During the fine-tuning stage, the pretrained basic model was designed to adjust the model’s parameters by evaluating the fine-tuning data, which helps reduce differences among datasets from various areas. The pretrained basic model aligned the model with the characteristics of the local region to improve the model’s performance in the target area.

For the prediction stage, the adapted model was applied to predict data, which comprise samples outside the common region. The model predicted the probabilities of each class for each sample location, serving as references in the selection of appropriate land cover labels.

Finally, the optimal land cover label from either the NLCD, LF EVT, or PALSAR wetland product was selected for each sample location in the order of the probabilities. We prioritized the labeling process based on confidence levels. Pixels were initially assigned labels corresponding to the Anderson Level II land cover types from the existing products based on whether the probability exceeded 0.1 to select the most detailed and specific land cover category first. Subsequently, labels were assigned based on the Anderson Level I classification legend in a similar manner, i.e., whether the probability exceeded 0.1 based on the order of probabilities. This means that if the probabilities indicate that the categories of the Anderson Level II classification legend are not confident enough, the method falls back on broader categories based on the Anderson Level I classification legend, ensuring all pixels are labeled with a reasonable category. Finally, the remaining unlabeled pixels are assigned the land cover type with the maximum probabilities, ensuring the entire dataset is labeled.

2.4. Method Assessment

The AKVEG is maintained by the Alaska Center for Conservation Science and includes over 14,000 vegetation plots from multiple projects conducted across Alaska. Most of the projects were conducted by the Alaska Center for Conservation Science, Bureau of Land Management, and the US Fish and Wildlife Service (USFWS). Macander et al. [54] also used the database to validate plant functional type cover.

For each plot, the survey date and the coverage of individual vegetation species were recorded. The survey date was considered as the reference date for the validation of land cover maps. Concerning the vegetation species, our initial step involved determining the life-form for each species by cross-referencing the accepted name with all names or accepted names in the checklist, which was supplied by the database provider.

Furthermore, we checked the wetland indicator status for each individual species by matching the accepted name with the scientific name on the national wetland plant list. This indicator status served as a measure of the likelihood of a particular species occurring in wetlands. Although facultative upland (FACU) and obligate upland (UPL) indicate high chances of being present in non-wetland areas, statuses such as obligatory wetland (OBL) and facultative wetland (FACW) might suggest a high possibility of occurring in wetlands.

Subsequently, we created a summary table for each plot that included the derived cover estimation for each life-form group and wetland status group. Additionally, we categorized life-form into deciduous forest, evergreen forest, mixed forest, graminoid, forb, dwarf scrub and shrub/scrub, and wetland status into confident wetland and confident non-wetland. Next, we calculated the total cover estimation for each life-form group and wetland status group. The created summary table is a repository of prior knowledge and provides percentage indicators for the plots. We established appropriate rules to correctly interpret each plot’s class label based on the derived vegetation coverage and land cover definitions. The interpretation rules were applied sequentially, from top to bottom. We first verified whether the plots met the percentage requirements for wetland, followed by forest, then shrub, and, finally, grass based on the definitions outlined in the NLCD legend [55]. All interpretations underwent validation with remote sensing images to ensure geometric alignment.

The study area was chosen by taking the plot survey methods and considering the plot numbers of each project in the database. The following two projects were selected: North Slope Land Cover, conducted by North Slope Science Initiative [56], and Vegetation Monitoring in Interior Refuges, conducted by USFWS [54]. These projects were chosen because they covered a variety of Alaskan landscapes, such as the Interior Alaska region, and were mostly made up of forest and shrub areas, the northern coastal region, and wetland, water, and grass areas. These projects were also conducted with ground survey techniques and a semi-quantitative visual estimation method [54]. Based on the locations of the plot data, two Landsat ARD tiles, H04V01 and H06V04, were selected for testing the method. These two tiles capture the diversity of Alaskan land cover and represent the geographic conditions typical of the Alaskan landscape.

3. Results

3.1. Validation Samples

In this research, the AKVEG ground survey plots were used to assist in the interpretations. For example, 10 distinct vegetation types were identified within survey plot KAN226 (66.139274°N, 151.279178°W). As shown in Table 2, the plant life-forms and wetland indicator statuses were ascertained by cross-referencing the vegetation names with existing databases. The total vegetation coverage within the plot was calculated to be 97.1%. Among the identified vegetation types, deciduous trees constituted the highest canopy, composing around 36% of the plot area and 37% of the total vegetation cover. Therefore, deciduous trees were the dominant life-form vegetation cover within the plot. A coverage of above 20% matched the NLCD system’s classification criteria for deciduous forests. Therefore, the plot was identified as a deciduous forest. In addition, the wetland indicator statuses indicated that area was mainly covered by facultative upland (FACU) vegetation, which composed 37% of the total plot, and facultative (FAC) vegetation, which constituted 58% of the total plot. Therefore, the plot was considered non-wetland. We interpreted 111 ground plots as validation sample units (Figure 4). The overall accuracy (OA), producer’s accuracy (PA), and user’s accuracy (UA) were calculated for evaluation.

3.2. Validation Results

Using the interpreted validation sample units, we completed an evaluation of the RSMA-based training data method. We compared the land cover labels from the NLCD product (Table 3) with labels produced by UNet-RSMA using a simple random sampling method in the fine-tuning phase (Table 4). Substantial increases in PA were observed in the woody wetlands (90) and emergent herbaceous wetlands (95) categories, which are predominant land cover types in the Alaskan region [12]. The increases in the PA accuracy for the forest classes were primarily found across the broader forest category based on the Anderson Level I classification legend, with only slight improvements for the Level II classification legend. The distinction between the barren and vegetation classes also increased. Although there were no increases in PA for the dwarf shrub (51) and sedge/herbaceous (72) categories, the commission errors (100 − UA) were greatly reduced, indicating a decrease in confusion with other categories. The accuracies of shrub/scrub (52) also showed improvements in the method. The OA increased from 29.7% with the NLCD to 60.4% with the new RSMA-based training data method based on the Anderson Level II classification legend.

In the RSMA method, the pretrained global model must be adjusted to the target region to facilitate adaption of the model to local landscapes. For the UNet-RSMA, we calculated the PA, UA, and OA for the pretrained model after the meta-training process, as shown in Table 5. Comparing the results with those in Table 4, the increases in both PA and UA were found for various land cover categories including dwarf scrub (51), shrub/scrub (52), sedge/herbaceous (72), and emergent herbaceous wetlands (95), following the model transfer (i.e., fine-tuning) process. The OA increased by 18.1% through the model’s fine-tuning. More accurate predictions were produced as a result of this adaptation process by incorporating the diversity and sharing characteristics of land cover across the large geographical region.

We explored the sampling method used for fine-tuning the pretrained UNet model derived by the meta-training process for the UNet-RSMA. Our results revealed that land cover labels obtained through the simple random sampling method reported higher OAs than those acquired through the equal-number random sampling method for each category (refer to Table 6 and Table 4). This observation indicates that the simple random sampling method accurately represented the distribution and characteristics of land cover classes at the given region. When comparing accuracies, the equal-number random sampling method for each category produced higher PA only for the woody wetlands (90) category, which was a rare class in both test regions. Using simple random sampling might arise from its ability to focus the model updates on major classes, thereby increasing the overall accuracy. Additionally, when comparing the results with the labels derived by the pretrained model by the meta-training process in the UNet-RSMA (Table 5). Dwarf scrub (51), sedge/herbaceous (72), Woody Wetlands (90), and emergent herbaceous wetlands (95) show higher accuracy in both the PA and UA using equal-number random sampling method for each category in the fine-tuning process of the UNet-RSMA (Table 6). And the OA increased by 6.3% compared with Table 5. These results demonstrate that fine-tuning the model can improve accuracies of model predictions.

We conducted a comparative analysis for two basic models, the 1D-CNN and UNet classifier, in the meta-training process to assess their performances as a basic model in the RSMA based training data method. The overall accuracies achieved by the UNet-RSMA were 60.4% for the Anderson Level II classes and 69.4% for the Anderson Level I classes. On the other hand, the 1D-CNN- RSMA based training data method achieved an overall accuracy of 57.7% for the Anderson Level II classes (Table 7) and 73.0% for the Anderson Level I classes. Further analysis revealed the differences in the model performances based on the complexity of the land cover legend. For the land cover classifications with a simple legend with distinguishable differences, such as the Level I classification legend, simpler architecture models were found to be better. Conversely, in scenarios involving complex legends with intricate relationships in time-series data, more complex models were superior to others.

4. Discussion

In large-scale land cover classification, training data obtained from full spatial coverage of a mapping region are beneficial for capturing the representative spectral characters to exhaust the spectral information for each land cover type. In this research, we developed an RSMA-based training data method (UNet-RSMA and 1D-CNN-RSMA) for the training data generation. The method combined the meta-training process of meta learning and fine-tuning and involves transferring a pretrained model from a larger region centered in the same geographic area and adapting it to perform on a smaller region within the same area. The method overcomes the challenge of consensus data scarcity in certain regions and produced accurate and consistent training samples at any given location that lacks consensus data. By determining optimal land cover labels for each generated training sample based on the probabilities associated with the model’s predictions and existing products, we confirmed reliable training samples.

However, the RSMA-based training data method has certain limitations. The accuracy of the newly labeled training sample was determined by the existing land cover products because the final label was selected from the products based on the probabilities. Additionally, the accuracies of the predicted probabilities used as indicators were influenced by various factors, including the distribution, quantity, and quality of the consensus samples and model parameters in the model. Despite these limitations, the developed RSMA-based training data method generated training samples with improved land cover labels compared to the single land cover product for various applications. For instance, the method could be utilized to produce reliable and accurate training data for the USGS-led annual land cover product, which produces annual land cover maps with Anderson Level II classification legend for the contiguous United States and Alaska since the 1980s.

The framework proposed in this paper can be applied to regions outside the continental United States (CONUS), Alaska, and Hawaii, where datasets like the NLCD, LANDFIRE’s EVT, and NWI are unavailable. In these regions, global land cover datasets can be used as substitutes. There are several global land cover products [57,58,59], forest [60], and water [61,62] products available at different scales and covering different time periods that provide a foundational basis for this approach.

For the validation data collection process, interpreting the AKVEG plots addresses another challenge of inconsistency among the validation samples and lack of knowledge for interpretation. One issue is the discrepancy between the spatial coverage of the database records and the remote sensing pixels. For example, the plots vary in size from 5 × 5 to 10 × 10 and 20 × 20 m, often smaller than the resolution of a single pixel or covering partial part of a pixel in a Landsat image. The vegetation coverage of the remaining pixel was often estimated when evaluating the vegetation coverage inside a pixel to identify land cover type. Furthermore, the surrounding pixels of the plot could exhibit spatial heterogeneity, The misalignment between the geolocation of the plot and the image pixel can impact the accurate determination of land cover types based solely on vegetation plot data. Therefore, the interpretation process requires cross-referencing with remote sensing images or high-resolution imagery.

5. Conclusions

In this research, we developed a novel method, the RSMA-based training data method, to automatically produce reliable training data at any location for mapping Alaska’s land cover. The method integrated the meta-training and fine-tuning techniques to refine the land cover labels compared with the labels from the NLCD products. The RSMA-based training data method demonstrates the potential for using the produced training data to generate high-quality land cover maps especially for regions like Alaska. The RSMA-based training data method is tailored to address the scarcity of training data with limited prior knowledge. Furthermore, our research examined the effects of basic machine learning models (1D-CNN vs. UNet) on training the global model in the meta-training process and sampling strategies (equal-number strategy vs. simple random sampling) on updating model parameters in the specific region in the fine-tuning process. Based on the results, we have provided insights and guidelines for using the method in different scenarios. Our research results indicate that for less complex classes such as Anderson Level I classification legend, 1D-CNN as the basic model choice in the RMSA method derived a higher OA, whereas UNet as a basic model derived higher OA for more complex classes such as Anderson Level II classification legend. Additionally, the OA was highest using simple random sampling for transferring the pretrained model.

In addition to generating automatic training data, our study addressed challenges related to validation sample interpretation, especially for wetlands. Beyond using manually interpreted samples for the validation, our research interpreted existing vegetation plots from other vegetation survey projects as validation samples, which also could be used as prior landscape knowledge to guide the selection of random samples for validation purposes, thereby mitigating potential biases.

Overall, this research advances the field of land-cover mapping in Alaska by developing an automatic approach to generate training data. In addition to overcoming the lack of prior knowledge, the method provided consistent training data across regions.

Author Contributions

Conceptualization, C.L.; methodology, C.L.; validation, C.L.; formal analysis, C.L.; writing—original draft preparation, C.L.; writing—review and editing, C.L., G.X. and S.J.; supervision, G.X. and S.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the U.S. Geological Survey National Land Imaging (NLI) program under USGS Contract 140G0124D0001.

Data Availability Statement

Landsat-5 and Landsat-7 images courtesy of the U.S. Geological Survey, National Land Cover Dataset (NLCD; Jon, 2019; [63]), and the existing vegetation type LANDFIRE datasets (LANDFIRE, 2010; [64]).

Acknowledgments

We thank T. Adamson and F. Dwomoh for editing the manuscript. We also appreciate all anonymous reviewers and editors for their constructive comments. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. This journal article has been peer reviewed and approved for publication consistent with USGS Fundamental Science Practices (https://pubs.usgs.gov/circ/1367/, accessed on 20 September 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

McGuire, A.D.; Anderson, L.G.; Christensen, T.R.; Dallimore, S.; Guo, L.; Hayes, D.; Heimann, M.; Lorenson, T.; Macdonald, R.; Roulet, N. Sensitivity of the carbon cycle in the Arctic to climate change. Ecol. Monogr. 2009, 79, 523–555. [Google Scholar] [CrossRef]
McGuire, A.D.; Apps, M.; Chapin, F.S.; Dargaville, R.; Flannigan, M.D.; Kasischke, E.S.; Kicklighter, D.; Kimball, J.; Kurz, W.; McRae, D.J.; et al. Land Cover Disturbances and Feedbacks to the Climate System in Canada and Alaska. In Land Change Science: Observing, Monitoring and Understanding Trajectories of Change on the Earth’s Surface; Gutman, G., Janetos, A.C., Justice, C.O., Moran, E.F., Mustard, J.F., Rindfuss, R.R., Skole, D., Turner, B.L., Cochrane, M.A., Eds.; Springer: Dordrecht, The Netherlands, 2004; pp. 139–161. [Google Scholar]
Jin, S.; Yang, L.; Zhu, Z.; Homer, C. A land cover change detection and classification protocol for updating Alaska NLCD 2001 to 2011. Remote Sens. Environ. 2017, 195, 44–55. [Google Scholar] [CrossRef]
Vynne, C.; Dovichin, E.; Fresco, N.; Dawson, N.; Joshi, A.; Law, B.E.; Lertzman, K.; Rupp, S.; Schmiegelow, F.; Trammell, E.J. The Importance of Alaska for Climate Stabilization, Resilience, and Biodiversity Conservation. Front. For. Glob. Change 2021, 4, 701277. [Google Scholar] [CrossRef]
Fisher, J.; Sikka, M.; Oechel, W.; Huntzinger, D.N.; Melton, J.R.; Koven, C.D.; Ahlström, A.; Arain, M.A.; Baker, I.; Chen, J.M.; et al. Carbon cycle uncertainty in the Alaskan Arctic. Biogeosciences 2014, 11, 4271–4288. [Google Scholar] [CrossRef]
Wang, J.A.; Sulla-Menashe, D.; Woodcock, C.E.; Sonnentag, O.; Keeling, R.F.; Friedl, M.A. Extensive land cover change across Arctic–Boreal Northwestern North America from disturbance and climate forcing. Glob. Change Biol. 2020, 26, 807–822. [Google Scholar] [CrossRef]
Douglas, T.; Hiemstra, C.; Anderson, J.; Barbato, R.; Bjella, K.; Deeb, E.; Gelvin, A.; Nelsen, P.E.; Newman, S.D.; Saari, S.P.; et al. Recent degradation of interior Alaska permafrost mapped with ground surveys, geophysics, deep drilling, and repeat airborne lidar. Cryosphere 2021, 15, 3555–3575. [Google Scholar] [CrossRef]
Crumley, R.; Hill, D.; Beamer, J.; Holzenthal, E. Hydrologic Diversity in Glacier Bay Alaska: Spatial Patterns and Temporal Change. In The Cryosphere Discussions; European Geosciences Union: Munich, Germany, 2019; pp. 1–31. [Google Scholar]
Marcot, B.G.; Jorgenson, M.T.; Lawler, J.P.; Handel, C.M.; DeGange, A.R. Projected changes in wildlife habitats in Arctic natural areas of northwest Alaska. Clim. Change 2015, 130, 145–154. [Google Scholar] [CrossRef]
Pielke Sr, R.A.; Pitman, A.; Niyogi, D.; Mahmood, R.; McAlpine, C.; Hossain, F.; Goldewijk, K.K.; Nair, U.; Betts, R.; Fall, S.; et al. Land use/land cover changes and climate: Modeling analysis and observational evidence. WIREs Clim. Change 2011, 2, 828–850. [Google Scholar] [CrossRef]
Nelson, K.J.; Long, D.G.; Connot, J.A. LANDFIRE 2010—Updates to the National Dataset to Support Improved Fire and Natural Resource Management; Report 2016-1010; USGS: Reston, VA, USA, 2016. [Google Scholar]
Hall, J.V.; Frayer, W.E.; Wilen, B.O. Status of Alaska Wetlands; US Fish & Wildlife Service: Washington, DC, USA, 1994. [Google Scholar]
U.S. Department of Interior (USDI); Fish and Wildlife Service (FWS). National Wetlands Inventory Website; U.S. Department of the Interior, Fish and Wildlife Service: Washington, DC, USA, 2018. Available online: http://www.fws.gov/wetlands/ (accessed on 2 November 2020).
Clewley, D.; Whitcomb, J.; Moghaddam, M.; McDonald, K.; Chapman, B.; Bunting, P. Evaluation of ALOS PALSAR Data for High-Resolution Mapping of Vegetated Wetlands in Alaska. Remote Sens. 2015, 7, 7272–7297. [Google Scholar] [CrossRef]
Li, C.C.; Wang, J.; Wang, L.; Hu, L.Y.; Pong, P. Comparison of classification algorithms and training sample sizes in urban land classification with Landsat Thematic Mapper imagery. Remote Sens. 2014, 6, 964–983. [Google Scholar] [CrossRef]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
Zhang, G.; Roslan, S.N.A.b.; Wang, C.; Quan, L. Research on land cover classification of multi-source remote sensing data based on improved U-net network. Sci. Rep. 2023, 13, 16275. [Google Scholar] [CrossRef] [PubMed]
Shakya, A.K.; Ramola, A.; Vidyarthi, A. Landcover Pattern Recognization through Texture Classification Using LANDSAT Data of Dallas; Springer: Singapore, 2020; pp. 283–293. [Google Scholar]
Zhao, W.; Peng, S.; Chen, J.; Peng, R. Contextual-Aware Land Cover Classification With U-Shaped Object Graph Neural Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6510705. [Google Scholar] [CrossRef]
Ghimire, B.; Rogan, J.; Miller, J. Contextual land-cover classification: Incorporating spatial dependence in land-cover classification models using random forests and the Getis statistic. Remote Sens. Lett. 2010, 1, 45–54. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Continuous change detection and classification of land cover using all available Landsat data. Remote Sens. Environ. 2014, 144, 152–171. [Google Scholar] [CrossRef]
Li, C.; Xian, G.; Zhou, Q.; Pengra, B. A novel automatic phenology learning (APL) method of training sample selection using multiple datasets for time-series land cover mapping. Remote Sens. Environ. 2021, 266, 112670. [Google Scholar] [CrossRef]
Li, C.; Gong, P.; Wang, J.; Yuan, C.; Hu, T.; Wang, Q.; Yu, L.; Clinton, N.; Li, M.; Guo, J.; et al. An all-season sample database for improving land-cover mapping of Africa with two classification schemes. Int. J. Remote Sens. 2016, 37, 4623–4647. [Google Scholar] [CrossRef]
Li, C.; Gong, P.; Wang, J.; Zhu, Z.; Biging, G.S.; Yuan, C.; Hu, T.; Zhang, H.; Wang, Q.; Li, X.; et al. The first all-season sample set for mapping global land cover with Landsat-8 data. Sci. Bull. 2017, 62, 508–515. [Google Scholar] [CrossRef]
Zhao, Y.; Gong, P.; Yu, L.; Hu, L.; Li, X.; Li, C.; Zhang, H.; Zheng, Y.; Wang, J.; Zhao, Y.; et al. Towards a common validation sample set for global land-cover mapping. Int. J. Remote Sens. 2014, 35, 4795–4814. [Google Scholar] [CrossRef]
Zhou, Q.; Tollerud, H.; Barber, C.; Smith, K.; Zelenak, D. Training data selection for annual land cover classification for the Land Change Monitoring, Assessment, and Projection (LCMAP) initiative. Remote Sens. 2020, 12, 699. [Google Scholar] [CrossRef]
Brown, J.F.; Tollerud, H.J.; Barber, C.P.; Zhou, Q.; Dwyer, J.L.; Vogelmann, J.E.; Loveland, T.R.; Woodcock, C.E.; Stehman, S.V.; Zhu, Z.; et al. Lessons learned implementing an operational continuous United States national land change monitoring capability: The Land Change Monitoring, Assessment, and Projection (LCMAP) approach. Remote Sens. Environ. 2020, 238, 111356. [Google Scholar] [CrossRef]
Li, C.; Xian, G.; Wellington, D.; Smith, K.; Horton, J.; Zhou, Q. Development of the LCMAP annual land cover product across Hawaiʻi. Int. J. Appl. Earth Obs. Geoinf. 2022, 113, 103015. [Google Scholar] [CrossRef]
Maus, V.; Camara, G.; Cartaxo, R.; Sanchez, A.; Ramos, F.M.; Queiroz, G.R. A Time-Weighted Dynamic Time Warping method for land-use and land-cover mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 3729–3739. [Google Scholar] [CrossRef]
Pengra, B.; Stehman, S.V.; Horton, J.A.; Wellington, D.F. Land Change Monitoring, Assessment, and Projection (LCMAP) Version 1.0 Annual Land Cover and Land Cover Change Validation Tables; U.S. Geological Survey Data Release: Reston, VA, USA, 2020. [Google Scholar]
Nawrocki, T.W. Alaska Vegetation Plots Database (AKVEG). Git Repository. 2021. Available online: https://github.com/accs-uaa/vegetation-plots-database (accessed on 20 April 2022).
Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-Learning in Neural Networks: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5149–5169. [Google Scholar] [CrossRef] [PubMed]
Alshalali, T.; Josyula, D. Fine-Tuning of Pre-Trained Deep Learning Models with Extreme Learning Machine. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 12–14 December 2018; pp. 469–473. [Google Scholar]
Anderson, J.R.; Hardy, E.E.; Roach, J.T.; Witmer, R.E. A Land Use and Land Cover Classification System for Use with Remote Sensor Data, Report 964; US Government Printing Office: Washington, DC, USA, 1976. [Google Scholar]
Cochran, W.G. Sampling Techniques, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 1977. [Google Scholar]
Crawford, C.J.; Roy, D.P.; Arab, S.; Barnes, C.; Vermote, E.; Hulley, G.; Gerace, A.; Choate, M.; Engebretson, C.; Micijevic, E.; et al. The 50-year Landsat collection 2 archive. Sci. Remote Sens. 2023, 8, 100103. [Google Scholar] [CrossRef]
Tucker, C.J.; Sellers, P.J. Satellite remote sensing of primary production. Int. J. Remote Sens. 1986, 7, 1395–1416. [Google Scholar] [CrossRef]
Zha, Y.; Gao, J.; Ni, S. Use of normalized difference built-up index in automatically mapping urban areas from TM imagery. Int. J. Remote Sens. 2003, 24, 583–594. [Google Scholar] [CrossRef]
McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Xu, H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. Int. J. Remote Sens. 2006, 27, 3025–3033. [Google Scholar] [CrossRef]
Feyisa, G.L.; Meilby, H.; Fensholt, R.; Proud, S. Automated Water Extraction Index: A new technique for surface water mapping using Landsat imagery. Remote Sens. Environ. 2014, 140, 23–35. [Google Scholar] [CrossRef]
Crist, E.P. A TM Tasseled Cap equivalent transformation for reflectance factor data. Remote Sens. Environ. 1985, 17, 301–306. [Google Scholar] [CrossRef]
Huang, C.; Wylie, B.; Yang, L.; Homer, C.; Zylstra, G. Derivation of a Tasseled Cap Transformation Based On Landsat 7 At-Satellite Reflectance. Int. J. Remote Sens. 2002, 23, 1741–1748. [Google Scholar] [CrossRef]
Miller, J.D.; Yool, S.R. Mapping forest post-fire canopy consumption in several overstory types using multi-temporal Landsat TM and ETM data. Remote Sens. Environ. 2002, 82, 481–496. [Google Scholar] [CrossRef]
Diek, S.; Fornallaz, F.; Schaepman, M.E.; Rogier, D.J. Barest Pixel Composite for Agricultural Areas Using Landsat Time Series. Remote Sens. 2017, 9, 1245. [Google Scholar] [CrossRef]
Tegegne, A.M. Applications of Convolutional Neural Network for Classification of Land Cover and Groundwater Potentiality Zones. J. Eng. 2022, 2022, 6372089. [Google Scholar] [CrossRef]
Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Into Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017, 70, 1126–1135. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Macander, M.J.; Nelson, P.R.; Nawrocki, T.W.; Frost, G.V.; Orndahl, K.M.; Palm, E.C.; Wells, A.F.; Goetz, S.J. Time-series maps reveal widespread change in plant functional type cover across Arctic and boreal Alaska and Yukon. Environ. Res. Lett. 2022, 17, 054042. [Google Scholar] [CrossRef]
Selkowitz, D.J.; Stehman, S.V. Thematic accuracy of the National Land Cover Database (NLCD) 2001 land cover for Alaska. Remote Sens. Environ. 2011, 115, 1401–1407. [Google Scholar] [CrossRef]
Ducks Unlimited (DU). North Slope Science Initiative Landcover Mapping Summary Report; Ducks Unlimited: Rancho Cordova, CA, USA, 2013. [Google Scholar]
Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S.; et al. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef]
Zanaga, D.; Van De Kerchove, R.; Daems, D.; De Keersmaecker, W.; Brockmann, C.; Kirches, G.; Wevers, J.; Cartus, O.; Santoro, M.; Fritz, S.; et al. ESA WorldCover 10 m 2020 v100; Zenodo: Genève, Switzerland, 2021. [Google Scholar] [CrossRef]
Friedl, M.A.; Woodcock, C.E.; Olofsson, P.; Zhu, Z.; Loveland, T.; Stanimirova, R.; Arevalo, P.; Bullock, E.; Hu, K.T.; Zhang, Y.; et al. Medium Spatial Resolution Mapping of Global Land Cover and Land Cover Change Across Multiple Decades From Landsat. Front. Remote Sens. 2022, 3, 894571. [Google Scholar] [CrossRef]
Hansen, M.C.; Potapov, P.V.; Moore, R.; Hancher, M.; Turubanova, S.A.; Tyukavina, A.; Thau, D.; Stehman, S.V.; Goetz, S.J.; Loveland, T.R.; et al. High-resolution global maps of 21st-century forest cover change. Science 2013, 342, 850–853. [Google Scholar] [CrossRef] [PubMed]
Pickens, A.H.; Hansen, M.C.; Hancher, M.; Stehman, S.V.; Tyukavina, A.; Potapov, P.; Marroquin, B.; Sherani, Z. Mapping and sampling to characterize global inland water dynamics from 1999 to 2018 with full Landsat time-series. Remote Sens. Environ. 2020, 243, 111792. [Google Scholar] [CrossRef]
Pekel, J.F.; Cottam, A.; Gorelick, N.; Belward, A.S. High-resolution mapping of global surface water and its long-term changes. Nature 2016, 540, 418–422. [Google Scholar] [CrossRef]
Dewitz, J. National Land Cover Database (NLCD) 2016 Products (ver. 3.0, November 2023) [National Land Cover Database (NLCD) 2011 Land Cover—Alaska]; U.S. Geological Survey: Reston, VA, USA, 2019. [Google Scholar] [CrossRef]
LANDFIRE. Existing Vegetation Type Layer, LANDFIRE 1.2.0, U.S. Department of the Interior, Geological Survey, and U.S. Department of Agriculture. 2010. Available online: http://www.landfire/viewer (accessed on 1 April 2021).

Figure 1. The framework of the “Region-Specific Model Adaptation (RSMA)”-based training data method. The land cover images are NLCD land cover maps. The different colored lines represent various input features.

Figure 2. Temporal trends (A) and semi-monthly NIR composition (B) of the NIR reflectance values (2009–2012). The reflectance values were scaled by a factor of 10,000.

Figure 3. Diagram of the convolution along the time dimension.

Figure 4. Interpreted validation sample units from plots in the North Slope Land Cover and USFWS. The background images are NLCD land cover maps.

Table 1. List of indicators used in the RSMA-based training data method.

Index	Formula	Citation
Normalized Difference Vegetation Index (NDVI)	$N D V I = \frac{N I R - R e d}{N I R + R e d}$	[37]
Normalized Difference Built-Up Index (NDBI)	$N D B I = \frac{SWIR - NIR}{SWIR + NIR}$	[38]
Normalized Difference Water Index (NDWI)	$N D W I = \frac{G r e e n - N I R}{G r e e n + N I R}$	[39]
Modified Normalized Difference Water Index (MNDWI)	$M N D W I = \frac{G r e e n - S W I R}{G r e e n + S W I R}$	[40]
Automated Water Extraction Index (AWEI)	$AWEIsh = Blue + 2.5 \times Green - 1.5 \times (NIR + SWIR 1) - 0.25 \times SWIR 2$ $AWEInsh = 4 \times (Green - SWIR 1) - (0.25 \times NIR + 2.75 \times SWIR 1)$	[41]
Wetness	For Landsat 5, $0.0315 \times B l u e + 0.2021 \times R e d + 0.3102 \times G r e e n + 0.1594 \times N I R + 0.6806 \times S W I R 1 - 0.6109 \times S W I R 2$ For Landsat 7, $0.2626 \times B l u e + 0.2141 \times R e d + 0.0926 \times G r e e n + 0.0656 \times N I R - 0.7629 \times S W I R 1 - 0.5388 \times S W I R 2$	[42,43]
Brightness	For Landsat 5, $0.2043 \times B l u e + 0.4158 \times R e d + 0.5524 \times G r e e n + 0.5741 \times N I R + 0.3124 \times S W I R 1 + 0.2303 \times S W I R 2$ For Landsat 7, $0.3561 \times B l u e + 0.3972 \times R e d + 0.3904 \times G r e e n + 0.6966 \times N I R + 0.2286 \times S W I R 1 + 0.1596 \times S W I R 2$	[42,43]
Normalized Burn Ratio (NBR)	$N B R = \frac{N I R - S W I R 2}{N I R + S W I R 2}$	[3,44]
Bare Soil Index (BSI)	$BSI = ((Red + SWIR) - (NIR + Blue)) / ((Red + SWIR) + (NIR + Blue))$	[45]

Table 2. An example of the vegetation’s interpretation into land cover types.

ID	Project	Site Code	Date	Accepted Name	Cover (%)	Plant Life-Form	Wetland Indicator Status
34027	USFWS Interior	KAN226	6/27/2013	Betula neoalaskana	36	deciduous tree	FACU
34024	USFWS Interior	KAN226	6/27/2013	Alnus viridis	23	shrub	FAC
34029	USFWS Interior	KAN226	6/27/2013	Empetrum nigrum	12	dwarf shrub	FAC
34031	USFWS Interior	KAN226	6/27/2013	Vaccinium uliginosum	12	dwarf shrub, shrub	FAC
34026	USFWS Interior	KAN226	6/27/2013	Betula nana	6	shrub	FAC
34030	USFWS Interior	KAN226	6/27/2013	Rhododendron tomentosum	4	dwarf shrub, shrub	FACW
34032	USFWS Interior	KAN226	6/27/2013	Vaccinium vitis-idaea	3	dwarf shrub	FAC
34025	USFWS Interior	KAN226	6/27/2013	Arctous alpina	0.5	dwarf shrub	FACU
34028	USFWS Interior	KAN226	6/27/2013	Calamagrostis canadensis	0.5	graminoid	FAC
34033	USFWS Interior	KAN226	6/27/2013	Carex bigelowii	0.1	graminoid	FAC

Table 3. The agreement between the interpreted validation sample units (interpreted plot data from AKVEG) and the NLCD. The OA was 29.7%. (31: Barren; 41: Deciduous Forest; 42: Evergreen Forest; 43: Mixed Forest; 51: Dwarf Scrub; 52: Shrub/Scrub; 71: Grassland/Herbaceous; 72: Sedge/Herbaceous; 90: Woody Wetlands; 95: Emergent Herbaceous Wetlands). When the ground total is 0, the corresponding accuracy was not calculated, indicated by a dash (-). Each grey block highlights the group of classes under the same Anderson Level I legend. The bolded values along the diagonal are the correctly labeled sample units. The same applies to the following tables.

		NLCD 2011 Land Cover
		31	41	42	43	51	52	71	72	90	95	Grand Total	PA (%)
Ground Truth	31	3				1			1			5	60.0
	41		2				4	2	1			9	22.2
	42			2	2	1						5	40.0
	43		1		3	1	1					6	50.0
	51	1				8	1		3			13	61.5
	52					5	13		1			19	68.4
	71							0				0	-
	72	1				2			1			4	25.0
	90			2		2	12		7	0		23	0.0
	95		1	1			2		22		1	27	3.7
	Grand Total	5	4	5	5	20	33	2	36	-	1	111	29.7%
	UA (%)	60.0	50.0	40.0	60.0	40.0	39.4	0.0	2.8	-	100.0		29.7%

Table 4. Agreement between the interpreted validation sample units (interpreted plot data from AKVEG) and land cover labels produced by UNet-RSMA with simple random sampling in the fine-tuning process. The OA was 60.4%. (31: Barren; 41: Deciduous Forest; 42: Evergreen Forest; 43: Mixed Forest; 51: Dwarf Scrub; 52: Shrub/Scrub; 72: Sedge/Herbaceous; 90: Woody Wetlands; 95: Emergent Herbaceous Wetlands). The grand total for 71: Grassland/Herbaceous in both validation sample units and predicted land cover labels was zero, so it was removed from the matrix. The same applies to the following tables.

		UNet-RSMA Based Training Data Method
		31	41	42	43	51	52	72	90	95	Grand Total	PA (%)
Ground Truth	31	4				1					5	80.0
	41		3	1			3			2	9	33.3
	42			5							5	100.0
	43		3		3						6	50.0
	51	1				8	2	2			13	61.5
	52			2			17				19	89.5
	72	1				2		1			4	25.0
	90		1	1	1	1	6	2	8	3	23	34.8
	95		3	2			1	2	1	18	27	66.7
	Grand Total	6	10	11	4	12	29	7	9	23	111	60.4%
	UA (%)	66.7	30.0	45.5	75.0	66.7	58.6	14.3	88.9	78.3		60.4%

Table 5. Agreement between the interpreted validation sample units (interpreted plots data from AKVEG) and land cover labels produced by the pretrained model from the meta-training process in UNet-RSMA. The OA was 42.3%. (31: Barren; 41: Deciduous Forest; 42: Evergreen Forest; 43: Mixed Forest; 51: Dwarf Scrub; 52: Shrub/Scrub; 72: Sedge/Herbaceous; 90: Woody Wetlands; 95: Emergent Herbaceous Wetlands).

		Pretrained Model by Meta-Training Process in UNet-RSMA
		31	41	42	43	51	52	72	90	95	Grand Total	PA (%)
Ground Truth	31	4								1	5	80.0
	41		3		1		2		1	2	9	33.3
	42		1	3	1						5	60.0
	43		3		3						6	50.0
	51	1				5	1	3	1	2	13	38.5
	52	1	2	2			7	2	4	1	19	36.8
	72	2				2		0			4	0.0
	90		1	1	1	2	5		9	4	23	39.1
	95		2		1	2	1	7	1	13	27	48.1
	Grand Total	8	12	6	7	11	16	12	16	23	111	42.3
	UA (%)	50.0	25.0	50.0	42.9	45.5	43.8	0.0	56.3	56.5		42.3

Table 6. The agreement between the interpreted validation sample units (interpreted plots data from AKVEG) and land cover labels produced by the equal-number random sampling method for each category in UNet-RSMA based training data method. The OA was 48.6% (31: Barren; 41: Deciduous Forest; 42: Evergreen Forest; 43: Mixed Forest; 51: Dwarf Scrub; 52: Shrub/Scrub; 71: Grassland/Herbaceous; 72: Sedge/Herbaceous; 90: Woody Wetlands; 95: Emergent Herbaceous Wetlands).

		Transferring Model Using the Equal-Number Random Sampling Method for Each Category
		31	41	42	43	51	52	71	72	90	95	Grand Total	PA (%)
Ground Truth	31	4									1	5	80.0
	41		3		1		2	1			2	9	33.3
	42		1	3	1							5	60.0
	43		3		3							6	50.0
	51	1				6	1		2	1	2	13	46.2
	52	1	2	2			7		2	4	1	19	36.8
	71							0				0	-
	72	2				1			1			4	25.0
	90		1	1	1	2	5		1	10	2	23	43.5
	95		3			2	1		4		17	27	63.0
	Grand Total	8	13	6	6	11	16	1	10	15	25	111
	UA (%)	50.0	23.1	50.0	50.0	54.5	43.8	0.0	10.0	66.7	68.0		48.6%

Table 7. Agreement between the interpreted validation sample units (interpreted plots data from AKVEG) and land cover labels produced by 1D-CNN-RSMA-based training data method. The OA was 57.7% (31: Barren; 41: Deciduous Forest; 42: Evergreen Forest; 43: Mixed Forest; 51: Dwarf Scrub; 52: Shrub/Scrub; 71: Grassland/Herbaceous; 72: Sedge/Herbaceous; 90: Woody Wetlands; 95: Emergent Herbaceous Wetlands).

		1D-CNN-RSMA
		31	41	42	43	51	52	72	90	95	Grand Total	PA (%)
Ground Truth	31	3				1		1			5	60.0
	41		4				5				9	44.4
	42			3	2						5	60.0
	43		1		4		1				6	66.7
	51					9	2			2	13	69.2
	52			3		4	12				19	63.2
	72	1				2	1	0			4	0.0
	90		1	3	1		4		6	8	23	26.1
	95		2	1			1			23	27	85.2
	Grand Total	4	8	10	7	16	26	1	6	33	111
	UA (%)	75.0	50.0	30.0	57.1	56.3	46.2	0.0	100.0	69.7		57.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, C.; Xian, G.; Jin, S. A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping. Remote Sens. 2024, 16, 3717. https://doi.org/10.3390/rs16193717

AMA Style

Li C, Xian G, Jin S. A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping. Remote Sensing. 2024; 16(19):3717. https://doi.org/10.3390/rs16193717

Chicago/Turabian Style

Li, Congcong, George Xian, and Suming Jin. 2024. "A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping" Remote Sensing 16, no. 19: 3717. https://doi.org/10.3390/rs16193717

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A “Region-Specific Model Adaptation (RSMA)”-Based Training Data Method for Large-Scale Land Cover Mapping

Abstract

1. Introduction

2. Method

2.1. Dataset

2.2. Basic Model of the RSMA-Based Training Data Method

2.3. Implementation of the RSMA-Based Training Data Method

2.4. Method Assessment

3. Results

3.1. Validation Samples

3.2. Validation Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI