Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin

Zhong, Bo; Yang, Aixia; Jue, Kunsheng; Wu, Junjun

doi:10.3390/rs13081596

Open AccessArticle

Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin

¹

State Key Laboratory of Remote Sensing Science, Aerospace information Research Institute, Chinese Academy of Sciences, Beijing 100101, China

²

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(8), 1596; https://doi.org/10.3390/rs13081596

Submission received: 19 March 2021 / Revised: 16 April 2021 / Accepted: 17 April 2021 / Published: 20 April 2021

(This article belongs to the Special Issue Advanced Machine Learning for Time Series Remote Sensing Data Analysis)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Long time series of land cover changes (LCCs) are critical in the analysis of long-term climate, environmental, and ecological changes. Although several moderate to fine resolution global land cover datasets have been publicly released and they show strong consistency at the global scale, they have large deviations at the regional scale; furthermore, high-quality land cover datasets from before 2000 are not available and the classification consistency among different datasets is not very good. Thus, long time series of land cover datasets with high quality and consistency are in great demand but they are still unavailable, even at the regional scale. The Landsat series of satellite imagery composed of eight successive satellites can be traced back to 1972 and it is, therefore, possible to produce a long time series land cover dataset. In addition, the newly available satellite data have the capability to construct time series satellite images and a time series analysis method such as LCMM can be employed for making high-quality land cover datasets. Therefore, by taking the advantages of the two categories of satellite data, we proposed a new time series land cover mapping method based on machine learning and it, thereafter, is applied to Heihe River Basin (HRB) for verification purposes. Firstly, the high-quality land cover datasets at HRB from 2011–2015, which were retrieved using the LCMM method, are used for quickly and accurately making training samples. Secondly, a strategy for transferring the training samples after 2011 to earlier years is established. Thirdly, the random forest model is employed to train the selected yearly samples and a land cover map for every year is subsequently made. Finally, comprehensive analysis and validation are carried out for evaluation. In this study, a long time series land cover dataset including 1986, 1990, 1995, 2000, 2005, 2010, 2011, 2012, 2013, 2014, and 2015 is finally made and an average precision of about 90% is achieved. It is the longest time series land cover map with 30 m resolution at HRB and the dataset has good time continuity and stability.

Keywords:

land cover; time series remote sensing; times series transfer learning; samples migration; random forest; machine learning for remote sensing; Heihe River Basin

1. Introduction

Land cover changes (LCCs) are the result of human activities and natural evolution and LCC has great impacts on the climate system and ecology [1]; it is subsequently an important factor in studying environmental changes and climate change. Therefore, a better understanding of LCC is necessary to provide a reference for evaluating the vulnerability of carbon and water cycles [2,3] and other ecosystem processes related to global or regional change [4]. Especially, long time series of land cover maps have been available from long-term land cover mapping based on the sequence of remote sensing detection, which has provided more information on land change in the analysis of long-term climate, environmental, and ecological changes.

Due to its capability of global coverage within a short period, remote sensing has become widely used for global land cover mapping, as it can capture the changes of land cover quickly. Until now, several moderate to fine resolution global land cover datasets have been publicly released [5,6,7,8,9,10,11,12] and the spatial resolutions of these products are between 1 km and 30 m. They show strong consistency at the global scale, but they have large deviations at the regional scale. The main problems include: (1) most of the land cover datasets have a relatively low resolution (500 m and lower) and their accuracies are usually lower than 75% [8,13]; among the released datasets, GlobeLand30 [14] and FROM-GLC30 [9] have a spatial resolution of 30 m, and their overall accuracies are 80% and 72% respectively; (2) the classification accuracy of these released datasets at the regional scale is much lower than the claimed accuracy, especially at heterogeneous areas, which is difficult to meet the modeling requirements, and (3) all the datasets are after 2000 and do not have long time series and therefore cannot support the analysis of long-term climate, environmental, and ecological changes.

Recently, satellite constellations and satellite-loaded sensors with large swath, such as Sentinel-2, FORMOSAT-2, Chinese Huanjing-1, and GF-1, have been launched and subsequently have the capability to scan the whole globe every few days at high resolution; hence time series of remote sensing images are increasingly available. Some regional land cover maps with an accuracy of over 90% have been made [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18] by employing time series analysis on satellite images. Furthermore, the authors of [19] developed a Land Cover mapping method by using multi-classifiers and multisource remotely sensed imagery (LCMM) by using HJ-1/CCD time series of images to make a finer land cover map at Heihe River Basin (HRB) from 2011–2015 [19]. However, satellite image time series were not available before 2011. Therefore, long time series of land cover datasets are in great demand but still unavailable, even at the regional scale.

In order to make long time series of land cover datasets of high spatial resolution to support the analysis of long-term climate, environmental, and ecological changes, the Landsat series of satellite imagery is the only choice. It is composed of eight satellites and can be traced back to 1972. It provides data support for continuous detection of the global surface and is of great significance; furthermore, it is freely available [20] and the emergence of new computing tools, such as Google Earth Engine (GEE), provides powerful cloud computing capabilities [21]. However, the 16-day revisiting period makes it very difficult to construct time series data, so the methods based on time series analysis, such as LCMM, cannot be employed to make a high-quality land cover map, especially at early stages.

In recent years, machine learning (ML) algorithms, such as random forest tree (RFT), support vector machine (SVM), and neural network (NN), in remote sensing land cover classification have been greatly improved in both efficiency and accuracy with the increase of computation capability and the technological development of artificial intelligence. However, the training samples, usually based on manual collecting or interpretation from satellite images, stopped them from being used widely, especially on large-scale applications. Furthermore, the timeliness of these methods did not allow them to be used for emergency cases [22,23]. Subsequently, an automatic sampling strategy for retrieving high-quality samples has become more and more important, especially for large-scale remote sensing land cover mapping [8,24]. Because samples from historical land cover maps and classification cases contain the prior information and knowledge that is helpful to the classification of the current satellite images, transfer learning is employed to solve the problems [25,26,27,28] and it has achieved good results for applications at the local scale. However, the accuracy is degraded without considering the samples incorrectly introduced by historical samples. In addition, these methods have not fully used the historical land cover maps with multi-year coverage [22,29].

Therefore, we propose a new time series land cover mapping method based on machine learning and it is applied to HRB for verification purposes. Firstly, the high-quality land cover datasets at HRB from 2011–2015, which were retrieved using the LCMM method and can be downloaded at http://westdc.westgis.ac.cn/data/6bbf9a3f-e7d8-4255-9ecb-131e1543316d, accessed on 19 April 2021, are used for quickly and accurately making training samples. Secondly, a strategy for transferring the training samples after 2011 to earlier years is established. Thirdly, the random forest model is employed to train the selected yearly samples and the land cover maps for earlier years are subsequently made. Finally, comprehensive analysis and validation are carried out for evaluation.

2. Materials and Methods

2.1. Study Area

The study area, HRB, is located at the northeast of the Tibetan plateau. Its geographical coordinates are between 97.1°E–102.0°E and 37.7°N–42.7°N, and it covers an area of approximately 143,000 km². HRB’s elevation ranges from 2000–5000 m and it covers highly heterogeneous landscapes including cold and arid landscapes at the upper stream, the artificial oasis-riparian ecosystem-wetland-desert compound in the middle stream, and the natural oasis and desert at the lower stream. Therefore, the complicated landscapes would be a good test site for verifying the proposed method and this was the first reason behind selecting HRB as the study area (Figure 1).

HRB is a typical inland river basin in China, and it has served as an experimental site for integrated watershed studies, land surface measurements, and hydrological observations for a very long time [30]. Many major research plans have been carried out here, for example, the “Integrated research on the eco-hydrological process of the Heihe River Basin” launched by the National Natural Science Foundation of China [31]. Under the support of these projects, many ground experiments and much scientific research has been carried out to collect a large number of ground measurements, remote sensing data, and land surface parameters including high-quality land cover datasets. Therefore, HRB has been well investigated and it is subsequently conducive to this study.

2.2. Data and Preprocessing

In this study, four categories of data were used and they include TM and OLI from Landsat series of satellites, SRTMGL1_003 from ASTER, land cover datasets of 2011–2015 from the LCMM method [19] using HJ-1/CCD data, and high-resolution images from Google Earth for sample verification and validation. The details of these data are listed in Table 1. Since the Landsat/TM images are not enough for land cover mapping before 1986, the earliest year in this study is 1986. In total, 3257 scenes of TM/OLI images including 2084 Landsat5/TM and 1173 Landsat8/OLI surface reflectance images were used from GEE in this study. The number of scenes for each year is listed in Table 2. Among them, the number in 1986 is the least, with only 109 scenes, and the year 2014 has the most scenes, 439. In addition, the number of Landsat8/OLI is more than that of Landsat5/TM. Due to the failure of the Landsat7 satellite, ETM+ was not used [32]. Figure 2 shows the seasonal reflectance image composites for 1986 and 2014 and the composites of 2014 look better than those of 1986 because of more available data (see statistics on Table 2). Especially, the autumn composite of 1986 has a lot of noise induced by clouds, so the final land cover may degrade. The details will be discussed in the discussion section.

In order to make a usable surface reflectance composite, the following preprocessing procedures were carried out.

(1): The CFMASK algorithm [33] was used to generate the quality assessment (QA) band and cloud contamination was subsequently removed.
(2): For Landsat 5 images, a negative buffering method [34] was employed to remove bad pixels at edges.
(3): The percentile reducer in GEE was used to mosaic the Landsat images within one season and 25% was used as a threshold for better noise removing.
(4): Aiming at minimizing the missing data, a reconstruction algorithm [35] striving to ensure the authenticity and integrity of the data was employed to reconstruct the missing portion of the data.

2.3. The Land Cover Classification System

Based on the analysis of existing classification systems, such as IGBP [7] and GLC2000 [6], the requirements from land process modeling at HRB, the characteristics of the HRB, and the capability of the remotely sensed data used in this study, a specific classification system was constructed, which is listed in Table 3. The criteria for building the classification system are as follows:

(1): The classification system was a combination of the IGBP [7,8], GLC2000 [6], and GlobCover [5] systems;
(2): Based on many ground campaigns at HRB, prior knowledge related to land cover from these campaigns was used to determine the classification system.
(3): In order to make a consistent land cover dataset at HRB, the capability of the Landsat series of data directly determined the classification system.

2.4. Methodology

The objective of this study is to make the longest time series land cover dataset at HRB (1986–2015) with high accuracy, high spatial resolution, and excellent consistency to support the analysis of long-term climate, environmental, and ecological changes from the perspective of the regional scale or the river basin scale. The Landsat series of satellite imagery composed of eight successive satellites can be traced back to 1972 and it is, therefore, the only choice to work on it. The statistics of data availability in Table 2 and the surface reflectance composites of 1986 and 2014 in Figure 2 further prove that Landsat series satellite data are feasible for this work. However, several factors stopped us from making high-quality and excellent-consistency land cover datasets by only using Landsat series satellite data and they are concluded as follows:

(1): Many comprehensive classification methods, such as GlobeLand30 [14] using Landsat series satellite data have been developed but the ones with high accuracy usually require manual intervention; thus, more than 10 years’ land cover maps with areas over 140,000 km² do not allow a lot of manual intervention.
(2): Although methods based on time series analysis like LCMM have the capability for making high-quality land cover datasets [19], a 16-day revisiting period of Landsat series satellite data does not support constructing time series data at HRB.
(3): Although machine learning methods, such as FROM-GLC30 [9] have a lot of advantages and have been employed to map land cover using remote sensing imagery, they are usually limited by sampling amount and representativity; the requirement for consistency on data of these methods cannot be met by Landsat series satellite data (see Figure 2), while they are applied to make a long time series land cover dataset.

In order to solve the above problems, we proposed a new time series land cover mapping method based on machine learning by taking the advantages of multiple satellite data and the well explored HRB is taken as the experimental site for land cover mapping and thereafter validation. The procedure of the proposed method is illustrated in Figure 3 and the major idea is described as follows:

(1): The new satellite data with high frequency, such as HJ-1/CCD and Sentinel2/MSI, were firstly used to construct monthly time series surface reflectance and they were subsequently used in the LCMM method [19] for time series analysis to make the high-quality land cover datasets at HRB from 2011–2015. In this study, the land cover dataset at HRB from 2011–2015 was made, and publicly released at the Western Data Center of China (http://westdc.westgis.ac.cn/, accessed on 19 April 2021) [19] and this dataset was well tested by many applications under the support of the HiWATER project [31], so it was directly used in this study. This takes advantage of the newly available satellite data.
(2): Instead of making the land cover year by year for 10 years, the machine learning method was chosen as the classifier to lower the labor and time costs. While employing machine learning, the training sample is always the key to its performance and it usually requires a lot of labor for manual sampling; subsequently, an automatic sampling strategy was established in this study to retrieve enough accurate training samples from the high-quality land cover dataset at step 1 by comprehensively using the land cover maps from all five years. The details of the strategy are presented in Section 2.4.2.
(3): Due to the inconsistency of seasonal surface reflectance composites in Figure 2, the samples from step 2 could only satisfy the requirement for 2011–2015 and could not be directly transferred for application at earlier years; therefore, a strategy for transferring the training samples to earlier years was established. The details of the strategy are presented in Section 2.4.3.
(4): Due to its advantages (detailed in Section 2.4.4.), the random forest model was employed to train the selected yearly samples and the land cover map for every year was subsequently made. A long time series land cover dataset including 1986, 1990, 1995, 2000, 2005, 2010, 2011, 2013, 2014, and 2015 was made.
(5): Finally, comprehensive analysis and validation were carried out for evaluation.

2.4.1. Land Cover Dataset at HRB from LCMM Method

LCMM is a comprehensive land cover mapping method using multiple classifiers and multisource remotely sensed imagery. Multisource remotely sensed data have advantages in spatial resolution (VHSR images from Google Earth), temporal resolution (monthly HJ-1/CCD images), and spectrum (Landsat/TM). In the meantime, multiple classifiers including time series analysis, SVM, thresholding, object-based method, and decision trees were all employed for different classification purposes. All the classifiers and data were successfully integrated by LCMM, and a land cover dataset at HRB with a high accuracy of over 90% was made in a simple and efficient way, which has been largely downloaded (463 downloads on 12 January 2020) from the datacenter website and widely used for different applications and scientific research, such as vegetation parameter retrieval [36], eco-hydrological modeling [37], land process modeling [38] and so on. However, only land cover maps in 2011–2015 were made because of data availability. In addition, this land cover dataset is monthly, so it needs to be aggregated to a yearly map based on the classification system in Table 3 before using and the mapping rules for different classes can be found in Table 3. In addition, the accuracy can be improved further after 2015, because the Sentinel-2/MSI data from ESA and GF1/6-WFV data from China can compose a higher frequency of time series images with better quality.

2.4.2. Automatically Sampling Strategy from 2011–2015 Land Cover Dataset

Now that high-quality land cover maps at HRB in 2011–2015 are available, the training samples for machine learning methods can be automatically retrieved through random stratified sampling based on the land cover maps. However, the land cover maps are not 100% accurate, so the errors in land cover maps will lead to the wrong samples and they will subsequently degrade the quality of final land cover maps. Therefore, an automatically sampling mechanism needs to be established to further improve the accuracy of samples. The following sampling rules will further guarantee the sampling accuracy from the sample amount, sample refining, and sample distribution; Figure 4 illustrates the sampling procedures.

(1): Based on previous research, land cover classification for large areas requires a larger number of samples and training samples are better when proportional to their areas [39,40]. The authors of [39] indicated that the training sample size should account for approximately 0.25% of the study area. HRB’s area exceeds 140,000 square kilometers and the samples were, therefore, close to 400,000 pixels. However, the barren land at HRB has an area more than 80% of the total area and most of the samples for barren land can be greatly reduced without degrading the training; therefore, 50,000 samples were selected for training purposes while considering the calculation efficiency and classification accuracy.
(2): The varied pixels in five years have relatively low confidence and only pixels with a consistent category in all five years are therefore used for sampling. This rule further confines the land cover map for sampling to guarantee the accuracy of the final samples.
(3): The confined land cover map was objectized and the number of samples was distributed to each object (land cover feature). For each object, 50% of the samples were randomly distributed to the central part of the object and the others were randomly distributed to the part close to its boundary. The central part of an object has typical characteristics such as the corresponding land cover and the boundary is usually easy to be confused with the neighboring land cover; therefore, this rule will improve the classification of boundaries.

Based on the above rules, the samples were retrieved and they were subsequently put into machine learning models (the random forest model was chosen in this study and the reasons will be discussed in Section 2.4.4.) for training.

2.4.3. Sample Transferring Strategy for Earlier Years

The trained random forest model looks perfect to be used for land cover mapping directly in earlier years (before 2011); however, the seasonal surface reflectance composites for each year were not as consistent as expected (see Figure 2) and the accuracy by directly using the trained model was subsequently degraded. Figure 5 gives an example of land cover classification based on machine learning transferring. The model used in 2010, 2005, and 2000 was the same one that was trained by the samples collected in Section 2.4.2. It is obvious that the results are not temporally consistent, especially for barren land and forests.

In addition, the large number of samples did not allow manual checking. Therefore, a sample transferring strategy for earlier years needs to be established to lower the labor and time costs while guaranteeing the sampling accuracy. The major procedures for the sample transferring are as follows:

(1): The trained random forest model was applied to each year’s seasonal surface reflectance composites to produce a land cover map as a reference.
(2): The training samples in Section 2.4.2. were compared with the land cover reference map from step 1 and the unmatched samples were removed from the sample collection.
(3): The surface reflectance composites were automatically checked to remove those samples whose surface reflectance was abnormal, such as noise, cloud contamination, and cloud shadow.
(4): Finally, if the number of samples was lower than the requirement, some new samples would be manually added to correct the amount of the samples. Although some manual work is required, only a few samples need to be added, which greatly reduced the labor and time costs while compared to all-labor sampling.

Based on the above procedures, the samples for every year before 2011 were collected and were subsequently put into the random forest model for training. Finally, the land cover maps before 2011 were made.

2.4.4. Machine Learning Model Selection

After training samples are collected, many classifiers can be employed to implement land cover classification, such as maximum likelihood classifier (MLC) [41,42], Support Vector Machine (SVM) [43] classifier, and Random Forest (RF) classifier [44,45,46]. The advantages of MLC include simplicity, computational efficiency, and robustness, but its accuracy is usually limited by its simplicity. The SVM classifier has been widely used for its outstanding performance in remote sensing classification, especially for instances with fewer samples [47]. Much research has reported that the RF classifier performs well [48]; in addition, the RF classifier is robust and accurate while dealing with high-dimensional data, such as multi-spectral and multi-temporal remote sensing images [44,49,50], which is more suitable for the multi-feature dimensions constructed in this paper. Therefore, the RF classifier was employed as the training model and the parameters were set as follows: the number of trees was 100 because it proved it can achieve a better result when considering the classification accuracy and efficiency; the number of variables per split was set to 0 (default), which means the square root of the number of variables; the min leaf population is 1 (default) and the bag fraction was 0.5 (default).

3. Results and Validation

3.1. Classification Results

Based on the procedure in Section 2., the land cover maps from 1986–2015 were produced and they are shown in Figure 6.

3.2. Validation

The procedure for validating the classification results is as follows:

(1): Randomly sample from the classification map by land cover types. The sample number for each class was determined by the area ratio of the class. The sampling details are shown in Table 4.
(2): Locate the samples precisely on the remotely sensed images, including seasonal composites of Landsat-OLI/TM and VHSR images from Google Earth.
(3): Manually interpret the land cover types of the samples by carefully inspecting the remotely sensed images and VHSR images from Google Earth.
(4): Make a confusion matrix for each year. Table 4 gives an example of the confusion matrix of 2014. The overall accuracy of classification in 2014 is 93.68%, and the kappa coefficient is 0.92. Because of its easy confusion with forests and grassland, shrubland had the lowest accuracy, whose producer’s accuracy (PA) was only 77.78%. Except that, the user’s accuracy (UA) of grassland and bare land is a little bit lower than 90%, the PAs and UAs of the other classes are all over 90%.

The OAs and Kappa coefficients for each year are listed in Table 5 and it shows that almost all of OAs are over or close to 90%. An average accuracy of 90.32% and an average kappa coefficient of 0.88 were achieved. The OA in 1986 is 85.2% and the result is degraded by the lack of data. Therefore, the proposed method is effective by combining high-precision samples with historical data to produce a land cover dataset.

In order to further verify the land cover dataset in this study, the land cover maps at HRB from both GlobeLand30 and the proposed method were compared at close views in 2000 and 2010 at the first place, which is shown in Figure 7. Based on the visual comparison, the advantages of the proposed method can be concluded as follows:

(1): The classification accuracy of the proposed method is much higher than that of the GlobeLand30. For example, the small villages on A and C are accurately classified in our map and they are completely missed in the GlobeLand30 one.
(2): The temporal consistency is better than that of the GlobeLand30. B1 (2000) and D1 (2010) from the GlobeLand30 were very different; B1 had a large area of shrub and water, but D1 did not. In contrast, B3 (2000) and D3 (2010) from the proposed method are very consistent.
(3): Waterbodies can be better classified with seasonal composites in our method. Only limited data used in GlobeLand30 caused the waterbodies to not be discerned.

In order to better illustrate the temporal consistency of our land cover maps at longer time series, the time series of major land covers, such as cropland, built-ups, and waterbodies for every five years from 1986–2015 were plotted. Figure 8 shows two close-view examples of time series land covers from 1986–2015. The cropland and built-ups have been continuously increasing. Furthermore, the total areas at HRB for every five years since 1986 for the major land covers, such as built-ups and urban, snow and ice, forest, and cropland, were calculated and plotted in Figure 9. The increasing cropland and built-ups are consistent with the increasing human activities. The decreasing of snow and ice is strongly related to the global warming trend. The time series analysis of land covers further supports the effectiveness of the proposed method for long time series of land cover mapping.

4. Conclusions and Discussions

In this paper, a time series land cover mapping method is proposed to produce long time series of a land cover dataset with high accuracy and consistency, especially for earlier years with fewer and lower quality data. The proposed method takes the advantages of time series Landsat images and the high-quality land cover datasets from the LCMM method; the high-quality land cover datasets from LCMM are used for quickly locating the accurate training samples and the RF classifier is employed to train the collected samples of each year to finally get the land cover maps. Based on the comprehensive validation, an average classification accuracy of 90.32% and an average kappa coefficient of 0.88 are achieved, which is suited for most of the applications and research at HRB. Compared to some other land cover datasets, such as LCMM, GlobeLand30 [11], and FROM-GLC30 [9], the proposed method is more applicable for long time series analysis while land process modeling because of the following advantages:

(1): It has the longest time series land cover dataset at HRB with 30 m spatial resolution, which starts from 1986.
(2): It has an average classification accuracy of over 90% and has high temporal consistency, making it the best land cover map at HRB among the available ones.
(3): The automatic strategy for collecting training samples from high-quality land cover maps and transferring samples to earlier years makes it efficient and accurate. Therefore, the proposed method provides a solution for making high-quality land cover maps of earlier years, even though new and high-quality data are not available.

Although the new method is developed for HRB, the methodology can be extended to other regions, which is the next plan of our research. While the new method can be applied to other regions, the land cover map from LCMM needs to be made for the first time and it will take a lot of work; therefore, instead of making land cover maps from LCMM, the strategy for transferring the training samples at HRB to other regions needs to be explored in the perspective of practical use.

Author Contributions

Conceptualization, B.Z.; Data curation, A.Y. and K.J.; Funding acquisition, B.Z.; Methodology, B.Z. and A.Y.; Project administration, B.Z.; Software, A.Y., and K.J.; Supervision, B.Z.; Validation, A.Y., K.J., and J.W.; Writing—original draft, B.Z.; Writing—review and editing, A.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (grant XDA20100101) and the project titled “Advance Research Project of Civil Space Technology” (grant D040402).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Acknowledgments

The Landsat-TM/OLI data were downloaded from the USGS website (http://landsat.usgs.gov, accessed on 28 November 2020). The HJ-1/CCD data were downloaded from the CRESDA website (https://http://www.cresda.com, accessed on 28 November 2020). The land cover dataset from 2011—2015 was downloaded from GSFC/NASA. The authors sincerely thank the reviewers who provided helpful comments for the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Haberl, H.; Erb, K.H.; Krausmann, F.; Gaube, V.; Bondeau, A.; Plutzar, C.; Gingrich, S.; Lucht, W.; Fischer-Kowalski, M. Quantifying and mapping the human appropriation of net primary production in earth’s terrestrial ecosystems. Proc. Natl. Acad. Sci. USA 2007, 104, 12942–12947. [Google Scholar] [CrossRef] [Green Version]
Herold, M.; Latham, J.; Di Gregorio, A.; Schmullius, C. Evolving standards in land cover characterization. J. Land Sci. 2006, 1, 157–168. [Google Scholar] [CrossRef] [Green Version]
Olson, J.S.; Watts, J.A.; Allison, L.J. Carbon in Live Vegetation of Major World Ecosystems; Oak Ridge National Laboratory-5862; Environmental Sciences Division Publication: Oak Ridge, TN, USA, 1983. [Google Scholar]
Poulter, B.; Ciais, P.; Hodson, E.; Lischke, H.; Maignan, F.; Plummer, S.; Zimmermann, N. Plant functional type mapping for earth system models. Geosci. Model Dev. 2011, 4, 993–1010. [Google Scholar] [CrossRef] [Green Version]
Arino, O.; Bicheron, P.; Achard, F.; Latham, J.; Witt, R.; Weber, J.-L. The most detailed portrait of Earth. Eur. Space Agency 2008, 136, 25–31. [Google Scholar]
Bartholome, E.; Belward, A.S. GLC2000: A new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 2005, 26, 1959–1977. [Google Scholar] [CrossRef]
Friedl, M.A.; McIver, D.K.; Hodges, J.C.; Zhang, X.Y.; Muchoney, D.; Strahler, A.H.; Woodcock, C.E.; Gopal, S.; Schneider, A.; Cooper, A. Global land cover mapping from MODIS: Algorithms and early results. Remote Sens. Environ. 2002, 83, 287–302. [Google Scholar] [CrossRef]
Friedl, M.A.; Sulla-Menashe, D.; Tan, B.; Schneider, A.; Ramankutty, N.; Sibley, A.; Huang, X. MODIS Collection 5 global land cover: Algorithm refinements and characterization of new datasets. Remote Sens. Environ. 2010, 114, 168–182. [Google Scholar] [CrossRef]
Gong, P.; Wang, J.; Yu, L.; Zhao, Y.; Zhao, Y.; Liang, L.; Niu, Z.; Huang, X.; Fu, H.; Liu, S. Finer resolution observation and monitoring of global land cover: First mapping results with Landsat TM and ETM + data. Int. J. Remote Sens. 2013, 34, 2607–2654. [Google Scholar] [CrossRef] [Green Version]
Chen, B.; Xu, B.; Zhu, Z.; Yuan, C.; Suen, H.P.; Guo, J.; Xu, N.; Li, W.; Zhao, Y.; Yang, J.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 23–26. [Google Scholar]
Gong, P.; Chen, B.; Li, X.; Liu, H.; Wang, J.; Bai, Y.; Chen, J.; Chen, X.; Fang, L.; Feng, S.; et al. Mapping essential urban land use categories in China (EULUC-China): Preliminary results for 2018. Sci. Bull. 2020, 65, 182–187. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data Discuss. 2020. [Google Scholar] [CrossRef]
Sulla-Menashe, D.; Gray, J.M.; Abercrombie, S.P.; Friedl, M.A. Hierarchical mapping of annual global land cover 2001 to present: The MODIS Collection 6 Land Cover product. Remote Sens. Environ. 2019, 222, 183–194. [Google Scholar] [CrossRef]
Chen, J.; Chen, J.; Liao, A.; Cao, X.; Chen, L.; Chen, X.; He, C.; Han, G.; Peng, S.; Lu, M. Global land cover mapping at 30 m resolution: A POK-based operational approach. ISPRS J. Photogramm. Remote Sens. 2015, 103, 7–27. [Google Scholar] [CrossRef] [Green Version]
Petitjean, F.; Kurtz, C.; Passat, N.; Gançarski, P. Spatio-temporal reasoning for the classification of satellite image time series. Pattern Recognit. Lett. 2012, 33, 1805–1815. [Google Scholar] [CrossRef] [Green Version]
Zhong, B.; Ma, P.; Nie, A.; Yang, A.; Yao, Y.; Lü, W.; Zhang, H.; Liu, Q. Land cover mapping using time series HJ-1/CCD data. Sci. China Earth Sci. 2014, 57, 1790–1799. [Google Scholar] [CrossRef]
Santos, L.A.; Ferreira, K.; Picoli, M.; Camara, G.; Zurita-Milla, R.; Augustijn, E.W. Identifying spatiotemporal patterns in land use and cover samples from satellite image time series. Remote Sens. 2021, 13, 974. [Google Scholar] [CrossRef]
Planque, C.; Lucas, R.; Punalekar, S.; Chognard, S.; Bunting, P.J. National crop mapping using sentinel-1 time series: A knowledge-based descriptive algorithm. Remote Sens. 2021, 13, 846. [Google Scholar] [CrossRef]
Zhong, B.; Yang, A.; Nie, A.; Yao, Y.; Zhang, H.; Wu, S.; Liu, Q. Finer resolution land-cover mapping using multiple classifiers and multisource remotely sensed data in the Heihe River Basin. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2015, 8, 4973–4992. [Google Scholar] [CrossRef]
Woodcock, C.E.; Allen, R.; Anderson, M.; Belward, A.; Bindschadler, R.; Cohen, W.; Gao, F.; Goward, S.N.; Helder, D.; Helmer, E. Free access to Landsat imagery. Science 2008, 320, 1011. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Xian, G.; Homer, C. Updating the 2001 national land cover database impervious surface products to 2006 using Landsat imagery change detection methods. Remote Sens. Environ. 2010, 114, 1676–1686. [Google Scholar] [CrossRef]
Chen, X.H.; Chen, J.; Shi, Y.S.; Yamaguchia, Y. An automated approach for updating land cover maps based on integrated change detection and classification methods. ISPRS J. Photogramm. Remote Sens. 2012, 71, 86–95. [Google Scholar] [CrossRef]
Zhao, Y.Y.; Gong, P.; Yu, L.; Hu, L.Y.; Li, X.Y.; Li, C.C.; Zhang, H.Y.; Zheng, Y.M.; Wang, J.; Zhao, Y.C.; et al. Towards a common validation sample set for global land cover mapping. Int. J. Remote Sens. 2014, 35, 4795–4814. [Google Scholar] [CrossRef]
Dan, Z.P.; Sang, N.; Chen, Y.F.; Chen, X. Remote sensing object recognition based on transfer learning. In Proceedings of the 10th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, Shenyang, China, 23–25 July 2013; Volume 1, pp. 930–934. [Google Scholar]
Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef] [Green Version]
Daum, H.; Marcu, D. Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 2011, 26, 101–126. [Google Scholar] [CrossRef]
Homer, C.; Huang, C.Q.; Yang, L.M.; Wylie, B.; Coan, M. Development of a 2001 national land-cover database for the US. Photogramm Eng. Remote Sens. 2004, 70, 829–840. [Google Scholar] [CrossRef] [Green Version]
Yinqiao, H.; Youxi, G.; Jiemin, W.; Guoliang, J.; Zhibao, S.; Linsheng, C.; Jiayi, C.; Shouqian, L. Some achievements in scientific research during HEIFE. Plateau Meteorol. 1994, 13, 225–236. [Google Scholar]
Li, X.; Cheng, G.; Liu, S.; Xiao, Q.; Ma, M.; Jin, R.; Che, T.; Liu, Q.; Wang, W.; Qi, Y. Heihe watershed allied telemetry experimental research (HiWATER): Scientific objectives and experimental design. Bull. Am. Meteorol. Soc. 2013, 94, 1145–1160. [Google Scholar] [CrossRef]
Scaramuzza, P.; Barsi, J. Landsat 7 scan line corrector-off gap-filled product development. In Proceedings of the Pecora 16 “Global Priorities in Land Remote Sensing”, Sioux Falls, SD, USA, 23–27 October 2005. [Google Scholar]
Zhu, Z.; Woodcock, C.E. Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sens. Environ. 2012, 118, 83–94. [Google Scholar] [CrossRef]
Robinson, N.P.; Allred, B.W.; Jones, M.O.; Moreno, A.; Kimball, J.S.; Naugle, D.E.; Erickson, T.A.; Richardson, A.D. A dynamic Landsat derived normalized difference vegetation index (NDVI) product for the conterminous United States. Remote Sens. 2017, 9, 863. [Google Scholar] [CrossRef] [Green Version]
Skakun, S.V.; Basarab, R.M. Reconstruction of missing data in time-series of optical satellite images using self-organizing Kohonen maps. J. Autom. Inf. Sci. 2014, 46, 19–26. [Google Scholar] [CrossRef]
Mu, X.; Hu, M.; Song, W.; Ruan, G.; Ge, Y.; Wang, J.; Huang, S.; Yan, G. Evaluation of sampling methods for validation of remotely sensed fractional vegetation cover. Remote Sens. 2015, 7, 16164–16182. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Cheng, G.; Ge, Y.; Li, H.; Han, F.; Hu, X.; Tian, W.; Tian, Y.; Pan, X.; Nian, Y. Hydrological cycle in the Heihe River Basin and its implication for water resource management in endorheic basins. J. Geophys. Res. Atmos. 2018, 123, 890–914. [Google Scholar] [CrossRef]
Ding, J.; Zhao, W.; Daryanto, S.; Wang, L.; Fan, H.; Feng, Q.; Wang, Y. The spatial distribution and temporal variation of desert riparian forests and their influencing factors in the downstream Heihe River basin, China. Hydrol. Earth Syst. Sci. 2017, 21, 2405–2419. [Google Scholar] [CrossRef] [Green Version]
Colditz, R. An evaluation of different training sample allocation schemes for discrete and continuous land cover classification using decision tree-based algorithms. Remote Sens. 2015, 7, 9655–9681. [Google Scholar] [CrossRef] [Green Version]
Stumpf, A.; Kerle, N. Object-oriented mapping of landslides using Random Forests. Remote Sens. Environ. 2011, 115, 2564–2577. [Google Scholar] [CrossRef]
Bruzzone, L.; Prieto, D.F. Unsupervised retraining of a maximum likelihood classifier for the analysis of multitemporal remote sensing images. IEEE Trans. Geosci. Remote Sens. 2001, 39, 456–460. [Google Scholar] [CrossRef] [Green Version]
Paola, J.D.; Schowengerdt, R.A. A detailed comparison of backpropagation neural network and maximum-likelihood classifiers for urban land use classification. IEEE Trans. Geosci. Remote Sens. 1995, 33, 981–996. [Google Scholar] [CrossRef]
Oommen, T.; Misra, D.; Twarakavi, N.K.; Prakash, A.; Sahoo, B.; Bandopadhyay, S. An objective analysis of support vector machine based classification for remote sensing. Math. Geosci. 2008, 40, 409–424. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Feng, Q.; Liu, J.; Gong, J. UAV remote sensing for urban vegetation mapping using random forest and texture analysis. Remote Sens. 2015, 7, 1074–1094. [Google Scholar] [CrossRef] [Green Version]
Dahinden, C.; Ethz, M. An improved Random Forests approach with application to the performance prediction challenge datasets. Hands Pattern Recognit. Chall. Mach. Learn. 2011, 1, 223–230. [Google Scholar]
Liu, D.; Kelly, M.; Gong, P. A spatial–temporal approach to monitoring forest disease spread using multi-temporal high spatial resolution imagery. Remote Sens. Environ. 2006, 101, 167–180. [Google Scholar] [CrossRef]
Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [Google Scholar]
Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]

Figure 1. The location of HRB (left) and the visualization of geographical characteristics including color composite from remote sensing image (middle) and DEM (right).

Figure 2. The seasonal reflectance image composites of 1986 (bottom) and 2014 (top).

Figure 3. Illustration of the procedures of the proposed method.

Figure 4. Illustration of sampling strategy from land cover maps of 2011–2015.

Figure 5. A classification example at HRB based on machine learning model transferring.

Figure 6. The long time series of land cover maps at HRB based on the proposed method.

Figure 7. Comparison of land cover maps at HRB between the GlobeLand30 and the proposed method in 2000 and 2010.

Figure 8. Two close-view examples of time series land covers from 1986–2015.

Figure 9. The area variations for every five years since 1986 for the major land covers at HRB.

Table 1. Land cover datasets of 2011–2015 from the LCMM method using HJ-1/CCD data and high-resolution images.

Data	Date	Description	Source
TM and OLI from Landsat	1986–2015	Major data in this study for land cover mapping	Google earth engine data sets or USGS
SRTMGL1_003	2000	The SRTMGL1 version 3 data obtained terrain information such as elevation and slope and assist in classification	Google earth engine data sets or USGS
Land cover dataset from LCMM method and HJ-1/CCD	2011–2015	The land cover datasets with high accuracy and consistency were used for quickly and accurately retrieving training samples for the machine learning method They made using HJ-1/CCD data	http://westdc.westgis.ac.cn/data/6bbf9a3f-e7d8-4255-9ecb-131e1543316d, accessed on 19 April 2021
Google Earth high-resolution images	--	The high-resolution data are used for verifying the training samples and validation	Historical data available from the Google earth

Table 2. Data and descriptions in this study.

Data	Date	Description
2015	Landsat 8 OLI SR	433
2014	Landsat 8 OLI SR	439
2013	Landsat 8 OLI SR	301
2011	Landsat 5 TM SR	302
2010	Landsat 5 TM SR	332
2005	Landsat 5 TM SR	364
2000	Landsat 5 TM SR	375
1995	Landsat 5 TM SR	291
1990	Landsat 5 TM SR	311
1986	Landsat 5 TM SR	109

Table 3. Land cover datasets of 2011–2015 from LCMM method using HJ-1/CCD data, and high-resolution images.

Code	Type	Description	The Type at LCMM
1	Croplands	Land types used in agriculture, horticulture, etc. including corn, wheat, irrigation, dry land, and other croplands	11Maize 12 Spring wheat 13 Highland barley 14 Rape 15 cotton 16 Alfalfa 17 Orchard 18 Other crops
2	Forests	Land with trees and their coverage being more than 30%, including deciduous forests and evergreen coniferous forests	21 Evergreen coniferous forest 22 Deciduous broadleaf forest
3	Grasslands	Lands with herbaceous cover	31 Grasslands
4	Shrublands	Deciduous shrubs and evergreen shrubs with coverage greater than 30%	40 Shrublands
5	Wetlands	Aquatic herbaceous plants are observable from the image as a non-water cover	51 Wetland
6	Waterbodies	Rivers, lakes, reservoirs/ponds.	41 Waterbodies
7	Urban and built-up	Cities, villages, roads, and other manmade objects	61 Urban and build-up
8	Barren land	Bare rocks, bare soils, desert, dry salt flats, dry river, and lack bottoms, and all other types of land not covered by vegetation except unplanted croplands and urban built-up areas	71 Barren land
9	Snow and ice	Lands under perennial snow and ice	81 Snow and ice 82 Glaciers

Table 4. Confusion matrix for the 2014 HRB land cover map.

Type	CR ¹	FR ²	GR ³	SR ⁴	WE ⁵	WB ⁶	UB ⁷	BL ⁸	SI ⁹	Total	PA (%) ¹⁰
Copland	114	1	1	3	0	0	0	3	0	122	93.44
Forest	0	95	11	0	0	0	0	0	0	106	89.62
Grassland	0	0	132	0	1	0	0	4	0	137	96.35
Shrub land	0	0	1	63	0	0	0	17	0	81	77.78
Wetland	0	0	0	0	55	2	0	0	0	57	96.50
Water	1	0	0	0	2	75	0	0	0	78	96.16
UB ⁷	5	0	0	0	0	0	95	2	0	102	93.14
Barren	0	0	3	0	0	0	5	231	3	242	95.45
Snow/ice	0	0	0	0	0	0	0	6	74	80	92.50
Total	120	96	148	66	58	77	100	263	77	1005
UA (%) ¹¹	95.00	98.96	89.19	95.45	94.83	97.40	95.00	87.83	96.10

Overall accuracy = 93.68%, Kappa = 0.92. ¹ CR = cropland, ² FR = forest, ³ GR = grassland, ⁴ SR = shrubland, ⁵ WE = wetland, ⁶ WB = water body, ⁷ UB = Urban and build-up, ⁸ BL = bare land, ⁹ SI = snow/ice, ¹⁰ UA = user’s accuracy and ¹¹ PA = producer’s accuracy.

Table 5. Accuracy of each year. OA (%) = Overall accuracy (%).

Years	2015	2014	2013	2011	2010	2005	2000	1995	1990	1986
OA (%)	93.5	93.7	91.3	89.6	89.8	89.9	91.1	90.3	88.8	85.2
Kappa	0.924	0.927	0.891	0.879	0.881	0.881	0.896	0.887	0.868	0.827

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhong, B.; Yang, A.; Jue, K.; Wu, J. Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin. Remote Sens. 2021, 13, 1596. https://doi.org/10.3390/rs13081596

AMA Style

Zhong B, Yang A, Jue K, Wu J. Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin. Remote Sensing. 2021; 13(8):1596. https://doi.org/10.3390/rs13081596

Chicago/Turabian Style

Zhong, Bo, Aixia Yang, Kunsheng Jue, and Junjun Wu. 2021. "Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin" Remote Sensing 13, no. 8: 1596. https://doi.org/10.3390/rs13081596

APA Style

Zhong, B., Yang, A., Jue, K., & Wu, J. (2021). Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin. Remote Sensing, 13(8), 1596. https://doi.org/10.3390/rs13081596

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Long Time Series High-Quality and High-Consistency Land Cover Mapping Based on Machine Learning Method at Heihe River Basin

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data and Preprocessing

2.3. The Land Cover Classification System

2.4. Methodology

2.4.1. Land Cover Dataset at HRB from LCMM Method

2.4.2. Automatically Sampling Strategy from 2011–2015 Land Cover Dataset

2.4.3. Sample Transferring Strategy for Earlier Years

2.4.4. Machine Learning Model Selection

3. Results and Validation

3.1. Classification Results

3.2. Validation

4. Conclusions and Discussions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI