1. Introduction
Accurate land cover mapping plays a crucial role in improving the performance of weather simulation models, such as the Weather Research and Forecasting (WRF) model. Land cover maps, used as inputs for surface characteristics in numerical models, play a critical role in influencing processes ranging from evapotranspiration to heat flux [
1,
2]. These processes, in turn, affect broader climate dynamics, making accurate land cover data critical for reliable meteorological simulations and predictions [
3,
4].
Changes in land use and land cover reflect the impact of human activities on terrestrial ecosystems, with far-reaching implications for global climate systems and biogeochemical cycles [
5,
6,
7]. As a result, land cover mapping has become a central focus of research related to global environmental changes and sustainable futures. Numerous studies have utilized remote sensing data to produce land cover products with varying spatial and temporal resolutions from localized regions to global scales [
8,
9,
10,
11].
However, inaccuracies in land cover datasets can distort weather simulations, leading to errors in the representation of weather patterns, temperature gradients, and precipitation predictions. This highlights the importance of continuously updating and integrating the WRF model with the most current and detailed land cover information to ensure more accurate simulations and forecasts [
12].
The East China region is characterized by complex terrains, dense river networks, numerous lakes, and concentrated mega-cities, mostly located along the coast and rivers. Interactions between urban heat islands, sea (lake), and land breezes are intricate. Coastal city sea breezes and urban heat islands can amplify convective precipitation. Studies have shown that these interactions can enhance low-level convergence and updraft motions at the sea breeze front, leading to increased convective cloud formation and precipitation [
13]. Although the basic WRF regional model offers multiple underlying surface datasets like AVHRR (1 km)- and MODIS (500 m)-derived land cover products to choose from, most of these datasets date back to 2010 or earlier. They suffer from a lack of spatial details, varying accuracies, and a failure to reflect land use changes due to rapid economic development in East China. Recently, there has been a surge in high-resolution global land cover products, with resolutions as detailed as 30 m or even finer. GlobeLand30, for instance, offers land cover datasets for 2000 to 2020 at a 30 m resolution [
10], while ESRI boasts a 10 m resolution global land cover data using Seninel-2 imagery [
14]. However, these global products exhibit varying mapping accuracy across different regions [
15], and their classification systems might not align seamlessly with WRF model requirements. Given these challenges, there is a pressing demand for an updated, high-resolution land surface data product tailored to East China’s unique topography and urban dynamics, thereby enhancing regional models for the region.
Regional land cover mapping for terrain-biologically complex ecosystems presents significant challenges, mainly due to phenology-induced errors and high within-class spectral variability [
16,
17]. Seasonal changes in plants’ growth and development often introduce errors in the mapping process. The availability of cloud-free satellite imagery within a selected mapping window (e.g., growing season) may not be ideal for certain study regions, complicating the temporal selection of satellite passes [
18]. Moreover, acquiring a substantial amount of training data points to accurately represent the spectral characteristics of mapping classes is particularly important for various classification algorithms [
19]. It has been emphasized that the quality and comprehensiveness of training data can outweigh the importance of the chosen classification algorithms [
20]. With the advent of machine learning, tools like the Random Forest, support vector machine, and various neural network-based methodologies have become accessible through open-source platforms such as scikit-learn in Python and the Caret library in the R environment. However, the main challenge in regional land cover mapping is not merely about algorithmic choices but in developing a comprehensive framework that seamlessly combines quality training data, appropriate imagery selection, and robust classification methods to address the intricate dynamics of complex ecosystems.
In this study, our primary aim was to develop a 30 m regional land cover mapping workflow for East China tailored to support high-resolution (km) WRF simulations and predictions. It is desirable to have an in-house, automated land cover mapping system that allows for easy expansion of mapping classes and the capability to rerun the mapping process every three years. Central to our workflow was the emphasis on developing high-quality training data, complemented by a careful selection of remote sensing and ancillary datasets. We also conducted a detailed, pixel-wise accuracy assessment and benchmarked our mapping accuracy against multiple global land cover products. A particularly compelling application of our refined land cover map is its assimilation into the WRF model. For the scope of this investigation, we focused on its influence on urban heat wave predictions. This enhanced capability could potentially improve our understanding and prediction of localized climate patterns within East China.
2. Study Area
The East China region, stretching along China’s eastern coast from 113°6′E to 122°6′E longitude and 23°4′N to 38°45′N latitude, encompasses six provinces (Shandong, Jiangsu, Zhejiang, Anhui, Jiangxi, Fujian) and Shanghai (
Figure 1), spanning approximately 2160.6 km north–south and 1156.6 km east–west, totaling 795,740 square kilometers. It is characterized by plains, basins, and hills, with Jiangsu dominated by plains (over 70%), Shandong a mix of plains (55%) and mountains/hills (29%), Anhui’s diverse terrain (Huai River Plain, Jianghuai Plateau, Western Anhui Hills, Yangtze River Area, Southern Anhui Hills), Shanghai’s alluvial plain, Zhejiang’s hills/mountains/basins (70.4%), Jiangxi’s hills/mountains with basins/valleys, and Fujian’s mountainous terrain (over 80%). This region’s rapid economic growth has fueled infrastructure development, land use changes, and urbanization, positioning it as one of China’s most dynamic economic and urbanizing areas. Located at the land–sea convergence and northern–southern climate transition, East China experiences a climate divided by the Huai River: temperate monsoon north and subtropical monsoon south, nurturing a rich landscape, ecosystem, complex surface features, dense river networks, numerous lakes, and coastal/riverine cities. Land cover changes here are influenced by both natural and socio-economic factors, with the latter increasingly dominant in recent decades [
21].
5. Discussion
The development of a high-resolution (30 m) land cover dataset tailored to East China’s diverse landscape is important for both meteorological modeling and environmental management. Existing national and global land cover datasets often fall short in spatial, temporal, and thematic accuracies needed to accurately represent regional surface characteristics, which can negatively impact meteorological modeling [
22,
35,
40]. One of the primary motivations for developing this 30 m land cover map product is its application in enhancing the WRF model. Accurate land cover data are crucial for defining surface–atmosphere interactions, influencing processes such as evapotranspiration and heat flux [
41,
42]. This is particularly important for East China, where rapid urbanization and complex terrain significantly impact local climate dynamics. The primary advantage of developing an in-house land cover product, rather than relying on existing data, is the flexibility it provides in adjusting the classification scheme and update schedule. For example, we included plastic greenhouses as an additional land cover category due to their unique spectral characteristics and significant spatial coverage in the study region. However, current land surface schemes in the WRF model lack the descriptions of the physical process corresponding to the plastic greenhouses type; therefore, for the current WRF simulation, plastic greenhouses have been classified under the land cover type ‘bare lands’. However, in future studies, we plan to represent plastic greenhouses using their specific surface parameters, such as albedo. This type will work when the physical processes related to the plastic greenhouse type are introduced into the land surface schemes.
Similar to other regional or national-scale land cover mapping tasks, the main challenges for successful land cover classification are associated with phenological variability and within-class spectral diversity [
43]. By treating each Landsat image as an independent analytical region for classification, we mitigated errors caused by seasonal changes in vegetation, ensuring robust classification across different times and conditions [
44]. The inclusion of a large and diverse training dataset further enhanced the model’s ability to accurately classify various land cover types, as evidenced by high producer and user accuracies for important categories such as urban and forest lands. We also incorporated ancillary data, such as digital elevation models and nighttime lighting, into our land cover mapping. Among several machine learning algorithms evaluated, we found Random Forest to be the most appealing due to its performance and ease of implementation, particularly when using R’s caret package with automatic parameter tuning and model selection [
45].
Our regional land cover mapping effort achieved an overall accuracy of 83.2% and a Kappa coefficient of 0.81. A comparative analysis with existing land cover products highlights the advancements made by our dataset. For instance, our product’s overall accuracy surpasses earlier datasets such as GLC 2000 (68.6%), IGBP-DIS (66.9%), and UMD (65.0%), as well as more recent products like GlobCover (67.5%) and CCI-LC (74.1%). GlobeLand30, which also offers high-resolution data, achieved an accuracy of 80.0%, slightly lower than our dataset. This comparison underscores the continuous improvement in land cover mapping technologies and the importance of incorporating local-specific classifications and high-resolution imagery. These results are expected because we used localized training data and the Random Forest algorithm, which is robust when high-quality training data is available.
By providing a more detailed and up-to-date land cover boundary layer, our dataset enables more precise regional weather simulations and predictions, particularly enhancing the WRF model’s performance. For the scope of this study, we only included three WRF experiments to highlight daytime and nighttime temperature predictions using different land cover maps as input. For a future study, we plan to expand the number of WRF experiments to cover a wider range of meteorological variables and seasons. We intend to analyze precipitation patterns, wind speed, and humidity levels across different seasons to provide a comprehensive assessment of our land cover dataset’s impact on weather modeling. Additionally, we aim to investigate the effects of land cover changes over time on climate variables to better understand long-term environmental trends.
Beyond meteorological applications, the high-resolution land cover dataset has broader implications for environmental policy and sustainable land management. By providing detailed and accurate information on land use changes, such as the expansion of urban areas and the distribution of plastic greenhouses, policymakers and planners can make more informed decisions. The automated workflow developed in this study ensures the practicality and scalability of the land cover mapping system, allowing for continuous monitoring and updates. Future refinements and integration of advanced machine learning algorithms could further enhance accuracy, improving weather forecasts and climate predictions for East China.
6. Conclusions
We developed regional land cover map products for East China to support high-resolution WRF modeling and prediction. A total of 72 Landsat8 images, combined with DEM and nighttime lighting data, were used to develop a 10-class land cover map. Focusing on the collection of high-quality training points and scene-specific RF classification, our regional mapping achieved an overall accuracy of 83.2% and a Kappa coefficient of 0.81, outperforming existing datasets. The automated workflow developed for this project ensures efficient data processing and future updates. Our three WRF model experiments demonstrated the improved performance of daytime and nighttime temperature predictions using new land cover maps.
Future studies will expand the range of WRF experiments to include various meteorological variables and seasons, further validating our dataset’s impact on weather modeling. Additionally, our high-resolution land cover dataset holds significant potential for ecological assessments, environmental policy formulation, and sustainable land management in the East China region, providing a robust foundation for ongoing and future research.