# **Modeling the Adaptations of Agricultural Production to Climate Change**

Edited by Dengpan Xiao and Wenjiao Shi Printed Edition of the Special Issue Published in *Agriculture*

www.mdpi.com/journal/agriculture

## **Modeling the Adaptations of Agricultural Production to Climate Change**

## **Modeling the Adaptations of Agricultural Production to Climate Change**

Editors

**Dengpan Xiao Wenjiao Shi**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Dengpan Xiao Hebei Normal University China

Wenjiao Shi Institute of Geographic Sciences and Natural Resources Research, CAS China

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Agriculture* (ISSN 2077-0472) (available at: https://www.mdpi.com/journal/agriculture/special issues/modeling agriculture climate).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-6802-7 (Hbk) ISBN 978-3-0365-6803-4 (PDF)**

© 2023 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


Reprinted from: *Agriculture* **2022**, *12*, 1089, doi:10.3390/agriculture12081089 ............ **179**


## **About the Editors**

#### **Dengpan Xiao**

Dengpan Xiao, Professor at College of Geography Science, Hebei Normal University. His research interests include climate change and its impacts on agricultural production, the efficient use of agricultural water resources, and the adjustment of agricultural systems and their impacts.

#### **Wenjiao Shi**

Wenjiao Shi, Professor at Institute of Geographic Sciences and Natural Resources Research, CAS. Her research interests include land use/cover change and its environmental effects; the impacts of global change on agriculture and ecology; ecosystem services; and spatial information analysis.

## **Preface to "Modeling the Adaptations of Agricultural Production to Climate Change"**

Globally, climate change and its impacts on agricultural production and food security are a significant public concern. Modeling is a key tool for exploring the impacts of climate change on agriculture and proposing adaptation strategies. Generally, establishing closer links between experiments and statistical and/or eco-physiological crop models may facilitate the necessary methodological advances. It is expected that insights derived from this book will be helpful for relevant decision makers in the areas of agricultural adaptation and food security.

We would like to sincerely thank all the authors who submitted papers to the Special Issue of *Agriculture* entitled "Modeling the Adaptations of Agricultural Production to Climate Change", the reviewers of these papers for their constructive comments and thoughtful suggestions, and the editorial staff of *Agriculture*.

> **Dengpan Xiao and Wenjiao Shi** *Editors*

## *Editorial* **Modeling the Adaptation of Agricultural Production to Climate Change**

**Dengpan Xiao 1,2,\* and Wenjiao Shi 3,\***


Climate change and its impacts on agricultural production and food security are a significant source of public concern around the world. In order to reduce the negative impacts of climate change on agriculture, maintain crop production levels, and even discover opportunities in agricultural intensification, researchers have made great efforts to assess changes in agricultural climate resources and develop adaptation measures in different growing areas of the world experiencing climate change. Modeling is a key tool for exploring the impacts of climate change on agriculture and proposing adaptation strategies. Currently, the two main fields where further progress is required include a more mechanistic understanding of climate impacts and management options for adaptation and mitigation, and a focus on cropping systems and integrative multiscale assessments instead of single season and crops. Therefore, establishing closer links between experiments and statistical and/or eco-physiological crop models may not only facilitate the necessary methodological advances but also achieve the above goals.

With these goals in mind, we have organized this Special Issue "Modeling the Adaptation of Agricultural Production to Climate Change (MAAPCC)". The Special Issue of MAAPCC has a total of 21 papers [1–21], and papers were submitted from five countries: China, Japan, Thailand, Iran, and South Africa. Moreover, the Special Issue covers a wide range of plants, including not only grain crops such as maize [15], rice [6,10], wheat [2,4,16], and soybean [14], and cash crops such as cotton [7] and sugarcane [3], but also the apple tree [1] and traditional Chinese medicinal plants such as Rheum nanum (*R. nanum*) [17]. In terms of the research time scale, the Special Issue not only focused on climate change and its impact in the historical period, but also analyzed the impacts of different climate scenarios on plant distribution, crop production, and climatic resources in the future in 10 papers [1,6,8,12,14–17,19,20], and some of the papers adopted the latest climate scenario data, namely Shared Socioeconomic Pathways (SSPs) from the Coupled Model Comparison Program (CMIP6). As for the research methods, these papers not only use the traditional statistical analysis methods, but also involve the widely used crop mechanism models, including the Agricultural Production Systems Simulator (APSIM) [2], the Crop Environment Resource Synthesis (CERES)-Wheat [4], CERES-Rice [6], the denitrification-decomposition (DNDC) [12], the integrated climate–hydrological–economic model [13], CROPWAT [14], and Environmental Policy Integrated Climate (EPIC) [15]. In addition, part of the papers also used the current, more popular, machine learning method for analysis and prediction [2,18]. Overall, the papers in the Special Issue of MAAPCC were grouped into three categories: assessment of climate resources in the context of climate change [5,7,8,14,19,20], assessment of the impact of climate change on crop production [1,3,10,13,15–17], and some methodological studies related to climate change [2,4,6,9,11,12,18,21].

The first category has six papers under the following sub-heading: assessment of climate resources in the context of climate change.

**Citation:** Xiao, D.; Shi, W. Modeling the Adaptation of Agricultural Production to Climate Change. *Agriculture* **2023**, *13*, 414. https:// doi.org/10.3390/agriculture13020414

Received: 6 February 2023 Accepted: 9 February 2023 Published: 10 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Generally, evaluating the response of evapotranspiration to climatic change can provide theoretical support on the optimal allocation of regional water resources and agricultural water management under climate change. In the two papers in this section, the trends and climatic causes of potential evapotranspiration (*ET*0) in Heilongjia Province from 1960 to 2019 [5] and in the Xinjiang Autonomous Region of China from 1957 to 2017 [7] have been quantitatively assessed, and the results suggested that historical climate change has had significant impacts on regional *ET*<sup>0</sup> and will further affect crop water demand and consumption. Moreover, the paper by Li et al. [14] investigated the spatial and temporal distribution of *ET*0, crop water requirement (*ETC*), irrigation water requirement (*Ir*), effective precipitation (*Pe*), and the coupling degree of *ETC* and *Pe* for soybean during the growth period for the future period from 2021 to 2080 in Heilongjiang Province, China.

Assessing the climatic suitability of crops is critical for mitigating and adapting to the negative impacts of climate change on crop production. The paper by Zhao et al. [20] developed a climate suitability model of maize and investigated the climate suitability of summer maize during the past and future periods in the North China Plain. The paper by Nooni et al. [19] investigated the future changes in drought events for four SSPs in the African continent. The projected wetter trends in humid areas may benefit agricultural production and ecological conservation, and the drier trends in non-humid areas may require appropriate drought adaptation strategies and development plans to minimize impacts [19]. The last paper in this section was from Shi et al. [8], which proposed a land use/land cover (LULC) simulation framework from 2000 to 2030 for four different development scenarios in the Xinjiang region. This study stated that both the supply and demand of carbon stock in Xinjiang would increase in 2025 and 2030, with the demand exceeding the supply [8].

The second category is the assessment of the impacts of climate change on crop production.

There were seven papers exploring the effects of climate change on crop (or plant) distribution and production. In general, climate change plays an important role in the distribution of suitable zones of plant cultivation. In the two papers in this section, the potential distribution of suitable habitats and range shifts of apple trees in the near present and near future (i.e., the 2030s and the 2050s) under two climate scenarios (i.e., SSP126 and SSP585) was simulated based on three pieces of software (the maximum entropy model, IDRISI, and ArcGIS) [1], and the potential distribution of *Rheum nanum* (*R. nanum*), a famous traditional Chinese medicinal plant, was developed for three periods (current, 2050s: 2041–2060, and 2070s: 2061–2080) using MaxEnt and ArcGIS [17]. These two studies may improve our understanding of the effects of climate warming on plant distribution and could be useful for relevant agricultural decision-making. In addition, the paper by Wu et al. [16] determined the planting boundary of winter wheat in north China for the future period based on four critical parameters of percentages of extreme minimum temperature years (POEMTYs), first day of the overwintering period (FD), sowing date (SD), and precipitation before winter (PBW).

Currently, a large number of studies focus on the effects of climate change on crop yields and/or production. The paper by Choruma et al. [15] assessed the effects of future climate change on maize yield in the Eastern Cape Province of South Africa, and indicated a decrease in maize production for two future periods (mid-century (2040–2069) and late century (2070–2099)). The paper by Shayanmehr et al. [13] constructed a new integrated climate–hydrological–economic model to assess the impact of future climate change on water resources and crop production. The findings noted that in the majority of cases, crop production will reduce in response to climate scenarios so that rainfed wheat will experience the greatest decline (approximately 59.95%) [13]. Zhang [10] analyzed the spatiotemporal change in heat stress and its impacts on rice growth in the middle and lower reaches of the Yangtze River, China, and indicated that the change in heat stress is attributed to climate changes and extreme meteorological events. The last paper in this section was from Yao et al. [3], which comprehensively assessed multiple sugarcane agrometeorological

disasters with regard to sugarcane yield in Southern China. The results suggested that the yield-reducing effect of sugarcane flood was more obvious than that of drought [3].

The last category is some methodological studies related to climate change.

To meet the challenges of climate change and the increasing food demand, an accurate, timely, and dynamic yield estimation of regional or global crop yield is critical to food trade and policy-making. In the two papers in this section, the winter wheat yield in the North China Plain [2] and in China [18] was accuracy predicted by coupling the crop model with machine learning algorithms based on multi-source data. These findings indicated that the prediction model can be used to develop adaptation strategies to mitigate the negative effects of climate change on crop productivity and provide the data support for food security. Moreover, the paper by Zheng et al. [6] used a process-based crop model (CERES-Rice) which was calibrated and validated based on experimental data from the Songnen Plain of China, and driven by multiple global climate models (GCMs) from the CMIP6 to predict rice growth period, yield, and light and heat resource utilization efficiency under future climate change conditions. The results showed that optimizing the sowing date could make full use of climate resources to improve rice yield and light and heat resource utilization indexes under future climate conditions [6]. In addition, the CERES-Wheat model was applied to investigate the optimal irrigation amount for high yield, water saving, and the trade-off between high yield and water saving of winter wheat in the North China Plain [4]. Therefore, crop mechanism models play an important role in assessing the effects of climate change on crop production and proposing effective coping strategies. In the future, we need to continuously develop crop models to improve the effectiveness and versatility of their simulation.

In this section, three papers developed models or devices to simulate or observe greenhouse gas emissions from agricultural processes, which have made a significant contribution to climate warming [9,11,12]. In the paper by Salehi et al. [11], two types of data-driven models were proposed to predict biogas production from the anaerobic digestion of spent mushroom compost supplement with wheat straw used as a nutrient source. The paper by Yi et al. [12] evaluated crop yields, nitrous oxide (N2O) emission, and soil organic carbon (SOC) in a typical wheat–corn rotation system field on the North China Plain on a 50-year scale using the denitrification-decomposition (DNDC) model, and proposed adaptive strategies for each climate scenario. Moreover, a gas diffusion analysis method for simulating N2O surface flux from soil gas measured in a soil-interred silicone diffusion cell using a low-cost device was developed by Bandara et al. [9]. The last paper in this section was from Huang et al. [21], which evaluated the accuracy of three reanalysis temperature data systems (e.g., the China Meteorological Administration Land Data Assimilation System (CLDAS), the U.S. Global Land Data Assimilation System (GLDAS), and the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis Version 5 (ERA5)-Land) across China. The results indicated that the CLDAS product demonstrated a relatively high reliability, which was of great significance for the study of climate change and forcing crop models [21].

In summary, this Special Issue focuses on the quantitative assessment of the impact of climate change on agricultural production based on multi-source model simulation and reveals the role and mechanism of improved management measures in adapting to climate change. It is expected that insights derived from this Special Issue will be helpful for relevant decision-makers in the areas of agricultural adaptation and food security.

**Author Contributions:** Conceptualization, D.X. and W.S.; investigation, D.X. and W.S.; writing—original draft preparation, D.X.; writing—review and editing, D.X. and W.S.; visualization, D.X. and W.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This material is based upon work that is supported by the Hebei Provincial Science Foundation for Distinguished Young Scholars (No. D2022205010), and the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No. 72221002).

**Acknowledgments:** We would like to sincerely thank all of the authors who submitted papers to the Special Issue of *Agriculture* entitled "Modeling the Adaptations of Agricultural Production to Climate Change", the reviewers of these papers for their constructive comments and thoughtful suggestions, and the editorial staff of *Agriculture*.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Multiscale Assessments of Three Reanalysis Temperature Data Systems over China**

**Xiaolong Huang 1,2, Shuai Han 3,\* and Chunxiang Shi <sup>3</sup>**


**Abstract:** Temperature is one of the most important meteorological variables for global climate change and human sustainable development. It plays an important role in agroclimatic regionalization and crop production. To date, temperature data have come from a wide range of sources. A detailed understanding of the reliability and applicability of these data will help us to better carry out research in crop modelling, agricultural ecology and irrigation. In this study, temperature reanalysis products produced by the China Meteorological Administration Land Data Assimilation System (CLDAS), the U.S. Global Land Data Assimilation System (GLDAS) and the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis version5 (ERA5)-Land are verified against hourly observations collected from 2265 national automatic weather stations (NAWS) in China for the period 2017–2019. The above three reanalysis systems are advanced and widely used multi-source data fusion and re-analysis systems at present. The station observations have gone through data Quality Control (QC) and are taken as "true values" in the present study. The three reanalysis temperature datasets were spatial interpolated using the bi-linear interpolation method to station locations at each time. By calculating the statistical metrics, the accuracy of the gridded datasets can be evaluated. The conclusions are as follows. (1) Based on the evaluation of temporal variability and spatial distribution as well as correlation and bias analysis, all the three reanalysis products are reasonable in China. (2) Statistically, the CLDAS product has the highest accuracy with the root mean square error (RMSE) of 0.83 ◦C. The RMSEs of the other two reanalysis datasets produced by ERA5-Land and GLDAS are 2.72 ◦C and 2.91 ◦C, respectively. This result indicates that the CLDAS performs better than ERA5-Land and GLDAS, while ERA5-Land performs better than GLDAS. (3) The accuracy of the data decreases with increasing elevation, which is common for all of the three products. This implies that more caution is needed when using the three reanalysis temperature data in mountainous regions with complex terrain. The major conclusion of this study is that the CLDAS product demonstrates a relatively high reliability, which is of great significance for the study of climate change and forcing crop models.

**Keywords:** temperature; evaluation; CLDAS; GLDAS; ERA5-Land

#### **1. Introduction**

Climate change and its impact on agricultural regionalization and crop production is one of the most important fields of study around the world. Temperature is an important indicator of the energy balance of the earth's surface and directly affects global climate change. Accurate temperature data can reasonably drive crop models to simulate the impact of climate change on agricultural production, which is a key tool to explore planting management systems and put forward adaptation strategies [1–3]. To date, temperature data come from a wide range of sources. Conventional temperature observations at groundbased weather stations are single-point observations. Although a single-point observation

**Citation:** Huang, X.; Han, S.; Shi, C. Multiscale Assessments of Three Reanalysis Temperature Data Systems over China. *Agriculture* **2021**, *11*, 1292. https://doi.org/10.3390/ agriculture11121292

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 21 November 2021 Accepted: 17 December 2021 Published: 19 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

may have high accuracy, its spatial coverage is quite limited, especially in mountainous areas with complex terrain, where the observation stations are often sparsely and unevenly distributed due to multiple constraints, such as the topography and environment conditions and maintenance difficulties. As a result, observations at such areas often have certain limitations in representativeness and applicability [4–6]. Temperature retrievals from remote sensing can provide spatially continuous observations, yet the retrieval accuracy is low [7–9]. Numerical model simulations have certain advantages regarding the spatialtemporal resolution. However, the model results are severely affected by various physical parameterization schemes [10,11], which often lead to large uncertainties in the output.

In recent years, various real-time analysis or reanalysis datasets (hereafter referred to as gridded dataset) on regular grids with high spatial resolution and temporal continuity have been produced by different multi-source data fusion and assimilation systems. A data fusion and assimilation system can take advantage of various data, analyze and process data from different sources, like direct observations, retrievals and model simulations, and output gridded dataset. A gridded dataset can cover a large spatial area over a long period, and thus effectively makes up for the lack of observations in areas where observation stations are sparse [12]. These gridded datasets provide basic data support for gridded forecasting, climate analysis and application services in meteorological agencies [13]. They can also be used as input data for numerical models to drive land surface, hydrological and ecological models to obtain more reliable results [10].

At present, various global and regional reanalysis datasets have been released. These datasets, including atmospheric and surface datasets, are produced by different data assimilation and fusion. The atmospheric reanalysis datasets include meteorological variables, such as temperature, relative humidity and UV winds on various pressure levels. The surface reanalysis datasets are composed of surface air temperature, and soil moisture, etc. By far, the National Centers for Environmental Prediction (NCEP) and National Centers for Atmospheric Research (NCAR) reanalysis products [14,15], the ECMWF reanalysis product (ERA5) [16–18], and the Japan Meteorological Agency (JMA) product JRA-55 [19] are the most widely used in the meteorological field. In practical applications, however, the requirement for spatial resolution of the near-surface elements are higher than that for the atmospheric elements in the upper air. Therefore, the land surface fusion system has been developing rapidly in recent years, and the fusion datasets that include near-surface meteorological elements and soil variables have been widely used in weather and climate prediction, water resources management and water cycle studies. In the beginning of the 2000s, the National Oceanic and Atmospheric Administration (NOAA) established the Global Land Data Assimilation System (GLDAS) [20]. In 2019, ECMWF released the high-accuracy ERA5-Land gridded surface dataset [21]. In 2015, the China Meteorological Administration Land Data Assimilation System (CLDAS-V2.0) was successfully developed [12,22,23].

Following the continuous improvement of observational systems, assimilation systems and numerical models, spatial and temporal resolutions of gridded temperature datasets also increased. They provide a rich data source for the mechanism study of regional atmospheric circulation and climate change studies. However, due to differences in the input data sources, as well as fusion models and assimilation systems, the simulation effect of temperature can be good or bad [24]. Therefore, the accuracy and applicability of temperature reanalysis datasets have always been a big concern of meteorologists. Many studies have evaluated the applicability of surface air temperature in several reanalysis products, such as ERA-40, JRA-25, NCEP/NCAR and NCEP/DOE, etc. It is found that these gridded reanalysis datasets can, to a certain degree, reflect spatial and temporal distribution characteristics of the observations [25–31], yet the differences between them demonstrate obvious regional and seasonal changes. Although the near surface air temperatures from GLDAS, ERA5-Landand CLDAS have been respectively evaluated over limited areas, comprehensive and detailed evaluation and comparison of these data over land areas of China have not been conducted. Note that the evaluation of gridded surface

air temperature datasets is an important component of climate change study. Results of the evaluation provide a valuable reference for understanding regional temperature changes and promoting sustainable development.

Based on observations collected at automatic weather stations in China, this study analyzes the accuracy of near-surface air temperature in GLDAS, ERA5-Land and CLDAS gridded datasets over mainland China from different temporal and spatial perspectives. Results of the evaluation will be helpful for researchers to understand the applicability of these gridded datasets in China and provide a reference for the selection of appropriate temperature datasets in the studies of climate change, extreme weather, the Earth's energy and various numerical models. Meanwhile, this study will also help research institutions to further improve the algorithms used for producing these gridded datasets.

#### **2. Data and Methods**

#### *2.1. Data*

Table 1 lists of the spatial and temporal resolutions of the datasets used in this study and their coverage areas. It contains in-site data (NAWS) and three grid datasets (GLDAS, ERA5-Land and CLDAS).


<sup>1</sup> Instantaneous value of temperature at current time.

#### 2.1.1. GLDAS Data

GLDAS is evolved from the land information system [32], which is a land surface data assimilation system that consists of multiple land surface models. It is applied to integrate observation-based data and produce surface state (such as soil moisture, surface temperature) and flux (such as evaporation, latent heat and sensible heat flux) variables. GLDAS includes four land surface models [33], i.e., Noah, Mosaic, CLM and VIC. Driven by Princeton University's global meteorological dataset, GLDAS-2 created a more climatologically consistent dataset that covers the period from 1948 to 2010 [34]. The horizontal resolution of this dataset is 0.25◦ and the temporal resolution is 3-h. It covers the area of (60◦ S–90◦ N, 180◦ W–180◦ E). In the present study, GLDAS gridded temperature data are downloaded from NASA Goddard Earth Science Data and Information Service Center (GES DISC) (http://disc.sci.gsfc.nasa.gov/hydrology/data-holdings, accessed on 5 September 2021).

#### 2.1.2. ERA5-Land Data

ERA5-Land is a repeat of the ERA5 climate reanalysis, while a series of improvements have been made to make it better meet application requirements [21,35–37]. In particular, ERA5-Land runs at enhanced resolution (9 km vs. 31 km in ERA5). The temporal frequency of the output is hourly and the fields are masked for all oceans, making them lighter to handle. ERA5-Land is produced by a single model simulation that is not incorporated into the ECMWF Integrated Forecasting System (IFS). The ERA5-Land historical dataset for the period 1950–1980 was released in September 2021, and the dataset for the period since 1981 was initially released to users in 2019 and is being updated in real-time. The spatial and temporal resolutions of the dataset are 1-h and 0.1◦, respectively. In the present study, the ERA5-Land surface air temperature gridded dataset is downloaded from ECMWF Copernicus Climate Change Service Climate Data Store (C3S CDS) (https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form, accessed on 5 September 2021).

#### 2.1.3. CLDAS Data

The version2 China Meteorological Administration Land Assimilation System (CLDAS-V2.0) is developed by the National Meteorological Information Center of the China Meteorological Administration. It runs four physical parameterization schemes [23] (Noah, CLM3.5, Common Land Model CoLM and Noah MP) to simulate various soil variables, such as soil temperature, soil moisture, evapotranspiration, surface heat flux, etc. The CLDAS datasets mainly include the forcing dataset and the land surface dataset. The forcing dataset is produced from fusion of observations collected at more than 60,000 automatic weather stations with numerical model predictions, as well as satellite remote sensing data using multi-grid variational analysis technology, discrete ordinates radiation model, hybrid radiation estimation model and terrain correction algorithm [12,23]. CLDAS provides gridded 2 m air temperature, 2 m humidity, 10 m UV wind, ground pressure, ground incident solar radiation, precipitation and other elements. In this study, the hourly gridded temperature dataset of the second edition of CLDAS is obtained from the National Meteorological Information Center of China Meteorological Administration. The spatial and temporal resolutions of the dataset are 1-h and 0.05◦, respectively, and it covers the area of (0◦ N–60◦ N, 70◦ E–140◦ E) (http://data.cma.cn/, accessed on 5 September 2021).

#### 2.1.4. NAWS Observation Data

NAWS observations are obtained from the CIMISS database in Sichuan Meteorological Observation Data Center. In total, 2281 national weather stations in mainland China with data integrity above 98% are selected (Figure 1). The observation instruments at the above weather stations are regularly calibrated, upgraded, and maintained by professionals in the meteorological field. The data collected at these weather stations have passed the national, provincial, and station quality controls and all data are marked with the QC flags [38]. Due to the lack of representativeness of the data observed by alpine stations in the plain area, the observation stations are excluded. Finally, data collected at the remaining 2265 stations are considered to be the most reliable observational data, which can be used as the benchmark data for the evaluation of CLDAS, ERA5-Land and GLDAS.

#### *2.2. Data Processing*

In this study, hourly 2 m temperature data collected at 2265 NAWS for the period 2017–2019 are used to evaluate the accuracy of CLDAS, ERA5-Land and GLDAS gridded datasets. Only those data indicated by QC flag as "correct" are selected to produce the "true value" dataset for the evaluation of reanalysis datasets. To address possible impacts caused by the displacement of weather stations during the evaluation period, GLDAS, ERA5- Land and CLDAS temperatures are spatially interpolated using the bi-linear interpolation method [39] to station locations according to the latitude and longitude information of the stations at each time to obtain comparative sequences. A total of 19,642,844 samples have been obtained. By calculating the statistical metrics defined in Section 2.3, the accuracy of the gridded datasets can be evaluated.

#### *2.3. Metrics Used for Evaluation*

Correlation coefficient (COR), mean error (ME) and root mean square error (RMSE) are used to compare CLDAS, GLDAS and ERA5-Land data with NAWS observations. They are defined as follows.

$$\text{COR} = \frac{\sum\_{i=1}^{n} (\text{G}\_{\text{i}} - \bar{\text{G}})(\text{O}\_{\text{i}} - \bar{\text{O}})}{\sqrt{\sum\_{i=1}^{n} (\text{G}\_{\text{i}} - \bar{\text{G}})^2 \sqrt{\sum\_{i=1}^{n} (\text{O}\_{\text{i}} - \bar{\text{O}})^2}}} \tag{1}$$

$$\text{BIAS} = \frac{1}{\text{N}} \sum\_{i=1}^{n} \text{G}\_{\text{i}} - \text{O}\_{\text{i}} \tag{2}$$

$$\text{RMSE} = \sqrt{\frac{1}{N} \sum\_{i=1}^{n} (\mathbf{G}\_i - \mathbf{O}\_i)^2} \tag{3}$$

where Oi is the weather station observation, Gi is the gridded temperature data interpolated to the station locations, N is the total number of samples used in the evaluation (number of stations). COR varies within [−1.0–1.0]; the closer the value is to 1, the better the data consistency and closer it is to −1, the stronger the opposite relationship. When COR is 0, it means that there is no linear relationship between product and observation. BIAS reflects the degree of deviation of the gridded temperature data from the observation at the station. A negative value indicates that the gridded temperature data is underestimated, while a positive value indicates that the temperature is overestimated in the reanalysis dataset. The closer the RMSE is to 0, the more accurate the gridded temperature data set is. During the evaluation and inspection period, all the samples used for evaluation are calculated based on the cumulative results of hourly observations.

**Figure 1.** Distribution of National Automatic Weather Stations (NAWS) in China.

#### **3. Results Analysis**

#### *3.1. Evaluation of Overall Accuracy*

3.1.1. Overall Accuracy during the Study Period

For the evaluation period of 2017–2019, the overall accuracy results are listed in Table 2. The average temperature observations at the weather stations in mainland China is 13.93 ◦C, and the averages CLDAS, ERA5-Land and GLDAS are 13.88 ◦C, 13.22 ◦C and 13.55 ◦C, respectively, and are 0.06 ◦C, 0.71 ◦C and 0.38 ◦C lower than the station observation. Note that the average of CLDAS is very close to that of the observations. From the perspective of correlation, the highest value of 0.998 is found between CLDAS and observations and the

lowest value of 0.970 is found between GLDAS and observations, while that between the ERA5-Land and observations is insignificant. The biases of the three reanalysis datasets all are negative in mainland China, indicating that temperatures in these gridded datasets are underestimated compared to station observations, and the underestimation is most severe in ERA5-Land with a value of 0.71 ◦C. The RMSEs for CLDAS, ERA5-Land and GLDAS are 0.83 ◦C, 2.72 ◦C and 2.91 ◦C, respectively. Overall, both correlation and bias metrics indicate that the accuracy of CLDAS is obviously higher than that of the other two gridded datasets for the evaluation period. The accuracy of ERA5-Land is better than that of GLDAS, although the difference between them is small.

**Table 2.** Statistics of CLDAS, ERA5-Land and GLDAS temperatures for the period 2017–2019.


Temperatures at the 4 h of 00, 06, 12, and 18 UTC on the 15th day of each month from January to December 2019 are used to represent the annual mean temperature in 2019. Scatterplots of temperature from the NAWS observations and the three gridded datasets, as well as their linear fittings, are displayed in Figure 2. The goodness of fit (R2) are 0.995, 0.95 and 0.945, respectively, for CLDAS, ERA5-Land and GLDAS. Intuitively, it can be seen that the CLDAS has a higher accuracy.

**Figure 2.** Fitting deviations of CLDAS (**a**), ERA5-Land (**b**) and GLDAS (**c**) data from NAWS observations.

#### 3.1.2. Evaluation at Individual Stations

Figure 3 displays the spatial distributions of correlation coefficients between the three gridded datasets and observations in mainland China. For most stations, the COR values for CLDAS are higher than that for GLDAS and ERA5-Land. For the three datasets, the COR values all decrease from east to west. As shown in Figure 3a for CLDAS, the COR values are greater than 0.99 at most stations, except a few individual stations over the Tibetan Plateau, the Hengduan Mountains and other high elevation areas. For ERA5-Land (Figure 3b), stations with a COR larger than 0.99 are concentrated in Northeast China, North China, and the middle and lower reaches of the Yangzi River. The COR values decrease from 0.98 to 0.95 over inland China and are largely below 0.96 in West China. As shown in Figure 3c for GLDAS, the spatial pattern of COR is similar to that for ERA5-Land, while the COR values are largely smaller than that for ERA5-Land over inland China and Sichuan Basin. Figure 3d presents the Kernel Distribution Estimation (KDE) of the density of stations corresponding to COR values for the three gridded datasets. Note that the greater the number of stations with COR close to 1.0, the better the correlation of the

gridded dataset with station observations. Figure 3d clearly indicates that CLDAS is the best among the three datasets, while ERA5-Land is better than GLDAS.

**Figure 3.** Spatial distributions of COR ((**a**): CLDAS; (**b**): ERA5-Land; (**c**): GLDAS); (**d**): KDE of stations on COR for CLDAS, ERA5-Land and GLDAS datasets.

Spatial distributions of RMSEs for the three datasets are presented in Figure 4, which indicates that the RMSEs for CLDAS are smaller than those for GLDAS and ERA5-Land at most stations. The value of RMSE increases from east to west for all three datasets. As shown in Figure 4a for CLDAS, the RMSE values are smaller than 0.5 ◦C at all of the stations, except for those in Xinjiang, Yunnan, the Tibetan Plateau and the high elevation areas in western Sichuan. The spatial distributions of RMSEs for ERA5-Land (Figure 4b) and GLDAS (Figure 4c) tend to be similar, with the values concentrated over 1.0–3.0 ◦C for ERA5-Land and over 1.5–4.0 ◦C for GLDAS. The KDE of density of stations with RMSE for the three datasets are presented in Figure 4d, which shows clearly that CLDAS is better than the other two datasets.

Figure 5 displays the spatial distributions of BIAS for the three datasets in mainland China. For CLDAS (Figure 5a), the biases at most stations vary between −1.0–1.0 ◦C with positive values in the east and negative values in the west, where the terrain elevation is relatively high. The spatial distributions of bias for ERA5-Land (Figure 4b) and GLDAS (Figure 4c) are basically consistent. Large positive values occur in the North China Plain and the Taklimakan Desert in Xinjiang, while negative values mostly occur in Fujian and southwest China (except Sichuan Basin). Figure 4d displays the KDE of density of stations with BIAS for the three datasets. It shows that positive and negative biases each account for half of the total stations for CLDAS, while negative biases prevail for ERA5-Land and the opposite is true for GLDAS.

**Figure 4.** Spatial distributions of RMSE ((**a**): CLDAS; (**b**): ERA5-Land; (**c**): GLDAS); (**d**): KDE of stations on RMSE for CLDAS, ERA5-Land and GLDAS.

**Figure 5.** Spatial distributions of BIAS ((**a**): CLDAS; (**b**): ERA5-Land; (**c**): GLDAS); (**d**): KDE of stations on BIAS for CLDAS, ERA5-Land and GLDAS.

#### *3.2. Evaluation at Various Time Scales*

#### 3.2.1. At Different Times of the Day

Figure 6 displays diurnal features of the four statistics metrics for the evaluation of the three datasets over the period 2017–2019. Multi-year averages of temperature at different times of the day are shown in Figure 6a, which indicates that the three gridded datasets exhibit consistent diurnal temperature variation with observations. COR at different times of the day are displayed in Figure 6b, which suggests that the COR for CLDAS changes little, with a value around 0.997. The COR value for ERA5-Land first decreases from 0.978 at 00 UTC to 0.97 at 06 UTC, then gradually increases and reaches the peak at 12 UTC, and then slowly decreases again. The COR for GLDAS first increases from 0.974 at 00 UTC to 0.979 at 03 UTC, and then gradually decreases. Figure 6c presents RMSEs at different times of the day. The largest RMSEs occur at 15, 06 and 12 UTC for CLDAS, ERA5-Land and GLDAS, respectively with the values of 0.94 ◦C, 2.86 ◦C and 3.27 ◦C. Biases at different times of the day for the three datasets are displayed in Figure 6d, which shows positive biases for CLDAS at 00, 03 and 06 UTC with the largest positive bias of 0.22 ◦C at 03 UTC and the largest negative bias of −0.29 ◦C at 12 UTC. GLDAS has positive biases at 00 and 03 UTC and the bias is up to 0.66 ◦C at 00 UTC, and negative biases occur at all other times with the largest negative bias of −1.43 ◦C at 12 UTC. ERA5-Land exhibits negative biases at all times of the day with the largest negative bias of −0.98 ◦C at 09 UTC. Overall, negative bias prevails at different times of the day for all the three datasets.

**Figure 6.** Diurnal variations of temperature during 2017–2019. (**a**): Multi-year average temperature, (**b**): COR; (**c**): RMSE; (**d**): BIAS.

#### 3.2.2. Daily Evaluation

Figure 7 presents daily variations of the evaluation metrics during 2017–2019. Daily mean temperatures for the three datasets and observations are shown in Figure 7a, which indicates that the daily temperature variation during the study period for the three datasets is consistent with station observations. Figure 7b presents daily CORs during the study period. The daily COR between CLDAS and station observations shows little changes, whereas the CORs of ERA5-Land and GLDAS with station observations exhibit large daily

variations with the values ranging between 0.891–0.977 and 0.848–0.979, respectively. Daily RMSEs of the three datasets are displayed in Figure 7c, which indicates that the RMSEs of CLDAS are obviously smaller than those of the other two datasets. The RMSEs of ERA5-Land and GLDAS are close, yet the RMSEs of ERA5-Land are smaller than those of GLDAS in most days. Daily RMSEs of CLDAS, ERA5-Land and GLDAS vary between 0.61–2.35 ◦C, 1.97–3.80 ◦C and 2.43–3.76 ◦C, respectively. Figure 7d presents daily biases of the three datasets. It is obvious that the biases of CLDAS basically are negative but very close to 0.0. The biases of ERA5-Land are also negative almost all of the time except for a few days. GLDAS is dominated by negative biases in autumn and winter, while positive biases mainly occur in spring and summer.

**Figure 7.** Daily evaluation of temperature during 2017–2019. (**a**): Daily average temperature, (**b**): COR; (**c**): RMSE; (**d**): BIAS.

#### 3.2.3. Monthly Changes

Figure 8 shows monthly changes of temperature and evaluation metrics for the three datasets over 2017–2019. The curves of monthly average temperature of the three grid datasets are basically consistent with the observed values (Figure 8a). Monthly CORs are presented in Figure 8b, which shows that the three gridded datasets exhibit similar monthly variation patterns, i.e., the COR gradually decreases from January to June, and gradually increases from July to December. The possible reason is that the average temperature in China gradually increases because it is located in the northern hemisphere, and the change range of hourly temperature increases relatively. However, the reanalysis temperature products are affected by spatial resolution, and there is a certain regional smoothness in the response to this temperature change, which results in a decreasing COR. Similarly, COR gradually increased from June to December. The CORs of GLDAS are significantly lower than those of ERA5-Land during April-October, but no obvious differences can be found in other months. Figure 8c shows monthly RMSEs. The RMSEs of ERA5-Land overall are smaller than those of GLDAS but are slightly higher in November and February. Monthly biases of CLDAS and ERA5-Land all are negative, while negative biases prevail in GLDAS (Figure 8d) with positive biases only occurring from November 2017 to February 2018 and from October 2018 to February 2019. Among the three gridded datasets, the monthly biases of GLDAS vary the most and those of the CLDAS vary the least.

**Figure 8.** Monthly changes of temperature during 2017–2019. (**a**): Monthly average temperature, (**b**): COR; (**c**): RMSE; (**d**): BIAS.

#### 3.2.4. Seasonal Changes

Figure 9 shows seasonal characteristics of the evaluation metrics over the study period. Seasonal average temperatures of the three gridded datasets and station observations are presented in Figure 9a. Seasonal mean temperatures of station observations and CLDAS, ERA5-Land and GLDAS datasets are 15.0 ◦C, 14.96 ◦C, 14.06 ◦C and 14.57 ◦C, respectively, in spring. Summer mean temperatures of the above four datasets are 24.61 ◦C, 24.56 ◦C, 23.87 ◦C and 23.88 ◦C, respectively. Autumn mean temperatures of these datasets are 14.23 ◦C, 14.16 ◦C, 13.63 ◦C and 13.95 ◦C, respectively. Seasonal mean temperatures in winter are 1.64 ◦C, 1.58 ◦C, 1.08 ◦C and 1.55 ◦C for the four datasets, respectively. In all the four seasons, the seasonal mean temperature of CLDAS is the closest to the observations, followed by that of GLDAS, and the result of ERA5-Land is the worst. Figure 9b shows the seasonal correlation between the three gridded datasets and station observations. It is found that the correlation is the lowest in summer, the highest in autumn and is higher in winter than in spring. This is a common feature for all three gridded datasets. RMSEs are displayed in Figure 9c, which shows that the RMSEs of the three datasets are the largest in winter with the value of 0.91 ◦C for CLDAS and 3.1 ◦C for both ERA5-Land and GLDAS. Figure 9d presents seasonal biases. Compared to station observations, temperature in all four seasons is underestimated in the three gridded datasets. The largest negative bias of CLDAS is −0.06 ◦C, which appears in autumn. The largest negative bias of ERA5- Land appears in winter with a value of −0.93 ◦C. GLDAS has the largest negative bias of −0.73 ◦C that appears in summer.

**Figure 9.** Seasonal changes of temperature during 2017–2019. (**a**): Seasonal average temperature, (**b**): COR; (**c**): RMSE; (**d**): BIAS.

#### *3.3. Evaluation over Subregions*

#### 3.3.1. Evaluation over Subregions Divided according to Climate Regimes

With reference to previous studies [40,41], China is divided into eight subregions for evaluation based on topographic and climatic characteristics. Figure 10 shows the eight subregions: subregion I (33◦–50◦ N, 72◦–105◦ E), subregion II (25◦–33◦ N, 76◦–95◦ E), subregion III (17◦–25◦ N, 95◦–105◦ E), subregion IV (25◦–33◦ N, 95◦–105◦ E), subregion V (7◦–30◦ N, 105◦–125◦ E), subregion VI (30◦–40◦ N, 105◦–125◦ E), subregion VII (40◦–55◦ N, 118◦–135◦ E), and subregion VIII (40◦–55◦ N, 105◦–118◦ E).

**Figure 10.** Subregions of China according to climate regimes.

Results of evaluation over climate regimes are listed in Table 3, which shows that the RMSEs of ERA5-Land and GLDAS are the largest in subregion II. This subregion is located in the Tibetan Plateau, where negative biases prevail. This is also the region with the largest negative bias. In contrast, the RMSE of CLDAS is the smallest in subregion II, where positive biases appear. The RMSEs of ERA5-Land and GLDAS are smaller in subregions V, VI and VII than in other subregions. Subregions V, VI and VII are located in eastern China, where the terrain is relatively flat. The RMSEs of the three datasets are larger in subregion IV than in other subregions, which is attributed to the fact that subregion IV is located in the transitional zone from the Tibetan Plateau to Sichuan Basin, where the terrain is extremely complex.


**Table 3.** Evaluation results over subregions of different climate regimes.

3.3.2. Evaluation over Administrative Regions

The province is the second-level administrative unit in China, which possesses certain geographical and human attributes. Most operational meteorological services or scientific

research projects are conducted according to the territorial principle. Therefore, it is necessary to evaluate the gridded datasets from the perspective of the provinces. For this reason, all of the national automatic weather stations in China are grouped according to their provincial attributes, and the biases in each province are calculated individually. Results are listed in Table 4, which shows that, except for Tibet and Guizhou for GLDAS, and except Tibet for ERA5-Land, the CORs of the two gridded datasets with station observations are above 0.90, while the CORs of CLDAS are above 0.99 in all of the provinces. The RMSEs of CLDAS are below 1.0 ◦C in all of the provinces, except for Shanxi and Xinjiang, where the values are 1.012 ◦C and 1.088 ◦C, respectively. The RMSEs of ERA5- Land and GLDAS are below 3.0 ◦C in all of the provinces except for Gansu, Xinjiang, Qinghai, Guizhou, Sichuan and Tibet, and the largest RMSEs of the two datasets both occur in Tibet, with the values of 7.86 ◦C and 6.292 ◦C, respectively. The numbers of provinces with negative biases in CLDAS, ERA5-Land and GLDAS respectively account for 61%, 81%, and 55% of the total number of provinces in mainland China. The evaluation results show that the quality of CLDAS, ERA5-Land and GLDAS is significantly better in the eastern provinces than in the western provinces of China. Compared with ERA5-Land and GLDAS, CLDAS is closer to observations in each individual province. ERA5-land is better than GLDAS in all of the provinces except for Sichuan, Qinghai and Tibet, where the biases of ERA5-Land are slightly larger than those of GLDAS.


**Table 4.** Evaluation results over provinces in mainland China.

#### **4. Discussion**

The present study reveals some important issues that are different to previous studies [30,41,42]. For example, the biases of CLDAS, ERA5-Land and GLDAS at night are larger than that in daytime, and all three datasets have negative biases in the nighttime. Monthly biases of the three gridded datasets demonstrate certain regularities. From January to June, their correlations with station observations gradually decrease, and the biases increase. From July to December, the correlations gradually increase, and the biases decrease. Seasonal correlations of the three datasets with observations are the lowest in summer and the highest in autumn, while the correlation in winter is higher than that in spring. Similar assessment results also found a monthly variation in the GLDAS evaluation results, but the deviation was the lowest in August [43], which may be due to different time periods of evaluation.

In addition, the change in temperature is significantly related to geographical locations and variations, such as altitude and slope. The Integrated Nowcasting through Comprehensive Analysis (INCA) [44] were used in the fine lattice simulation and application of temperature over complex terrain, and this method compared and analyzed the other three interpolation methods (inverse distance weighting method, inverse distance weighting method and ordinary Kriging method) [45]. The altitude of the station will have a great impact on the results of the four grid methods, and the error increases gradually with the increase of the elevation of the verification station. However, it is mainly aimed at Zhejiang Province in eastern China [46]. There are few reports that provide a detailed evaluation of site classification according to terrain across the whole of China.

The topography in China is high in the west and low in the east, showing a staircaselike distribution with multiple terrain patterns and large mountainous areas. The 2065 observation stations used in this study are located at different elevations. The highest station is the Amdo Station in Tibet, the elevation of which is 4800 m. The lowest station is the Turpan Station in Xinjiang, western China, and its elevation is −48.7 m. Evaluation of the present study at individual stations and over various regions indicate that the accuracy of the three datasets is, to a certain degree, related to topography. This is because surface air temperature in gridded datasets is simulated at each fixed grid, where the elevation is the grid-average value. However, the elevation of a weather station may not be able to well represent the average elevation of its nearby area, which may possibly lead to biases in the gridded dataset. Next, we will further classify the slope and elevation of the observational stations, and discuss the influences of the two main terrain features on the accuracy of the gridded datasets.

#### *4.1. Impact of Terrain Elevation on the Accuracy of Gridded Dataset*

According to their elevations, the stations are divided into eight categories, i.e., elevation < 500 m, ≥500–1000 m, ≥1000–1500 m, ≥1500–2000 m, ≥2000–2500 m, ≥2500–3000 m, ≥3000–3500 m and ≥3500 m. Figures 11 and 12 show the bias characteristics of the three gridded datasets at different elevations. The correlations of ERA5-Land and GLDAS with station observations both show a downward trend with increasing elevation, while their average biases gradually increase with more severe underestimation and the bias range at a single station becomes more divergent. Compared to ERA5-Land and GLDAS, the CLDAS dataset is less affected by elevation.

Several previous studies have also found that elevation differences between stations and model grids are a major reason for the biases in reanalysis datasets [47–49]. Specifically, weather stations over the Hengduan Mountain in western Sichuan are concentrated in the river valley, where the elevation is greatly different to the surrounding areas. Large cold biases in this area are found in gridded datasets because the station elevations there are lower than the heights of corresponding model grids. For those stations located at the top of mountains, their elevations probably are higher than the heights of model grids at the same place. As a result, warm biases are found at these stations in the gridded datasets. The above discussion indicates that the elevation correction of temperature in the gridded dataset can effectively reduce the biases and improve the applicability of the dataset [31,50,51]. In addition, possible input data errors, model system errors, and interpolation errors (from Gaussian grid to latitude-longitude grid) of the fusion system are also sources of biases.

**Figure 11.** RMSE changes with elevation for ((**a**): CLDAS, (**b**): ERA5-Land, (**c**): GLDAS) datasets.

**Figure 12.** BIAS changes with elevation for ((**a**): CLDAS, (**b**): ERA5-Land, (**c**): GLDAS) datasets.

#### *4.2. Impact of Slope on the Accuracy of Gridded Dataset*

According to the classification of slopes proposed by the International Geographical Union and the Geomorphological Mapping Committee for the application of detailed geomorphological maps [52], the slope grades are divided into: plain (0◦–0.5◦), slight slope (>0.5◦–2◦), gentle slope (>2◦–5◦), slope (>5◦–15◦), steep slope (>15◦–35◦), steep slope cliffs slope (>35◦–55◦), vertical slope (>55◦–90◦). Figures 13 and 14 displays the RMSE and BIAS characteristics of the three datasets over different types of slope. It is found that RMSEs and BIASs of ERA5-Land and GLDAS both increase with increasing slope, while the correlations of the two datasets with observations decrease and the mean errors gradually increase. The underestimation of temperature in the two gridded datasets also gradually intensifies, with a wider spread of biases at individual stations. Compared with the above two datasets, CLDAS is less affected by the terrain slope.

**Figure 13.** RMSE changes with slope for ((**a**): CLDAS, (**b**): ERA5-Land, (**c**): GLDAS) datasets.

**Figure 14.** BIAS changes with slope for ((**a**): CLDAS, (**b**): ERA5-Land, (**c**): GLDAS) datasets.

#### **5. Conclusions**

In the present study, the gridded temperature datasets (CLDAS, ERA5-Land and GLDAS) that have been widely used in mainland China are evaluated for the past three years (2017–2019) on multiple times scales from hours of the day to daily, monthly and seasonal, etc. Spatially, the evaluation is conducted at single stations and over various climate regimes and administrative regions, etc. The results indicate that the three gridded datasets can represent the near surface air temperature in mainland China and realistically reflect the overall characters of temperature over major land areas of China. Compared to station observations, temperatures in the three datasets all are underestimated to varying degrees. The underestimation is most severe in ERA5-Land, followed by that in GLDAS. Overall, CLDAS exhibits the highest accuracy in mainland China, and ERA5-Land shows the second highest accuracy. GLDAS is the worst. However, note that the accuracy of ERA5-Land and GLDAS are only slightly different, and the two datasets demonstrate their own advantages and disadvantages in different regions.

In summary, differences in evaluation results can be attributed to various factors, including different resolutions of the gridded datasets, different remapping methods used to match gridded data with station observations and different evaluation metrics, etc. The present study compares the evaluation results of three gridded temperature datasets from different perspectives and finds that CLDAS has the highest accuracy in mainland China, followed by ERA5-Land, with GLDAS being the worst. However, CLDAS dataset mainly covers China and the surrounding areas, whereas ERA5-Land and GLDAS datasets are long-term, global datasets. Therefore, appropriate datasets should be selected based on different applications.

**Author Contributions:** Conceptualization, X.H., S.H. and C.S.; methodology, X.H., S.H. and C.S.; validation, S.H.; data curation, S.H. and C.S.; writing—original draft preparation, X.H.; writing—review and editing, X.H., S.H. and C.S.; visualization, X.H.; supervision, S.H.; funding acquisition, S.H. and C.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Program of China (No. 2018YFC1506601), Key Techniques and Data Sets of Land Surface Reanalysis in Qinghai Xizang Plateau (No. NMICJY202106), Study on the Fusion of Precipitation and Soil Moisture with Multi-Source Data (No. 2011DFG23150), the Key Technology Development Project of Weather Forecasting (No. YBGJXM(2020)1A-08) and the Innovative Development Project of the China Meteorological Administration (No. CXFZ2021Z007).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Future Projection for Climate Suitability of Summer Maize in the North China Plain**

**Yanxi Zhao 1,2,3, Dengpan Xiao 1,2,3,\*, Huizi Bai 3, Jianzhao Tang <sup>3</sup> and Deli Liu <sup>4</sup>**


**Abstract:** Climate change has and will continue to exert significant effects on social economy, natural environment, and human life. Research on the climatic suitability of crops is critical for mitigating and adapting to the negative impacts of climate change on crop production. In the study, we developed the climate suitability model of maize and investigated the climate suitability of summer maize during the base period (1981–2010) and two future periods of 2031–2060 (2040s) and 2071–2100 (2080s) in the North China Plain (NCP) based on BCC-CSM2-MR model (BCC) from the Coupled Model Comparison Program (CMIP6) under two Shared Socioeconomic Pathways (SSP) 245 and SSP585. The phenological shift of maize under future climate scenarios was simulated by the Agricultural Production Systems Simulator (APSIM). The results showed that the root mean square errors (*RMSE*) between observations and projections for sunshine suitability (*SS*), temperature suitability (*ST*), precipitation suitability (*SP*), and integrated climate suitability (*SZ*) during the whole growth period were 0.069, 0.072, 0.057, and 0.040, respectively. Overall, the BCC projections for climate suitability were in suitable consistency with the observations in the NCP. During 1981–2010, the *SP*, *ST,* and *SZ* were high in the north of the NCP and low in the south. The *SP*, *ST,* and *SZ* showed a downward trend under all the future climate scenarios in most areas of NCP while the *SS* increased. Therein, the change range of *SP* and *SS* was 0–0.1 under all the future climate scenarios. The *ST* declined by 0.1–0.2 in the future except for the decrease of more than 0.3 under the SSP585 scenario in the 2080s. The decrease in *SZ* in the 2040s and 2080s under both SSP scenarios varied from 0 to 0.2. Moreover, the optimum area decreases greatly under future scenarios while the suitable area increases significantly. Adjusting sowing data (SD) would have essential impacts on climate suitability. To some extent, delaying SD was beneficial to improve the climate suitability of summer maize in the NCP, especially under the SSP585 scenario in the 2080s. Our findings can not only provide data support for summer maize production to adapt to climate change but also help to propose agricultural management measures to cope with future climate change.

**Keywords:** adaptation; climate change; summer maize; phenology shift; GCM

#### **1. Introduction**

Over the past 100 years, global warming has become more and more significant, and it has become one of the major issues affecting the sustainable development of human society [1]. Global warming has exerted a significant impact on the natural environment, social economy, and human life, among which the impact on agricultural production on which human survival depends has attracted widespread attention [2–9]. Generally, different crops have different demands for climate resources, and more or less, climate resources are not conducive to the normal growth and development of crops [10,11]. The quantity

**Citation:** Zhao, Y.; Xiao, D.; Bai, H.; Tang, J.; Liu, D. Future Projection for Climate Suitability of Summer Maize in the North China Plain. *Agriculture* **2022**, *12*, 348. https://doi.org/ 10.3390/agriculture12030348

Academic Editor: Claudia Di Bene

Received: 20 January 2022 Accepted: 27 February 2022 Published: 28 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

variation of key climatic factors (i.e., sunshine hours, temperature, and precipitation) can be transformed into the climate suitability of crop growth and development based on the membership function method in fuzzy mathematics [12,13]. Moreover, the crop climate suitability can play a role in predicting final yield [14,15]. The study of climate suitability not only helps to make specific divisions of regions according to climate conditions, e.g., unsuitable area (unsuitable area for crop growing), less suitable area (less suitable area for crop growing), suitable area (suitable area for crop growing) and optimum area (optimum area for crop growing) [16,17], but also can rationally plan the planting areas of crops [17–20]. Moreover, the climate suitability for different varieties of the crop is different. By comparing the climate suitability for three varieties of early maturity, medium maturity, and late maturity of spring soybean in North China, the medium maturity variety was the most suitable variety for planting [21]. The climate suitability model is a useful tool to investigate the sensitivity of crops to climatic factors such as temperature, precipitation, and sunshine, analyze the response mechanism of crops to climate change, and optimize the selection of crop varieties [22,23].

Climate change exerted an important impact on the crop growth process, and the study of future climate suitability is crucial for taking effective adaptive measures to cope with the adverse effects of climate change on crop production. Global climate model (GCM) is an effective tool to explore the mechanisms of climate change and predict future climate change, and the simulation results of GCMs can provide important data support to study the impacts of future climate change on agricultural production at different scales [24]. The World Climate Research Program (WCRP) initiated a new round of the International Coupled Model Comparison Program (CMIP6). Since the implementation of CMIP was more than 20 years, the number of participating models for CMIP6 was the largest, the scientific experiment design was the most complete, and the largest amount of model data was provided [25]. Compared with previous climate models such as CMIP5, the simulations of CMIP6 models for climate systems were closer to the observations and had less uncertainty, so the simulating ability of climate change has significantly improved [26,27]. For example, in terms of simulation for extreme climate at global scale, CMIP6 models had a general improvement in simulating the changing trend of extreme climate compared to CMIP5 models [28]. In the study, the BCC-CSM2-MR model (BCC) from CMIP6 was used to explore the crop climate suitability in the future. In contrast with the previous versions of the BCC model from CMIP5, the physical mechanism of BCC, such as atmospheric radiation and deep convection process, was improved to make it more suitable for the simulation of climate distribution [29–31].

Maize is an important food and feed crop in the world and has a considerable impact on agricultural economy [32]. In China, maize is one of three major food crops while it ranks first among miscellaneous grain crops, widely distributed in Northeast, North, Northwest, and Southwest China [33]. The North China Plain (NCP) is an important grain production base in China, with maize production accounting for more than 30% of the country's total output [34]. Over the past few decades, climate change has had a significant impact on maize production in the NCP [35]. From 1980 to 2009, the contribution of climate change to maize yield reduction in the NCP was 15–30%, among which the contribution from the reduction in solar radiation was 12–24%, and that from temperature increment was 3–9% [36]. Therefore, It is of great significance to study the climate suitability of maize in the NCP and evaluate the impact of climate resources on agricultural production. The evaluation of agricultural climate suitability is beneficial to cope with the impact of climate change on maize production, rationally use agricultural climate resources, improve the agricultural management level and ensure agricultural production safety [14]. Tang and Liu [15] analyzed the spatial-temporal characteristics of maize climate suitability during the current period and future period in the NCP based on 30 CMIP5 GCMs. However, the phenology shift of maize in the above study was obtained by calculating the active accumulated temperature, with a lack of mechanism. During the historical period, the observed records of maize phenology at the agro-meteorological stations should be used for developing

the climate suitability model. In addition, the crop model can be used to simulate crop growth under various environmental conditions and agricultural management measures. Therefore, this study took advantage of the crop model to simulate the future phenology of maize. The Agricultural Production Systems Simulator (APSIM) was selected to simulate the phenological shift of maize under future climate scenarios.

In the study, we mainly investigated the climate suitability of maize in the NCP under future climate scenarios on the basis of daily climate data of BCC from CMIP6 and future phenology conditions simulated by the APSIM model. The objectives of the study were (1) to develop the climate suitability model of maize based on the regional climatic conditions and fuzzy mathematics; (2) to analyze the spatial and temporal change characteristics of climate suitability for maize in the NCP under future scenarios; and (3) to evaluate the effect of adjusting sowing date on maize climate suitability.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The NCP (113.7–122.7◦ E, 32.9–40.5◦ N) is delimitated in the east by the sea, the west by Taihang Mountains, the south by the main stream of the Huaihe River, the north by the Yan Mountains, accounting for approximately 1.4 × 105 km2 of arable land (Figure 1) [37]. The region has a warm temperate monsoon climate with plenty of light and heat resources [37]. The mean annual temperature across the study area ranged from 9.6 to 16.0 ◦C in nearly fifty years [38]. The annual precipitation is not evenly distributed, with over 70% of precipitation appearing in July through September. The main soil type in the NCP is the loam of Aeolian origin, a soil type deposited by rivers over geological periods. The NCP is an important grain production region in China, where the main cropping system is the double-cropping systems of winter wheat-summer maize [36]. Summer maize is usually planted in middle June and harvested in September [39].

**Figure 1.** The spatial distribution of 52 meteorological stations in the North China Plain.

#### *2.2. Climate Data*

The historical records about daily climate data, including mean temperature (*Tmean*), maximum temperature (*Tmax*), minimum temperature (*Tmin*), precipitation (*Prec*), and sunshine hours (*Sh*) during 1981 to 2010 for 52 meteorological stations across the NCP, were obtained from China's Meteorological Administration (CMA).

Future climate scenario data were obtained from GCMs, which is provided by the World Climate Research Program (WCRP) of Coupled Model Inter-comparison Project phase 6 (CMIP6, https://esgf-node.llnl.gov/search/cmip6/ (accessed on 20 February 2020)). CMIP6 integrated climate change information from the CMIP5 simulations of the representative concentration pathway (RCP) and future societal development pathways (SSPs) [40]. The SSPs describe alternative evolutions of future society under climate change and/or climate policy. SSPs 1 and 5 envision relatively optimistic trends for human development, with substantial investments in education and health, rapid economic growth, and wellfunctioning institutions [40]. However, SSP5 assumes an energy-intensive, fossil-based economy, while SSP1 assumes an increasing shift toward sustainable practices. Further details of the SSPs can be found in O'Neill et al. [40]. In our study, we focused on fossil-fueled development trends (i.e., SSP5) and the highest forcing pathway (i.e., RCP8.5, the radiative forcing of 8.5 W m−<sup>2</sup> by 2100), defined by SSP5-85 (SSP585). Additionally, the combination of medium social vulnerability and medium radiative forcing, defined by SSP2-45 (SSP245), was also used for comparison. SSP245 is the updated RCP4.5 scenario, and the radiative forcing will stabilize at 4.5 W/m<sup>2</sup> in 2100. A GCM named BCC-CSM2-MR (BCC) under SSP245 and SSP585 with a time span of 2031–2060 (2040s) and 2071–2100 (2080s) developed by Beijing Climate Center, China, was selected. In the study, the 2040s under SSP245, 2080s under SSP245, 2040s under SSP585, and 2080s under SSP585 were defined as S1, S2, S3, and S4, respectively.

The statistically downscaled method developed by Liu and Zuo [41] was used to generate daily climate data at each station from the monthly data of the BCC-CSM2-MR model (BCC). This method was mainly divided into two steps: spatial downscaling and temporal downscaling. The spatial downscaling was to transform monthly GCMs on the grid scale into monthly stations data using the inverse distance-weighted interpolation (IDW). The formula is as follows:

$$S\_i = \sum\_{k=1}^{4} \left[ \frac{1}{d\_{i,k}^m} \left( \sum\_{j=1}^{4} \frac{1}{d\_{i,j}^m} \right)^{-1} P\_k \right] \tag{1}$$

where *Si* is the downscaled site-specific GCMs projection at site *i*, *Pk* is the GCMs projection at cell k, *di*,*<sup>k</sup>* (*di*,*j*) is the distance between site *i* and the center of cell *k* (*j*), *m* is the control parameter.

In the spatial downscaling process, the qq-mapping bias correction method was applied to correct the bias of the spatial downscaling data to match with the observations. The bias-corrected spatial downscaling data were calculated by

$$\mathbf{x}\_{k}^{f} = y\_{i}^{o} + \frac{y\_{i+1}^{o} - y\_{i}^{o}}{\mathbf{x}\_{i+1}^{h} - \mathbf{x}\_{i}^{h}} \left(\mathbf{x}\_{k}^{r} - \mathbf{x}\_{i}^{h}\right), \ \mathbf{x}\_{i}^{h} \le \mathbf{x}\_{k}^{r} \le \mathbf{x}\_{i+1}^{h} \tag{2}$$

where *x <sup>f</sup> <sup>k</sup>* is the bias-corrected spatial downscaling data, *<sup>k</sup>* = 1, 2, ... , *<sup>n</sup>*, *<sup>y</sup><sup>o</sup> <sup>i</sup>* is the monthly observed data in the baseline period, *x<sup>h</sup> <sup>i</sup>* is the GCMs monthly data in the baseline period, *xr <sup>k</sup>* is the future GCMs projected data before bias correction.

Then the spatial downscaling monthly climate data for each station was transformed to daily climate data using a modified stochastic weather generator (WGEN) [42].

#### *2.3. Phenology Data and Future Phenology Simulation*

The observed data of maize phenology, including sowing data (SD), flowering data (FD), and maturity data (MD) for 49 agro-meteorological stations across the NCP during 1981–2010, were obtained from China's Meteorological Administration (CMA). To investigate the climate suitability of maize at different growth periods, the whole growth period (WGP) was divided into two periods: vegetative growth period from SD to FD (VGP) and reproductive growth period from FD to MD (RGP). The phenology data of the adjacent agro-meteorological stations were used to calculate climate suitability for the three meteorological stations without phenology data. With future climate change, the phenology of maize across the NCP will also change. In order to more accurately calculate the climate suitability under future climate scenarios, the APSIM, which is a comprehensive model developed to simulate biophysical processes in agricultural systems [43,44], was selected to simulate the maize phenology under future climate scenarios. Generally, the APSIM model can provide an acceptable prediction of crop productivity under the combined influences of climate change, soil condition, and management measures and was widely used in agricultural research [36,45–48]. The APSIM model had been calibrated and validated based on observed phenological data for the selected stations in the previous study [49]. In addition, SD adjustments were considered in the validated APSIM model to simulate the maize phenology shift under future climate scenarios. A total of 5 sowing dates were set up, including observed SD during the historical period (S\_Base), observed SD advanced by 30 days (S\_A30) and 15 days (S\_A15), and observed SD delayed by 15 days (S\_D15) and 30 days (S\_D30). The results can be used to evaluate the impacts of adjusting SD on the phenology and climate suitability of maize.

#### *2.4. Climate Suitability Model*

To characterize the climatic adaptability of maize to key climatic factors (i.e., sunshine hours, temperature, and precipitation) and integrated climatic conditions in the NCP, the climate suitability model was built with reference to related studies [12,50–52].

#### 2.4.1. Sunshine Suitability (*SS*) Model

The sunshine hours had a great influence on the growth and development of crops. The *SS* of maize was calculated as follows [23,52,53]:

$$S\_S = \begin{cases} \left. e^{-\left[ (S\_i - S\_0)/r \right]^2} S\_i < S\_0 \\ & \mathbf{1} \ S\_i \ge S\_0 \end{cases} \tag{3}$$

where *S*<sup>0</sup> is the daily sunshine hours when the percentage of daily sunshine hours reaches 70%; *Si* is daily sunshine hours (h); *r* is a constant that can be determined according to the climatic conditions across the NCP and relevant studies [52,54]. The values for *r* at different growth periods are shown in Table 1. *SS* at the VGP, RGP, and WGP are referred to as *SS\_VGP*, *SS\_RGP,* and *SS\_WGP*, respectively.

**Table 1.** Values of related parameters for calculating the suitability of sunshine, temperature, and precipitation in vegetative growth period (VGP) and reproductive growth period (RGP) of summer maize.


2.4.2. Temperature Suitability (*ST*) Model

*ST* is related to three base point temperatures at different growth periods of crops. Three basis point temperatures include the optimal temperature, lower limit temperature, and upper limit temperature of the crop life process. Under the optimal temperature, crops grow quickly and well, while crops cease to grow and develop above upper limit temperature or below lower limit temperature [23]. The *ST* of maize was calculated as follows [23,52,53]:

$$S\_T = \frac{\left[ \left( T\_i - T\_1 \right) \left( T\_2 - T\_i \right)^B \right]}{\left[ \left( T\_0 - T\_1 \right) \left( T\_2 - T\_0 \right)^B \right]} \tag{4}$$

$$\text{Among } \left\{ B = \frac{(T\_2 - T\_0)}{(T\_0 - T\_1)} \right\}.$$

where *Ti* is daily mean temperature (◦C); *T*<sup>0</sup> is the optimal temperature (◦C) at different growth periods; *T*<sup>1</sup> and *T*<sup>2</sup> is the lower limit temperature (◦C) and the upper limit temperature (◦C) during various growth periods. The specific values of *T*0, *T*1, and *T*<sup>2</sup> refer to the climatic conditions across the NCP and relevant studies [50,52,55] and are shown in Table 1. The *ST* during the VGP, RGP and WGP were defined by *ST\_VGP*, *ST\_RGP*, and *ST\_WGP*, respectively.

#### 2.4.3. Precipitation Suitability (*SP*) Model

It is very important for crop growth if precipitation can match the physiological water requirement of crops. *SP* was defined as the ratio of precipitation to physiological water requirement when precipitation is less or greater than physiological water requirement during the crop growth period. The *SP* of maize is calculated as follows [51]:

$$S\_P = \begin{cases} \ R/\ R\_0 \ R\_i < R\_0\\ \ R\_0/\ R \ R\_i \ge R\_0 \end{cases} \tag{5}$$

where *R* is precipitation (mm); *R*<sup>0</sup> is the physiological water requirement of crops, which can be calculated as follows:

$$R\_0 = \mathcal{K} \mathcal{c} \cdot E T\_0 \tag{6}$$

where *Kc* is the crop coefficient and *ET*<sup>0</sup> is the reference crop evapotranspiration (mm). The *Kc* values of maize during various growth stages were determined according to relevant studies [56,57] and listed in Table 1. The *ET*<sup>0</sup> values of maize are calculated based on the Penman–Monteith formula [57].

$$ET\_0 = \frac{0.408\Delta (R\_{\rm II} - G) + \gamma \frac{900}{T + 273} \mathcal{U} 2 (\varepsilon\_s - \varepsilon\_a)}{\Delta + \gamma (1 + 0.34 \mathcal{U} 2)} \tag{7}$$

where Δ is the slope on the saturation vapor pressure temperature curve (kPa ◦C−1); *T* is the daily mean temperature (◦C); *Rn* represents net radiation (MJ·m−2·d−1); *<sup>G</sup>* is the soil heat flux (MJ m−<sup>2</sup> d−1); *γ* is the hygrometer constant (kPa ◦C−1); *U*<sup>2</sup> is the wind speed 2 m above the ground (m s<sup>−</sup>1); *es* and *ea* are respectively the saturated vapor pressure and the actual vapor pressure (kPa) at temperature *T*. The *SP* at the VGP, RGP and WGP were defined by *SP\_VGP*, *SP\_RGP,* and *SP\_WGP*, respectively.

#### 2.4.4. Crop Climate Suitability during Different Crop Growth Stages

The sunshine and temperature suitability during different growth stages are calculated by the arithmetical average method according to the following formula:

$$\mathbf{S}\_{\mathbb{C}} = \frac{1}{m} \sum\_{i}^{m} \mathbf{S}\_{\text{ci}} \tag{8}$$

where S*<sup>c</sup>* represents *SS* (*ST*) at different growth stages of maize; *I* is the corresponding day sequence at each growth stage; *Sci* is the *SS* (*ST*) suitability of *i* day; *m* is the total number of days at the corresponding growth stage.

#### 2.4.5. Integrated Climate Suitability (*SZ*) Model

Crop growth and development are jointly affected by sunshine, temperature, and precipitation together. *SS*, *ST,* and *SP* can only reflect the influence of a single climatic factor on crop growth. Therefore, *SZ* is developed to make a comprehensive reflection of the impact of sunshine, temperature, and precipitation on maize growth and development. The formula is as follows:

$$S\_Z = aS\_S + bS\_T + cS\_R \tag{9}$$

Based on related studies and characteristics of crop growth and development in the NCP [50,52,58], the coefficients *a, b,* and *c* are taken as values 0.20, 0.32, and 0.48, respectively in this study. The *SZ* during the VGP, RGP, and WGP were defined by *SZ\_VGP*, *SZ\_RGP*, and *SZ\_WGP*, respectively.

#### *2.5. Spatial Interpolation*

In the study, the inverse distance weighting (IDW) in ArcGIS 10.3 was used to map the spatial distribution characteristics and to analyze spatial variation trends of *SS*, *ST*, *SP*, and *SZ* during the 2040s and 2080s compared with the baseline period (1981–2010) in the NCP.

#### **3. Results**

#### *3.1. Comparison between Observations and Simulations for Climate Suitability during 1981–2010*

In the study, it can be found from Figure 2 that the Pearson correlation coefficient (*R*) calculated between observations and projections for *SS*, *ST*, *SP,* and *SZ* at different stages ranged from 0.10 to 0.46. Meantime, the root mean square error (*RMSE*) calculated between observations and projections for *SS*, *ST*, *SP,* and *SZ* at different stages was relatively small. In detail, the *RMSE* of *SS\_WGP*, *ST\_WGP*, *SP\_WGP,* and *SZ\_WGP* was 0.069, 0.072, 0.057, and 0.040, respectively. There was small uncertainty for the BCC projection in the interannual variation during 1981–2010, which can reasonably reproduce the temporal variation trend of the observed climate suitability. As was shown in Figure 2, the BCC projection displayed similar trends of observations for climate suitability. These results showed that the BCC projections for climate suitability were in suitable consistency with the observations and suitable for the assessment and prediction of climate suitability in the NCP.

**Figure 2.** Observed and simulated maize climate suitability for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) during 1981–2010 across the North China Plain.

#### *3.2. Temporal Changes of Climate Suitability under the Future Climate Scenarios*

The changes of BCC projected *SS*, *ST*, *SP,* and *SZ* in the 2040s and 2080s under SSP245 and SSP585 compared to the baseline period of 1981–2010 are shown in Figure 3.

**Figure 3.** Changes in projected maize climate suitability for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) during the 2040s and 2080s compared to 1981–2010 (S1, S2, S3, and S4 are the SSP245\_2040s, SSP245\_2080s, SSP585\_2040s, and SSP585\_2080s).

The change of BCC projected *SP\_VGP* under both scenarios in the 2040s compared to the baseline period was small (less than 0.01), while the decrease in BCC projected *SP\_VGP* in the 2080s was from 0.05 to 0.06 (Figure 3a). *SP\_RGP* in the 2040s and 2080s showed a uniform downward trend, and the range of decrease was 0.02–0.04 (Figure 3b). On the whole, the decline of *SP\_WGP* in the future compared to the baseline period was even more pronounced. The *SP\_WGP* under both scenarios in the 2040s decreased by 0.03–0.04, and the decline of *SP\_WGP* in the 2080s was up to 0.09–0.11 (Figure 3c).

The increase in BCC projected *SS\_VGP* under both scenarios during the 2040s and 2080s compared to the baseline period was from 0.06 to 0.09 (Figure 3d). The increase in BCC projected *SS\_RGP* was relatively small, less than 0.05 (Figure 3e). The overall rise of *SS\_WGP* for the future periods under SSP245 and SSP585 was 0.04–0.07 (Figure 3f). The change of *SS* in the future across the NCP represented a slight increasing trend.

The decrease in *ST\_VGP*, *ST\_RGP* and *ST\_WGP* during the SSP245\_2040s, SSP245\_2080s and SSP585\_2040s ranged from 0.1 to 0.3, while the decline of *ST\_VGP*, *ST\_RGP* and *ST\_WGP* in the SSP585\_2080s was between 0.3 and 0.5 (Figure 3g–i). The downtrend of the *ST* during the 2040s and 2080s across the NCP was significant, especially under SSP585\_2080s.

The *SZ* was affected by sunshine, temperature, and precipitation together. The variation trend of *SZ* for the future across the NCP was consistent with the *ST*, while the magnitude of decrease for the *SZ* was smaller than the *ST* on account of the *SS* and the *SP*. The decline of *SZ\_VGP*, *SZ\_RGP,* and *SZ\_WGP* during the SSP245\_2040s, SSP245\_2080s, and SSP585\_2040s was less than 0.1, and the *SZ\_VGP*, *SZ\_RGP,* and *SZ\_WGP* in the SSP585\_2080s decreased by 0.1–0.2 (Figure 3j–l).

#### *3.3. Spatial Distribution of Climate Suitability in the Baseline Period (1981–2010) and Future Periods (2040s and 2080s)*

The spatial distribution of the *SS*, *ST*, *SP,* and *SZ* during the baseline period (1981–2010) in the NCP is shown in Figure 4. *SP\_VGP* in the northwest of the NCP was higher than 0.6, while *SP\_VGP* in the southeast of the NCP was lower than 0.6 (Figure 4a). There were most areas of the NCP with *SP\_RGP* higher than 0.6 (Figure 4b). The areas with *SP\_WGP* lower than 0.7 were distributed in the southeast of the NCP, and *SP\_WGP* in the northwest of the NCP exceeded 0.7 (Figure 4c). The spatial difference of *SS* across the NCP was relatively small. The values of *SS\_VGP*, *SS\_RGP,* and *SS\_WGP* during 1981–2010 in most areas of the NCP ranged from 0.4 to 0.6 (Figure 4d–f). The spatial distribution of *ST* across the NCP was low in the south and high in the north. The ranges of *ST\_VGP*, *ST\_RGP,* and *ST\_WGP* in Hebei and Shandong during 1981–2010 were 0.8–0.9, 0.6–0.7, and 0.7–0.8, respectively, while the values of *ST\_VGP*, *ST\_RGP,* and *ST\_WGP* in the south of the NCP were less than the values in the north (Figure 4g–i). There was a spatial characteristic with high in the north and low in the south for the *SZ*. *SZ\_VGP* and *SZ\_WGP* in most areas were higher than 0.6, and the values of *SZ\_RGP* ranged from 0.5 to 0.7 (Figure 4j–l).

**Figure 4.** The spatial distribution of maize climate suitability for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) during the baseline period (1981–2010) in the North China Plain.

The spatial change characteristics of climate suitability in the NCP under the SSP245 scenario during the 2040s and 2080s are shown in Figures S1 and S2. Relative to the baseline period, *SP\_VGP* increased by 0–0.1 in the central region of the NCP during the 2040s and 2080s (Figures S1a and S2a). The areas with *SP\_RGP* and *SP\_WGP* decreasing by 0–0.1 were distributed in most areas of the NCP for two future periods (Figures S1b–c and S2b–c). The *SS* increased by 0–0.1 at most parts of the NCP except *SS\_RGP* in the 2040s (Figures S1d–f and S2d–f). In contrast to the *SS*, the *ST* mainly presented a spatial characteristic of decline with a magnitude over 0.1 across the NCP in the 2040s and 2080s, particularly the decreasing amplitude of *ST*\_RGP during 2080s exceeding 0.2 (Figures S1g–i and S2g–i). Similar to the *ST*, there was a spatial characteristic of decreasing for the *SZ* in most parts of the NCP. However, the extent of decline for the *SZ* was significantly smaller than that of the *ST* (Figures S1j–l and S2j–l).

The spatial change characteristics of climate suitability in the NCP under the SSP585 scenario in the 2040s and 2080s are presented in Figures 5 and 6. Compared to the baseline period, the spatial change characteristic of *SP* under the SSP585 scenario coincided with that under the SSP245 scenario. In addition, the decline for *SP\_WGP* in the north of the NCP during the 2080s surpassed 0.1 (Figures 5a–c and 6a–c). The *SS* increased by 0–0.1 at most parts of the NCP, while the increase in *SS* in the south of the NCP exceeded 0.1 (Figures 5d–f and 6d–f). The decreasing trend of *ST* under the SSP585 scenario was more significant than that under the SSP245 scenario. The *ST* decreased by 0.1–0.3 during the 2040s across the NCP (Figure 5g–i). Furthermore, the magnitude of decline was over 0.3 during the 2080s (Figure 6g–i). The change characteristic of *SZ* under SSP585 in the 2040s was consistent with that of *SZ* under SSP245 in the 2040s and 2080s, with a decrease of 0–0.1 (Figure 5j–l). The range of decline for the *SZ* under SSP585 in the 2080s was up to 0.1–0.2 (Figure 6j–l).

**Figure 5.** The spatial distribution of the change of maize climate suitability for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) under SSP585 during the 2040s compared to the baseline period (1981–2010) in the North China Plain.

**Figure 6.** The spatial distribution of change of maize climate suitability for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) under SSP585 during the 2080s compared to the baseline period (1981–2010) in the North China Plain.

#### *3.4. Regional Division of Climate Suitability for Maize in the Baseline Period (1981–2010) and Future Periods (2040s and 2080s)*

According to the statistical principle and referring to the expression of possibility in the fourth assessment report of the Intergovernmental Panel on Climate Change (IPCC), the climate suitability of maize planting area was set as four grades: unsuitable area (CS < 0.05, unsuitable area for maize growing), less suitable area (0.05 ≤ CS < 0.33, less suitable area for maize growing), suitable area (0.33 ≤ CS < 0.66, suitable area for maize growing) and optimum area (CS ≥ 0.66, optimum area for maize growing) [16,17].

The regional division of climate suitability for maize during the baseline period (1981–2010) in the NCP is shown in Figure 7a. As an important maize production base in China, the climate in the NCP was generally conducive to the growth of maize. The optimum area and the suitable area accounted for about 50%, respectively. The optimum area is distributed in the north of the NCP, and the suitable area is mainly distributed in the south (Figure 7a). This is similar to the research results of He and Zhou [17]. The regional division of climate suitability for maize under four future scenarios in the NCP is shown in Figure 7b–e. With future climate warming, the overall climate suitability in the NCP shows

a downward trend. The optimum area will decrease greatly under four future scenarios in the NCP, while the suitable area will increase significantly (Figure 7b–e).

**Figure 7.** Regional division of climate suitability for maize during the baseline period (1981–2010) (**a**) and four future scenarios (**b**–**e**) in the NCP.

*3.5. Effects of Sowing Date Adjustment on the Climate Suitability of Maize in the NCP*

Adjustment of SD has a significant impact on the phenology of maize, while the FD and MD of maize changed with the adjustment of SD (Figure S3). Moreover, the climate suitability was closely related to the growth periods. So there was a significant difference for the climate suitability based on various SDs (Figure 8). The *SS*, *ST*, *SP,* and *SZ* at the VGP declined with the delay of SD under all future scenarios, while the climate suitability went up with the advance of SD (Figure 8a,d,g,j). Additionally, the climate suitability for the RGP increased due to the delay of SD except the *SP* (Figure 8b,e,h,k). Overall, the delay of SD can effectively increase the climate suitability at the WGP, especially in the S4 (Figure 8c,f,i,l). The adjustment of SD led to the change of maize growth period, while there were great differences in climatic resources at different growth periods of maize. The temporal changes of main climate factors (precipitation, sunshine duration, mean temperature, maximum temperature, minimum temperature) from May to October in the baseline period, S1, S2, S3, and S4, are shown in Figure 9. The climate factors varied greatly from month to month, which contributed to the change of climate suitability for different SDs.

**Figure 8.** The climate suitability of maize for the vegetative growth period (**a**,**d**,**g**,**j**), reproductive growth period (**b**,**e**,**h**,**k**) and whole growth period (**c**,**f**,**i**,**l**) under different sowing dates in the NCP during the 2040s and 2080s under SSP245 and SSP585 (S1, S2, S3, and S4 are the SSP245\_2040s, SSP245\_2080s, SSP585\_2040s, and SSP585\_2080s).

**Figure 9.** The temporal change of precipitation (**a**), sunshine duration (**b**), maximum temperature (**c**), minimum temperature(**d**) and, mean temperature (**e**) from May to October in the North China Plain during the baseline period, the 2040s and 2080s.

#### **4. Discussion**

Climate change has a significant impact on the growth process and yield formation of maize in China. In recent decades, climate warming not only brought forward flowering date and maturity date but also shortened RGP and WGP [46,59,60]. With future climate change, the phenological period of maize in the future will also change further [61]. In order to study the influence of future climate suitability on maize better, the phenology shift of maize should also be taken into account. The previous study analyzed the spatialtemporal characteristics of maize climate suitability in the future period across the NCP, while the phenology of maize under future scenarios was obtained by calculating the active accumulated temperature [15]. Nevertheless, the process of crop growth is complicated. There is a deficiency of mechanism to determine crop growth period by calculating effective accumulated temperature. Consequently, we used the APSIM model to simulate the flowering date and maturity date of maize. The flowering date and maturity date simulated by the APSIM model under future scenarios in our study present the trend of significant advance across the NCP (Figure S4).

Overall, the *ST* across the NCP during the baseline period was higher than 0.5, while the values of *ST\_VGP* and *ST\_WGP* in the north of NCP exceeded 0.7 (Figure 4g–i). However, the decrease in *ST\_VGP*, *ST\_RGP,* and *ST\_WGP* in the future was remarkable, especially the decline of *ST\_VGP*, *ST\_RGP,* and *ST\_WGP* under SSP585\_2080s was between 0.3 and 0.5 (Figure 3g–i). This may be largely associated with rising temperature. As shown in Figure 9c–e, the mean temperature, maximum temperature, and minimum temperature across the NCP will increase markedly under four future scenarios compared to the baseline period. The warm temperatures can improve the growth of crops before the threshold is reached, while yields will abruptly diminish subsequently [62,63]. Furthermore, the sensitivity to extreme temperature changes at different growth stages of crops [64] is particularly significant during the reproductive growth period [65]. The heat stress during the reproductive growth period can affect pollination, reduce male fertility and seed quality, ultimately lead to the loss of kernel weight and yield [65–67]. Our results indicated that *ST\_RGP* was lower than *ST\_VGP* during 1981–2010 across the NCP (Figure 2g–h), while the decrease in *ST\_RGP* was significantly higher than *ST\_VGP* under future scenarios (Figure 3g–h). The risk of high temperature for summer maize during future periods in the NCP will become an important field of climate change-related research [65].

In order to ensure crop yield, some adaptive agricultural management measures should be taken to counteract the adverse effects of climate change, including variety renovation, adjustment of sowing date, improvement of fertilization and irrigation conditions, and so on [68,69]. For example, the renovation of maize varieties delayed the heading date and maturity date and prolonged the whole growth period at more than 90% of stations in China, while appropriate late sowing can also prolong the whole growth period [5]. These adaptive agricultural management measures offset the impacts of climate change on maize to some extent and further ensured maize yield [5]. In this study, we investigated the climate suitability of maize for different sowing dates in the NCP during the 2040s and 2080s under SSP245 and SSP585 (Figure 8). Compared to the observed SD during the historical period, the *SS*, *ST*, *SP,* and *SZ* for the VGP declined with the delay of SD in the future, while these for the RGP increased due to the delay of SD except the precipitation suitability (Figure 8a,b,d,e,g,h,j,k). On the whole, the delay of SD can effectively increase the climate suitability during the whole growth period of maize (Figure 8c,f,i,l). Due to the adjustment of SD, the overall growth process of maize will change. The FD and MD will postpone delaying the SD of maize (Figure S3). The grain filling stage of maize is sensitive to high temperatures. The delay of SD can postpone the grain filling stage to a relatively cool period, which can reduce the heat damage stress on maize to a certain extent [37,70]. Our study indicated the temporal changes of main climate factors from May to October during the present period, the 2040s and 2080s, in the NCP. In the future period, the increase in temperature and sunshine hours in September and October will better meet

the growing demand for sunshine and temperature resources in the late growth period of maize and avoid the invasion of high temperature.

Based on the global climate model (GCM), we can investigate the climate suitability under future scenarios and further deal with the risks of ecological environment protection and social development from climate change. Compared with CMIP5, CMIP6 models have a certain degree of improvement and development in terms of resolution, physical parameterization, experimental design, and simulation computing capability. The simulation of the climate models from CMIP6 was closer to the observed value, while the uncertainty of simulation was smaller and the accuracy of the simulation was higher [26–28]. Taking the study of extreme climate indices as an example, the climate models from CMIP6 had stronger effects on the extreme temperature indices and extreme precipitation indices than the climate models from CMIP5, which can well reproduce the changing trend of the extreme climate indices [71,72]. In our study, the BCC-CSM2-MR model from CMIP6 processed by the statistically downscaled method developed by Liu and Zuo [41] was used to estimate the spatio-temporal variation characteristics of future climate suitability under different climate scenarios. The statistically downscaled method could reproduce the climate statistics at multiple time scales for historical periods and correct the stationary errors effectively [44], while the statistically downscaled data using the method has been applied in previous research [38,73,74].

The study of climate suitability can be used to cope with the impact of climate change on crop production. We can not only make specific divisions of regions based on the climate suitability model [17–20] but also optimize the selection of crop variety [21–23]. We set the regional division of climate suitability for maize in the baseline and future periods according to the statistical principle and referring to the expression of possibility in the IPCC4 (Figure 7a). The regional division of climate suitability during 1981–2010 in the study was close to the results of relevant research [17]. With climate warming, ST and SZ will decrease significantly under four future scenarios, which causes the change in the regional division of climate suitability (Figure 7b–e). Under four future scenarios, the most area of the optimum area in the NCP will change into the suitable area (Figure 7b–e). In the future, the adoption of specific adaptations or mitigation measures against the risk of heatwaves needs to be taken seriously [70]. Moreover, there was a certain correlation between climate suitability and climatic yield [14,15]. The climate suitability combined with climate information and phenology information can be used as an improved climate index in the yield prediction model to improve the effect of prediction.

#### **5. Conclusions**

The study of crop climate suitability can enhance the ability to cope with the impact of climate change on crop production. The study developed a climate suitability model of maize and investigated the climate suitability of maize under the historical period and two future periods in the NCP based on the BCC-CSM2-MR model from CMIP6. APSIM model was used to simulate the phenology data of maize under future climate scenarios to improve accuracy and reliability. The results showed that the BCC projections for climate suitability were in suitable consistency with the observations. In 1981–2010, the *SP*, *ST*, and *SZ* were high in the north and low in the south. The *SP*, *ST,* and *SZ* decreased under all the future climate scenarios in most areas of NCP while the *Ss* presented an increasing trend. Therefore, the optimum area decreases greatly under four future scenarios in the NCP while the suitable area increases significantly. Moreover, the delay of SD can effectively increase the climate suitability during the whole growth period, especially under the SSP585 scenario in 2071–2100. Thus the adjustment of SD had essential impacts on the climate suitability, which was advantageous to adapt to climate change and promoted agricultural production in the NCP.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agriculture12030348/s1, Figure S1: The spatial distribution of the

change of maize climate suitability for the vegetative growth pe-riod (a,d,g,j), reproductive growth period (b,e,h,k) and whole growth period (c,f,i,l) under SSP245 during 2040s compared to the baseline period (1980–2010) in the NCP, Figure S2: The spatial distribution of the change of maize climate suitability for the vegetative growth pe-riod (a,d,g,j), reproductive growth period (b,e,h,k) and whole growth period (c,f,i,l) under SSP245 during 2080s compared to the baseline period (1980–2010) in the NCP, Figure S3: The simulated flowering date (a) and maturity date (b) of maize for different sowing dates in the NCP during 2040s and 2080s under SSP245 and SSP585, Figure S4: The simulated changes of flowering date and maturity date across the North China Plain during 2040s and 2080s under SSP245 and SSP585 compared to the baseline (1981–2010).

**Author Contributions:** Conceptualization, D.X.; methodology and data analysis, Y.Z.; writing of original draft preparation, Y.Z.; providing the future climate scenario data, D.L.; writing of review and editing, D.X., H.B., J.T. and D.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** National Natural Science Foundation of China, grant/award number: 41901128.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The climatological data that support the findings of this study during manuscript preparation were available from the corresponding author on reasonable request. The detailed access to the data: The historical records about daily climate data from China's Meteorological Administration (CMA) (http://data.cma.cn/, accessed on 1 February 2020); Future climate scenario data were provided by the World Climate Research Program (WCRP) of Coupled Model Inter-comparison Project phase 6 (CMIP6, https://esgf-node.llnl.gov/search/cmip6/ (accessed on 20 February 2020)).

**Acknowledgments:** We are grateful to the NSW Department of Primary Industries, Australia, for providing the statistically downscaled climate data for the BCC-CSM2-MR model (BCC) from Coupled Model Comparison Program (CMIP6).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Projections of Drought Characteristics Based on the CNRM-CM6 Model over Africa**

**Isaac Kwesi Nooni 1,2,3, Daniel Fiifi Tawia Hagan 2,\*, Waheed Ullah 2, Jiao Lu 2, Shijie Li 2, Nana Agyemang Prempeh 4, Gnim Tchalim Gnitou <sup>1</sup> and Kenny Thiam Choy Lim Kam Sian 1,3**


**Abstract:** In a warming climate, drought events are projected to increase in many regions across the world, which would have detrimental impacts on water resources for agriculture activity and human life. Thus, projecting drought changes, especially the frequency of future drought events, is very important for the African continent. This study investigates the future changes in drought events based on the France Centre National de Recherches Météorologiques (CNRM-CM6) model in the Coupled Model Intercomparison Project phase six (CMIP6) datasets for four shared socioeconomic pathways (SSP): SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5; and three time slices: near future (2020–2039), mid-century (2050–2069), and end-of-century (2080–2099), relative to a historical baseline period (1995–2014). The interannual variability and trends of the self-calibrating Palmer Drought Severity Index (scPDSI) based on the Penman–Monteith methods for measuring potential evapotranspiration (PET) are used to estimate future droughts. The temporal analysis shows that the drought frequency, intensity, and affected area will increase throughout the 21st century. Among the scenarios, SSP3-7.0 and SSP5-8.5 project a larger upward trend in drought characteristics than SSP1-2.6 and SSP2-4.5. The spatial pattern shows drought frequency decreases in humid regions and increases in non-humid regions across Africa. For all SSP scenarios, the projected wetting trend per decade ranges from 0.05 to 0.25, while the drying trend per decade ranges from −0.05 to 0.25. A regional trend analysis revealed key differences in spatial pattern, with varied trend projections of wetter and drier conditions in humid and non-humid regions under all SSP scenarios. Drier conditions are expected to intensify in Southern Africa under all SSP scenarios but are projected to be more intense under either SSP3-7.0 and SSP5-8.5. In general, the projected wetter trends in humid areas may favor agricultural production and ecological conservation, and drier trends in non-humid regions may call for the possible adoption of tailor-made drought adaptation strategies and development programmes to minimize impacts.

**Keywords:** CNRM-CM6; PET; climate change; IPCC-AR6; SSP scenarios

#### **1. Introduction**

Under a warming climate, the frequency of droughts is expected to increase in many regions due to the increase in projected temperature (TEMP) [1]. Drought is a complex natural process with adverse effects that ripple through multiple sectors of society, especially water resources for agricultural activities and human livelihood [2]. Droughts may be classified as meteorological, hydrological, agricultural, or socio-economic, based on their physical characteristics (see [2,3] for more details).

**Citation:** Nooni, I.K.; Hagan, D.F.T.; Ullah, W.; Lu, J.; Li, S.; Prempeh, N.A.; Gnitou, G.T.; Lim Kam Sian, K.T.C. Projections of Drought Characteristics Based on the CNRM-CM6 Model over Africa. *Agriculture* **2022**, *12*, 495. https:// doi.org/10.3390/agriculture12040495

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 25 February 2022 Accepted: 30 March 2022 Published: 31 March 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

An increase in TEMP is expected to significantly affect hydrological processes [1,4] and may considerably change regional climates, leading to more frequent extreme events (e.g., droughts, heat stress) [1]. Skoulikaris et al. [4], among others, investigated the heat stress on agriculture due to climate change. Drought indices are commonly used to quantify drought events at any spatial–temporal scale. According to the World Meteorological Organisation (WMO), there are over 50 indices designed to compute droughts [2,3]. However, the most widely used and recommended drought indices are the Standardized Precipitation Index (SPI) [5], the Standardized Precipitation Evapotranspiration Index (SPEI) [6], the Palmer Drought Severity Index (PDSI) [7], and the self-calibrating PDSI (scPDSI) [8]. In a historical context, a wide range of studies based on independent observation data sets has examined drought indices and characterized drought parameters, such as the frequency, intensity, and spatial extent over the last three decades [2]. Among the indices mentioned above, the SPEI and the PDSI/scPDSI are the most widely used because their design incorporates two key components of the water cycle—precipitation (i.e., water supply) and evapotranspiration (i.e., water demand)—to represent a drought condition [2,3].

When comparing SPEI and scPDSI, the latter is preferred due to its better physical representation of drought condition in tropical regions, particularly in non-humid regions [9–12]. The advantage of using scPDSI, for example in a tropical region such as Africa, is its ability to represent drought conditions in non-humid regions (e.g., the Saharan or Kalahari deserts) and humid regions (e.g., the equatorial regions of Africa). In scPDSI, the change in the water balance is based on the difference between precipitation (PRE) and potential evapotranspiration (PET) together with parameters related to the soil/surface characteristics at each geographic region [8]. For example, the response to actual ET in water-limited regions (such as arid or semi-arid climates) is related to PRE changes rather than PET. On the contrary, in an energy-limited region (such as the equatorial region of Africa), PET, rather than PRE, is a driver to actual ET changes [13,14]. Thus, the scPDSI has been suggested by many studies [8,15,16], despite the index not being multi-scalar [15]. However, characterizing drought events at an interannual scale makes scPDSI comparable to SPEI at similar timescales [13], thus improving our understanding of drought events over the past century [16,17] and those documented in [8,16,18,19]. Recently, the scPDSI based on the Penman–Monteith methods for calculating PET, provided reasonable estimates of drought characteristics over the tropical climates of Africa compared to the Thornwaite method [20], a pattern which is well documented in historical drought study literature [15,17,21].

Good knowledge of the evolution of drought characteristics in the near and distant future can assist in early and efficient preparation for a drought event. Outputs from global climate models (GCMs) used in the Coupled Model Intercomparison Project (CMIP) framework allow us to understand the evolution of the climate under different emission scenarios [1]. Many studies have delved into drought characteristics using older CMIP versions. The effect of climate warming on drought intensifications (aridity) is also well studied and reported in [1,18,19,22]. The release of the new CMIP datasets [21,23] with improved quality and resolution makes further drought studies [24] of great interest, as updates in CMIP6 large-scale physics and dynamics are expected to introduce differences in how they perform in different climate regions. A typical example is presented by Voldoire et al. [25], where updates of several schemes, such as those in the France Centre National de Recherches Météorologiques (CNRM-CM) model, improved the simulation outputs of tropical climates, which is of great interest to Africa's climate studies.

Also, an understanding of interannual variability and long-term changes in future droughts is further motivated by recent studies [26,27], which demonstrate potential shifts in climate zones under a future global warming scenario. According to these studies, different climate regions are likely to be influenced by a warming climate at the end of the century. This could suggest that in a future climate, an altered energy/water-limited regional response to actual ET will be related to PRE (PET) changes rather than PET (PRE) [14,28]. The new demarcation of the African sub-regions based on climate zones

largely puts this study into proper context [29], thus providing meaningful information needed to achieve effective regional drought mitigation strategies under climate warming. However, knowledge of future droughts interannual variability and long-term changes based on the CMIP6 data is limited. A review of previous studies showed that relatively significant studies have used the CMIP3 or the CMIP5 to study drought in different African regions [30–34].

In a multiplicity of global datasets coupled with improved data representation, many studies have adopted the multi-mean ensemble (MME) technique to study drought events. On the other hand, single model studies have gained significant attention and have advanced in recent times [25,35–40]. Most single model studies compared the two different versions of the same models [25,36–40]. Unlike those studies, the focus of the present study is to examine the climatology of drought events and their parameters based on the CNRM-CM6 GCM. This study follows a related study [35] that examined the future ET climatology for different SSP scenarios using the CNRM-CM6 model across climate regions and indicated that the ET variability may likely influence the distribution of extreme events, such as droughts, in both space and time, especially across Africa. This study investigates the temporal variability of future drought characteristics under four emissions scenarios. Moreover, the spatial pattern of drought event frequency and the wetting and drying trends from the CNRM-CM6 model using the scPDSI is examined based on the Penman-Monteith (PM) methods for measuring PET to identify droughts. The choice of scPDSI to represent future drought is documented in CMIP (phase 5 and 6 based drought analysis) [1,24,26].

The remainder of the paper is structured as follows: Section 2 describes the study area, and introduces the data and methods used in the study. Section 3 presents the results of the projected change in drought characteristics and projected trends in wetting and drying conditions. The discussions are presented in Section 4 and the conclusion in Section 5.

#### **2. Data and Methods**

#### *2.1. Study Area*

Africa is located between 32◦ N and 35◦ S and 14◦ W and 52◦ E (Figure 1). The entire African land area is nearly 30.37 million km2, and the equator divides the continent into two, with more states in the Northern than in the Southern Hemispheres. Africa is the second-largest continent after Asia, in land size and population growth. Its vulnerability to climate variabilities is highly noticeable when extreme events occur, as three-quarters of the continent's Gross Domestic Product is heavily dependent on rain-fed agriculture [41], which is tied to climate variability [41]. As climate change is expected to reshape the spatiotemporal pattern of climate zones in the future climate [26,27], we present a Köppen– Geiger map overlaid on the latest Intergovernmental Panel on Climate Change (IPCC) regional demarcations for Africa region [29] (Figure 1). The IPCC regional demarcations for Africa [29] is divided into seven regions: the Sahara (SAH), West Africa (WAF), Central Africa (CAF), Northern East Africa (NEAF), Southern East Africa (SEAF), Western South Africa (WSAF), and Eastern South Africa (ESAF) (Figure 1) and are used in [42,43].

**Figure 1.** Map of Köppen–Geiger climate classification for 2071–2100 [27] overlaid with the updated IPCC sub-regions for Africa climate studies [29]. Abbreviations: (I) SAH: Sahara, (II) WAF: West Africa, (III) CAF: Central Africa; (IV) NEAF: Northern East Africa, (V) SEAF: Southern East Africa; (VI) WSAF: Western South Africa, (VII) ESAF: Eastern South Africa.

#### *2.2. Data*

Following the recommendation from a previous study of the region [35], the present study uses the France Centre National de Recherches Météorologiques (CNRM-CM6) dataset produced by [25,44]. The dataset is jointly developed by the Centre National de Recherches Météorologiques—Groupe d'Étude de l'Atmosphère Météorologique (CNRM-GAME) and the Centre Européen de Recherche et de Formation Avancée en Calcul Scientifique (CERFACS). In CNRM, the atmosphere model is represented by ARPEGE-Climat (v6.3) [45], which incorporates the land surface scheme ISBA-CTRIP [46,47]. The lake areas uses the revised FLAKE model, which is incorporated in the SURFES v8.0 [48] externalized surface system as well as being fully-coupled with the NEMO version 3.6 ocean model [49]. The sea ice model GELATO scheme [50] through the OASIS-MCT coupling system [51,52] and the Total Runoff Integrating Pathways (TRIP) river routing scheme [53] were used. An output system called the XIOS server is added to the system to allow online output processing [54]. Readers are directed to Voldoire et al. [25] for more details of the CNRM-CM6 GCM. The CNRM-CM selection was influenced partly as a result of past studies on PRE [55,56], ET [35] and TEMP [55]. Further, an evaluation of CNRM-CM5 and CNRM-CM6 by Voldoire et al. [25] highlighted significant improved simulation of tropical climates.

The spatiotemporal resolution of CNRM-CM6 is 1.4◦ × 1.4◦ and extends from 1995–2014 for historical data and 2015–2100 for projections. The study considered projections for three time slices: near future (2020–2039), mid-century (2050–2069), and end-of-century (2080–2099). We use the first ensemble member (r1i1p1f1 (r1: realization index; i1: initialization index; p1: physics index; and f1: forcing index). The CNRM-CM6 datasets are publicly available at [57]. The projections are studied for four Shared Socio-economic Pathway (SSP): SSP1-2.6, SSP 2-4.5, SSP3-7.0, and SSP5-8.5, representing the low forcing (i.e., sustainability pathway), medium forcing (i.e., middle-of-the-road pathway) medium-to-high forcing pathway (i.e., a medium challenge to mitigation and adaptation pathway) and high-end forcing pathway (i.e., the worst possible scenario), respectively [58].

#### *2.3. Methods*

#### 2.3.1. Potential Evapotranspiration (PET) Computation Using the Penman–Monteith Model

PET is a key component of the scPDSI. We used the Penman–Monteith (PM) model to compute PET based on the Food and Agriculture (FAO) recommendation [41]. The choice of PM is based on a previous study in the region [20] and it has been documented in many studies across the globe [15,16,18,19]. The PET computation with the PM model uses relative humidity, wind, temperature, and short and long-wave radiation. See for more details about the PM approach [59].

#### 2.3.2. Self-Calibrated Palmer Drought Severity Index (scPDSI) Model

The scPDSI is widely used to quantify future droughts [60]. We computed the scPDSI with PRE, PET, and available water capacity (AWC) following Wells et al. [8] and Dai [59]. The drought is computed for the baseline (1995–2014) and future (2015–2100) periods. To compute the projected drought for each time window, the difference between the future time window and the baseline period is estimated and projected from 2015 to 2100 under the four SSP scenarios following [60]. Details of scPDSI formulations and calculation is found in [59] and others related studies [15,16,18,19,61].

#### 2.3.3. Drought Characteristics

The run theory is used [62] to extract drought events and describe their basic characteristics (i.e., drought frequency (DF), drought intensity (DI) and mean drought-affected area (DA)). For each month, grids with scPDSI values lower than −2 are considered as a drought event.

The mathematical expression below (Equation (1)) is used to calculate drought frequency (*DF*). *DF* is the ratio of the number of drought months to the total number of months.

$$D\_F = \frac{n}{N} \tag{1}$$

where *DF* denotes the frequency of droughts, *n* denotes the number of drought months, and the *N* represents the total number of months.

Drought area is the total area of grids affected by droughts. The drought area coverage (*DA*) is expressed as (Equation (2)):

$$D\_A = \frac{\sum\_{i=1}^{n} d\_a}{n\_a} \tag{2}$$

where *DA* represents the drought area coverage, *i* is a month, *n* is the nth month, *na* is the total number of pixels under drought condition, and *da* denotes the number of pixels with scPDSI < −2 for a specific intensity in month *i*.

Drought intensity is the average drought index of grids experiencing droughts. Drought intensity is computed by the averaging the intensity of all drought events on each grid during the reference period and the three future periods (Equation (3)):

$$D\_I = \left[\frac{1}{n} \sum\_{i=1}^{n} scPSDI\_i\right] \tag{3}$$

where *DI* represents the drought intensity and *n* denotes the total number of grids with drought conditions in months with *scPSDI* < −2.

#### 2.3.4. Mann–Kendall Test and Theil–Sen's Slope Test

The trends in drought are examined using the Mann–Kendall tau-b nonparametric technique [63,64]. The study also used the Theil–Sen formula, to estimate and characterize linear trends [65]. The mathematical formulation for both the Mann–Kendall tau-b nonparametric technique and the Theil–Sen formula is well known in the literature. The computation procedures are presented in many studies [61,66,67].

#### 2.3.5. Unit of Analysis

The unit of analysis used in this study is based on annual and decadal scales for future climate change analysis. A flowchart of the paper is illustrated in Figure S1. In addition, to better characterize the drought events over the African region, the projected drought changes are performed for six spatial domains defined by Iturbide et al. [29] and are adopted in the IPCC AR6 [1] and other studies [42,43]. The new demarcation provides a possible scientific basis for describing drought events under different climate zones and proposing tailormade adaptations and mitigation policies for different regions.

Based on lessons from past studies [16,18,68], this work defines drought episodes as periods with a monthly drought index (less than −2) under the thresholds shown in Table 1. Thus, we calculate the drought index for each grid and the four SSP scenarios of the CNRM-CM6 data. The drought properties are then spatially averaged for each 20-year period. The projected droughts are computed by subtracting the historical mean (1995–2014) from the entire projected time series: near future (2020–2039), mid-century (2050–2069), and end-of-century (2080–2099). All data processing is performed using the Climate Data Operation (CDO).


**Table 1.** Classifications of droughts based on scPDSI.

#### **3. Results**

#### *3.1. Projected Climatological Changes in Drought Characteristics*

Figure 2 illustrates the projected drought frequency for the four SSPs scenarios during 2015–2100 relative to the baseline period (1995–2014). In general, the frequency of future drought events shows an increasing trend for all SSP scenarios. Moreover, the magnitude of the trend increases with an increase in radiative forcing.

SSP5-8.5 (red) and SSP3-7.5 (orange) illustrate an increasing trend throughout the century, with the frequency of drought events estimated to range between 3–8 yr−1. The magnitude of the frequency of drought events is slightly lower (3–6 yr<sup>−</sup>1) in SSP1-2.6 (deep blue) and SSP2-4.5 (light blue). Overall, all the SSP scenarios show an increasing trend in drought frequency throughout the century.

Figure 3 illustrates the time series of projected drought intensity for the four SSP scenarios. The projected time series presents different drought intensities for the different SSPs. Overall, the projected intensity varies significantly across all SSP scenarios with values ranging from −2.5 to −4 for different periods. These changes in drought intensities are more distinguishable in the mid- (2050–2069) to end-of-century (2080–2099), as projected intensity changes from severe (i.e., −2.5–−3.5) to extreme (i.e., ≥3.5) droughts under SSP3- 7.0 (orange) and SSP5-8.5 (red) at the end of the century. In the SSP1-2.6 (deep blue) and SSP2-4.5 (light blue) scenarios, the drought intensities are moderate (i.e., ≤2.5) in the near future (2020–2039), changing to severe in the mid-century (2050–2069), and back to moderate drought intensity at the end of the century (2080–2099).

**Figure 2.** Annual drought frequency averaged over Africa during the baseline (1995–2014; black line) and future (2015–2100) periods. The colored lines represent results under the four SSP: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5.

**Figure 3.** Annual drought intensity averaged over Africa during the baseline (1995–2014, black line) and future (2015–2100) periods. The color lines represent the results under the four SSP: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5.

Figure 4 shows the time series of the projected drought-affected area for the different SSP scenarios during the 21st century. The figure shows mixed upward and downward trends during the different time periods. Quantitatively, the affected areas range from 900–1800 (104 km<sup>−</sup>2) in the near future (2020–2039) under all SSP scenarios, followed by an increase in coverage from 2000 (10<sup>4</sup> km−2) in the 2060s and a decrease to 1700 (104 km−2) at the end of mid-century (i.e., 2070). The end-of-century projects upwards trends for all SSP scenarios from 1600 to 2099 (10<sup>4</sup> km<sup>−</sup>2). The affected areas for SSP3-7.0 (orange color) and SSP5-8.5 (red) are slightly closer to each other in magnitudes, while SSP1-2.5 (deep blue) and SSP 2-4.5 (light blue) scenarios show a similar increase in range of magnitudes.

**Figure 4.** Annual drought spatial coverage averaged over Africa during the baseline (1995–2014; black line) and future (2015–2100) periods. The color lines represent the results under the four SSP: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5.

#### *3.2. Projected Changes in Drought Frequency*

Figure 5 illustrates the spatial distributions of climatological drought frequency across Africa for the different SSPs scenarios during 2020–2099. The drought frequency is computed by subtracting each grid of the baseline period from that of the future periods. Overall, the spatial pattern of drought frequency is projected to increase for different periods and SSP scenarios. The spatial trend of drought frequency across Africa complements the temporal trend by indicating the regions of possible increase or decrease in drought frequency.

The analysis of the projected drought frequency across the continent shows striking differences in spatial patterns. Regional differences are observed when considering the different sub-regions (Figure 5). In SAH, high drought frequency values (6 to >10) are observed for each scenario (Figure 5). The drought frequency is projected to increase with an increase in radiative forcing scenarios in this region over the period. Similar patterns are noted for WSAF and ESAF, with the drought frequency varying from 6 to >10. However, we observed higher frequency values over the arid regions of WSAF than over the semi-arid region of ESAF, except under SSP5-8.5. The drought frequency pattern of SAH and WSAF is related to the similarity in PRE pattern and the amount of the regions.

In contrast, the equatorial region shows different drought frequencies under each scenario. In WAF, the drought frequency is relatively lower and ranges from 2 to 4, with the frequency slightly reducing with increasing scenarios. In CAF, drought frequency varies from <2 to 6 and the value reduces from SSP1-2.6 to SSP5-8.5. We observe values of <2 in drought frequency in CAF, especially over the Congo Basin. In the EAF, we observe slightly mixed results. For example, NEAF presents slightly higher drought frequency values ranging from 2 to >8, while SEAF shows values ranging from 2 to 6. Drought frequency of 4–6 yr−<sup>1</sup> is observed over parts of Sudan and Ethiopia but reduces from 6 to 2 over the Somalia and Eritrea region for each scenario. In summary, drought frequency is projected to decrease in CAF and WAF and slightly in SEAF for each scenario while an increase is expected in SAH, WSAF, and ESAF.

**Figure 5.** Geographic distribution of projected annual mean drought frequency (yr−1) during 2020–2099 under the four SSP scenarios. (**a**) SSP1-2.6; (**b**) SSP2-4.5; (**c**) SSP3-7.0; and (**d**) SSP5-8.5. The anomaly is calculated as projection minus baseline period. The white background shows areas with no values.

We further investigate the spatial distributions of projected climatological changes in drought frequency in the three time slices (Figure 6). Overall, the spatial pattern shows nearly similar value ranges in drought frequencies across the continent. However, subregional analysis shows an interesting pattern with a distinguishable reduction in the number of drought occurrences in WAF and CAF. WAF shows progressively decreasing values in drought frequency throughout the century. The number of drought occurrences decreases at a much lower rate, with a sharper decrease noted in CAF. In general, in WAF and CAF, drought frequencies is lower under SSP3-7.0 and SSP5-8.5 than under SSP1-2.5 and SSP2-4.5.

On the contrary, an increase in the number of drought occurrences is noted in SAH, WSAF, and ESAF. On the other hand, the drought frequency in NEAF and SEAF is relatively similar for each scenario. The drought frequency is likely to increase in SAH, WSAF, and ESAF under SSP3-7.0 and SSP5-8.5 than under SSP1-2.5 and SSP2-4.5.

**Figure 6.** Geographic distribution of projected mean drought frequency (yr−1) for the near future (2020–2039), mid-century (2050–2069), and end-of-century (2080–2099) under (**a**–**c**) SSP1-2.6, (**d**–**f**) SSP2-4.5, (**g**–**i**) SSP3-7.0, and (**j**–**l**) SSP5-8.5 scenarios. The white background shows areas with no values.

#### *3.3. Projected Wetting and Drying Trends*

This section focuses on the projected scPDSI trends for the different scenarios during the 2020–2099. For this purpose, we consider an event as dry and wet when scPDSIPM is <−2 and scPDSIPM >+2, respectively. Figure 7 illustrates the spatial distribution of scPDSIPM linear trends based on the Mann–Kendall test. The results are tested at 5% significance level.

The spatial patterns clearly illustrate that those mixed drying and wetting signals are likely to dominate many parts of Africa throughout the 21st century. The spatially complex trends show that drying conditions are likely to increase from 0.05 to 0.25 decade−<sup>1</sup> under SSP1-2.5 and SSP2-4.5 across Africa, with packets of wetting conditions scattered in the continent (Figure 7a,b).

In contrast, the spatial patterns in Figure 7c,d, clearly show that wetting conditions are likely to increase throughout the 21st century, with distinguished variations observed in the equatorial region (i.e., WAF and CAF), with trends of SSP3-7.0 and SSP5-8.5 scenarios increasing from 0.05 to 0.25 decade−1. Moreover, a clear distinction is observed in SAH, WSAF, and ESAF, showing that future drying conditions are likely to increase from −0.05 to −0.25 decade−<sup>1</sup> for SSP3-7.0 and SSP5-8 scenarios. It is worth observing that NEAF and SEAF present mixed results of wetting and drying in different scenarios.

The spatial distribution in SEAF shows a slightly larger increase in wetting than in NEAF under all scenarios. In SEAF, a striking pattern is the wetting trends before 30–40◦ E and the drying trend along 40–45◦ E in the SSP3-7.0 and 5–8.5 scenarios. However, in Figure 7a,b, a distinct spatial pattern between the two scenarios in NEAF and SEAF shows no trends in SSP1-2.5 and wetting trends in SSP 2-4.5 along 40–45◦ E. In WSAF and ESAF, a drying trend gradually increases from SSP1-2.5 to the SSP5-8.5 scenario. Our analysis shows more pronounced drying conditions in SSP3-7.0 and SSP5-8 scenarios over WSAF and ESAF.

**Figure 7.** Pixel-wise linear trends for scPDSIPM <(−2.0, during 2020–2099. (**a**) SSP1-2.6, (**b**) SSP2-4.5, (**c**) SSP3-7.0, and (**d**) SSP5-8.5 scenarios. The values are expressed in changes per decade (the dots denote passing a 5% significance test). The white background shows areas with no values.

Figure 8 illustrates the spatial distribution of linear trends for scPDSIPM for all scenarios during the near future ((2020–2039), mid-century (2050–2069), and end-of-century (2080–2099), based on the Mann–Kendall test at 5% significance level. In general, a distinguishable spatial pattern of wetting and drying trends is shown for each SSP scenario and time slice. The sub-regional analysis shows different spatial patterns of wet and dry

conditions with differences in trend values for the three time slices under all SSP scenarios. SAH show mixed results of wetting and drying trends in all SSPs. In WAF and CAF, we observed a more distinct drying trend under SSP2-4.5 (Figure 8d) than SSP1-2.6 (Figure 8a) in the near future (2020–2039), which reverts to a wetting trend in the mid-century (2050–2069) with magnitudes >0.25 decade−<sup>1</sup> in Figure 8b,d, and mixed trends at the end-of-century (2080–2099) (Figure 8c,f). On the contrary, NEAF and SEAF are relatively similar, but they have mixed wetting and drying trends in three SSPs except for SSP5-8.5. A strong wetting trend under SSP5-8.5 is observed in the near future and end-of-century in NEAF and SEAF but with mixed trends in mid-century (2020–2039). On the other hand, WSAF and ESAF show a mixed trend in all SSPs but more pronounced drying conditions in the SSP3-7.0 and SSP5-8 scenarios at ESAF than WSAF at the end of century.

**Figure 8.** Pixel-wise linear trends for scPDSIPM < −2.0, for the near future (2020–2039), mid-century (2050–2069), and end-of-century (2080–2099) under (**a**–**c**) SSP1-2.6, (**d**–**f**) SSP2-4.5, (**g**–**i**) SSP3-7.0, and (**j**–**l**) SSP5-8.5 scenarios, respectively. The values are expressed in changes per decade (dots denote passing a 5% significance test). The white background shows areas with no values.

#### **4. Discussion**

This study uses the CNRM-CM6 data to investigate future drought characteristics and trends during the 21st century (near future, mid-century, and end-of-century) for four SSP scenarios (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). Using these scenarios provides the scientific community with an opportunity to investigate changes in droughts across the African continent against the backdrop of the continent's vulnerability to weather and climate variability, which is coupled with population growth.

The time series of change in drought intensity between the future and the baseline periods are computed and presented in Figure 3. An increase in drought event intensity is experienced under all SSP scenarios and different time periods. The extent of the drought-affected areas significantly increases over time and the upward trend increases with increasing SSPs (Figure 4). This indicates that the frequency of future drought events will be higher and drought-event severity will intensify over time, thus increasing the spatial extent of the affected area. Overall, higher emissions (i.e., SSP370 and SSP585) scenarios will exhibit higher frequency of drought events, more severe drought over significantly larger areas than lower emissions scenarios (SSP1.2.6 and SSP2-4.5). This result is consistent with the IPCC AR6 [1].

Geographically, the drought-event frequency shows some mixed results under different climate conditions. In absolute terms, drought events will be more frequent in areas under arid and semi-arid climate conditions in Africa (Figure 5) for different future periods. These results are consistent with a previous study over the region, but it is based on SPI and SPEI [69]. Moreover, this pattern is comparable to RCP2-4.5 and RCP8.5 scenarios from CMIP5 [32]. The spatial pattern shows similar value ranges in drought frequencies across the continent (Figure 6). The WAF region shows progressively decreasing values in drought frequency across all three time slices (Figure 6a–i). The number of drought occurrences decreases at a much lower rate, with a sharper decrease noted in the CAF (Figure 6a–i). This result is consistent with previous studies in the WAF and CAF regions [70,71]. On the contrary, an increase in the number of drought in SAH, WSAF, and ESAF (Figure 6a–i) is in agreement with Shongwe et al. [72]. On the other hand, the number of drought frequencies in NEAF and SEAF is relatively similar for different time slices and under each SSP-RCP scenario (Figure 6a–i), in agreement with Makula, and Zhou [73] and Ayugi et al. [68]. A general spatial pattern shows a likely increase in drought frequencies over SAH, WSAF, and ESAF under SSP3-7.0 and SSP5-8.5 than under SSP1-2.5 and SSP2-4.5.

To investigate future trends in wetting and drying conditions, the Mann–Kendall test [64] and the Theil–Sen slope estimator were used [65]. The scPDSI present significant trends for all different scenarios. The trends are mostly negative (indicating likely increases in drying conditions) across Africa. Overall, lower emissions (SSP1-2.5 and 2-4, Figure 7a,b) present a larger area of drying trends than higher emission (SSP3-7.0 and SSP5-8.5, Figure 7c,d). The regional analysis of projected wetting and drying trends shows spatial pattern differences across the continent. A larger area of negative trends obtained in SAH, WSAF, and ESAF indicate that future drought events may be further intensified in arid regions. Similar drying conditions were observed in Bellprat et al. [74] in WSAF and ESAF, and in SAH [75]. This may be partly attributed to the spatial pattern of future PRE in this region due to water-limited conditions. Droughts respond more to PRE than PET, which is consistent with Munday and Washington [76], or model uncertainties [77]. Considering the SSP scenarios, lower emissions (i.e., SSP1-2.6 and SSP2-4.5) are expected to witness more drying conditions than higher emissions (SSP3-7.0 and SSP5-8.5). The WSAF and ESAF point to drying conditions, with trends significant under SSP2-4.5, SSP3-7.0, and SSP5-8.5. This result is consistent with a study by Iyakaremye et al. [78], who projected that SAH and WSAF would warm faster, relative to other parts of the continent with changes in PRE [13,79]. The magnitudes of negative trends under SSP3-7.0 and SSP5-8.5 signal severe aridification trends in the arid region of WSAF and semi-arid conditions of ESAF than in SAH. The striking difference in drying trends in SAH relative to WSAF and ESAF is that the projected increase in PRE in the Northern Hemisphere (NH) is slightly higher than

that in the Southern Hemisphere (SH), consistent with Almazroui et al. [55], Lim Kam Sian et al. [56], and Babaousmail et al. [80]. These results indicate that the arid conditions in the SH (i.e., WSAF and ESAF) exhibit more pronounced drying conditions than the NH (i.e., SAH). Similar results were noted in Lee & Wang [81]. The plausible reason for these results may be related to the interhemispheric difference in the warming rate documented in Kitoh et al. [82]. This signal has a potential implication for this region, as these impacts may affect regional socio-economic stability and ecological security for countries located in the SAF.

In WAF and CAF, SSP1-2.5 and SSP2-4.5 (Figure 7a,b) show opposite indicators to SSP3-7.0 and SSP5-8.5 (Figure 7c,d), as the magnitude of change cannot be ignored. Large changes are expected in the WAF and CAF regions, as large areas of significant positive trends under a high emissions rate and negative trends under lower emissions rate are observed. The possibility of an increase in negative trends under SSP1-2.5 and SSP2-4.5 (Figure 7a,b) and positive trends under SSP3-7.0 and SSP5-8.5 (Figure 7c,d) provides an interesting result, since, in humid conditions, drought responds to PET rather than PRE. A lower emissions rate likely will present a drying trend, and wetting trends in higher emissions scenarios in a humid environment such as WAF and CAF may be related to a weakening of the land–atmosphere coupling [11,83–85]. Similar results were reported in Dosio et al. [28] using both regional climate models (RCM) and GCMs over WAF but for PRE projections. A further study is recommended to examine this phenomenon of low emission scenarios (i.e., SSP1-2.6 and SSP2-4.5) that shows a stronger wetting trend with high magnitudes than high emission scenarios (i.e., SSP3-7.0 and SSP5-8.5).

The NEAF and SEAF regions present distinct spatial patterns between the four scenarios (Figure 7a–d). The SSP2-4.5 illustrates a positive trend in the NEAF and SEAF region, while the SSP1-2.5 exhibits no trend in large areas, with scattered packets of negative trends. However, the drought conditions under the SSP3-7.0 and SSP5-8.5 in the NEAF and SEAF generally present negative and positive trends. The negative trend is related to a decrease in PRE [79,80] and the increase in ET [35] or TEMP [55] in the NEAF and SEAF regions in the future. The drying conditions identified here agree with previous studies in this region, reflecting the complex patterns of PRE and ET trends [86,87]. The results noted that the negative trends are located in already vulnerable states, such as Kenya and Somalia, with arid conditions and poor adaptation mitigation. This pattern in NEAF and SEAF shows that the results are comparable in SSP2-4.5 and SSP5-8.5 scenarios from CMIP6 in Haile et al. [30]. Overall, the CNRM-CM6 model indicates that the possible future wetting and drying patterns are changing in different regions across Africa. We urge readers to interpret the results with caution, as the results are based on non-bias adjusted CNRM-CM6 data. Different studies have noted that non-bias adjusted data may over- or underestimate regions with significant variation in local features, such as topography and water bodies [88–90]. Future studies should consider the impact of bias adjustment on the historical and projected droughts events over Africa. This information may provide insight into the ongoing climate discussion and improve our understanding of drought events over the African region.

In summary, stakeholders have reiterated the need for an evidence-based approach to studing extreme events to inform policymaking at local scales. Many countries in Africa are highly dependent on rain-fed agriculture. The projected wetting and drying trend throughout the century for all SSP scenarios may likely impact the future agricultural production and ecological stability of humid, arid and semi-arid climates, as documented in past studies [30,91]. The regional analysis of projected drought climatology shows significant spatial differences. The spatial differences suggest that drought impact may vary with locations, and so will a region's capacity to respond to drought events. Future climate adaptation policies should be tailored to specific regional needs. The results form a wider network of previous related studies published in FAO [41] to inform national policymakers of the identified future drought-prone regions to develop adaptation policies across Africa.

#### **5. Conclusions**

This study describes the long-term changes in drought characteristics using the scPDSI based on the CNRM-CM6 model of the CMIP6 datasets. The following conclusions are drawn from the model projections across Africa for all four SSP scenarios.


Within the context of the Paris Agreement, the Agenda 20,230 of the United Nations Sustainable Development Goals (SDGs), and the Malabo 2025 declaration, the findings of this study are significant and provides a basis for stakeholders in the region to further explore the changing trends of projected drought episodes and its potential impact on various sectors of the society. Readers are urged to interpret the results with caution as the objective of this study is not to confirm the superiority of the CNRM-CM6 datasets over other CMIP6 datasets or the ensemble approach, but rather to demonstrate its potential use in a local context. A future study plans to explore the implication of future drought climatology on direct (e.g., water use efficiency and crop yields) and indirect costs in African countries whose economy is tied to climate variability.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/agriculture12040495/s1, Figure S1: Methodological Flowchart.

**Author Contributions:** Conceptualization, I.K.N. and D.F.T.H.; methodology, I.K.N. and D.F.T.H.; software, J.L.; validation, D.F.T.H. and W.U.; formal analysis, I.K.N., J.L. and D.F.T.H.; investigation, I.K.N. and D.F.T.H.; resources, S.L. and J.L.; data curation, S.L. and J.L.; writing—original draft preparation, I.K.N.; writing—review and editing, W.U., N.A.P., G.T.G. and K.T.C.L.K.S.; visualization, S.L. and J.L.; supervision, D.F.T.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The CNRM-CM6 datasets used here are publicly available at the France Centre National de Recherches Météorologiques website (https://esg1.umr-cnrm.fr/ (accessed on 20 May 2021)) or CMIP6: https://esgf-node.llnl.gov/search/cmip6 (accessed on 20 May 2021).

**Acknowledgments:** The authors are thankful to the World Climate Research Programme (WCRP)- Working Group on Coupled Modelling (WGCM) for making the CMIP6 data publicly available.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Improving Winter Wheat Yield Forecasting Based on Multi-Source Data and Machine Learning**

**Yuexia Sun 1,2, Shuai Zhang 1,2,\*, Fulu Tao 1,2,3, Rashad Aboelenein <sup>4</sup> and Alia Amer <sup>5</sup>**


**Abstract:** To meet the challenges of climate change, population growth, and an increasing food demand, an accurate, timely and dynamic yield estimation of regional and global crop yield is critical to food trade and policy-making. In this study, a machine learning method (Random Forest, RF) was used to estimate winter wheat yield in China from 2014 to 2018 by integrating satellite data, climate data, and geographic information. The results show that the yield estimation accuracy of RF is higher than that of the multiple linear regression method. The yield estimation accuracy can be significantly improved by using climate data and geographic information. According to the model results, the estimation accuracy of winter wheat yield increases dramatically and then flattens out over months; it approached the maximum in March, with R<sup>2</sup> and RMSE reaching 0.87 and 488.59 kg/ha, respectively; this model can achieve a better yield forecasting at a large scale two months in advance.

**Keywords:** solar induced chlorophyll fluorescence (SIF); winter wheat; yield forecast; random forest; enhanced vegetation index (EVI)

#### **1. Introduction**

As the world's largest producer and consumer of wheat [1], China faces great challenges of food security. Winter wheat production, one of the most important summer grain production of China [2], stagnated in 56% of China from 1961 to 2008 [3]. Therefore, a timely and an accurate winter wheat yield forecasting in China is of great importance for food trade and policymakers. Recently, there has been increasing research on winter wheat yield estimation where the yield prediction models are based on the physiological and the ecological processes of crops. These have been developed constantly, such as WOFOST [4], DSSAT [5], APSIM [6], STICS [7], and MONICA [8]. Such models mostly simulate daily crop development, growth, and yield formation as well as climate variables that are used as the main inputs to describe environmental conditions during the period of crop growth. However, the growth state of crops is not only affected by abiotic factors (growth environment) but also by biological factors (such as plant diseases) [9–11]. Therefore, using climate data alone may not be sufficient to estimate yield. Meanwhile, due to the high spatial heterogeneity of crop varieties, farmer management policies, and environments, there is significant uncertainty in the practical application of the model on a large scale [12,13].

Satellite remote sensing can continuously monitor crop growth across various spectral bands and provide useful additional information for crop yield estimations [14–16]. In the past decades, remote sensing monitoring technology has been successfully applied to crop yield estimations [17,18]. Such research was mostly about the empirical relationship

**Citation:** Sun, Y.; Zhang, S.; Tao, F.; Aboelenein, R.; Amer, A. Improving Winter Wheat Yield Forecasting Based on Multi-Source Data and Machine Learning. *Agriculture* **2022**, *12*, 571. https://doi.org/ 10.3390/agriculture12050571

Academic Editor: William A. Payne

Received: 21 March 2022 Accepted: 16 April 2022 Published: 19 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

between vegetation indexes (VIs), which is based on visible light and near infrared (e.g., NDVI data, are important in the models of crop yield estimation [19–22], radiation data EVI, GCVI) [23–26], and observed yield [27]. EVI, which is sensitive to a higher canopy leaf area index and less affected by the atmospheric aerosol, is most commonly used in crop yield estimation. However, VIs are based on greenness and not sensitive to the physiological changes of vegetation caused by meteorological factors such as temperature, vapor pressure, and absorbed radiation. In recent years, extensive studies have shown [28–31] that suninduced chlorophyll fluorescence (SIF) can directly reflect the respiration of crops, respond timely and accurately to environmental stress, and it is directly related to biomass [32–35]. Currently some studies use SIF directly to estimate crop yield and to obtain better results at the field scale than the vegetation index [29,36–38]. It has been shown that the yield estimation accuracy of crop models based on climate data and on satellite data is generally better than that of the models based on data alone [39–41]. Water-supply-related variables (such as precipitation), temperature-related variables (such as maximum temperature), and water-demand-related variables (such as potential evapotranspiration), among all climate are also important variables for crop growth simulation [42]. Recognizing that various satellite products have common overlapping and complementary information is conducive to yield estimation [43]. However, better combining satellite data with other environmental factors in crop yield estimation needs to be further studied [44]. It is also unclear how multi-source climate data (i.e., climate and satellite data) promote the formation of the final yield estimation models and how their contributions to the models vary with the growing season. Furthermore, more and more approaches based on machine learning (ML) or deep learning are applied to agricultural applications, such as crop type classification [45,46], disease prediction [47], crop growth monitoring [48], and yield estimation [49–52]. Frausto-Solis [53] estimated the yield of many kinds of crops based on data of daily minimum and maximum temperature, and precipitation by using the decision tree (DT), while Jeong [54] used climate data to estimate the yields of wheat and corn and other crops in the world and some regions based on RF. At present, the research on the estimation of winter wheat yield in China based on RF are mostly concentrated in the North China Plain, such as Anhui Province, Henan Province, and so on [54–56].

To avoid the randomness of an individual model, this paper used multiple linear regression (MLR), a most common method for forecasting, and random forest (RF), a typical machine learning method, to build a crop yield estimation model in which the crop growth environment, agricultural policies, and spatial heterogeneity of yield are considered. This model combines the spatial information data, the climate data, and the satellite data. This paper aims to understand the relationship between different input elements and crop yield and to compare the effects of different input data combinations and time series data of different growth periods on the model's performance. Besides, this paper intends to unravel and to quantify the contributions of climate and satellite data on growing seasons to the crop yield estimation. This study mainly used EVI and SIF to find out whether satellite data can improve climate-based yield estimation methods on a large scale and also to explore whether SIF still maintains the advantage of the sensitive capture of photosynthetic activity of crops on a large scale [57,58].

#### **2. Materials and Methods**

#### *2.1. Study Area*

As the target area in this study, the winter wheat planting areas in China (Figure 1) are mainly in the North China Plain and also include the winter wheat planting area in ten provinces—Inner Mongolia, Shandong, Hubei, Chongqing, Sichuan, Guizhou, Yunnan, Shaanxi, Gansu, and Qinghai—and two autonomous regions: Ningxia and Xinjiang. It is thus clear that the study area is widely distributed and the proportion of planting areas in other provinces are relatively small. Winter wheat in China is generally sown at the end of September or at the beginning of October, and it is harvested by the middle of June of the following year [59]. Generally, irrigation and fertilization are available in these areas. In

this study, January to June is defined as the major growth period of winter wheat on which the analysis and the modeling are focused in view of the effect of cold and frost injury on crops in January [60].

**Figure 1.** Winter wheat growing area in China.

#### *2.2. Data and Preprocessing*

#### 2.2.1. Dataset

The study uses data on crop yield, planting area, satellite data, climate data, and spatial information (Table 1). The multi-source data collected in the study have various temporal and spatial resolutions. Therefore, firstly the raster data were resampled to a spatial resolution of 1 km and the climate and satellite data are unified into monthly interval. Then, monthly climate and satellite are aggregated at prefecture-level by using the crop map generated.


Crop yield and planting area: the winter wheat yield of prefecture-level cities from 2014 to 2018 (unit: kg/ha) was collected from local agricultural statistical yearbooks. Based on the previous work, winter wheat planting distribution of China at 1 km resolution from 2000 to 2015 are identified [61] (https://doi.org/10.6084/m9.figshare.8313530 (accessed on 21 August 2019)). The main planting areas of crops remain almost unchanged in a short period, therefore in this study the data of 2014 is used to represent the winter wheat planting distribution in China from 2014 to 2018.

Satellite data: satellite data includes EVI (MOD13C2 V6) (https://search.earthdata. nasa.gov (accessed on 1 February 2000)) and SIF reanalysis datasets (GOSIF) of OCO-2 satellite (http://data.globalecology.unh.edu (accessed on 27 November 2019)). Compared with the normalized vegetation index (NDVI), EVI is closely related to biomass and crop yield [62], and it can better represent the leaf and the chlorophyll content of crop canopy; GOSIF is a reanalysis dataset based on SIF data derived from OCO-2, MODIS data, and climate data. Compared with SIF data with coarse resolution and calculated directly from OCO-2, GOSIF has a better spatio-temporal resolution (0.05◦, 8 days), continuous global coverage, and longer records.

Climate data: a total of 10 climate variables are collected from CRU\_TS 4.04 (Climatic Research Unit Timeseries 4.04) series and CMFD (The China Meteorological Forcing Dataset) series (Table 1). The CRU\_TS series is based on the record analysis of more than 4000 independent weather stations and it is gridded at a resolution of 0.5 × 0.5◦, including monthly precipitation, daily maximum and minimum temperatures, cloud cover, and other variables covering the terrestrial region of the earth from 1901 to 2020. The CMFD dataset, with a spatial resolution of 0.1 × 0.1◦, mainly includes precipitation per 3 h, surface radiation, wind speed, air specific humidity, and other variables covering 1979 to 2018.

Geographical basic data: the crops growth status and growth environment have spatial heterogeneity. Studies have indicated that crop yields in neighboring counties are usually similar in a certain year. The spatial autocorrelation can be explained by coding geographical coordinates (lat, lon) in feature space [62,63]. In this study, all data, including EVI, SIF, climate variables, and geographical coding, are covered by the raster data of winter wheat distribution and they were collected to prefecture-level cities with an average value.

#### 2.2.2. Data Preprocessing

Selecting input variables was indispensable before machine learning and linear regression, which can not only reduce the input dimension, i.e., integrating expert knowledge to select the most appropriate input but also quantify the correlation between different potential independent variables and dependent variables to help to explain the results of machine learning algorithms. Based on previous studies on the relation between climate and crop yield [19–22], ten climate variables were selected for the study. To facilitate variable selection and interpretation, the best variable combinations of yield estimation were chosen without wasting information. Firstly, the 10 climate variables were divided into four groups according to prior knowledge: (1) water-supply-related, including precipitation (pre), wet day frequency (wet), and air specific humidity (shum); (2) temperature-related, including near-surface average temperature (tmp), near-surface temperature minimum (tmn), and near-surface temperature maximum (tmx); (3) water-demand-related, including potential evapotranspiration (pet) and vapor pressure (vap); and (4) radiation-related, including surface downward shortwave radiation (srad) and surface downward longwave radiation (lrad). The correlation analysis was carried out based on the mean value of the variables of the growing season (January–June) to eliminate the influence of the seasonal cycle. This study selected appropriate dependent variable inputs from the climate variables based on the following criteria: 1) selecting the variables which have the maximum absolute correlation with the yield in each group; and 2) selecting the variables whose value of correlation with the previously selected climate variables in the same group is not greater than 0.5.

#### *2.3. Research Methods*

#### 2.3.1. Multiple Linear Regression

Multiple linear regression (MLR) is one of the most widely used methods of crop yield estimation, and it is easy to use. Based on the principle of Ordinary Least Square (OLS) and the stepwise regression method, the independent variables were selected with significant effects and they constructed the optimal regression model for winter wheat yield estimation by using the climate, satellite, and space information data. The yield estimation model is calculated by Equation (1)

$$Y = a\_1\mathbf{x}\_1 + a\_2\mathbf{x}\_2 + a\_3\mathbf{x}\_3 \cdots \mathbf{a}\_n\mathbf{x}\_n + \beta + \varepsilon \tag{1}$$

where *Y* represents the winter wheat yield of prefecture-level cities; *x*<sup>1</sup> ... *xn* represent different independent variable factors used to predict *Y*; *a*<sup>1</sup> ... *an* represent partial regression coefficient; *β* is a random variable and a constant term; and *ε* represents random error. The criterion for the stepwise regression model to pass the significance test is that the equation of linear relation model passes the F test and all the coefficients of the equation passes the *t* test.

#### 2.3.2. Random Forest

Random Forest (RF) is an integrated learning technology, which classifies or regresses by combining a group of CART decision trees. Due to the introduction of randomness, RF is not prone to over-fitting, and it has good learning stability [56]. In this paper, the scikit-learn, an ML library of Python, is used to develop the RF model, which includes three steps: (1) normalizing all the selected variables and yield and randomly dividing the whole data set into training data with 70% and test data with 30% [64,65]; (2) for the training data set only, optimizing the key parameters of each model based on the highest R<sup>2</sup> and the lowest RMSE by ten-fold cross-verification; and (3) conducting the "leave one year out" experiment from 2014 to 2018, and R<sup>2</sup> and RMSE are used to evaluate the performance and generalization of the model. Considering the climate data, satellite data, and spatial information, this study counts the yield data of 187 out of 385 prefecture-level cities in China from 2014 to 2018.

#### *2.4. Experiment Design*

Two groups of experiments were designed (Figure 2) to answer the research questions raised in this paper. The purpose of the first group of experiments was to explore the effect of different input combinations on crop yield estimation models and to compare the potential of SIF in crop yield estimation. There are 11 data input combinations for the experiment, namely: (1) only SIF; (2) only EVI; (3) only climate; (4) SIF combined with spatial information; (5) EVI combined with spatial information; (6) climate combined with spatial information; (7) SIF combined with climate; (8) EVI combined with climate; (9) SIF combined with climate and spatial information; (10) EVI combined with climate and spatial information; and (11) SIF combined with EVI, climate, and spatial information. To assess the practicality of these models, based on the most suitable selected input, we recursively performed hindcasting for each year from 2014 to 2018 to evaluate whether the models can be promoted in different years; for example, the data for 2014–2017 was collected as training data to predict winter wheat yield in 2018. Certainly, future data cannot be used to predict current data. However, more verification samples can be provided for these hypotheses to increase the understanding of the model's performance. The RMSE (root mean square error) and R<sup>2</sup> (determination coefficient) between the predicted yield and the actual yield of winter wheat were calculated to verify the accuracy of the model.

**Figure 2.** Experimental flow chart (Related parameter description: Location represents spatial information, including latitude and longitude; *a*<sup>1</sup> ... *an* represent partial regression coefficient; *β* is a random variable and a constant term; *ε* represents random error; Jan is short for January; Feb is short for February).

The second group of experiments explored the effect of time series data on yield estimation models and the contribution of climate data and satellite data to crop yield prediction at different growth stages. In this experiment, the location information was not added since by default the spatial information of crops remained unchanged in this study in the short term. In the experiments, the climate and the satellite data were added and compared the change of R<sup>2</sup> and RMSE based on two methods of modeling to evaluate the change of performance of winter wheat estimation models. During the growing season (January–June), the input data of all months were used to predict winter wheat yield. The experiments were based on the three input combinations (namely climate, satellite, as well as climate and satellite). According to the experiment results, the added value of climate or satellite data to the estimation model in any period can be determined, and through different methods and input combinations the time for the model to achieve the best performance of estimation can be tested.

#### **3. Results**

#### *3.1. Selection of Climate Variables Combination*

Figure 3 shows the correlation analysis results of 10 climate variables, which demonstrated that water-supply-related variables (shum, pre, and wet) are all positively correlated with yield, while tmx and tmp among the temperature-related variables are negatively correlated with yield. Among the water-demand-related variables, pet is negatively correlated with yield, while vap is positively correlated with yield. Among radiation-related variables, srad is negatively correlated with yield, and lrad is positively correlated with yield. To select appropriate variables from each group as the input of yield estimation, 5 out of 10 climate variables are selected according to the method in Section 2.4, namely wet, tmx, pet, srad, and vap.

**Figure 3.** Correlations among the 10 climate variables and correlations between each climate variable and yield.

#### *3.2. The Influence of Different Input Data Combinations on the Simulation of the Model*

The results of the first group of experiments given in (Figure 4) that two models have the following similar characteristics with different combinations of data inputs: in the single data, the yield estimation performance of climate data is better. It may be because the climate variables can better simulate the growing environment of crops. There is an obvious spatial pattern of winter wheat yield at the prefecture scale, which indicates that the addition of spatial information is helpful to improve the prediction accuracy of the model. A better simulated result of yield is obtained by combining satellite data, climate data, and spatial information (MLR: R2~0.68; RF: R2~0.95). What is noteworthy is that on a monthly scale, compared with the addition of EVI, that of SIF does not significantly improve the yield estimation accuracy. The estimation effect of RF by combining SIF with other environmental factors is even lower than that of EVI. The result indicates that at the seasonal and the prefecture scale, SIF cannot provide much additional information that is different from EVI in crop yield estimation, and it shows no advantages of yield estimation on the small scale in the field. This may be related to the low signal-to-noise ratio, coarse resolution, and complex extraction algorithm of SIF [38]. However, the resolution of the SIF dataset used in this study (0.05◦) has been improved compared with previous datasets (0.5~1◦) [38]. Yet, the performance of the model remains the same, indicating that the downscale SIF dataset based on statistical methods alone cannot significantly enhance the effect of seasonal-scale crop yield estimation, which is consistent with Lindsey [66].

**Figure 4.** The model performance (predicted R2, RMSE) of two methods using different combinations of inputs for the whole growing season, each number representing a different input combination, which can be expressed as 1: SIF, 2: EVI, 3: Climate, 4: Location, 13: SIF + Climate, 23: EVI + Climate, 14: SIF + Location, 24: EVI + Location, 34: Climate + Location, 134: SIF + Climate + Location, 234: EVI + Climate + Location, 1234: SIF + EVI + Climate + Location.

#### *3.3. Comparison of Yield Estimation Performance of the Model*

The results show that the yield estimation performance of RF is generally higher than that of MLR, which may be because the relationships between crop yield and variables are mostly nonlinear while nonlinear methods capture these relationships better than linear methods. Besides comparing the performance and generalization of the models, we conduct a "leave-one-year-out" experiment to verify the extrapolation potential of the models, which is establishing models based on all data and the crop yields in four out of five years and then separately verifying the estimated yield result of the year left. The result is shown in Table 2 in which each row represents the model performance of one year. The RMSE and R2 between the estimated and the actual yield of winter wheat are compared. The results show that R<sup>2</sup> and RMSE are fairly stable in each year of winter wheat yield estimation at the prefecture scale, except for 2015. For example, the spatial distribution of yield prediction of 2014 based on two models (Figure 5) shows that RF can well reflect the spatial difference of winter wheat yield, especially in the North China Plain, in addition to having a high potential of yield estimation. In 2014, the errors of RF are mainly in Henan Province, while MLR generally underestimates the crop yield in high-yield areas, and the errors are concentrated in Henan, Hebei, and Shandong provinces.


**Table 2.** The validation results of "leave one-year out" experiment.

**Figure 5.** *Cont.*

**Figure 5.** Spatial pattern of winter wheat yield forecast in 2014: (**a**) RF, (**b**) MLR, (**c**) actual yield.

#### *3.4. The Influence of Time Series Data on the Simulation Ability of the Model*

Since SIF has no advantage in large-scale yield estimation, EVI is used as an example in this experiment. The results (Figure 6) illustrate that the two models have the following similar characteristics: (1) for any particular inputs, the yield estimation accuracy of the model increases rapidly with the increase of acquired data and the growth rate slows down and gradually reaches saturation at a later stage of the growing season; (2) the combination can significantly improve the performance of the yield estimation model which can be significantly improved through combining climate data with satellite data, and climate data plays an essential role in the model. However, there are significant differences in the trajectory of model prediction performance produced by different inputs. With only satellite data as inputs, the model generally starts from very poor performance (R2~0.1–0.2), and then it improves relatively by much (R2~0.4–0.5), while with only climate data and combined data as inputs, the model starts with relatively good performance (R2~0.4–0.6) and has a small increase during the growing season. To understand more clearly the effect of multi-resource data on the model, we assume that climate and satellite data have independent and overlapping contributions to the yield estimation model. We quantify the contributions of data from different sources. For example, the independent influence of climate data on the model is the difference between the combination R<sup>2</sup> and satellite R2. The results indicate that (Figure 7) climate data always play a vital role in the performance of the model. With the advance of the growing season, the proportion of overlapping information becomes higher, and the contribution of climate data to the model decreases gradually. The results depict that satellite data gradually absorbs climate information as time goes by. In addition, models with only climate data and only satellite data can generally achieve a high simulation performance in May, while the performance of those with multi-source data can generally get close to the maximum in March (RF) and April (MLR). Therefore, the combination of multi-source data can achieve a high estimation accuracy of the crop yield one or two months in advance.

**Figure 6.** Cumulative input effect on the model performance. In each set of figures, the left figure represents the changing trend of R2 with the increase of inputs; the figure on the right shows the changing trend of RMSE.

**Figure 7.** Independent and overlapping contributions of satellite and climate data to the model: (**a**) MLR, (**b**) RF.

#### **4. Discussion**

The results show the effects of different input combinations on yield estimation accuracy, and different yield estimation models have similar results. During the whole crop growing season, climate data always provide important information for crop yield estimation, which is consistent with the previous conclusions [20–22]. The performance of two yield estimation models with satellite data has been significantly improved, which is consistent with the view of Guan et al., who indicate that satellite data can provide for crop

growth additional information different from climate data [43]. It is worth noting that the addition of SIF in the regional range does not significantly improve the yield estimation accuracy of the model in this case. Compared with EVI, SIF has no significant advantage in performance of yield estimation. It has not provided much additional information to the yield estimation model. It agrees with previous research results on whether SIF has an advantage in capturing crop growth state on the regional scale [38,66], which may be related to the coarse resolution and the complex extraction algorithm and thus more uncertainties of SIF. Random Forest can better predict crop yield. This is consistent with the literature [39], as both temperature and precipitation have a nonlinear response to yield [67,68]; and the nonlinear yield estimation model was more in line with the actual situation.

As time goes on in the growing season, the amount of the input satellite and the climate data increases, and the changes of yield estimation accuracy of different models show a similarity. In accordance with the previous conclusions [38], in the early stage of crop growth climate data play an important role in the model, the satellite gradually absorbs crop growth information, and the yield estimation accuracy of the model increases significantly, while the yield estimation accuracy reaches the maximum in the late stage of growth. It is worth noting that the time when the estimation accuracy reaches the maximum varies slightly between estimation models. While in the regression model the accuracy of yield estimation peaks only one month before harvest, RF achieves a high performance of the yield prediction two months in advance.

The winter wheat planting areas are widely distributed in China, with obvious spatial differences. The main planting areas of winter wheat are in the North China Plain, about 15,309.1 kha. Most of the previous studies have not considered the situation of Inner Mongolia, Ningxia, Xinjiang, and other regions, which accounts for approximately 27.4 percent of the total winter wheat planting area in China. This study establishes models of yield estimation at the national scale with consideration of the spatial heterogeneity of winter wheat yield by adding extra basic geographic data. The results show that adding spatial information data can improve the yield estimation accuracy of models [3] and it is helpful to establish a unified model on a large scale.

In this paper, RF and MLR are used to build yield estimation models of winter wheat in China, which avoids the randomness of single model analysis. However, due to the availability of data, the research is mainly on the prediction of crop yield at the prefecture scale. Furthermore, the spatial resolution of the satellite data used is relatively coarse, which leads to small training samples, and it limits the ability of machine learning methods [38,64,69]. The newly launched satellite provides a variety of data (such as EVI, SIF, and climate variables) and it has a higher spatiotemporal resolution (such as Landsat, Sentinel, and Fluorescence EXplorer) [70–72], which can provide the potential for future improvement. Besides, although the machine learning model performs well in the yield estimation in the prefecture, the process-based explanation is limited, which weakens the traceability and the interpretability of the model. How to better combine the process-based model with machine learning algorithm to realize more efficiently the extrapolation beyond the training conditions [54,69] and special migration can be investigated in the future. This is to improve the crop yield estimation of models in areas where there are not enough historical yield records, such as Africa [44]. At the same time, some key factors have not been considered in this study, such as biological factors other than those captured by satellite, namely soil characteristics [26,40], which will also help to explain more yield variability. In addition, due to the data limitation of the spatial distribution of winter wheat, the spatial distribution of winter wheat from 2014 to 2018 was represented by 2014 data in this paper, which may also lead to errors in the statistics of remote sensing data. For future research, it is suggested that the crop interannual spatial distribution information data should be generated from satellite data to reduce potential errors.

#### **5. Conclusions**

To avoid the randomness of a single model, this study conducts yield estimation of winter wheat in China at the prefecture scale based on a MLR model and a RF model combining climate data, satellite data, and spatial information. The effects of different input combinations and time-series data on the performance of the model have been discussed. The main conclusions are as follows:


This study demonstrated a new scalable, simple, and inexpensive framework in estimating winter wheat yields over a wide range of areas based on publicly available data, which is applicable to other crops and geographical environments.

**Author Contributions:** S.Z. led the project and developed the framework; F.T. and S.Z. conceptualized and designed this research strategy; Y.S. carried out the field work and was responsible for data processing and manuscript writing; S.Z., R.A. and A.A. revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was supported by the National Key Research and Development Program of China (Project No. 2016YFD0300201) and the National Science Foundation of China (Project No. 41801078).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the authors.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Maximum Entropy Niche-Based Modeling for Predicting the Potential Suitable Habitats of a Traditional Medicinal Plant (***Rheum nanum***) in Asia under Climate Change Conditions**

**Wei Xu 1,†, Shuaimeng Zhu 2,†, Tianli Yang 3, Jimin Cheng <sup>4</sup> and Jingwei Jin 5,\***


**Abstract:** *Rheum nanum*, a perennial herb, is a famous traditional Chinese medicinal plant that has great value in modern medicine. In order to determine the potential distribution of *R. nanum* in Asia, we specifically developed the potential distribution maps for three periods (current, 2050s: 2041–2060, and 2070s: 2061–2080) using MaxEnt and ArcGIS, and these were based on the current and future climate data under two climate scenarios (RCP2.6 and RCP6.0). To predict the potential impacts of global warming, we measured the area of suitable habitats, habitat suitability changes, and habitat core changes. We found that bio16 (i.e., the precipitation of the wettest quarter) and bio1 (i.e., the annual mean temperature) were the most important climate factors that influenced the distribution of *R. nanum*. The areas of high suitable habitats (HH) and middle suitable habitats (MH) in the current period were 156,284.7 <sup>±</sup> 0.99 km<sup>2</sup> and 361,875.0 <sup>±</sup> 3.61 km2, respectively. The areas of HH and MH in 2070RCP6.0 were 27,309.0 <sup>±</sup> 0.35 km<sup>2</sup> and 123,750 <sup>±</sup> 2.36 km2, respectively. The ranges of 82.0–90.3◦ E, 43.8–46.5◦ N were the mostly degraded areas of the 2050s and 2070s, and RCP6.0 had a larger decrease in habitable area than that found in RCP2.6. All the HH cores shifted south, and the shift distance of HH in 2070RCP6.0 was 115.65 km. This study provides a feasible approach for efficiently utilizing low-number occurrences, and presents an important attempt at predicting the potential distribution of species based on a small sample size. This may improve our understanding of the impacts of global warming on plant distribution and could be useful for relevant agricultural decision-making.

**Keywords:** geographic distribution; suitable habitat; *Rheum nanum*; MaxEnt; ArcGIS; range shifts; climate scenario

#### **1. Introduction**

In China, *Rheum* L. plants are traditional medicines distributed in the Greater Khingan Range, the Taihang Mountains region, the Qinling Mountains region, the Daba Mountains region, west of the Yunnan–Guizhou Plateau, and the Qinghai-Tibet Plateau [1–3]. At present, studies on *Rheum* L. plants mainly focus on its functional extracts [4–8]. As one of the most heat-tolerant and drought-tolerant species of *Rheum* L. plants [1], *Rheum nanum* Siev. ex Pall has been found in Mongolia, Kazakhstan, Russia, and China (i.e., Xinjiang, Gansu, and Inner Mongolia), which mainly inhabit hillsides, valleys, and gravel

**Citation:** Xu, W.; Zhu, S.; Yang, T.; Cheng, J.; Jin, J. Maximum Entropy Niche-Based Modeling for Predicting the Potential Suitable Habitats of a Traditional Medicinal Plant (*Rheum nanum*) in Asia under Climate Change Conditions. *Agriculture* **2022**, *12*, 610. https://doi.org/10.3390/ agriculture12050610

Academic Editor: Nándor Fodor

Received: 15 April 2022 Accepted: 22 April 2022 Published: 26 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

lands below 1000 m [9]. The latest study showed that chrysophanol, an active constituent of *R. nanum*, has been effective for obesity and may cure certain skin disorders such as acne vulgaris [10]. Emodin, another active constituent of *R. nanum*, has suppressed the growth and the invasion of colorectal cancer [11] and has been used to cure acute kidney injury [12]. Previous studies have shown that precipitation and temperature are the dominate factors affecting the geographic distribution of *Rheum* L. plants [1,3]. *R. nanum* extracts have played an important role in modern medicine; however, to the best of our knowledge, the exact geographic distribution and the suitable habitats of this ancient medicine remain unexplored.

With the development of computer technology, it has become possible to predict the distribution pattern of the species' niche by using occurrence data and associated environmental variables [13] with the help of species distribution models (SDMs). According to their dependence of occurrence data, SDMs can be roughly divided in two groups: occurrence-absence data groups (e.g., generalized additive models, generalized linear models, generalized boosting models) and occurrence-data-only groups (e.g., domain model, ecological niche-factor analysis model) [14,15]. The occurrence-data-only models typically have greater advantages as the absence data are usually difficult to collect in the real world, especially for poorly known species [16]. Among various occurrence-data-only SDMs, the maximum entropy model (MaxEnt) [17] has been used extensively because of its strong statistical foundation (i.e., high accuracy and robustness [14,18–21]) and ability to simulate ecological relationships [22]. Until now, MaxEnt has had a wide range of applications with relevant predictions [23–29]. Moreover, MaxEnt has better performance for species with a small sample size [16], especially for those species with typically little occurrence records [30].

Over the past 100 years, the global average temperature has risen by 0.6 ◦C [31,32]. By the end of this century, with a maximum increase of 2.6–4.8 ◦C in global average temperature [33], global precipitation will increase by approximately 31% [34]. Throughout the evolutionary history of diverse species [31,35], climate change has had significant impacts on the spatial distribution patterns of organisms [32,36]. Global warming has also been confirmed to have the ability to change species' survival [35] and conservation situations both spatially and temporally [37,38]. With the growing concern of global warming and its potential effects, there have been many useful studies on ecological modeling and conservations using multiple factors and methods [39]. In sub-Saharan Africa, under future climate scenarios, rice production will experience relevant losses in different regions [40]. For alpine plant communities in mountain ecosystems, elevationdependent warming presents additional challenges and may result in the extinction of coldadapted flora species [41,42]. In addition, the herbaceous and woody species in mountain basins and the Taklamakan Desert are more sensitive to climate change than plant species in meadows and steppe regions [36]. For *Stipa purpurea*, a perennial herbaceous plant widespread throughout the Tibetan Plateau, its number of suitable habitats is expected to increase from 1990 to 2050 [27]. Both the herbaceous species and dwarf shrubs will shift upwards in the alpine tundra region of the Changbai Mountains [43]. Hence, it is expected that the possible biogeographic patterns of *R. nanum* could change with global warming. Based on the effects of increasing temperatures on the global distribution of flora species, we developed two hypotheses: (1) high suitable habitats of *R. nanum* will increase and move northward; (2) the higher emission scenario may cause a larger increase in the habitat area of *R. nanum* than the lower emission scenario. The aim of this study was to predict the potential distribution of *R. nanum* in Asia under climate change and to verify our two hypotheses.

#### **2. Materials and Methods**

#### *2.1. Occurrence Data and Study Area*

The study area range was 70.9–136.5◦ N, 17.8–55.5◦ E in Asia. We initially selected 58 region names with relevant habitat descriptions (e.g., text, photo, or GPS points) from

online resources (http://www.cvh.ac.cn/ accessed on 21 April 2022, http://www.cfh.ac.cn/ accessed on 21 April 2022 and http://www.plantphoto.cn/ accessed on 29 June 2016) and the published literature [1,3]. At an approximate position, we conducted a two-and-a-half year (2016–2018) field survey in north and northwest China, and collected 246 occurrence data with GPS. To improve the accuracy of prediction and reduce overfitting [44], SDMs toolbox version 2.4 for ArcGIS 10.2 was used to rarefy the occurrence data. Since the resolution of environmental variables used in the MaxEnt model was 2.5 arc-minutes (approximately 5 km), to ensure that each variable grid covered, at most, one occurrence point, we filtered the occurrence data with a buffer distance of 5 km and finally selected 16 occurrence points (Figure 1) for the MaxEnt model. The occurrence points are usually divided into two types according to their usage: the training sample layer and the test sample layer. The rarefied sample size was too small and the widely used method of setting "random test percentages" (i.e., 20–30% of occurrence data [45,46]) would further reduce the sample size. Therefore, to overcome the limitations of a small sample size, we decided to set the 16 occurrence points (i.e., a 5 km buffer) as the training sample layer, and also sought other occurrence points that differed from the training samples as the test sample layer. We rarified the 246 occurrence data again with a 1 km buffer, and then obtained 27 points, which covered the 16 occurrence points (i.e., a 5 km buffer) previously selected. After removing the 16 samples from the 27 newly acquired samples (i.e., a 1 km buffer), we obtained 11 sample points that could be set as the test sample layer for the MaxEnt model.

**Figure 1.** Occurrence data of *Rheum nanum* in China.

#### *2.2. Environmental Variables*

Nineteen climate variables (2.5 arc-minutes resolution, Table S1) for the current (1970– 2000) and future (2050s: 2041–2060 and 2070s: 2061–2080) periods were downloaded from the WorldClim version 1.4 dataset (https://www.worldclim.org accessed on 21 April 2022). The current climate data are interpolations from the observed data, and the future climate data are projections from the Intergovernmental Panel on Climate Change (IPCC [47]) Fifth Assessment Reports (AR5). The future projection data has four "representative concentration pathways" (RCPs. 2.6, 4.5, 6.0, and 8.5 [48]) and examines four possible future radiative forcing levels (2.6, 4.5, 6.0, and 8.5 W/m2 [46]). The emission scenarios consider the potential impacts of policies on future greenhouse gas emissions and provide more scientific descriptions of possible future climate changes [17,49]. Among them, RCP2.6 is the lowest emission scenario (i.e., the most optimistic case) with ~490 parts per million (ppm) CO2 concentrations until 2100, RCP4.5 and RCP6.0 are middle of the road (i.e., the more optimistic case, ~650 ppm and 850 ppm, respectively), and RCP8.5 is the highest

emission scenario (i.e., the worst case, ~1370 ppm, https://www.carbonbrief.org accessed on 21 April 2022). RCP2.6 is the only case that could generally limit the global mean temperature increase below 2 ◦C by 2100 [49], because previous studies have indicated that without proper control, the CO2 concentrations will likely reach 560 ppm (double the pre-industrial level) and 800 ppm by 2060 and 2080, respectively [50]. Hence, in this study, we chose RCP2.6 and RCP6.0 of the Beijing Climate Center–Climate System Model version 1.1 (BCC–CSM 1.1 [51,52]) as the future climate change scenarios.

#### *2.3. Model Processing and Evaluation*

For SDMs, rare samples may lead to an imperfect match between the environmental variables and the occurrence data, which brings great difficulties to model evaluation and easily leads to model overfitting [53]. To our knowledge, MaxEnt is one of the few models that perform well in small sample simulations [16,53]. Over the past two decades, MaxEnt, an occurrence-data-only machine-learning program [54], has had extremely extensive procedures on predicting the potential distribution of species [44,55]. Previous studies indicates that 15 might be the minimum number of occurrence points for ecological niche modeling [16]. For Asian species which have >15 available occurrences, the MaxEnt model may provide a broad spectrum of predictions [16]. Moreover, the jackknife test reflects the contributions and the importance of environmental factors in the MaxEnt model [56–58]. The area under the curve (AUC) value, based on the receiver-operating characteristic (ROC), ranges from 0 to 1 and reflects the accuracy of the MaxEnt model's prediction [14,56,59]. Values of 0.8–0.9 indicate a good prediction while values of 0.9–1.0 represent an excellent prediction [17,32,54].

We converted 19 bioclimate environmental variables (bio1–bio19) to ASCII format using Toolbox of ArcGIS 10.2, and then imported the file along with the species occurrence data (in ASCII format) into MaxEnt 3.4.1 for pre-selection. Based on the default settings, we set 16 occurrence data (i.e., at 5 km) as the model training dataset and 11 occurrence data (i.e., at 1 km) as the model testing dataset. A correlation analysis (r > 0.8) was conducted to avoid the potential over-fitting [60,61] of environmental variables, and we obtained two factors (more details in [62]): the annual mean temperature (bio1) and the annual precipitation (bio12). In order to measure the contribution and importance of the 19 variables, we conducted a pre-selection with a set of 1000 iterations, a jackknife test, and an analysis of the response curves, and used default values for other settings. According to the pre-selection result (Table S1), the precipitation of the wettest quarter (bio16) and bio1 contributed to the model the most. Previous studies have indicated that, for *Rheum* L. plants, seasonal precipitation and relevant temperature during the growth period (March to September [9], and March to May for some species in extreme conditions) are the key factors affecting its spatial distribution [1–3]. Hence, after comprehensively considering the results of the correlation analysis, the pre-selection, and the possible environmental needs of *R. nanum*, we finally chose 6 bioclimatic factors (Table 1) as the environmental layers of the MaxEnt in this study. To improve the projection accuracy and reduce uncertainty, we launched the MaxEnt with the previous settings and ran it 10 times (i.e., 10 "replicates" with basic settings). We calculated the contributions and importance of the 6 factors (Table 1) with Excel and the obtained raster layers in ASCII format were then imported into ArcGIS for further analysis.

**Table 1.** Environmental variables used for MaxEnt modeling.


We defined the logistic threshold of "balance training omission, predicted area, and threshold value" as TH, and divided the obtained float raster maps into four classes using the reclassify tool [54,55]: UH (unsuitable habitat, 0–TH), LH (low suitable habitat, TH–0.3), MH (middle suitable habitat, 0.3–0.6), and HH (high suitable habitat, 0.6–1.0). The area of suitable habitats as well as habitat maps showing the increases and decreases in habitat range were measured using a raster calculator tool. The cores of habitats and range shifts of habitats were calculated by the zonal geometry (i.e., by calculating the area and centroid of the input raster or feature zone data using a specific zone field, which must be an integer field) and the mean center tool (i.e., by identifying the geographic center for a set of features with case field and weight field, which determine the rank and weight value, respectively; the rank represents the suitability classes, and the weight represents the floating value of MaxEnt.). To quantify the level of increase and decrease in habitat suitability, we defined UH, LH, MH, and HH as 1, 2, 3, and 4, respectively. The raster calculator tool of ArcGIS was used to test and compare the changes in habitat suitability over two periods. The changes in the habitat suitability degree of −1, −2, and −3, respectively, represent a slight, moderate, and dramatic decrease, while +1, +2, and +3, respectively, represent a slight, moderate, and dramatic increase. In addition, we chose the ratio of the area corresponding to the habitat suitability change degree and the total area of suitability change (i.e., increase or decrease) to quantify the changes in habitat suitability.

#### **3. Results**

#### *3.1. Contributions and Importance of Environmental Variable in the MaxEnt Model*

The mean AUC value of the training data was 0.992 ± 0.003 (Figure S1a), indicating that MaxEnt had excellent prediction performance. The Jackknife importance test showed that the mean value of the regularized training gain of the six factors was 3.127 (Figure S1b,c). When the environmental variables were used in isolation, bio16 (i.e., the precipitation of the wettest quarter) had the highest gain of 1.825, bio1 (i.e., the annual mean temperature) had the second highest gain of 1.455, and bio14 (i.e., the precipitation of the driest month) had the lowest gain of 0.291. Moreover, bio1 (2.751) had the largest decrease in gains when it was omitted, followed by bio16 (3.021) and bio15 (3.047) (Figure S1c).

The MaxEnt model results showed that the variable that contributed the most was bio16 (55.9%), followed by bio1 (13.9%) and bio14 (9.7%); the most important factor in the MaxEnt model was bio1 (54.1%), and the second and third most important factors were bio16 (20.0%) and bio19 (i.e., the precipitation of the coldest quarter, 13.4%), respectively.

#### *3.2. Potential Geographic Distribution and Suitable Habitat Area of R. nanum*

Under the RCP2.6 and RCP6.0 scenarios, *R. nanum*'s potential geographic distribution of suitable habitats over the three periods (current, 2050s and 2070s) modeled by MaxEnt was visualized by ArcGIS (Figure 2). The distribution maps showed that, under current climate conditions, the range of MH was bounded at 81.7–97.3◦ E, 39.4–48.6◦ N, and the range of HH was bounded at 82.4–93.4◦ E, 42.5–47.9◦ N.

**Figure 2.** Potential suitable habitat maps of the current period, 2050s, and 2070s: (**a**) current; (**b**) 2050RCP2.6; (**c**) 2050RCP6.0; (**d**) 2070RCP2.6; (**e**) 2070RCP6.0. UH, LH, MH, and HH represent unsuitable habitat, low suitable habitat, middle suitable habitat, and high suitable habitat, respectively.

The areas of MH and HH were 361,875.0 ± 3.61 km2 and 156,284.7 ± 0.99 km<sup>2</sup> in the current climate conditions, respectively (Figure 3). The area of suitable habitats (LH, MH, and HH) under the RCP6.0 scenario was smaller than that under the RCP2.6 scenario, in both the 2050s and the 2070s. Under the RCP2.6 scenario and the RCP6.0 scenario, the area of MH in the 2070s would be 225,972.2 ± 3.12 km<sup>2</sup> and 123,750 ± 2.36 km2, respectively, while that of the HH in the 2070s would be 48,194.4 ± 0.54 km<sup>2</sup> and 27,309.0 ± 0.35 km2. For MH, the area that was reduced the most was 2070RCP60 (65.80%), followed by 2050RCP60 (62.15%) and 2070RCP26 (37.56%) (Figure 2 and Table S2). For HH, the area that was reduced the most was 2070RCP60 (128,975.7 km2, 82.53%), followed by 2050RCP60 (128,368.1 km2, 82.14%), 2050RCP26 (110,920.1 km2, 70.97%), and 2070RCP26 (108,090.3 km2, 69.16%) (Figure 3 and Table S2).

#### *3.3. Changes of Potential Suitable Habitats in the Future Distribution Pattern*

Environment variables have a significant influence on the distribution of *R. nanum*. The increase and decrease maps (Figure 4) described the changes in their potential suitable habitats for three periods (current, 2050s, and 2070s). The predicted maps showed that, under the RCP2.6 scenario, the habitat suitability drastically decreased in the ranges of 81.6–89.6◦ E, 43.6–46.8◦ N (Figure 4a) and 82.0–90.3◦ E, 43.8–46.5◦ N (Figure 4b) in 2070– current and 2050–current period, respectively; under the RCP6.0 scenario, the drastically decreased area of habitat suitability in 2070–current period was at 81.9–91.9◦ E, 42.1–48.3◦ N (Figure 4c), while that of 2050–current period was at 81.9–90.6◦ E, 43.6–48.1◦ N (Figure 4d). Under the RCP2.6 and RCP6.0 scenarios, the changes in habitat suitability were varied by different degrees: a larger proportion of suitability change was found between adjacent levels (e.g., MH and HH), and a smaller proportion of change was found between further apart habitat levels (e.g., LH and HH). The change range of habitat suitability at −1 and +1 was 51.84–59.31% and 65.64–76.91%, while that of −3 and +3 was 14.82–18.63% and

4.55–7.37%, respectively (Figure 4 and Table S3). Among all conditions of the 2050s and the 2070s, compared with the current distribution pattern, RCP6.0 led to a larger change in the ratio of −1 than RCP2.6, and led to a smaller change in +1 (Table S3).

**Figure 3.** The area of potential suitable habitats over three periods (current, 2050s, and 2070s) under RCP2.6 and RCP6.0 scenarios. UH, LH, MH, and HH represent unsuitable habitat, low suitable habitat, middle suitable habitat, and high suitable habitat, respectively.

**Figure 4.** The increase and decrease maps of suitable habitats area: (**a**,**b**) RCP2.6 scenario; (**c**,**d**) RCP6.0 scenario; (**a**,**c**) 2070–current; (**b**,**d**) 2050–current. Zero represents no change of suitability, negative numbers represent a decrease in habitat suitability, and positive numbers represent an increase in habitat suitability. The degree of habitat suitability changes increases from −1 to −3 and +1 to +3.

#### *3.4. Range Shifts of Suitable Habitat Cores under Two Climate Scenarios*

Under the RCP2.6 and RCP6.0 scenarios, all cores of *R. nanum*'s UH were located in the area of 98.7–99.6◦ E, 38.3–39.0◦ N, the border of Gansu (GS) and Qinghai (QH), and did not have significant changes for three periods (current, 2050s, and 2070s) (Figure 5). However, the cores of LH, MH, and HH in future periods (i.e., 2050s and 2070s) had significant changes under RCP2.6 and RCP6.0, compared with their current distribution. Notably, under the RCP2.6 and the RCP6.0 scenarios, all cores of LH, MH, and HH in the 2050s and the 2070s were located in Xinjiang (XJ), China (Figure 5). Compared to the habitat range of MH in the current period, in the west–east direction, the MH cores of 2050RCP60, 2070RCP26, and 2070RCP60 shifted toward the west, while the MH core of 2050RCP26 moved east; in the north–south direction, except for 2070RCP60, the MH range of 2050RCP60, 2070RCP26, and 2050RCP26 shifted toward the north (Figure 5). The shift distances of MH cores between the current distribution and that of 2070RCP60, 2070RCP26, 2050RCP60, and 2050RCP26 were 118.68 km, 63.52 km, 142.73 km, and 148.49 km, respectively. Compared to the core of HH in the current distribution, all cores of HH shifted toward the south under the RCP2.6 and the RCP6.0 future scenarios (Figure 5). Compared with the range of HH in the current distribution, the cores of HH shifted by 100.18 km and 115.65 km in 2050RCP60 and 2070RCP60, respectively.

**Figure 5.** Range shifts of four habitats under RCP2.6 and RCP6.0 scenarios. Lines with color show the direction and distance of core change from current (start of line) to specific time (end of arrow) in MH and HH, and the color of the line depicts the specific time. UH: unsuitable habitat; LH: low suitable habitat; MH: middle suitable habitat; HH: high suitable habitat; M: Mongolia; IM: Inner Mongolia; XJ: Xinjiang; GS: Gansu; QH: Qinghai.

#### **4. Discussion**

#### *4.1. Effects of Climate Change on Suitable Habitat Range*

Our simulation of the potential distribution of *R. nanum* in the current period, the 2050s, and the 2070s had an accurate performance (AUC > 0.9), and the results indicated that *R. nanum*'s distribution was largely influenced by three precipitation factors (i.e., bio14, bio16, and bio19) and one temperature factor (bio1). For flora species, precipitation affects soil moisture and infiltration [15] and plays a key role in determining the distribution of plants. Moreover, under the growing pressure of changing environments, an increase in temperature [34] has resulted in a reduction in precipitation in Himachal Pradesh, and has impacted the productivity of agricultural crops [63]. In this study, we compared the range changes of the potential suitable habitat of three periods (i.e., current, 2050s, and 2070s) under two climate scenarios (i.e., RCP2.6 and RCP6.0) to quantify the impacts of temperature increase on the potential distribution of *R. nanum*. Compared to the current distribution, almost every suitable habitat (LH, MH, and HH) under RCP2.6 and RCP6.0 scenarios would decrease by different degrees in the future except for LH (increase of 2.58%) and MH (increase of 2.32%) under the 2050RCP2.6 scenario (Figure 3 and Table S2). We

believe that, under the 2050RCP2.6 scenario, the unbalanced changes of suitable habitats (e.g., for MH, the degraded area from HH could not offset the newly added suitable habitats from UH) may be responsible for the increase of LH and MH. The comparison of the potential distribution maps of the 2050s and the 2070s (Figure 4 and Table S2) revealed that, compared to the RCP2.6 scenario, RCP6.0 led to a greater reduction in the area of suitable habitats (LH, MH, and HH), which was contrary to previous studies [27] and our former hypotheses. The ecosystems of arid and semi-arid regions are too sensitive to global warming [64], and we believe that this may be the cause of this opposite result.

As the optimal habitat of *R. nanum*, the range of HH was predicted to shift south in the future, as compared to its current range. This result is contrary to the moving direction of the other species in previous studies [27,65]. We inferred that species in arid and semi-arid lands may have different survival strategies [36] when facing global warming, as compared to species in other ecosystems [26], and this may have caused the opposite range shifts. Murray et al. [66] suggested that species with a limited distribution always have a narrow ecological tolerance, and that even slight climate changes may affect their distribution patterns. In addition to the poleward shift, the cores of HH also shifted to higher altitudes with complex terrain (Figure S2), which was consistent with our original hypothesis. Previous studies have also indicated that species would shift poleward and upward to adapt to climate changes [67–69]. However, the poleward migration is essentially the same strategy as the high-elevation migration. For species with little elevation change in their typical habitats, range shifts north would provide an environment similar to their previous habitat in terms of temperature, and species could also shift to a colder environment by increasing living elevation by a small range [65]. Notably, unlike the simple horizon or elevation shift shown on maps, species' range shifts in nature are typically a combination of two migration strategies [67].

With a continued increase in greenhouse gas emissions [48], the higher-emission scenario (RCP6.0) had higher rates of temperature increases than those found in the loweremission scenario (RCP2.6). Until the end of this century, MH and HH had further range shifts in RCP6.0 than in RCP2.6. One possible reason was that, as an arid and semiarid species [36], *R. nanum* is highly sensitive to climate changes (e.g., temperature and precipitation). Notably, the geometric comparison of the suitable habitat range maps of the three periods under the two climate scenarios indicated that the HH range of 2050RCP26, 2050RCP60, 2070RCP26, and 2070RCP60 were all included in the HH area in the present day (Figure 2). In addition, the global temperature and mountain ranges largely led to the range shifts of plant species [68]. Therefore, combined with the range of suitable habitats and their range shifts, we assumed that the irregular shrinkage of their suitable habitats may be the main reason behind the range shifts in the future. For insight into the essentials of species range shifts, quantifying and evaluating the importance and contribution of the irregular shrinkage of their habitats should be a focus in future studies.

#### *4.2. Conservation of Species in Ecologically Fragile Areas*

The complex terrains of the high mountains in Asia, such as the Kunlun Mountains, the Altai Mountains, the Qilian Mountains, and the Tianshan–Pamir–Hindu Kush–Karakoram mountain ranges, have created a large region with relatively stable arid and semi-arid environments [70]. Previous studies have also proven that the valleys and the rivers of the Hengduan Mountains have provided refuge for plants to survive and evolve in the Last Interglacial and the Last Glacial Maximum [71]. For the habitat suitability of *R. nanum*, the drastically increased range was commonly bounded by 74.1–81.4◦ E and 36.0–37.6◦ N while the drastically decreased range was commonly bounded by 81.6–89.6◦ E and 43.6–46.8◦ N (Figure 4). We speculate that the mountain regions of Central Asia provide a relatively stable habitat for *R. nanum* to survive, and previous studies have provided some consistent clues [27,70,71]. Under two climate scenarios of the 2050s and the 2070s, the decreased range of habitat suitability coincides with the major distribution range (MH and HH) of *R. nanum* under current climate conditions (Figures 2a and 4). We support the belief

that the ecosystems in arid and semi-arid regions are relatively fragile and sensitive to environmental changes [36], and believe that temperature and corresponding precipitation changes [34,35] may be the dominant factors causing the decrease in *R. nanum*'s habitat suitability in Central Asia. The growing pressure of potential evaporation caused by the increasing global warming will accelerate the decrease in soil moisture [72], which may trigger severe droughts [36] with increasing frequency in arid and semi-arid regions. The severe drought stress could lead to low richness patterns of plant species [73], which would likely offer less opportunities [74] for the adaption and distribution of species in this region.

The variations (e.g., seasonality changes) in environmental factors (i.e., temperature and precipitation [41]) have an obvious effect on flora plants in arid and semi-arid areas. Compared with the herbaceous species distributed in this region, the living strategies (e.g., deeper root systems [36]) of woody species play an important role in ensuring survival. However, they also limit the speciation during their life history and may reduce their potential distribution range [75]. In terms of the decrease in the habitat suitability and irregular shrinkage of habitat range of *R. nanum*, compared to species with similar habitats, we cannot state with certainty whether this is an individualistic response [41] by the species to climate change. When it comes to conservation, for *R. nanum* and other plants with medicinal properties [48], we hold the opinion that appropriate human intervention is necessary. Since the extinction of species due to severe climate change is most likely to occur in sensitive and fragile ecosystems (e.g., the Taklamakan Desert [36]), it is recommended that protection areas [76] are set up with suitable habitats for target species in order to prevent excessive digging [77] and illegal trade [48]. Additionally, to ensure the management and protection of agricultural crops of economic value, regular collection of field germplasm resources and corresponding artificial cultures are also necessary and deserve special attention.

#### *4.3. Dominant Environmental Factors and Limitations in Predicting Species-Distribution Ranges*

Our study succeeded in predicting the geographic distribution of species living in arid and semi-arid regions. Objectively speaking, however, our results had several limitations. Firstly, MaxEnt's niche simulation of the target species is based on the premise that the species would be extensively present in the sites where their environmental conditions (i.e., temperature and precipitation) have maximum similarity to the sites of known occurrence data [59,62,78]. Some geographic barriers (e.g., monsoons, mountains, and rivers) that affect the distribution ranges of species are typically ignored [20]. Secondly, we only obtained a few occurrence data in this study, and the prediction did not consider more relative environment factors (i.e., soil category, light, terrain). For more accurate predictions, future studies should consider as many relevant factors (e.g., abiotic and biotic factors) as possible and integrate them with more adaptable algorithms [53]. Thirdly, we predicted the potential distribution of *R. nanum* using MaxEnt alone. However, previous studies have shown that crosslinked models have had a higher prediction accuracy compared to a single model [68]. In addition, as an important complement and confirmation of our work, studies of the Last Glacial Maximum [45] and the Mid-Holocene will further refine species' response theory to climate change.

#### **5. Conclusions**

In this study, with the help of ArcGIS and the MaxEnt model, we successfully predicted *R. nanum*'s potential distribution and evaluated suitable habitats in the current period, the 2050s, and the 2070s under the RCP2.6 and RCP6.0 scenarios based upon the relevant environmental factors (i.e., temperature and precipitation). In contrast to the simulation of other species with abundant occurrence data, we categorized the selected occurrence data into the test layer and training layer manually, and realized MaxEnt's effective utilization of our small sample size. Our results suggest that the potential distribution habitat of *R. nanum*'s range was 81.7–97.3◦ E and 39.4–48.6◦ N in the current period. The key environmental factors that affected the distribution of *R. nanum* were bio1 and bio16. Under the

two climate scenarios, the areas of suitable habitats (i.e., LH, MH, and HH) had different degrees of decreasing in both the 2050s and the 2070s, and RCP6.0 led to larger habitat range reductions than those found with RCP2.6. Moreover, the suitable habitats for *R. nanum* will shift toward the south with different distances in the future. In particular, we found that the irregular shrinkage of suitable habitats may be an ignored reason that led to the movement of habitat cores. To prevent the illegal digging and trade of agricultural crops with economic value, it is feasible to establish protection areas and management standards. We believe our study can provide a vital reference for the habitat simulation of species with a small sample size, and may provide supports for species conservation in arid and semi-arid regions.

**Supplementary Materials:** The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/agriculture12050610/s1, Figure S1: The curves of the MaxEnt model (a) and bio-climate factors (b) and Jackknife test results (c); Figure S2: The terrain remote sensing image (300 km) of high suitable habitat cores; Table S1: Contribution of environmental variables used in MaxEnt; Table S2: Area changes of suitable habitats in different climate scenarios; Table S3: The area (104 km2) and ratio (%) of habitat suitability change at different period interval situations.

**Author Contributions:** W.X. designed the study, collected occurrence data, and wrote the main body of the manuscript; W.X., S.Z. and T.Y. drew the map, tables, and figures; J.J. and J.C. acquired the funding and revised the manuscript and figures. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financially supported by the Key Research and Development Program of Shaanxi Province (2021NY-006), Natural Science Foundation of China (31601987), China Agriculture Research System (CARS-34), Doctoral Fund of Henan Polytechnic University (B2019-4), and Deployment Program of the Chinese Academy of Sciences (KJZD-EW-TZ-G10).

**Acknowledgments:** We thank the national and international organizations that provided data and associated software for this work. The authors are indebted to Yixian Chen, Hanqi Liu, and Kai Jin for their kind help with ArcGIS operation and data processing. Junru Wang assisted in confirming species' Latin names. During the two and a half years of fieldwork, Weiwei Ren and numerous anonymous and kind individuals helped collect the species' field occurrence data. Moreover, we would like to thank Shuo Wang for her suggestions on the color selection of the full-text Figures. Importantly, the spiritual support of Runzhi Mao was the indispensable base for the completion of this work. We also thank the anonymous reviewers who made many important suggestions for improving the paper and others who contributed to the improvement of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **North Expansion of Winter Wheat Planting Area in China under Different Emissions Scenarios**

**Maowei Wu 1, Yang Xu 1, Jingyun Zheng 1,2 and Zhixin Hao 1,2,\***


**Abstract:** Suitable planting areas for winter wheat in north China are expected to shift northwardly due to climate change, however, increasing extreme events and the deficient water supply are threatening the security of planting systems. Thus, based on predicted climate data for 2021–2050 under the Shared Socioeconomic Pathways (SSP1-2.6, SSP3-7.0, and SSP5-8.5) emission scenarios, as well as historical data from 1961–1990, we use four critical parameters of percentages of extreme minimum temperature years (POEMTY), first day of the overwintering period (FD), sowing date (SD), and precipitation before winter (PBW), in order to determine the planting boundary of winter wheat. The results show that the frequency of extreme minimum temperature occurrences is expected to decrease in the North winter wheat area, which will result in a northward movement of the western part of northern boundary by 73, 94, and 114 km on average, in addition to FD delays ranging from 6.0 to 10.5 days. Moreover, agrometeorological conditions in the Huang-Huai winter wheat area are expected to exhibit more pronounced changes than the rest of the studied areas, especially near the southern boundary, which is expected to retreat by approximately 213, 215, and 233 km, northwardly. The north boundary is expected to move 90–140 km northward. Therefore, the change in southern and northern boundaries will lead the potential planting areas of the entire North winter wheat area to increase by 10,700 and 28,000 km2 on average in the SSP3-7.0 and SSP5-8.5 scenarios, respectively, but to decrease by 38,100 km2 in the SSP1-2.6 scenario; however, the lack of precipitation remains a limitation for extending planting areas in the future.

**Keywords:** climate change; agriculture; food security; planting boundary; winter wheat

#### **1. Introduction**

Wheat is the third-largest crop in the world, and provides 20% of human dietary protein and caloric intake globally [1]. Its broad adaptability to climatic conditions and variety diversity accounts for its unparalleled cultivation range, from 67◦ N in Scandinavia and Russia, to 45◦ S in Argentina [2]. As stated by the Food and Agriculture Organization of the United Nations (FAO), the production of wheat has increased from 222 million tons in 1961 to 732 million tons in 2013 [3]. Nonetheless, with the continually increasing global mean surface temperature since the Industrial Revolution [4], climate change and increasing extreme climate events disturb the agricultural ecosystem, and result in changes in local suitable agrometeorological conditions, which affects wheat growth. Thus, climate change is expected to substantially expand the suitable regions for winter wheat cultivation in North America northwardly into Canada, and extend the fall-sown spring wheat region northwardly and eastwardly [5,6]. In northern Europe, suitable areas for winter wheat cultivation have expanded almost into the Arctic Circle (66.5◦ N) [7]. Crops in southern Europe, such as maize, sunflower, and soya beans, could also expand further north and occur at higher altitudes [8,9]. Although warmer temperatures benefit wheat cultivation at high latitudes by reducing cold-temperature constraints on agricultural development, typical wheat planting

**Citation:** Wu, M.; Xu, Y.; Zheng, J.; Hao, Z. North Expansion of Winter Wheat Planting Area in China under Different Emissions Scenarios. *Agriculture* **2022**, *12*, 763. https:// doi.org/10.3390/agriculture12060763

Academic Editor: Danilo Scordia

Received: 29 April 2022 Accepted: 25 May 2022 Published: 27 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

areas in the tropics will be gradually reduced [10]. Therefore, the negative impacts of climate change on global wheat production will likely become a critical issue to address in the future. As predicted by Balkovic et al. [11], global wheat production under current conventional management methods would decrease under all Representative Concentration Pathways (RCPs) by 37–52 Mt, and by 54–103 Mt in the 2050s and 2090s, respectively.

China is the largest wheat producer in the world, possessing 11% of the global wheat cultivation areas and contributing to 17% of global wheat production [12]. Winter wheat accounts for more than 90% of the total wheat yields in China [13]. The winter wheat cultivation zone climatic indices in China are based on tolerable low temperatures for winter wheat growth [14,15], as varying degrees of freeze damage during the overwintering stage may have different negative impacts on winter survival rate, crop vigor, and therefore final yields [16,17]. However, temperature change spatial distribution patterns in China exhibit strong similarities with global changes, with temperature increases occurring throughout the entire region, albeit more noticeably in the northern region [18]. This northward expansion over the past few decades has been largely attributed to the longer growing seasons and decreased temperature-related constraints on crop growth that have resulted from warmer temperatures. Increasing attention has begun to focus on the changes in winter wheat cultivation distribution and possible planting boundaries in China, and substantial progress in the characterization of this phenomenon has been achieved. For instance, an observed significant relationship is that the sowing date is delayed for 4 days when the temperature increases by 1 ◦C [19]. The planting boundaries for different winter wheat varieties in China moved significantly northward in 1981–2010, compared to the 1951–1980 period [20]. Moreover, the overall potential planting areas for winter wheat increased as well. The strong winterness-variety winter wheat had the largest change in both the movement of the planting boundary and in planting area increase. Hao et al. analyzed changes in suitable winter wheat planting boundaries along its production areas in China under the RCP4.5 scenario, and predicted a northward shift of the northern winter wheat boundary by 1–2◦ N [21]. Planting area increased 1420 km2 in the year 2019 compared to that in the year 2000, as measured by Landsat image mapping [22].

So far, most previous studies have focused on the northern boundary shifts, rather than on those occurring in both the southern and northern boundaries in the coming decades. Moreover, some studies have been limited to the provincial or regional scale, and thus cannot represent the general impact of climate change on agriculture. Therefore, this study aimed to assess the impacts of climate change on agrometeorological index trends associated with wheat safe overwintering in the winter wheat region of China under different Shared Socioeconomic Pathways of low (SSP1-2.6), medium-high (SSP3-7.0), and high (SSP5-8.5) emission scenarios, as well as further, in order to explore potential wheat planting boundaries in the future. Thus, the findings of this study could provide reference for other agriculture planting regions and scientific data for climate change adaptation and responses in food security.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The winter wheat planting region in China is divided into the North winter wheat area and the South winter wheat area border by the Huaihe River, based on the geographical environment, natural conditions, climatic factors, farming system, and wheat varieties [14]. In this study, we focus on the North winter wheat area (I), which is located south of the Great Wall and north of the Huaihe River, and includes the Shandong, Henan, Hebei, Shanxi, Shaanxi, southeastern Gansu, and the northern parts of Jiangsu and Anhui Provinces (Figure 1). Crops ripen twice per year, or three times every two years [23,24]. The climatic conditions are suitable for winter or strong winter varieties of winter wheat growing, with an annual mean temperature of 9–15 ◦C, extreme minimum temperature of −30.0 to −13.2 ◦C from north to south, and annual active accumulated temperature ranges of 2750−4900 ◦C. Area I can be divided into the Northern winter wheat subregion (Ia) and

Huang-Huai winter wheat subregion (Ib), according to latitudes, terrains, and climatic conditions. For subregion Ia, the winter wheat is sown from the end of September to early-October, and harvested until mid-June to late-June, but for subregion Ib, the sowing date is the same as for Ia, and the harvest time is advanced to early-June. In addition, since the possible winter wheat planting boundaries might move northward as a result of climate warming, we also plot the Spring wheat area (II) in the north of area I in Figure 1.

**Figure 1.** Map of North winter wheat area; shaded colors indicate the temperature spatial pattern (mean value from 1991−2019), using the daily meteorological data set of basic meteorological elements of China National Surface Weather Station (V3.0).

#### *2.2. Data*

The Shared Socioeconomic Pathways (SSPs) are five distinctly different scenarios determined by an international team of climate scientists, economists and energy systems modelers, and adopted by the Sixth Assessment Report of the Intergovernmental Panel on Climate Change, which describe how global societies, populations, and economies will change under the context of climate change adaptation and mitigation [25]. The SSPs provide a framework to describe alternative socioeconomic developments between and within countries, and represent five scenarios, including sustainable pathway (SSP1), middle pathway (SSP2), regional rivalry pathway (SSP3), divided pathway (SSP4), and fossil-fueled development pathway (SSP5). For the analysis of socioeconomic and climate systems, this study combines SSPs and RCPs to form a set of future global change scenarios determined by socioeconomics, emissions, climate response, and anthropogenic forcing of climate systems, which makes future scenarios more reasonable for social development [26]. Three combined SSP-RCP scenarios are selected in this study for future wheat overwintering indices in model prediction: (1) a low forcing and sustainability pathway (SSP1-2.6), which represents the combined scenario of a lower challenge of mitigation with low radiation forcing which peaks at 2.6 W/m<sup>2</sup> by 2100; (2) a new forcing scenario (SSP3-7.0), which represents a combination of high social vulnerability and relatively high radiative forcing that stabilizes at 7.0 W/m<sup>2</sup> by 2100; and (3) a high forcing scenario (SSP5-8.5), which represents a highly energy-intensive socioeconomic development pathway whereby radiative forcing reaches 8.5 W/m2 by 2100.

Climate scenario data for 2021–2051 and historical data for 1961–1991 were provided by the Inter-Sectoral Impact Model Intercomparison Project (ISI–MIP; https://data.isimip. org/search/tree/ISIMIP3b/; accessed on 1 May 2021). The data included five climate model simulation outputs: GFDL-ESM4 (NOAA-GFDL), UKESM1-0-LL (MOHC), MPI-ESM1-2-HR (MPI-M), IPSL-CM6A-LR (IPSL), and MRI-ESM2-0 (MRI), which have been bias-corrected based on the raw data from the five aforementioned Coupled Model Intercomparison Project (CMIP6) models. The monthly mean values of the simulated data were adjusted to match the observed data, in order to preserve the long-term absolute or relative trends of the simulated data. Afterward, these bias-adjusted data were bilinearly interpolated at 0.5◦ × 0.5◦ spatial resolution [27]. In this study, the data used were daily precipitation, daily mean temperature, and daily minimum temperature. We used the daily gridded climate data set from 1961 to 1990 as observation data, in order to evaluate the simulation capability of the climate model. The daily gridded climate data set with spatial resolution of 0.5◦ × 0.5◦ is converted from observation data, and can be downloaded from the National Meteorological Information Center (http: //101.200.76.197/data/detail/dataCode/SURF\_CLI\_CHN\_PRE\_DAY\_GRID\_0.5.html; accessed on 1 January 2021).

The study area includes approximately 1425 grid points. A Taylor diagram is used to evaluate the simulation capability of extreme minimum temperature (≤−22 ◦C) days (EMTD), accumulated temperature (AT), and accumulated precipitation (AP) from 1 September to 31 December over the entire study area (Figure 2). The spatial correlation coefficients of EMTD, AT, and AP between the multi-model ensemble (MME) and the observation data were 0.968, 0.974 and 0.970, respectively, and all coefficients passed the significance test at the 99% confidence level. The respective ratios of standard deviations between MME and observation data were 1.070, 1.036 and 1.018, and the normalized root mean square differences (RMSD) between simulation and observation data for all three climatic indices were less than 0.5. These results indicate that MME can effectively capture the temperature and precipitation characteristics of the study area. Therefore, we used MME to analyze the meteorological conditions related to wheat overwintering under different emissions scenarios relative to 1961–1990.

**Figure 2.** Taylor diagrams of (**a**) EMTD (≤−22 ◦C), (**b**) AT, and (**c**) AP from September 1st to December 31st over the entire study area for the period 1961–1990. The red dotted line corresponds to the 99% confidence level; REF: observation; azimuthal position: spatial correlation coefficient; radial distance: ratio of standard deviation; distance from REF point: normalized root mean square difference.

#### *2.3. Methodology*

In the North winter wheat Area, whether the winter wheat can be grown safely or not is determined by climate conditions during the winter, and thus, four critical indices with significant effects on winter wheat safe planting boundaries are used here: (1) the percentage of extreme minimum temperature years occurrence, (2) the first day of the overwintering period, (3) the sowing date, and (4) the precipitation before overwintering. The percentage of extreme minimum temperature years occurrence determines whether winter wheat can resist freezing injury in severe winter. The first day of the overwintering period and sowing date both account for the accumulated temperature of winter wheat prior to the overwintering stage, as well as its overwintering ability; if the accumulated temperature between sowing and overwintering is too low or high, weak wheat seedlings may encounter difficulties in surviving through winter. The precipitation before overwintering influences the strength of wheat seedlings before entering the overwintering stage. Here, since the sowing dates are influenced by complex factors such as wheat variety and terrain (e.g., mountain area microclimates), we obtain the sowing date through the required accumulative temperature from sowing date to the overwinter period (e.g., 450, 550, 700 ◦C), which can ensure winter wheat seedling survival during the winter. It is also worth noting that the accumulated temperature was calculated until December 31st over the south region of the whole winter wheat area, where no obvious overwintering period is observed. The calculations of the four indices were performed as follows:

(i) Percentage of extreme minimum temperature (−22 ◦C) years occurring(POEMTY) in a given period:

$$\text{POEMY} = \sum\_{k=1}^{ty} I\{t \text{min}\_k \le -22\} / ty \tag{1}$$

where *ty* is the total years of a study period, *tmink* is the daily minimum temperature for a certain year *k*, and *I* is a sign function, which is 1 if *tmink* is lower than or equal to −22 ◦C; the specified −22 ◦C is the lowest temperature that winter wheat can endure safely through the winter [28,29].

(ii) First day of the overwintering period (FD): defined as the first day at which the daily mean temperature (i.e., based on a 5-day moving average) was below 0 ◦C [30].

(iii) Sowing date (SD): the date when the cumulative temperature reaches the required accumulated temperature before winter (Ta), calculating back from the FD. Ta (i.e., 450, 550, 700 ◦C) is positive accumulated temperature calculation for daily average temperature greater than 0 before overwintering, according to the method described by Cui et al. [31].

(iv) Precipitation before overwintering (PBW): total precipitation from the SD to the FD.

For each grid, the FD and SD are determined with an 80% guarantee rate (i.e., an agricultural climate criterion), and the PBW was calculated via the mean value over the referenced period and forecasting period.

#### **3. Results**

#### *3.1. Percentage of Extreme Minimum Temperature (*−*22* ◦*C) Years Occurrence*

The lowest critical temperature for winter wheat cultivation in the regions near the north boundary (along the Great Wall) was demonstrated to be −22 ◦C [29]. Therefore, POEMTY is among the most important indices to reflect climatic conditions during the winter wheat overwintering period, which directly impacts seedling survival rate. Figure 3 illustrates the spatial distribution of POEMTY for 1961–1990 and 2021–2050 under SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios. The POEMTYs are mainly concentrated in 0–20% and 80–100% intervals, and the 0% areas are marked with white lattices in Figure 3, where no lower than −22 ◦C extreme minimum temperatures occurred. However, values of POEMTY above 50% have cold injury risk for agriculture, and farmers would no longer choose these areas to plant winter wheat. Therefore, colored areas with above 50% of POEMTY were defined as high-risk region for winter wheat growing, and low-risk (0 < POEMTY ≤ 20%) and medium-risk regions (20% < POEMTY ≤ 50%) were also defined, as shown in Figure 3. In particular, the medium-risk region was considered as the potential extended winter wheat area along the northern boundary.

**Figure 3.** Spatial distribution of POEMTYs for the 1961–1990 and the 2021–2050 periods. (**a**) Historical period, (**b**) SSP1-2.6, (**c**) SSP3-7.0, and (**d**) SSP5-8.5 scenarios. White lattices indicate that no extreme minimum temperature ≤−22 ◦C occurred. The black solid and dotted lines represent the northern border of winter wheat cultivation in the 1961–1990 and 2021–2050 periods, respectively.

During the historical period 1961–1990 (Figure 3a), the POEMTY in the North winter wheat area (I) was below 20%, which meets the 80% assurance rate of guaranteed minimum temperature. In fact, the potential winter wheat planting area northern boundary was further north than the current boundary in the eastern and western portions of area I. At the end of the 20th century, the experiments of winter wheat northward migration were successfully carried out in Liaoning Province and Inner Mongolia [32].

For 2021−2050, the potential safe overwintering areas for winter wheat are projected to expand under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios compared to the 1961−1990 period (Figure 3b–d), and the high-risk area will move northward as a result of intended climate warming. Under the SSP1-2.6 scenario, the safe overwintering areas for winter wheat will increase by 11.8% relative to 1961–1990. The risk-free area will increase by 24.7%, and the wheat in 94.5% of the current area I will no longer experience extremely low temperatures. The northern boundary of the potential planting area in northeastern China (i.e., the eastern region) will extend to the western part of Jilin Province, and the boundary in central Inner Mongolia (i.e., the western region) will move slightly north as well. The western part of the northern boundary could move northwardly on average by approximately 73 km, and the northernmost tip of the eastern region could move 111 km northward. Under the SSP3-7.0 scenario, the low-risk area will increase by 14.7%, and the potential northern planting boundary will move northwardly by approximately 94 km on average in the western region, and by 152 km in the northernmost tip of the eastern region. Moreover, 99.0% of the current area I will no longer experience extreme low temperatures, and the risk-free area will increase by 32.5%. Under the SSP5-8.5 scenario, the area for safe wheat overwintering will have an increase of 16.6%, the risk-free area will increase by 34.8%, and the northern boundaries in the western and eastern regions will both substantially move to the north (i.e., 114 and 158 km, respectively).

#### *3.2. First Day of the Overwintering Period*

During the 1961–1990 reference period (Figure 4a), the FD varied from mid-November to late December in most of the North winter wheat area (I), while moving from north to south along the latitudinal gradient. In the Spring wheat area (II), the FD occurred in early November in the southern part of northeastern China and western Inner Mongolia, and in October in the northern and western parts of northeastern China. However, most of the northern parts of area II were unsuitable for winter wheat, given the low-temperature limitations mentioned in Section 3.1. The FD was from early November to early December in the northern winter wheat subregion (Ia), and concentrated in mid-to-late December in the Huang-Huai winter wheat subregion (Ib). However, some areas along the Huaihe River in the southernmost part of subregion (Ib) did not exhibit noticeable overwintering periods, since they are in the temperate-subtropical transition region, which is represented with dark grey dots in Figure 4a.

**Figure 4.** Spatial distribution of FDs for the 1961–1990 and 2021–2050 periods. (**a**) Historical period, (**b**) SSP1-2.6, (**c**) SSP3-7.0, and (**d**) SSP5-8.5 scenarios. The grid points marked by dark grey dots indicate that there are no obvious overwintering periods.

For the period of 2021–2050, the FD exhibited a relatively consistent spatial distribution under the three SSP-RCP scenarios, albeit with delays towards late November and later (Figure 4b–d). Taking the SSP1-2.6 scenario as an example, the winter wheat in area II will enter the overwintering period as early as early October, and the boundaries of the overwintering periods will move northwardly. Moreover, the largest changes will occur between late October to mid-November in northeastern China, and from early- to mid-November in western Inner Mongolia. In subregion Ia, the FD will range from mid-November to late December from north to south, respectively. Specifically, it is mainly from mid-November to early December in the western high elevations, and from mid-November to late December in the east. Winter wheat will begin to enter the overwintering period in early to mid-December in the western regions, and in late-December to later in most of the eastern plains. There will be no obvious overwintering periods in the south part of subregion Ib, and the areas without obvious overwintering periods will increase and account for approximately 36.7%, 37.0%, and 39.7% in subregion Ib under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively, and will be mainly distributed in some of the southern provinces, such as Shandong, Henan, Jiangsu, and Anhui, which will cause the southern boundaries of area I to move approximately 213, 215, and 233 km to the north (i.e., as determined by the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively). Compared with the reference period, the FD is delayed by 6.0, 6.6, and 7.1 days in area II; by 8.9, 7.7, and 10.4 days in subregion Ia; and by 10.5, 9.0, and 10.3 days in subregion Ib under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively. Importantly, the MME projects higher temperatures for more than half of the days from overwintering periods in 63.0% of the grids over area I under the SSP1-2.6 scenario compared to the SSP3-7.0 scenario. Therefore, the changing trends of some overwintering meteorological indices in this region are larger under the SSP1-2.6 scenario, such as the area without obvious overwintering periods and the FD delays.

#### *3.3. Sowing Date*

The accumulated temperature (AT) before entering the overwintering period is important factor affecting the ability of winter wheat to resist freezing conditions, and the SD in this study was calculated when the pre-winter positive accumulated temperature reached a certain value. According to previous studies [21], the AT for viable seedlings was approximately 570–720 ◦C, and the lower limit for safe overwintering of wheat was approximately 420 ◦C. However, wheat seedlings tended to grow excessively before winter if the AT was excessive, which also led to poor cold resistance ability. In this study, values of 450, 550, and 700 ◦C were used as references to analyze SD changes in the North winter wheat area (I).

Figure 5 illustrates the spatial distribution features of SD during the 1961–1990 historical period, as well as under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios in 2021–2050 with AT values of 450, 550, and 700 ◦C, respectively. The SD exhibited a gradually delayed spatial distribution pattern from south to north, but occurred earlier if the AT increased for each scenario. In 1961–1990, when the AT was 450 ◦C, the SD began in mid-September and earlier in the spring wheat area (II), in August in the Greater Khingan Mountains and Lesser Khingan Mountains, and in mid-September in the southern part of the northeast plain and western part of Inner Mongolia. The SD was from mid-September to early-October in the Qinling Mountain area, the western Taihang Mountain, the northern part of the North China Plain, and the Liaoning area; mid-October in the lower and middle reaches of the Yellow River; and late October in the southern part of the North China Plain. The SD spatial distributions for values of 550 and 700 ◦C were similar to those for 450 ◦C; however, the SD advanced as AT requirements increased, as illustrated in Figure 5(a2,a3). Additionally, the SDs of different regions changed to varying degrees over the 1961–1990 period, exhibiting average delays of 0.3–0.4 days/decade in area II, 0.8–0.9 days/decade in the Northern winter wheat subregion (Ia), 0.6–0.8 days/decade in the Huang-Huai winter wheat subregion (Ib), and 0.7–0.8 days/decade in the entirety of area I.

**Figure 5.** Spatial distribution of SD in the 1961–1990 and 2021–2050 periods for accumulated temperature before overwintering periods at 450, 550, and 700 ◦C. (**a1**–**a3**) historical period, (**b1**–**b3**) SSP1-2.6, (**c1**–**c3**) SSP3-7.0, and (**d1**–**d3**) SSP5-8.5 scenarios.

For the period of 2021–2050, the SDs exhibited relatively consistent spatial distributions under the three emissions scenarios for 450, 550, and 700 ◦C, and all showed delayed dates compared with 1961–1990. For example, when the AT reaches 450 ◦C under the SSP1-2.6 scenario, the SDs are delayed, from mid and late August in the reference period, to late August and early September in most areas of the Greater Khingan Mountains and the Lesser Khingan Mountains, and from early-mid September to mid-to-late September in the northern boundary and its surrounding areas. In subregion Ia, the SDs range from mid-September and early October, to late September and early October in the western regions, and concentrate in early and mid-October in the eastern regions. In subregion Ib, the SDs are delayed from late September and early October, to early and mid-October in

the areas surrounding Qinling Mountain, and from late October to November in the areas along the Huaihe River. On average, the SD is delayed by 8.1 days in area II, 7.5 days in subregion Ia, and 8.7 days in subregion Ib. For the AT of 550 and 700 ◦C, the SD spatial distributions are shown in Figure 5(b2,b3), with a delay of 7.7–8.8 days and 7.8–8.9 days for the three sub-districts. The SD would be further delayed under the SSP3-7.0 and SSP5-8.5 scenarios, with an average delay of 7.5 (450 ◦C) to 8.0 (700 ◦C) days for SSP3-7.0, and 8.8 (450 ◦C) to 9.1 (700 ◦C) days for SSP5-8.5 across the entirety of area I. Furthermore, the SD trends over a given region were similar for different AT requirements under the same emissions scenario over the predicted 2021–2050 period, and the SD delay rate accelerated for all conditions compared to 1961—1990. The SDs under SSP1-2.6, SSP3-7.0, and SSP5-8.5 were respectively delayed by 1.3–1.4, 1.6–1.7, and 2.2–2.4 days/decade on average for the entirety of area I, suggesting that the impact is greater under higher emissions scenarios.

#### *3.4. Precipitation before Winter*

The historical spatial distribution features of PBW in 1961–1990, as well as in 2021–2050 under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 radiative forcing scenarios are illustrated, with AT values of 450, 550, and 700 ◦C (Figure 6). During the 1961–1990 reference period, when the AT was 450 ◦C, PBW reached more than 60 mm in the northern Greater Khingan Mountains, the Lesser Khingan Mountains, Qinling Mountain, and Changbai Mountain, which was followed by 40–60 mm in Taihang Mountain and the Huaihe River area, and 20–40 mm in the Northeast Plain and North China Plain. The PBW in the plateau areas in the west of Taihang Mountain was generally less than 60 mm, and decreased from east to west. Moreover, there was an increase in PBW with higher AT, especially at high altitudes. For example, at 700 ◦C of AT, the PBW was generally above 100 mm in the Greater Khingan Mountains, Lesser Khingan Mountain, Changbai Mountain, and Qinling Mountain area; 60–80 mm in the Taihang Mountain area; 20–80 mm in the Northeast Plain and Northern China Plain; and 80–100 mm in the areas along the Huaihe River.

During the period of 2021–2050, the PBWs in most areas of the North winter wheat area (I) show a decreasing trend relative to the reference period under the SSP1-2.6, SSP3- 7.0, and SSP5-8.5 scenarios. No substantial differences were observed between the spatial distribution pattern of PBW for different emissions scenarios and AT requirements. When the AT reached 450 ◦C under the SSP1-2.6 scenario, the regions with more than 60 mm of PBW almost reached the Greater Khingan Mountains, Lesser Khingan Mountains, and Changbai Mountain, while the western regions of Inner Mongolia received little precipitation. The western regions of the Northern winter wheat subregion (Ia) exhibited more precipitation than the eastern regions, with 40–60 mm and less than 40 mm of PBW, respectively. The PBWs in most regions of the Huang-Huai winter wheat subregion (Ib) only received 20–40 mm, but the area near the Qinling Mountain exceeded 40 mm. On average, the PBW decreased by 13.8% in the spring wheat area, 13.7% in subregion (Ia), and 26.7% in subregion (Ib), relative to the reference period. For AT values of 550 and 700 ◦C, the PBW spatial distributions are illustrated in Figure 5(b2,b3), with 13.1–24.8% and 10.1–22.6% decreases for the three sub-districts, respectively. The PBWs would still decrease with larger reductions under the SSP5-8.5 scenario than under the SSP1-2.6 and SSP3-7.0 scenarios, which indicated that the issue of water deficiency during the historical period was still intractable [33]; development and management of irrigation facilities may require more attention over this region [34].

**Figure 6.** Spatial distribution of PBW in the 1961–1990 and 2021–2050 periods for accumulated temperature before overwintering periods at 450, 550, and 700 ◦C. (**a1**–**a3**) historical scenario, (**b1**–**b3**) SSP1-2.6, (**c1**–**c3**) SSP3-7.0, and (**d1**–**d3**) SSP5-8.5 scenarios.

#### *3.5. Planting Boundaries under the Different Scenarios*

Based on above analysis from 3.1 to 3.4, the safe planting areas are illustrated in Figure 7 for each scenario. The western region (west of 115◦ E) of the potential northern planting boundaries will move northward by approximately 73, 94, and 114 km on average under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively, and the northernmost tip of the eastern part will respectively move northwardly by 111, 152, and 158 km. Due to climate change, almost 40% the Huang-Huai winter wheat subregion (i.e., 36.7%, 37.0%, and 39.7% under the three radiative forcing scenarios) will exhibit no obvious overwintering periods, causing the southern boundaries of the North winter wheat area (I) to retract approximately 213, 215, and 233 km to the north. This indicates that some provinces in southern area I, such as Shandong, Henan, Jiangsu, and Anhui, would become unsuitable for winter wheat cultivation. However, the potential planting areas of the entirety of area I will increase by 10.7 and 28.0 thousand km<sup>2</sup> on average in the SSP3-7.0 and SSP5-8.5 scenarios, respectively, and decrease 38.1 thousand km2 in the SSP1-2.6 scenario. It is worth noting that although the radiative forcing of SSP3-7.0 is higher than that of SSP1-2.6, obvious warming of SSP3-7.0 exists as a regional difference.

**Figure 7.** The possible planting boundaries for the North winter wheat area for 1961–1990 and 2021–2050 (SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios).

#### **4. Discussion**

Winter wheat is generally planted in China as a result of its broad climatic adaptability, and the distribution of its cultivation zones has become the focus of a growing number of scientists in the search to maintain high and stable yields. Previous studies mainly focused on the northward displacement of the northern planting boundary [19,32,35]. However, we predicted that areas without obvious overwintering periods are expected to increase significantly by reaching the Yellow River within the Huang-Huai winter wheat subregion. This indicates that most of the Huang-Huai winter wheat subregion will no longer be suitable for the currently cultivated winter wheat variety in the future. It is also worth noting that when 'extreme spring cold spells' (ESCSs), occur in northern China, continuous negative temperature anomalies can have a catastrophic impact on wheat yields, resulting in yield losses of20% or more. Without an overwintering period, winter wheat will likely grow excessively fast during the winter and subsequently encounter difficulties in resisting cold injury in the early spring. We assumed that ESCS will occur when the daily temperature remains at least 3 ◦C lower than the climatological daily mean during 5 consecutive days, and analyzed the probability of ESCS in the area with no obvious overwintering periods. We found that the probability of ESCS will increase from 12.5% in the 1961–1990 to 20.5%–25.5% in the forecast period (i.e., 25.5%, 20.5%, and 21.1% under the three scenarios). Additionally, plant diseases and insect pests may increase as well as a result of the climate change, which would further reduce winter wheat yields. Climate change also impacts the winter wheat sowing date. Here, we found that the sowing date was delayed by 0.7–0.8 days/decade on average for the entire North winter wheat area in

1961–1990. Similar results were found in some previous studies; for instance, Xiao et al. observed 1.5 days/decade average delays in the North China Plain during the 1981–2009 period [36]; Liu et al. reported a larger value (3.1 days/decade) for the same period [37]. These large variations in sowing date delays were largely due to differences in study areas and data sources. Nonetheless, our study determined that sowing dates will be delayed by 1.1–1.3, 1.9–2.0, and 2.2–2.4 days/decade in 2021–2050 under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively, suggesting that the sowing date delay rate will further increase in the coming three decades.

In order to predict the northern winter wheat planting boundary, we mainly focused on whether winter wheat could safely survive winter conditions, and calculated an index based on the minimum temperature that could be tolerated by winter wheat. However, this threshold (−22 ◦C) was established for the winter wheat variety of 1980; current varieties are known to tolerate lower temperatures. For example, the strong winter wheat varieties "Dongnong winter wheat No.1," which was introduced and bred in Heilongjiang Province, can withstand a minimum temperature range of −30 to −35 ◦C [38]. Considering that the winter wheat varieties planted in most areas of the North winter wheat area (I) do not have such strong resistance to freezing temperatures, this study still conservatively assumed a −22 ◦C minimum temperature in the winter wheat cultivation northern boundary.

Climate change plays an important role in the distribution of suitable winter wheat cultivation zones in China. This study analyzed the changes of potential winter wheat planting areas in the North winter wheat area (I) from a strict meteorological perspective. However, our results do not necessarily imply that winter wheat can be grown in any area where the meteorological conditions are favorable, as local soil conditions, productivity levels, and agricultural policies are also critical factors that determine the suitability of a region for winter wheat cultivation. Additionally, farmers may adjust to the local conditions by choosing the correct sowing depth, adjusting the sowing date, improving crop varieties, and expanding irrigation infrastructure, all of which may lead to further variations in the actual planting boundaries for winter wheat relative to the meteorological boundaries. Furthermore, farmers will no longer engage in production activities in areas where winter wheat planting has failed repeatedly. As a result, wheat cultivation in the northern boundary has changed very little since 2000 to reduce or avoid climate risks. Nonetheless, the winter wheat meteorological boundary is still an important reference boundary for the exploration of new winter wheat cultivation areas, as winter wheat would be difficult to produce profitably beyond these boundaries. Therefore, actual changes in winter wheat cultivation areas should be considered comprehensively in combination with regional meteorological conditions and human activities in future studies.

#### **5. Conclusions**

This study predicted the potential changes in the northern and southern winter wheat cultivation boundaries during 2021–2050 under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 radiative forcing scenarios in the North winter wheat area and its Spring wheat area. The findings of this study indicate that the occurrence of extremely low temperature years will decline in the North winter wheat area due to climate change, which will result in an increase in the potential safe overwintering areas for winter wheat cultivation in 2021–2050 by 11.8%, 14.7%, and 16.6% under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively, compared to the 1961–1990 reference period. The north boundary will move 0.8–1.3◦ northward on average, and the south boundary will retract 1.9–2.1◦ in latitude.

SD and FD, the two phenological stages before winter, are projected to be delayed in the forecasted period. Compared with the 1961–1990 reference period, the SD is delayed by 8.2–8.5, 7.5–8.0, and 8.8–9.1 days, and FD is delayed by 9.9, 8.5, and 10.3 days in the entire North winter wheat area under the SSP1-2.6, SSP3-7.0, and SSP5-8.5 scenarios, respectively. Moreover, PBW is projected to experience large decreases across the entire region under the three scenarios, especially in the Huang-Huai winter wheat subregion, with reduction

rates above 20%. This increases the likelihood that the southern winter wheat cultivation boundary will recede northwardly.

**Author Contributions:** M.W. and Y.X. analyzed and processed the data. J.Z. and Z.H. were major contributors to the drafting of the manuscript. Z.H. provided financing throughout the experiment and generated the outlines of the research. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was funded by The Strategic Priority Research Program of Chinese Academy of Sciences (XDA23100403) and National Natural Science Foundation China (42171030).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Simulating the Impacts of Climate Change on Maize Yields Using EPIC: A Case Study in the Eastern Cape Province of South Africa †**

**Dennis Junior Choruma 1,2,\*, Frank Chukwuzuoke Akamagwuna <sup>2</sup> and Nelson Oghenekaro Odume <sup>2</sup>**


**Abstract:** Climate change has been projected to impact negatively on African agricultural systems. However, there is still an insufficient understanding of the possible effects of climate change on crop yields in Africa. In this study, a previously calibrated Environmental Policy Integrated Climate (EPIC) model was used to assess the effects of future climate change on maize (*Zea mays* L.) yield in the Eastern Cape Province of South Africa. The study aimed to compare maize yields obtained from EPIC simulations using baseline (1980–2010) weather data with maize yields obtained from EPIC using statistically downscaled future climate data sets for two future periods (mid-century (2040–2069) and late century (2070–2099)). We used three general circulation models (GCMs): BCC-CSM1.1, GFDL-ESM2M and MIROC-ES under two Representative Concentration Pathways (RCPs), RCP 4.5 and RCP 8.5, to drive the future maize yield simulations. Simulation results showed that for all three GCMs and for both future periods, a decrease in maize production was projected. Maize yield was projected to decrease by as much as 23.8% for MIROC, RCP 8.5, (2070–2099). The temperature was projected to rise by over 50% in winter under RCP 8.5 for both future periods. For both future scenarios, rainfall was projected to decrease in the summer months while increasing in the winter months. Overall, this study provides preliminary evidence that local farmers and the Eastern Cape government can utilise to develop local climate change adaptation strategies.

**Keywords:** climate change; agriculture; crop modelling; yield; future climate scenarios

#### **1. Introduction**

Climate change is anticipated to significantly impact the resilience of agricultural systems in semi-arid developing countries such as South Africa. The Intergovernmental Panel on Climate Change (IPCC) has projected that increases in greenhouse gases, particularly carbon dioxide (CO2), are expected to modify global climate by increasing surface air temperature, altering rainfall patterns and increasing the occurrence of extreme weather events [1]. While the increased temperature may boost the yields of some crops in some regions by increasing the rate of biomass accumulation [2], the negative effects of climate change such as increased rainfall variability and droughts are expected to far outweigh the positive benefits of climate change [3]. Several studies have predicted a decline in agricultural productivity in most parts of Southern Africa due to increased rainfall variability and elevated temperatures [4–6].

Maize (*Zea mays* L.) is a staple food in South Africa and vital for food security in the country [7]. However, climate change threatens agricultural productivity in South Africa and hence food security and the livelihoods of many subsistence farmers who rely on maize production for their livelihoods [8,9]. A review by [10] showed that maize was projected to decrease by as much as 8–38% under RCP 4.5 and RCP 8.5 scenarios by the end of the

**Citation:** Choruma, D.J.; Akamagwuna, F.C.; Odume, N.O. Simulating the Impacts of Climate Change on Maize Yields Using EPIC: A Case Study in the Eastern Cape Province of South Africa. *Agriculture* **2022**, *12*, 794. https://doi.org/ 10.3390/agriculture12060794

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 7 May 2022 Accepted: 12 May 2022 Published: 31 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

21st century. Several studies have investigated the impacts of climate change on maize production in South Africa. A study by [11] in Southern Africa using a process-based crop model (APSIM) combined with 17 general circulation models (GCMs) predicted a decrease in future maize yields. However, many of these studies focused on the traditional maize growing areas such as KwaZulu-Natal with limited studies focusing on the Eastern Cape. However, many people rely on maize production for their livelihoods in the Eastern Cape [9]. While the Eastern Cape has been predominantly a livestock producing area due to the semi-arid climate, the government is driving efforts to increase cereal production, especially maize, in an effort to increase the region's food security [9,12,13].

Lately, predicting and evaluating the possible impacts of climate change on crop yields has become important in order to develop effective climate change adaptation strategies in agricultural systems. Early knowledge and understanding of potential climate change effects on crops may help farmers and decision makers to make informed decisions that minimise agricultural production risks and take advantage of opportunities arising from climate change [11]. This knowledge of how the future climate may affect agricultural production is important in semi-arid regions such as South Africa, where water scarcity and increasing frequencies of droughts are already limiting crop production [14] and threatening food security.

One way of predicting and evaluating the effects of future climate conditions on agricultural production is by using crop models. Crop models have gained increasing application in agriculture-related research to enhance crop growth, soil water balance and nutrient management under various climate conditions [15,16]. Crop models have also been used to assess the impacts of climate change on crop production and environmental risks [17,18], and explore potential adaptation strategies [19]. In South Africa, studies have applied crop models in the fields of hydrology and agriculture. For example, Warburton et al. [20] investigated the impacts of climate change on the hydrology of catchments. Abraha and Savage [21] assessed the potential effects of climate change on maize yields in the KwaZulu-Natal area of the country. However, most of these studies in South Africa used global climate data to run the crop models. Global climate data may not always be representative of local climate conditions [22].

When simulating future crop yields, variables such as precipitation and temperature are required as model inputs. General circulation models (GCMs) have been created to use different greenhouse gas scenarios and complex earth–atmosphere interactions to project future climate parameters such as precipitation. GCMs are numerical models that use complex mathematical equations to simulate the earth's atmospheric processes and predict climate [23]. GCMs project climate parameters at a resolution of approximately 250 km2 [24,25]. While accurate predictions can be made at this resolution at the global scale, the resolution is coarse at the local scale to support local decision making and planning [26]. To reduce the uncertainty involved with the use of GCMs, data from GCMs is usually downscaled either statistically or dynamically to produce local climate data or regional climate models (RCMs) that reflect local conditions more accurately [27].

In the dynamic downscaling method, a regional climate model (RCM) is nested into the GCM to represent a given boundary forcing. Statistical downscaling methods use empirical relationships established between large-scale and fine-scale variables using historical data, for example, statistical downscaling uses historically sourced data such as the quantitative links between the state of the larger-scale climatic environment and local variations. In contrast, dynamical downscaling employs boundary conditions (e.g., surface pressure and wind) and an atmospheric circulation system (principle of physics) to generate highresolution data sets [28]. However, the dynamical downscaling method is computationally and technically complex and expensive [29], limiting the number of institutions employing the approach. In this regard, coupling local and regional baseline climate data with statistically downscaled GCM outputs provides an invaluable way of reducing uncertainty associated with climate projections. In this study, freely available climate data, statistically downscaled to reflect local weather more accurately, were used for the climate simulations.

In South Africa, research groups such as the Council for Scientific and Industrial Research (CSIR) and the Climate Systems Analysis Group (CSAG) have developed local downscaled future climate data. However, despite the availability of these locally developed, downscaled climate data, few studies have used these downscaled climate data to assess the impacts of future climate change on crop yields in South Africa [30,31]. Therefore, this study aims to compare current and future maize yields under different future climate scenarios. While the focus of this study was not on climate uncertainty, three climate models were compared to reduce the uncertainty of climate change projections associated with different models that could affect crop response.

#### **2. Materials and Methods**

#### *2.1. Background*

This study follows up on our previous study using the EPIC model in the study area. The previous study [32] provides a detailed description of the model calibration and validation using limited data from field trials on maize at the Cradock Research Farm. This present study applies the calibrated and validated EPIC model to simulate future maize yields using future climate data sets. In this study, only a summary of the model performance will be given. A detailed description of the calibration and validation steps can be found in [32] and additional data on model performance can be found in Appendix A.

#### *2.2. Study Area*

Biophysical data for model calibration were collected from the Cradock Research Farm (Figure 1) in the Eastern Cape province of South Africa (32◦13 11.09 S, 25◦41 11.86 E, elevation 849 m). The area is predominantly fine-loamy mollic ustifluvent [33], with elevated quantities of Beaufort sediments (alluvial sand and silt and colluvial materials). A description of the major soil characteristics at the Cradock Research Farm is given in Appendix A, Table A1. Rainfall in the area is bimodal, with winter rainfall on the western side of the province and summer rainfall on the eastern side. The region receives an average rainfall amount of 341 mm. The area is drought-prone, and since 2015, most of the Eastern Cape has experienced droughts resulting in water supply shortages [34].

**Figure 1.** Map of study area indicating the dominant farming towns in the Eastern Cape Province, South Africa. The figure is taken from [32].

The Eastern Cape has been predominantly a livestock production area due to frequent droughts and semi-arid nature of the region. In addition, the soils are inherently infertile and prone to erosion [9]. However, to improve food security in the region, government, through programmes such as the Massive Food Production Programme (MFPP) has been on a drive to increase maize production in the area [12]. Maize is a staple food in the area and key to enhancing the region's food security.

Projections by the South African Department of Environmental Affairs [35] predict significant increases in climate variability for the region. Substantial reductions in both annual and daily precipitation have been forecasted for the area [34,35]. The yearly temperature is also anticipated to rise, accompanied by elevated evapotranspiration rates and the likelihood of droughts. An assessment of mid-century (2040–2060) CMIP5 rainfall predictions by Mahlalela et al. [34] estimate a levelling of the annual rainfall cycle over the Eastern Cape, with summer becoming drier and winter becoming wetter. Generally, the Eastern Cape is projected to have elevated temperatures, a higher frequency of extreme rainfall events and drier conditions, especially in summer [35].

#### *2.3. EPIC Model Description*

The EPIC model (version 0810) is an agroecosystem model designed to simulate over 70 crops at the field scale using values characteristic of each crop [36]. Crop yield is estimated based on the biomass accumulated by the plant. Biomass accumulation is affected by model parameters such as planting density (PD), photosynthetic active radiation (PAR), vapor pressure deficit (VPD) and the biomass to energy ratio (WA) [37]. The daily stresses caused by extreme temperature, water and nutrient stress or inappropriate aeration are used to correct the potential daily biomass accumulation to daily actual biomass accumulation. The model also requires weather inputs such as precipitation, minimum and maximum temperature, wind speed and relative humidity. Stresses reduce the biomass accumulation and the harvest index using the value of the most severe stress experienced by the crop [38]. To better reflect the specific site conditions, values of location-specific variables such as potential heat units (PHU) accumulated, HI and optimum temperature (OT) have to be adjusted according to the area or region in which the model is to be used [39].

#### *2.4. Field Work*

Field trials on maize were conducted by the Agricultural Research Council (ARC) from 1999 to 2003 at the Cradock Farm to assess the yield potential of hybrid maize cultivars within the Eastern Cape Province of South Africa. Data from these trials were used to calibrate and validate the EPIC model. We selected two fields with similar soil characteristics, one for calibration and one for validation. A randomised block design (RBD) [40], with three replications, was used throughout the field trials. The two fields with similarly performing maize hybrids were managed according to the same agricultural management plan developed by the ARC based on local farmers' management practices. The management plan, including planting and harvesting dates, and irrigation and fertiliser application dates, is shown in Appendix A, Table A2. The management practices were performed around the same time each year. Each year, minor changes to the management plan were carried out based on prevailing weather conditions. In the future climate simulations, management practices including planting dates, fertiliser and irrigation levels used during the maize cultivar evaluation trials were used as the baseline management practices being used in the area.

#### *2.5. Model Inputs*

EPIC requires weather inputs such as rainfall, relative humidity, temperature and solar radiation. We obtained weather files for the study area from the AgMERRA [41] climate dataset at 0.5 × 0.5 arc-degree spatial resolution. Soil parameter values including cation exchange capacity, soil texture, bulk density and electrical conductivity were taken from a previous soil analysis in the Cradock Farm. We selected missing soil parameter values (i.e., soil albedo, organic carbon concentration) from the Harmonized World Soil Database (HWSD) [42] based on the expert opinion given by the Cradock Farm Manager (Mr G. Jordaan 2017, pers. comm).

#### *2.6. EPIC Model Set-Up*

#### 2.6.1. Framework

This study used a modelling framework for the EPIC model developed at the International Institute of Applied Systems Analysis (IIASA) [43]. Raster layers on weather, soil and topography were combined and a modelling scheme applied at 5 × 5 arc-min resolution. A grid was set up for the whole Eastern Cape and then divided into homogenous grids that had similar site properties such as soil texture, weather and elevation. We then chose the grid containing Cradock farm and used one soil profile based on the soil characteristics at the farm [43]. The simulation grid containing the Cradock Research Farm was then chosen for the simulations.

The Priestly–Taylor method was used to calculate the potential evapotranspiration (PET). The Priestly–Taylor method was selected due to the method yielding PET values close to the region's reported values by [44]. The model was run for 31 years, corresponding to the length of the weather records available, with the first 19 years serving as a warm-up period for equilibrating EPIC's soil erosion functions. Agricultural land management in the model was set up according to the dates in the management plan (Appendix A, Table A2). Irrigation and fertiliser applications were carried out in the model using the manual setting. One soil profile (see Appendix A, Table A1) was used for all the simulations.

#### 2.6.2. Model Calibration

The calibration and validation of the model were performed using grain yield data from two fields at the Cradock Farm that had similar soil types. Other data such as biomass accumulation rates and nutrient leaching were not available for model calibration and validation as the trials were only designed to evaluate cultivar stability and potential yield. Detailed steps of the calibration process are given in [32]. Model calibration used data from one field and model validation used grain yield data collected from the other field. Grain yield data were for the five-year period from 1999 to 2003.

#### 2.6.3. Model Evaluation

We used four indicators, namely root mean square error (*RMSE*), the coefficient of determination (*R*2), Nash–Sutcliffe efficiency (*NSE*) and per cent bias (*PBIAS*) to evaluate model efficiency.

$$RMSE = \left[ \frac{1}{n} \sum\_{i=1}^{n} (s\_i - o\_i)^2 \right]^{\frac{1}{2}} \tag{1}$$

$$R^2 = \frac{\left[\sum (O\_i - O\_{\text{mean}}) (S\_i - S\_{\text{mean}})\right]^2}{\sum (O\_i - O\_{\text{mean}})^2 \sum (S\_i - S\_{\text{mean}})^2} \tag{2}$$

$$NSE = 1 - \frac{\sum\_{i=1}^{n} \left(O\_i - S\_i\right)^2}{\sum\_{i=1}^{n} \left(O\_i - O\_{mean}\right)}\tag{3}$$

$$PBIAS = \frac{\sum\_{i=1}^{n} 100(O\_i - S\_i)}{\sum\_{i=1}^{n} O\_i} \tag{4}$$

where *n* represents the sample number, *Omean* the observed mean value and *Smean* the simulated mean value. *Oi* and *Si* are the observed and predicted values of the *i*th observation (*i* = 1 to *n*), respectively. Regarding the *RMSE*, values close to zero signify a good fit between observed and simulated yields [45]. An *RMSE* of zero indicates that the model predicts the observations with complete accuracy. The coefficient of determination, *R*2, has values ranging from 0 to 1, with higher values denoting less error variance [46]. *NSE* varies from negative infinity to 1, with an *NSE* value of 1 representing perfect model fit

between observed and simulated values. In contrast, negative *NSE* values indicate that the mean observed value is a better predictor than the simulated value [46]. The *PBIAS* measures the tendency of simulated data to be larger or smaller than the observed data. *PBIAS* has an ideal value of 0, while positive values indicate model underestimation, and negative values indicate model overestimation [47]. Lastly, the *t*-test evaluated variations between simulated and observed mean values. We considered *<sup>R</sup>*<sup>2</sup> ≥ 0.6, *PBIAS* ≤ ±25% and *NSE* ≥ 0.4 as satisfactory model performance criteria following [48].

#### *2.7. Climate Data*

We used statistically downscaled climate input data from three general circulation models available from the Coupled Model Intercomparison Project Phase 5 (CMIP5) [49]. The climate data were downloaded from the Climate Systems Analysis Group's (CSAG) Climate Information Portal (CIP) (http://cip.csag.uct.ac.za, accessed 27 July 2019). The climate data come from two primary sources—the Computing Centre for Water Resources located at the University of KwaZulu-Natal and the South African Weather Services. Prior to uploading to the CIP, the data are collated and checked for quality by the CSAG [50]. Due to inherent uncertainties in individual models, three GCMs were used to encompass a range of global mean temperature and precipitation changes and consider a wide range of plausible future scenarios. The selected GCMs have been applied previously in South Africa and found to represent the region accurately in terms of projection signal (see [51] for example). The driving GMCs chosen for this study were the BCC-CSM1.1, GFDL-ESM2M and MIROC-ES models (Table 1).

**Table 1.** List of driving GCMs and the model abbreviations used in this study.


For future greenhouse gas emission scenarios, two Regional Concentration Pathways, RCP 4.5 and RCP 8.5, for two future 30-year periods, from 2040–2069 and 2070–2099, were chosen to compare two different possible climate scenarios depending on the level of greenhouse gas emissions. The GCAM modelling team at the Pacific Northwest National Laboratory's Joint Global Change Research Institute (JGCRI) in the United States developed RCP 4.5. It is a stabilisation scenario that stabilises the radiative forcing, overshooting the long-run radiative forcing target level, shortly after 2100 [52,53], whereas RCP 8.5 was created using the MESSAGE model and the IIASA Integrated Assessment Framework by IIASA in Austria. The RCP 8.5 pathway is characterised by increasing greenhouse gas emissions over time and represents a scenario that results in high greenhouse gas levels [54].

We used the weather data for 31 years from 1980 to 2010 for the Cradock Research Farm obtained from the AgMERRA database [41] as input data for the baseline simulation with EPIC. Weather data included daily maximum and minimum temperature and rainfall. In the field trials, the time from physiological maturity to actual harvest date was not recorded. Due to this lack of information on the actual time from physiological maturity to harvest, changes in the length of the growing season under future climate scenarios were not included in the simulations.

#### *2.8. Data Analysis*

The model output variables for the simulations analysed included economic yield in tonnes per hectare (t ha<sup>−</sup>1), seasonal irrigation water applied in millimetres (mm), seasonal evapotranspiration in mm, nitrogen (N) leaching as N lost in percolate in kilogrammes Nitrogen per hectare (kg N ha−1) and water use efficiency (WUE) computed as yield per unit of water use (yield/(rainfall plus irrigation) in kg ha−<sup>1</sup> mm−1). The means of the output variables for the current scenario were compared to the means of the output variables for the future periods. Model variables were analysed using analysis of variance (ANOVA) computed with the Statistical Package for Social Scientists (SPSS) v21. Prior to ANOVA, Shapiro–Wilks and Levene's tests examined the normality and equality of variance. Tukey's post hoc tests were used to determine the means that significantly varied when ANOVA indicated significant differences. An independent samples t-test was performed to test for mean differences in the output variables between the two future periods, 2040–2069 and 2070–2099. The ANOVA and t-tests were conducted in SPSS v21.

#### **3. Results**

#### *3.1. Model Calibration*

Before calibration, the following model performance values were observed: *NSE* = −3.34, *RMSE* = 3.65 and *PBIAS* = 28.55. After calibration the following values were observed: *NSE* = 0.53, *RMSE* = 1.17 and *PBIAS* = 0.31. Table 2 summarises model performance after calibration. For the calibration simulation, the model underestimated yields for all years using default parameters. Adjusting the parameters, Parm 20 (microbial decay rate coefficient), Parm 47 (slow humus transformation rate), Parm 52 (tillage effect on residue decay rate) and WSYF (minimum harvest index) decreased the *RMSE*% from 32.4% to 11.4%, while the *NSE* value increased from negative values to 0.47. Adjusting PHU improved model performance with a PHU value of 2480 producing the smallest *RMSE*% (10.7%) value between observed yields and simulated yields. Further adjustments of PHU from 2480 did not produce any improvement in model performance. After PHU adjustment, model performance came within the range set for satisfactory model calibration (i.e., *R*<sup>2</sup> > 0.6 and *PBIAS* < ±25%). Further calibration of the crop parameters HI and WA was therefore not conducted. The relationship between observed and simulated grain yield is given in Appendix A, Figures A1 and A2.

**Table 2.** Showing Nash–Sutcliff efficiency (*NSE*), root mean square error (*RMSE*) and per cent bias (*PBIAS*) for calibration and validation [32].


#### *3.2. Validation*

Observed yields ranged from 9 t ha−<sup>1</sup> to 14 t ha−1, while simulated yields ranged from 10 t ha−<sup>1</sup> to 12 t ha−1. The following model evaluation statistics were observed: *NSE* = 0.61, *RMSE* = 10.18 and *PBIAS* = −0.2. Model performance was within the set criteria and considered satisfactory. Table 2 summarises model performance for the validation simulation. The model overestimated maize yields for three out of the five years used for validation. In the year 2000, there were unusually high observed maize yields (14.01 t ha<sup>−</sup>1), which were underestimated by the model. In 2003, the trials had low observed yields, which were slightly overestimated by the model. No indications were given in the management records on why there were unusually high observed yields in the year 2000; however, in the year 2003, management records indicated that the trial suffered a heavy weed infestation. No statistical differences were revealed by the Student's *t*-test (alpha = 0.05) between

the observed and simulated mean grain yields. The relationship between observed and simulated grain yield is shown graphically in Appendix A, Figures A3 and A4.

#### *3.3. Climate Data Analysis*

#### 3.3.1. Temperature and Rainfall

All three GCMs revealed average temperature increases from March to October for both scenarios (Figure 2a,b). For RCP 4.5 scenario (Figure 2a), the increase in average temperature was lower than the RCP 8.5 scenario (Figure 2b). The highest monthly average temperature in the RCP 4.5 scenario was 23.7 ◦C in January and February for the model MIROC and approximately 21 ◦C for the GFDL and BCC models. The temperature increase was more prominent in the RCP 8.5 scenario and the MIROC model, where average temperatures in June and July were above 10 ◦C and approximately 6.8 ◦C higher than the baseline average for the two months. In the months from September to December, the temperatures were similar across all three models.

**Figure 2.** Monthly average temperatures for the two 30-year future periods compared to 31 years of baseline data, (**a**) RCP 4.5 and (**b**) RCP 8.5.

With respect to temperature differences from the baseline (Figure 3), the GCMs that had the highest temperature increase for the RCP 4.5 scenario were the MIROC model, with a monthly percentage difference from the baseline of about 51% in July and the GFDL model with peaks of more than 40% in June and July for the period 2070–2099. RCP 8.5 showed higher temperature differences from the baseline compared to RCP 4.5 for both climate models and future time periods. The highest percentage difference from the baseline in RCP 8.5 was given by the MIROC model, reaching a peak of 71% in July.

Regarding rainfall (Figure 4), an increase in winter rainfall was observed from May to July for both RCPs with higher average rainfall values in RCP 8.5 (Figure 3b). The MIROC model showed a different trend for rainfall from the other models for both the RCP 4.5 (Figure 3a) and RCP 8.5 scenarios (Figure 3b) with higher average monthly rainfall for the months September to December, showing peaks of about 70 mm in November (Figure 3b). The baseline, BCC and GFDL scenarios also showed peaks in November in the RCP 8.5 scenario but with rainfall peaks lower than the MIROC model (Figure 3b).

**Figure 3.** Percentage variations from the baseline of average monthly temperatures for the two thirty-year future periods for all three GCMs under the two RCPs, (**a**) BCC RCP 4.5, (**b**) BCC RCP 8.5, (**c**) GFDL RCP 4.5, (**d**) GFDL RCP 8.5, (**e**) MIROC RCP 4.5 and (**f**) MIROC RCP 8.5).

**Figure 4.** Monthly average rainfall for the two 30-year future periods compared to 31 years of baseline data, (**a**) RCP 4.5 and (**b**) RCP 8.5.

#### 3.3.2. Yield Simulations

Simulation results displayed a similar trend among all the three GCMs used in the RCPs. There was a reduction in maize yield, WUE and seasonal irrigation requirements, and an increase in N leaching and seasonal evapotranspiration for all GCMs under the two future periods (Table 3).

**Table 3.** Average model output values and mean comparison test for the different scenarios, climate models and future time periods. Different superscript letters on means in the same column indicate significant differences (*p* < 0.05) revealed by a Tukey's post hoc multiple comparison test. Identical superscript letters on means in the same column indicate no significant differences (*p* > 0.05).


\* Indicates a significant difference at α = 0.05 for independent samples t-test. WUE = water use efficiency, Et = evapotranspiration.

Regarding percentage differences between the baseline and future periods, maize yield decreased by up to 23.8% for MIROC, RCP 8.5, (2070–2099). The largest decrease in seasonal irrigation (13.6%) was for GFDL, RCP 8.5 (2040–2069). For WUE, the most significant percentage decrease (22.7%) occurred under MIROC, RCP 8.5, (2070–2099). Concerning N leaching, a significant percentage increase of 375.4% occurred under GFDL, RCP 8.5 (2070–2099). Table 4 shows the percentage differences (future–baseline) between the simulated mean baseline values and simulated mean future values for yield, WUE, seasonal irrigation requirements and N leaching.

**Table 4.** Percentage differences (future–baseline) between the simulated mean baseline values and simulated mean future values for yield, WUE, seasonal irrigation requirements and N leaching.


#### 3.3.3. BCC Model

In the second future period, 2070–2099, where the gap from the baseline was more highlighted, maize yield was on average equal to 10.3 t ha−<sup>1</sup> for RCP 8.5 and 11.3 t ha−<sup>1</sup> for RCP 4.5. RCP 8.5 2070–2099 gave the most considerable yield difference from the baseline yield (Figure 5a). The seasonal irrigation amount showed a decreasing trend in the future periods compared to the baseline (Figure 5b). The decrease in seasonal irrigation amount was comparable between RCP 4.5 2040–2069, RCP 4.5 2070–2099 and RCP 8.5 2040–2099, with the three periods having similar seasonal irrigation requirements. RCP 8.5 2070–2099 had the largest seasonal irrigation requirement decrease compared to the baseline scenario, with a seasonal irrigation amount 8% lower than the baseline. Future WUE also showed a decreasing trend from the baseline scenario for all future periods (Figure 5c). The largest decrease in WUE was in RCP 8.5 2070–2099, which was 22.7% lower than the baseline WUE. N leaching increased in all future scenarios except in RCP 4.5 2040–2099, where N leaching slightly decreased compared to the baseline scenario (Figure 5d). RCP 8.5 2070–2099 had the largest increase in N leaching compared to the baseline scenario.

**Figure 5.** EPIC model outputs from the simulations using BCC-ESM climate data. Values are plotted and shown for the two 30-year periods compared to the baseline simulation; (**a**) yield, (**b**) seasonal irrigation, (**c**) water use efficiency (WUE), (**d**) N leaching.

#### 3.3.4. GFDL Model

For the GFDL model, crop yield was similar to the baseline yields but slightly lower (Figure 6a). For RCP 4.5 and RCP 8.5, there were only slight differences in yield in the two future periods for the GFDL scenario. The two future scenarios for RCP 8.5 showed lower yields compared to both RCP 4.5 and the baseline scenario. Seasonal irrigation was similar to the baseline period for the two future periods in RCP 4.5. However, both future periods for RCP 8.5 showed a marked decrease in seasonal irrigation amount compared to the baseline scenario. The largest decrease in seasonal irrigation compared to the baseline scenario was observed for 2040–2069 in RCP 8.5 (Figure 6b). WUE slightly decreased in the future climate scenarios ranging from 15.91 kg ha−<sup>1</sup> mm−<sup>1</sup> in RCP 8.5 2070–2099 to 20.61 kg ha−<sup>1</sup> mm−<sup>1</sup> in RCP 4.5 2040–2069 compared to 20.61 kg ha−<sup>1</sup> mm−<sup>1</sup> in the baseline scenario (Figure 6c). N leaching increased in all future climate periods for all the scenarios compared to the baseline scenario (Figure 6d). RCP 8.5 2070–2099 had the largest increase in N leaching with an average of 91.64 kg N ha<sup>−</sup>1.

**Figure 6.** EPIC model outputs from the simulations using GFDL climate data. Values are plotted and shown for the two 30-year periods compared to the baseline simulation; (**a**) yield, (**b**) seasonal irrigation, (**c**) water use efficiency (WUE), (**d**) N leaching.

#### 3.3.5. MIROC Model

The MIROC model showed a similar trend of decreasing yield for all the future periods with respect to the baseline period. Maize yield decreased by up to 23% in RCP 8.5 2070– 2099 (Figure 7a). Seasonal irrigation also reduced significantly in the future periods for all RCPs. Seasonal irrigation amount decreased by up to 13% in RCP 4.5 2040–2069 and the two time periods for RCP 8.5 compared to the baseline period (Figure 7b). For WUE, the model simulated a slight decrease over time, particularly in RCP 8.5 2070–2099 (Figure 7c). N leaching increased for all future periods compared to the baseline scenario. RCP 8.5 2070–2099 had the most significant increase in N leaching compared to all the other periods for all three models (Figure 7d).

**Figure 7.** EPIC model outputs from the simulations using MIROC climate data. Values are plotted and shown for the two 30-year periods compared to the baseline simulation; (**a**) yield, (**b**) seasonal irrigation, (**c**) water use efficiency (WUE), (**d**) N leaching.

#### **4. Discussion**

#### *4.1. EPIC Model Calibration and Validation*

Notwithstanding the limited data available to calibrate and validate the model in this study, the calibration results revealed satisfactory agreement between observed and simulated yields. In the initial simulation with default parameters, the agreement between observed and simulated crop yields was unsatisfactory, suggesting the need for calibration. After adjustment of site-specific model parameters, the model performance improved, showing the value of calibrating models with parameters that are site-specific. Our results provide further evidence to support previous studies that have demonstrated that adjusting parameters with local-scale data increase can increase the accuracy of simulations and reduce model uncertainties considerably [55]. For example, Xiong et al. [56] and Angulo et al. [57] demonstrated that fine-tuning PHUs to local conditions could significantly improve model simulation accuracy. In this study, model simulations improved on adjusting the PHU value. The PHUs are closely related to biomass growth and its final yield allotment, indicating the substantial influence of PHU adjustment on simulated crop yields.

Trials conducted in the USA by Williams et al. [58] showed that the PHUs required for maize to reach maturity ranged between 1000–2900. In this study, 2480 PHUs brought model performance into the range set for satisfactory model calibration. The ARC in South Africa states that maize typically requires 120 days to mature from the day of planting. However, this period is hugely dependent on weather conditions and 120 days is generally for the warmer traditional maize growing regions in South Africa such as KwaZulu Natal [59]. The Cradock area is relatively cooler than the traditional maize growing regions in South Africa, which may account for the higher PHU value found in this study.

Concerning the HI, the default value in the EPIC model is 0.5, which is representative of HI values for improved high yielding maize varieties [60], similar to the varieties used in the field trials for this study. The default HI value of 0.5 used in this study, has been used in studies such as those by [39,61].

In this study, we did not adjust the biomass to energy ratio (WA) since adjusting PHUs improved the model performance considerably to within the range set for satisfactory model calibration. For example, PHU calibration gave an *RMSE* of 1.17 kg ha−<sup>1</sup> and *PBIAS* of 0.31 between observed and simulated yields. The small *RMSE* and *PBIAS* values suggested that no additional WA and HI adjustments were required since the conditions for satisfactory model performance had been met. Regarding WA, we left WA at the default value of 40 kg ha−<sup>1</sup> MJ−<sup>1</sup> m2. Similar studies have also used the value of 40 kg ha−<sup>1</sup> MJ−<sup>1</sup> m2 for WA (see, e.g., [39,62]). The biomass to energy ratio can significantly influence crop yields [63], and [36] explains that WA can substantially alter crop growth and yield rate. Reference [36] further emphasises that WA should be adjusted only as a final resort and based on experimental data.

The EPIC model potentially overestimates yields, even at low observed yields during calibration and validation (see [32]. Studies conducted by [64,65] also found that the EPIC model tended to overestimate low observed yields. It has been suggested that the overestimation of plant available water at field capacity could potentially lead to the overestimation of yields in the dry years by the EPIC model (see [66]). Thus, Kiniry et al. [66] proposed measuring the maximum depth of water extraction using local cultivars as a solution. However, the solution was not applied as it is beyond the scope of the calibration and validation study. The overestimation observed in this study may be attributed to the influence of weed outbreaks. Agricultural management records used during the field trials note that in 2003 the maize fields were affected by heavy weed outbreaks. At the time of model calibration and validation, the EPIC model had not yet been developed to accurately account for competition from weeds [67]. As such, competition from weeds was not accounted for in the simulations, which may explain why the model overestimated the low yields observed in 2003.

#### *4.2. Climate Change Impacts on Maize Yield*

Model ensemble results predicted a decrease in maize yield for all future scenarios with a more pronounced reduction in RCP 8.5 2070–2099. This decrease can be attributed to an increased temperature that would shorten the growth stage of the maize crop. Increased temperature increases the rate of accumulation of growing degree days, thereby influencing growth duration. Several studies have shown that temperature increases lead to early crop maturing, allowing less time to accumulate biomass and form grain yield [68–70]. The projected decrease in maize yield in this study agrees with other studies in Southern Africa. For example, studies by [71] projected decreases in maize yield in Zimbabwe under irrigated and rain-fed agriculture. In their study, [71] used the CERES model driven by GCMs (specifically the GFDL and the Canadian Climate Centre Model). Walker and Schulze [72] also studied the response of smallholder maize production in Potshini village, KwaZulu-Natal, South Africa, up to the late 21st century climates. The study by [72] projected a decrease in average maize yields of approximately 30% and showed that more efficient management of fertiliser and manure applications would be a viable management strategy to adapt to climate change.

A study by [73] in Ethiopia for mid-century maize production projected a shortening of maize maturity period by approximately 9–13% due to elevated temperatures. The reduced maturity period would reduce the amount of time the maize crop was able to capture solar radiation and assimilate carbon dioxide, resulting in a reduction in biomass and yield accumulation [74]. Other studies such as those by [75,76] have reported that photosynthesis is affected by elevated temperatures and low water availability, which in turn can reduce the yield. In this study, projections showed an increase in temperature and decrease in rainfall during the early growing season, leading to a reduction in yield. Although rainfall is predicted to be lower in a portion of months in the growing season, studies have shown that maize requires the right amount and distribution of rainfall [77,78]. In this study, GCM projections predicted low rainfall in the critical growing months for maize. While there was an increase in rainfall in winter, the maize plant would already have been affected by water stress, and hence the reduction in yield.

Rainfall can also influence crop yield as water is key to crop growth and development. In this study, rainfall was predicted to decrease in the early months of the maize growing season. Similar to this study [79], found a shift in precipitation during the growing season. The shift in precipitation may affect yield as studies have shown maize to be sensitive to moisture amount and distribution [80]. Furthermore, the decrease in rainfall projected has implications for food production as rainfall supplements irrigation in the study area. Rainfall is the ultimate source of irrigation water in the study area. A reduction in rainfall would lead to decreased flows in the Great Fish River, leading to further water shortages in an already water-scarce area. Further water shortages would significantly impact food production in the area as the Great Fish River supplies most of the irrigation water used by conventional farmers in the area. A previous study by [81] showed that rain-fed maize yields in the Eastern Cape are very low without irrigation even when sufficient fertiliser is provided.

Regarding nitrate leaching, all future simulations predicted significant increases in N leaching. Generally, increases in temperature accelerate phenological development, leading to a shorter growing period and less nutrient uptake. The shorter growing period, coupled with the increased rainfall towards the end of the growing found in this study, can explain the increased leaching for the future period. The increased N leaching found in this study is similar to the findings of [16]. In the study by [16], under future climate scenarios, nitrate leaching was found to increase significantly compared to the baseline scenario. He et al. [16] attributed the increased leaching to the future high temperature stress and increased precipitations, explaining that the high temperature stress and increased precipitations resulted in low crop N removal and increased drainage. Without matching the amount of fertiliser applied to crop N needs, excess N can be lost to the environment through leaching. This indicates the need to take into consideration the impacts of climate change on N leaching when developing future agricultural land management strategies aimed at maximising the use of N by plants and minimising N losses to the environment.

Considering the predicted impacts of climate change in the study area, farmers may need to obtain financial and technical support to implement on-farm water adaptation strategies such as rainwater harvesting and the use of field water conservation strategies such as mulching. Several studies analysing climate and weather trends in South Africa have shown that average temperatures in the country have increased in the last decades [35,82,83]. A study by [50] on observed and modelled trends for rainfall and temperature for South Africa found significant increases in temperature and rainfall variability in the Eastern Cape. Temperature increases and the decreased rainfall season length predicted in this study suggest that short-term growing maize varieties and drought-tolerant maize varieties may be needed in the Eastern Cape if crop production is to be sustained.

It is worth noting that we did not consider farmers implementing agricultural land management strategies aimed at minimising the effects of climate change in the simulations. This is unlikely to be the case in practice. Agroecosystems are human-managed, and farmers have a variety of possible adaptation options [84,85]. While the study did not show

possible yield changes due to the implementation of climate change adaptation measures, the study does provide a clear picture on maize yield and N leaching rates if no climate change adaptation measures are taken. While there is uncertainty associated with climate projections, several studies in sub-Saharan Africa (e.g., [86,87]) have shown that projections of climate impacts appear robust across model ensembles [11].

However, the results of climate impact studies should not be taken in absolute terms but rather as possible pathways for the future of maize production in the Eastern Cape. Decision makers should consider other factors that may influence crop yield. In this study, the combined influence of other factors such as the development of pests and disease on crop yield was assumed to be fully controlled through appropriate management practices. This study's results can be used by farmers and policymakers to plan how to adapt to the projected increases in temperature and decreased rainfall. It is vital to develop adaptation strategies that consider the projected increases in temperature and minimise N leaching. N leaching represents an economic loss to farmers (N fertiliser not utilised by plants) and a potential water pollutant. It is recommended that studies that test the effectiveness of adaptation strategies and current and future climate scenarios using the EPIC model be carried out in the region.

#### **5. Limitations of the Study**

Downscaled climate projections inescapably inherit uncertainties from GCMs. Sources of uncertainty arise from internal variability of the model, the greenhouse gas emission scenario used (RCPs), the statistical downscaling process and imperfections in the GCMs from which the downscaled data were derived. Other sources include using only one crop model (EPIC) to project the impacts of climate change on crop yield. Asseng et al. [88] suggested that ensembles of many crop models could give a better estimate of yield than using one model. However, the use of multiple models was beyond the scope of this study.

The results of climate change effects are prone to many uncertainties resulting from the limited knowledge of underlying geophysical processes of global change (GCM uncertainties) and uncertain future scenarios (emission scenario uncertainties) [19]. Uncertainties in climate projections with respect to climate models can have significant impacts on crop model outputs [89,90]. To reduce uncertainties associated with individual climate models, three different models under two contrasting climate scenarios were selected to capture the full range of changes in temperature and precipitation projected by the models. Reference [91] states that emission scenario uncertainties are less relevant until the middle of the 21st century; hence, the 2040–2069 scenario was chosen as the starting period for future climate simulations.

In this study, carbon dioxide (CO2) fertilisation effects were not considered due to the lack of site-specific annual data on future CO2 levels for the periods used in the scenarios. Klein [89] explains that model equations are all subject to variability and uncertainty. As a result, processes included in simulation models, such as CO2 fertilisation effects, may not always be fully understood or well implemented. For example, Free Air Carbon Enrichment (FACE) experiments indicate productivity increases due to increased CO2 levels but do not address important co-limitations arising from water and nutrient availability [89]. The magnitude of crops' responses to increased CO2 levels is thus uncertain and the subject of current debates among researchers [2,92–94]. Biernath et al. [95] argue that many crop models are currently unable to capture the complex underlying processes associated with CO2 fertilization and are therefore unable to reproduce experimental results.

Additionally, we assumed crop management such as fertilisation to be similar across the future periods, which may not be the case in reality as farmers adapt to changing farming conditions. Additionally, by considering one maize cultivar, we assumed the single cultivar would give similar responses to the impacts of climate change as those of different cultivars.

#### **6. Conclusions**

EPIC simulations predict that climate change will negatively affect maize production and environmental water quality in the Eastern Cape. Maize yields are projected to decrease, accompanied by an increase in N leaching. Mitigating the future impacts of climate change will be vital to enhancing food security in the region. Models such as EPIC can help predict and anticipate the possible effects of climate change on crop production and help plan appropriate agricultural land management responses that contribute to sustainable food production in South Africa. In this regard, this study's results have demonstrated that the EPIC model can be considered a valuable tool for exploring the future impacts of climate change on crop yields and the environment. Future studies using EPIC should test the effectiveness of various crop rotation and intercropping strategies based on farmers' current crop rotation and intercropping strategies.

**Author Contributions:** Conceptualization, D.J.C.; methodology, D.J.C.; software, D.J.C.; validation, D.J.C.; formal analysis, D.J.C.; investigation, D.J.C.; resources, N.O.O.; data curation, D.J.C. and N.O.O.; writing—original draft preparation, D.J.C.; writing—review and editing, D.J.C., F.C.A. and N.O.O.; visualization, D.J.C. and F.C.A.; supervision, D.J.C.; project administration, D.J.C. and N.O.O.; funding acquisition, D.J.C. and N.O.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This article is the outcome of research conducted within the Rhodes University African Studies Centre (RASC), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany is Excellence Strategy–EXC 2052/1–390713894 and the National Research Foundation (NRF) of South Africa under the Southern African Systems Analysis Centre (SASAC) initiative.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A**

Supplementary information on the calibration and validation of the EPIC model.

**Table A1.** Representative soil characteristics of the Cradock Research Farm used as inputs into the Environmental Policy Integrated Climate (EPIC) model (obtained from [32]). Clay, sand, silt, soil organic carbon units are in percentages, whereas bulk density, soil organic carbon and ion exchange capacity are in g cm−<sup>3</sup> m and (cmol (+) kg<sup>−</sup>1), respectively.



**Table A2.** Showing the agricultural management plan used during the study period (table obtained from [32]).

<sup>1</sup> The dates given in the table are not fixed for each year. They indicate the approximate times of year each management activity was carried out during the trial period.

**Figure A1.** Showing the crop yields (observed and simulated) since the model was in the range set for acceptable model calibration for the study period after PHU calibration (figure obtained from [32]).

**Figure A2.** Showing the simulated crop yields on observed maize yields with the calibrated maize crop file (figure obtained from [32]).

**Figure A3.** Simulated yields in the validation simulation using the calibrated model (figure obtained from [32]).

**Figure A4.** Simulated crop yields (t ha<sup>−</sup>1) regression result on observed maize yields (figure obtained from [32]).

#### **References**


## *Article* **Responses of Soybean Water Supply and Requirement to Future Climate Conditions in Heilongjiang Province**

**Na Li 1, Tangzhe Nie 1,2,\*, Yi Tang 1, Dehao Lu 1, Tianyi Wang 3, Zhongxue Zhang 2,3, Peng Chen 4, Tiecheng Li 2,3, Linghui Meng 5, Yang Jiao <sup>1</sup> and Kaiwen Cheng <sup>1</sup>**


**Abstract:** Understanding future changes in water supply and requirement under climate change is of great significance for long-term water resource management and agricultural planning. In this study, daily minimum temperature (*Tmin*), maximum temperature (*Tmax*), solar radiation (*Rad*), and precipitation for 26 meteorological stations under RCP4.5 and RCP8.5 of MIRCO5 for the future period 2021–2080 were downscaled by the LARS-WG model, daily average relative humidity (*RH*) was estimated using the method recommended by FAO-56, and reference crop evapotranspiration (*ET*0), crop water requirement (*ETc*), irrigation water requirement (*Ir*), effective precipitation (*Pe*), and coupling degree of *ETc* and *Pe* (*CD*) for soybean during the growth period were calculated by the CROPWAT model in Heilongjiang Province, China. The spatial and temporal distribution of these variables and meteorological factors were analyzed, and the response of soybean water supply and requirement to climate change was explored. The result showed that the average *Tmin*, *Tmax*, and *Rad* under RCP4.5 and RCP8.5 increased by 0.2656 and 0.5368 ◦C, 0.3509 and 0.5897 ◦C, and 0.0830 and 0.0465 MJ/m2, respectively, while the average *RH* decreased by 0.0920% and 0.0870% per decade from 2021 to 2080. The annual average *ET*0, *ETc*, *Pe*, and *Ir* under RCP4.5 for 2021–2080 were 542.89, 414.35, 354.10, and 102.44 mm, respectively, and they increased by 1.92%, 1.64%, 2.33%, and −2.12% under the RCP8.5, respectively. The ranges of *CD* under RCP4.5 and RCP8.5 were 0.66–0.95 and 0.66–0.96, respectively, with an average value of 0.84 for 2021–2080. Spatially, the *CD* showed a general trend of increasing first and then decreasing from west to east. In addition, *ET*0, *ETc*, and *Pe* increased by 9.55, 7.16, and 8.77 mm per decade, respectively, under RCP8.5, while *Ir* decreased by 0.65 mm per decade. Under RCP4.5 and RCP8.5, *ETc*, *Pe*, and *Ir* showed an overall increasing trend from 2021 to 2080. This study provides a basis for water resources management policy in Heilongjiang Province, China.

**Keywords:** climate change; soybean; CROPWAT; reference crop evapotranspiration (*ET*0); crop water requirement (*ETc*); irrigation water requirement (*Ir*)

#### **1. Introduction**

Global climate change, marked mainly by climate warming, has taken place [1]. Undoubtedly, this change has had and will continue to have an important impact on agricultural water resources on which crop growth depends [2]. In addition, climate

**Citation:** Li, N.; Nie, T.; Tang, Y.; Lu, D.; Wang, T.; Zhang, Z.; Chen, P.; Li, T.; Meng, L.; Jiao, Y.; et al. Responses of Soybean Water Supply and Requirement to Future Climate Conditions in Heilongjiang Province. *Agriculture* **2022**, *12*, 1035. https://doi.org/10.3390/ agriculture12071035

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 23 May 2022 Accepted: 14 July 2022 Published: 15 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

factors, such as relative humidity (*RH*), solar radiation (*Rad*), and CO2 concentration, have a significant effect on crop water requirements (*ETc*) [3]. Moreover, the uncertainty of temporal and spatial distribution of precipitation (*P*) and *ETc* affects crop irrigation water requirement (*Ir*) [4]. Therefore, analyzing the impact of future climatic changes on crop water supply and requirement becomes necessary [5].

Future climate change is expected to affect water supply and requirement in a number of ways [6]. Climate change mainly affects the transpiration of plants, evaporation of water from the soil and field surface between plants, and *P* in the agricultural water cycle system [7]. *Rad* is the largest source of energy required for soil water vaporization during evapotranspiration, which converts a large amount of liquid water into water vapor. *Rad* absorbed by the atmosphere and heat emitted from the surface increase the atmospheric temperature [8]. The sensible heat around the atmosphere transmits energy to the crop to control the evapotranspiration rate, and the increase in soil surface temperature promotes evaporation [5]. The water vapor pressure difference between the evapotranspiration surface and the atmosphere is the decisive factor for water vapor movement [9]. The increase in *RH* leads to the saturation of air humidity, forming a protective layer on the field surface, thus reducing the evapotranspiration requirement [10]. However, the increase in CO2 concentration will also promote the accumulation of crop dry matter, promote plant growth, and increase transpiration [11]. *P* increases soil water content, replenishes the total effective soil water, improves the plant root water absorption rate, and helps to reduce *Ir* while meeting the needs of crop evapotranspiration [12]. Some researchers found that the *RH* in Zimbabwe areas would decrease in the future, while the average temperature, *Rad* and wind speed would increase, resulting in an increase in *ET*<sup>0</sup> and *ETc*; however, the decrease in *P* would eventually lead to the increase of *Ir* [13]. In contrast, studies in North China Plain (NCP) found that *ETc* and *Ir* decreased with increasing temperature, *Rad* and *P*, shorten of growth period [14]. In many studies from different regions, the relationships between *ETc*, *Pe*, and *Ir* varies under climate change. Therefore, more in-depth studies are needed to assess the impact of future climate change on crop water supply and requirement.

*ETc* constitutes a major component of regional and global hydrological cycles and, therefore, has important implications in the use of agricultural irrigation water, as well as in analyzing the crop water supply and requirement relationship in agricultural ecosystems [15]. There are many methods for calculating *ETc*, such as the empirical estimation method, the Penman–Monteith (P–M) double-crop coefficient method, and the P–M singlecrop coefficient method [16]. The empirical formula for estimating *ETc* is simple and convenient; however, it is only suitable for local instead of large-scale areas [17]. When using P–M double-crop coefficient method to estimate *ETc*, the crop coefficient is divided into basic crop coefficient (*Kcb*) and soil evaporation coefficient (*Ke*), although the estimation accuracy of *ETc* is improved [18]; however, the estimation of *Ke* is complex and uncertain, which needs the support of a large amount of experimental data [19,20]. The parameters required by the P–M single-crop coefficient method are easy to obtain, which can be directly substituted into the formula for calculation. The calculated *ETc* has less difference with the measured ETc. Generally, the P–M single-crop coefficient method has strong universality in different regions and is considered to be a more efficient, convenient, and accurate method [21,22]. Therefore, most scholars use the P–M single-crop coefficient method recommended by the Food and Agriculture Organization of the United Nations (FAO) to calculate *ETc* [23]. Nie et al. estimated rice *ETc* in Heilongjiang province using the CROPWAT model based on the P–M single-crop coefficient method [24]; the calculated *ETc* was only 21–30 mm different from the measured *ETc* in the field experiment. In order to test the practicability and rationality of the P–M single-crop coefficient method for calculating *ETc*, Jin et al. calculated wheat *ETc* in the Huaihongxinhe Irrigation District using the P–M single-crop coefficient method and found that the average difference in *ETc* for the years was only 6 mm [25].

Quantitative estimation of temporal and spatial variability of *ETc*, *Pe*, and *Ir* under climate change is helpful to maximize the use of rainwater resources and optimize regional water resource allocation [11,14]. *P* is the main influencing factor of soil moisture content, which provides water for crop evapotranspiration [26]. *Ir* depends on soil moisture content [27]. Therefore, there is a complex relationship between *Pe* and *Ir*, which cannot be fully explained by simple linear equations [28]. In addition, the relationship among *ETc*, *Pe*, and *Ir* is also affected by *P* distribution pattern, crop species, and planting area [11]. In the Jayakwadi command area, India, the *ETc* of major crops and *Pe* increased during the growth period, resulting in less *Ir* under climate change [29]. In the Najafabad plain in Iran, *ETc* increased and *Pe* decreased during the growth period of major crops; therefore, more water needed to be irrigated [30].

As one of the largest developing countries in the world, China constitutes 22% of the world's population and encompasses 9% of the world's arable land [31]. Heilongjiang Province has the largest arable land area in China and is also an important commercial grain base in China [32]. The soybean sowing area and yield in Heilongjiang Province rank first in China, with a sowing area of 4.279 × 106 ha and yield of 7.808 × 106 tons as of 2019 [33]. Soybean sowing area increased by an average of 5 × 105 ha per year in the last 5 years. The climate distribution in Heilongjiang province leads to great differences in temporal and spatial distribution of crop water supply and requirement, and agricultural drought occurs frequently in spring and summer [34]. With the increase in soybean planting area and soybean export share, the study on soybean water supply and requirement under future climatic conditions is of great guidance to ensure soybean production and food security in Heilongjiang Province [35].

The purpose of this study was (1) to clarify the spatial and temporal distribution characteristics of *ET*0, *ETc*, *Pe*, *Ir*, and *CD* during the soybean growth period for 2021–2080 under RCP4.5 and RCP8.5 in Heilongjiang Province, and (2) to reveal the response of soybean water supply and requirement to climate change for 2021–2080 under RCP4.5 and RCP8.5. This study will provide reasonable planning for water allocation and guide the sustainable development of agricultural irrigation water use in Heilongjiang Province.

#### **2. Materials and Methods**

#### *2.1. Study Region and Datasets*

The study area is located in Heilongjiang Province, Northeast China, where 26 meteorological stations are located relatively evenly throughout the study area for observations (Figure 1). The area belongs to the cold temperate and temperate continental monsoon climate, with an average annual temperature of 4.52 ◦C, an average annual solar radiation of 13.72 MJ/m2, and an average annual *P* of 511 mm. According to the "Heilongjiang Province Crop Variety Cumulative Temperature Zone Plan" [36] and "Heilongjiang Province 2015 Regional Layout Plan for High-Quality and High-Yielding Major Food Crops" [37] issued by the Heilongjiang Provincial Agriculture Committee, the sixth cumulative temperature zone is not suitable for soybean cultivation; therefore, the sixth cumulative temperature zone is not included in this study.

We used the general circulation model (GCM) of MIRCO5 with a resolution of 1.39◦ × 1.41◦ and selected two representative concentration pathways (RCP4.5 for the low-radiation scenario and RCP8.5 for the high-radiation scenario) according to the socioeconomic conditions of the radiative forcing currently faced by humans. The minimum temperature (*Tmin*), maximum temperature (*Tmax*), *Rad*, and *P* data from 26 stations of the China Meteorological Administration (CMA) from 1960–2015 were imported into the LAR-SWG stochastic weather generator model to generate future climate datasets. The dataset includes daily *Tmin*, *Tmax*, *Rad*, and *P* for 26 meteorological stations under RCP4.5 and RCP8.5 for the future period (2021–2080). The period 2021–2080 was divided into three time periods: the 2030s (2021–2040), 2050s (2041–2060), and 2070s (2061–2080). Under RCP4.5 and RCP8.5, average *RH* was estimated using the method recommended by FAO-56.

**Figure 1.** Study area and distribution of 26 meteorological stations in Heilongjiang Province.

#### *2.2. Division of Soybean Growth Period*

The FAO divides the crop growth period into four stages: initial stage (*Lini*), crop development stage (*Ldev*), mid-season stage (*Lmid*), and late stage (*Llate*); the crop coefficients in each growth stage are *Kcini*, *Kcmid* and *Kcend*. In this study, the whole growth period of soybean was divided into sowing to three-leaf stage (*Lini*), three-leaf stage to flowering stage (*Ldev*), flowering stage to podding stage (*Lmid*), and podding stage to maturity stage (*Llate*). The crop coefficients (*Kc*) were based on the irrigation series "Crop Guide to Crop Water Requirements" published by FAO-56 and corrected using the method recommended by FAO-56 [38,39]. It was assumed that the soybean variety would not change in the future period. According to the observation data of soybean growth period from 1994 to 2005 at 19 agrometeorological observation stations in Heilongjiang Province, the soybean sowing date and the length of each growth stage were determined. The data of the adjacent agrometeorological observation stations in the same temperature accumulation area were selected as the calculation basis, as shown in Table 1.

**Table 1.** Average soybean growth period data in 1994–2005.



**Table 1.** *Cont.*

#### *2.3. Soil Parameters*

Parameters such as soil type, total available soil moisture, maximum rain infiltration rate, and maximum rooting depth were obtained from the Harmonized World Soil Database (HSWD). To improve the accuracy of the model simulation results, the initial soil moisture depletion and initial available soil moisture were adjusted according to the "10 day dataset of crop growth and development and farmland soil moisture in China" from the China Meteorological Data Network (http://data.cma.cn, accessed on 22 May 2022). The obtained soil data were input into the CROPWAT model, and the initial soil water content for each year thereafter was taken as the last day of the previous year.

#### *2.4. Effective Precipitation (Pe)*

For upland crops, *Pe* refers to the total precipitation that can be stored in the crop root layer to meet the crop's water needs, excluding surface runoff and leakage below the crop root layer. In this study, we used the method recommended by the United States Department of Agriculture (USDA) to calculate *Pe*. The formula was as follows:

$$P\_{\varepsilon} = \begin{cases} \begin{array}{l} P(125 - 0.6P) / 125 \\ 125/3 + 0.1P \end{array} & \begin{array}{l} (P \le 83.3 \text{ mm}) \\ (P > 83.3 \text{ mm})' \end{array} \tag{1}$$

where *Pe* is the effective precipitation (mm), and *P* is precipitation (mm).

#### *2.5. Crop Water Requirement (ETc)*

Soybean *ETc* was calculated using the CROPWAT model as a function of the loading altitude, latitude, longitude, *Tmin*, *Tmax*, *Rad*, and *RH* data from each station into the "climate/*ET*0" module to calculate *ET*0. The sowing date, harvest date, *Kc*, and length of each growing period were loaded into the "crop" module to calculate *ETc*. Soybean *ETc* was calculated using the single-crop coefficient method recommended by FAO-56. *ETc* was calculated from *ET*<sup>0</sup> and *Kc* using the equation under standard conditions, where *ET*<sup>0</sup> was considered as the key variable for the estimation of *ETc*. Standard conditions mean that there were no limitations to crop growth, including a sufficient supply of water and crops free from diseases and pest infections.

$$ET\_{\mathfrak{c}} = \mathbb{K}\_{\mathfrak{c}} \times ET\_{0\prime} \tag{2}$$

where *ET*<sup>0</sup> is the reference crop evapotranspiration (mm), *Kc* is the crop coefficient (dimensionless), and *ETc* is the crop water requirement (mm).

*ET*<sup>0</sup> was calculated using the P–M formula recommended by FAO; thus,

$$ET\_0 = \frac{0.408\Delta \times (R\_\text{n} - G) + \frac{900\gamma \times \nu\_2 \times (\varepsilon\_s - \varepsilon\_a)}{(T + 273)}}{\Delta + \gamma \times (1 + 0.34\nu\_2)},\tag{3}$$

where *ET*<sup>0</sup> is the reference crop evapotranspiration (mm·day<sup>−</sup>1), <sup>Δ</sup> is the slope of the vapor pressure curve (kPa· ◦C<sup>−</sup>1), *Rn* is the net radiation at the crop surface (MJ·(m2·day<sup>−</sup>1)), *<sup>G</sup>* is the soil heat flux density (MJ·(m2·day−1)), *<sup>γ</sup>* is the psychrometric constant (kPa· ◦C−1), *T* is the mean daily air temperature at 2 m height (◦C), *u*<sup>2</sup> is the wind speed at 2 m height (m·s−1), *es* is the saturation vapor pressure, *ea* is the actual vapor pressure (kPa), *es* − *ea* is the saturation vapor pressure deficit (kPa), and 900 is a conversion factor.

#### *2.6. Irrigation Water Requirement (Ir)*

The daily soil water balance equation was used to calculate *Ir*. Irrigation quota should be less than or equal to the root-zone water consumption to avoid deep leakage loss. The calculation formula is as follows:

$$I\_{r,i} = D\_{r,i-1} + ET\_c - D\_{r,i} - P\_{c\_i,i} \tag{4}$$

where *Ir*,*<sup>i</sup>* is the irrigation water requirement on day *i*, *Dr*,*<sup>i</sup>* <sup>−</sup> <sup>1</sup> is the water consumption of the root zone on day *i* − 1, *ETc* is crop water requirement, *Dr*,*<sup>i</sup>* is the water consumption of the root zone on day *i*, and *Pei* is the *Pe* on day *i*.

#### *2.7. Climate Tendency Rate*

The climate tendency rate is the changing rate of each variable every 10 years. A positive climate tendency rate indicates an increasing trend of the corresponding variable, while a negative value indicates a decreasing trend. By using the least-square method, the changing trend of variable can be expressed by a linear equation formulas follows:

$$
\Upsilon\_{\mathbb{I}} = at + b,\tag{5}
$$

where *Yt* is represents the fitted values of each variable, *t* is the corresponding year, and *a* and *b* are regression coefficients.

#### *2.8. Coupling Degree of ETc and Pe (CD)*

During the soybean growth period, the degree to which *Pe* meets *ETc* is called the coupling degree between *ETc* and *Pe*. The calculation equation is as follows:

$$
\lambda\_i = \begin{cases} 1 & \text{( $P\_\varepsilon \ge ET\_\varepsilon$ )} \\ P\_\varepsilon / ET\_\varepsilon & \text{( $P\_\varepsilon < ET\_\varepsilon$ )} \end{cases} . \tag{6}
$$

#### *2.9. Mann–Kendall Trend Test*

The Mann–Kendall trend test is a nonparametric statistical method used to reveal how a variable changes with time, introduced by the World Meteorological Organization. Positive and negative values of the statistical variable *Z* indicate the data changing trend; if the absolute value of *Z* is greater than 1.64, 2.32, and 2.56, it means that the data have passed the significance test of 95%, 99%, and 99.9% for reliability [40]. This study used this method to test the changing trend of *ET*0, *ETc*, *Ir*, *Pe*, and *CD* during the soybean growth period.

#### *2.10. Data Processing*

The reduced-dimension downscaled dataset was processed by Codeblocks20.03 [41] open-source software, which made the data schema acceptable to the CROPWAT model. The CROPWAT8.0 [42] model was used to calculate *ETc*, *Pe*, and *Ir* under future climate conditions at 26 meteorological stations in Heilongjiang Province. Matlab R2019a [43] was used to perform Mann–Kendall trend tests of *ET*0, *ETc*, *Pe*, *Ir*, and their climate tendency rates under future climatic conditions, and the inverse distance weighting (IDW) method in the spatial analysis toolbox of Arcmap 10.2 was used to spatially interpolate and mapping at a resolution of 0.04◦ × 0.04◦. We used SPSS25.0 [44] to process the correlation analysis

of *Tmin*, *Tmax*, *RH*, *Rad*, *ET*0, *ETc*, *Pe*, and *Ir*, as well as the analysis of variance (ANOVA) of *Tmin*, *Tmax*, *Pe*, *RH*, *Rad*, *ET*0, *ETc*, *Ir*, and *CD*.

#### **3. Results**

#### *3.1. Spatial and Temporal Variation of Future Meteorological Factor*

*ET*<sup>0</sup> during the soybean growth period was driven by interacting effects of different climate factors. Therefore, a detailed analysis of changes for each meteorological factor was conducted (Figures 2 and 3). Average *Tmax* under RCP4.5 and RCP8.5 showed a significant increasing trend, *Rad* showed an increasing and then decreasing trend, and average *RH* showed a decreasing and then increasing trend (Figure 2). Under RCP4.5 and RCP8.5, the average *Tmax* started from 25.19 and 25.36 ◦C in the 2030s, respectively, and increased significantly to 26.76 and 28.02 ◦C in the 2070s. Similarly, the average *Rad* increased significantly from 21.04 and 21.15 MJ/m<sup>2</sup> in the 2030s to 21.56 and 21.53 MJ/m<sup>2</sup> in the 2050s, respectively, and then both decreased to 21.42 MJ/m<sup>2</sup> in the 2070s. The average *RH* decreased significantly from 75.07% and 74.92% in the 2030s to 74.36% and 74.42% in the 2050s, before continuing to increase to 74.50% and 74.65% in the 2070s, respectively. Under RCP4.5 and RCP8.5, the highest values of the *Tmin* were distributed in the east, and the highest values of the *Tmax* were distributed in the south. In addition, the highest *RH* was found in the central part, and the highest *Rad* was found in the western and eastern parts.

Under RCP4.5, the average climate tendency rates of *Tmin*, *Tmax*, *RH*, and *Rad* for 2021–2080 were 0.2656 ◦C/(10 years), 0.3509 ◦C/(10 years), −0.0920%/(10 years), and 0.0830MJ/m2/(10 years), respectively (Figure 3). Under RCP8.5, the average climate tendency rates of *Tmin*, *Tmax*, *RH*, and *Rad* in 2021–2080 were 0.5368 ◦C/(10 years), 0.5897 ◦C/(10 years), −0.0870%/(10 years), and 0.0465 MJ/m2/(10 years), respectively. Under RCP4.5 from 2021–2050, the *RH* declined most quickly, at a rate of 0.3002%/(10 years), while *Rad* increased most quickly, at a rate of 0.2193 MJ/m2/(10 years).

**Figure 2.** *Cont.*

**Figure 2.** Spatial distribution of average minimum temperature (*Tmin*), maximum temperature (*Tmax*), relative humidity (*RH*), and solar radiation (*Rad*) during soybean growth period under RCP4.5 and RCP8.5 in the study area in the 2030s, 2050s, and 2070s.

**Figure 3.** Climate tendency rates of average minimum temperature (*Tmin*), maximum temperature (*Tmax*), relative humidity (*RH*), and solar radiation (*Rad*) during soybean growth period under RCP4.5 and RCP8.5 in the study area in 2021–2050, 2051–2080, and 2021–2080.

#### *3.2. Spatial and Temporal Variation of ET0*

The *ET0* values during the soybean growth period from 2021–2080 under RCP4.5 and RCP8.5 were shown in Figure 4. Under RCP4.5, *ET*<sup>0</sup> from 2021–2080 was between 409.34 and 621.47 mm, with an average of 542.89 mm. Under RCP8.5, *ET*<sup>0</sup> was between 492.48 and 642.24 mm, with an average of 553.35 mm. Under RCP4.5 and RCP8.5, *ET*<sup>0</sup> increased first and then decreased from west to east in the study area.

**Figure 4.** Spatial distribution of reference crop evapotranspiration (*ET*0) during the (**a**) 2030s, (**b**) 2050s, and (**c**) 2070s under RCP4.5, and during the (**d**) 2030s, (**e**) 2050s, and (**f**) 2070s under RCP8.5 during the soybean growth period.

The climate tendency rate of *ET*<sup>0</sup> in the soybean growth period from 2021–2080 under RCP4.5 was 3.71–10.18 mm/(10 years). The climate tendency rates of *ET*<sup>0</sup> in 2021–2050, 2051–2080, and 2021–2080 were 12.65 mm/(10 years), 1.93 mm/(10 years), and 7.71 mm/(10 years), respectively (Figure 5a–c). Under RCP8.5, the climate tendency rate of *ET*<sup>0</sup> from 2021–2080 was 7.30–12.07 mm/(10 years), with an average of 9.55 mm/(10 years) (Figure 5d–f). All 26 sites passed the significance test at α = 0.001 under both RCP4.5 and RCP8.5.

**Figure 5.** Climate tendency rates of *ET*<sup>0</sup> in the periods (**a**) 2021–2050, (**b**) 2051–2080, and (**c**) 2021–2080 under RCP4.5, and in the periods (**d**) 2021–2050, (**e**) 2051–2080, and (**f**) 2021–2080 under RCP8.5 during the soybean growth period.

#### *3.3. Spatial and Temporal Variation of ETc*

The spatial distribution of *ETc* and its climate tendency rate of soybean growth period for 2021–2080 under RCP4.5 and RCP8.5 are shown in Figures 6 and 7. Under RCP4.5, the *ETc* values for 2021–2080 were 356.88–470.45 mm, with an average of 414.35 mm. Under RCP8.5, the average *ETc* values for the 2030s, 2050s, and 2070s were 403.94, 423.39, and 436.07 mm, respectively. Under both RCP4.5 and RCP8.5, *ETc* increased and then decreased from west to east in the study area.

**Figure 6.** Spatial distribution of crop water requirement (*ETc*) during the (**a**) 2030s, (**b**) 2050s, and (**c**) 2070s under RCP4.5, and during the (**d**) 2030s, (**e**) 2050s, and (**f**) 2070s under RCP8.5 during the soybean growth period.

As shown in Figure 7, the climate tendency rates of soybean *ETc* for 2021–2080 were 2.92–8.11 mm/(10 years) and 4.08–9.39 mm/(10 years) under RCP4.5 and RCP8.5 with average values of 6.09 mm/(10 years) and 7.16 mm/(10 years), respectively. The *ETc* climate tendency rate was higher in the western region than that in the eastern region under RCP4.5, whereas it was higher in the eastern region than that in the western region under RCP8.5. All 26 sites passed the significance test at α = 0.001 under both RCP4.5 and

RCP8.5. Specifically, soybean *ETc* in Yichun and Suifenhe increased at a rate of more than 11 mm/(10 years) under RCP8.5.

**Figure 7.** Climate tendency rates of *ETc* in the periods (**a**) 2021–2050, (**b**) 2051–2080, and (**c**) 2021–2080 under RCP4.5, and in the periods (**d**) 2021–2050, (**e**) 2051–2080, and (**f**) 2021–2080 under RCP8.5 during the soybean growth period.

#### *3.4. Spatial and Temporal Variation of Pe*

The spatial distribution of *Pe* and its climate tendency rate under RCP4.5 and RCP8.5 during the soybean growth period for 2021–2080 are shown in Figure 8. Under RCP4.5 and RCP8.5, *Pe* values were 268.41–459.18 mm and 269.53–466.94 mm, with an average of 354.10 and 362.36 mm, respectively. Under RCP8.5, the greatest difference in *Pe* was 94.99 mm. Under RCP4.5 and RCP8.5, *Pe* first increased and then decreased from west to east; higher values were mainly distributed in Hailun and Tieli, with an average value greater than 370 mm, while lower values were mainly distributed in Tailai and Huma, with an average value lower than 340 mm.

**Figure 8.** Spatial distribution of effective precipitation (*Pe*) during the (**a**) 2030s, (**b**) 2050s, and (**c**) 2070s under RCP4.5, and during the (**d**) 2030s, (**e**) 2050s, and (**f**) 2070s under RCP8.5 during the soybean growth period.

Under RCP4.5, the climate tendency rate of *Pe* during the soybean growth period from 2021–2080 was −10.81–10.11 mm/(10 years), and the average was 1.37 mm/(10 years) (Figure 9). A total of 16 sites showed an upward trend, while 10 sites showed a downward trend. Bei'an, Harbin, Jixi, Suifenhe, and Tieli passed the significance test at α = 0.05, while Hulin, Keshan, and Suihua passed the significance test at α = 0.1. Under RCP8.5, the climate tendency rate of Pe for 2021–2080 was −1.16–22.28 mm/(10 years), with an average value of 8.77 mm/(10 years). Bei'an, Mudanjiang, Suifenhe, Suihua, and Tonghe passed the significance test at α = 0.001, while Baoqing, Fuyu, Fujin, and Mingshui passed the significance test at α = 0.05.

**Figure 9.** Climate tendency rates of *Pe* in the periods (**a**) 2021–2050, (**b**) 2051–2080, and (**c**) 2021–2080 under RCP4.5, and in the periods (**d**) 2021–2050, (**e**) 2051–2080, and (**f**) 2021–2080 under RCP8.5 during the soybean growth period.

#### *3.5. Spatial and Temporal Variation of CD*

The *CD* values during the soybean growth period under RCP4.5 and RCP8.5 from 2021–2080 are shown in Figure 10. Under RCP4.5 and RCP8.5, the *CD* for 2021–2080 ranged

from 0.66–0.95 and 0.66–0.96, with average values of 0.84 in both cases, showing a trend of first increasing and then decreasing in the study area.

**Figure 10.** Spatial distribution of *CD* during the (**a**) 2030s, (**b**) 2050s, and (**c**) 2070s under RCP4.5, and during the (**d**) 2030s, (**e**) 2050s, and (**f**) 2070s under RCP8.5 during the soybean growth period.

The climate tendency rate of *CD* during the soybean growth period from 2021–2080 under the RCP4.5 was −0.036–0.014/(10 years), with an average value of −0.007/(10 years), showing an overall downward trend (Figure 11). Among them, Keshan and Qiqihar passed the significance test at α = 0.001, while Fuyu and Shangzhi passed the significance test at α = 0.05. However, under RCP8.5, the *CD* during the growth period of soybean ranged from −0.013 to 0.029/(10 years), with an average of 0.006/(10 years), showing an overall increasing trend. The climate tendency rates of *CD* at the 19 sites were greater than 0, among which Bei'an passed the significance test at α = 0.001, while Mudanjiang and Tonghe passed the significance test at α = 0.05.

**Figure 11.** Climate tendency rates of *CD* in the periods (**a**) 2021–2050, (**b**) 2051–2080, and (**c**) 2021–2080 under RCP4.5, and in the periods (**d**) 2021–2050, (**e**) 2051–2080, and (**f**) 2021–2080 under RCP8.5 during the soybean growth period.

#### *3.6. Spatial and Temporal Variation of Ir*

The temporal and spatial distributions of *Ir* under RCP4.5 and RCP8.5 during the soybean growth period for 2021–2080 are shown in Figure 12. Under RCP4.5 and RCP8.5, the *Ir* values during 2021–2080 were 58.01–159.84 mm and 60.03–166.19 mm, with average values of 102.44 mm and 100.27 mm, respectively, which showed a trend of first decreasing and then increasing from west to east in the study area. Under RCP8.5, the greatest difference in *Ir* during the 2050s was as high as 43.32 mm, which was higher than that during the 2030s and 2050s (Figure 12d–f).

**Figure 12.** Spatial distribution of irrigation water requirement (*Ir*) during the (**a**) 2030s, (**b**) 2050s, and (**c**) 2070s under RCP4.5, and during the (**d**) 2030s, (**e**) 2050s, and (**f**) 2070s under RCP8.5 during the soybean growth period.

The average climate tendency rates of *Ir* during the growth period of soybean under RCP4.5 in 2021–2051, 2051–2080, and 2021–2080 were 14.88, −5.92, and 3.73 mm/(10 years), respectively (Figure 13). Among the 26 sites, Qiqihar increased at a significance of α = 0.001. Under RCP8.5, the average climate tendency rate of *Ir* for 2021–2080 was −0.067 mm/(10 years). *Ir* showed an overall downward trend (Figure 13d–f). During the whole period of the study, the climate tendency rates of *Ir* in 16 sites were negative, accounting for 61.54% of the total site number, among which, Bei'an decreased at a significance of α = 0.001.

**Figure 13.** *Cont.*

**Figure 13.** Climate tendency rates of *Ir* in the periods (**a**) 2021–2050, (**b**) 2051–2080, and (**c**) 2021–2080 under RCP4.5, and in the periods (**d**) 2021–2050, (**e**) 2051–2080, and (**f**) 2021–2080 under RCP8.5 during the soybean growth period.

#### *3.7. Effect of Climate Change on Water Supply and Requirement*

Under RCP4.5 and RCP8.5, soybean *ETc* was significantly positively correlated with *Tmax* and *Rad* and negatively correlated with *RH* (Table 2). Under RCP8.5, *Pe* was significantly negatively correlated with *Tmax* and *Rad*, significantly positively correlated with *Tmin*, and weakly correlated with *RH*. Under RCP4.5, *Ir* was significantly positively correlated with average *Tmin*, *Tmax*, and *Rad*, and significantly negatively correlated with average *RH*. The effects of meteorological factors on soybean *ETc* in the study area under RCP4.5 and RCP8.5 for 2021–2080 and the relationships among *Pe*, *ETc*, and *Ir* are shown in Figure 14. Under RCP4.5 and RCP8.5, soybean *ETc* was significantly correlated with average *Tmin*, *Tmax*, *RH*, and *Rad*. The increase in temperature and *Rad* led to an increase in *ET*0, further increasing *ETc* (Table 2). The correlation coefficient between *ETc* and *Rad* was greater than that of *Tmax*, indicating that the increasing soybean *ETc* was most influenced by *Rad*, followed by *Tmax*. The *CD* tended to decrease and then increase under RCP4.5; however, it tended to increase under RCP8.5 from the 2030s to 2070s. Under RCP4.5, *ETc* increased by 29.45 and 5.55 mm in the 2030s–2050s and 2050s–2070s, respectively, while *Pe* decreased by 12.21 mm in 2030s–2050s and then increased by 16.78 mm in 2050s–2080s. The combined effects of *ETc* and *Pe* led to a change in *Ir*, which first increased by 31.01 mm and then decreased by 13.65 mm from the 2030s to 2070s (Figure 14).


**Table 2.** Correlation analysis among *ET*0, *ETc*, *Pe*, *Ir*, and meteorological factors during the soybean growth period.


**Table 2.** *Cont.*

Note: \* significant correlation at the 0.05 level; \*\* significant correlation at the 0.01 level.

**Figure 14.** Effects of changes in meteorological factors on soybean *ETc* and the relationships among *Pe*, *ETc*, and *Ir* under RCP4.5 and RCP8.5 for 2021–2080. Bars marked with different lowercase letters indicate significant differences between groups (*p* < 0.05), while those marked with the same lowercase letters indicate insignificant differences between groups (*p* > 0.05).

#### **4. Discussion**

*4.1. Soybean ETc and Meteorological Factors*

Soybean *ETc* showed an increasing trend from the 2030s to 2070s under RCP4.5 and RCP8.5 in Heilongjiang Province of China in this study. An upward trend in *ETc* was also observed in previous studies involving different crops under future climate change, including maize in Zimbabwe in the 2020s, 2050s, and 2090s [13], sugarcane in Pakistan in the 2020s, 2050s, and 2080s [45], rice in Kunshan in the 2020s–2080s [7], wheat, maize, and

gram in India in the 2020s–2080s [28], wheat in three provinces of northeast China in the 2040s, 2070s, and 2100s [23], and summer maize in Huang-Huai-Hai Plain in 2016–2050 [46].

In the 60 year time series under RCP4.5 and RCP8.5 covered in this study, *Tmin*, *Tmax*, *Rad*, and *RH* were all strongly related to the increase in *ETc*. Yang et al. (2021) found that *Rad*, wind speed, and *P* had the strongest linear correlations with cotton *ETc,* with correlation coefficients of 0.410–0.789, 0.361–0.676, and −0.215–−0.410, for 1965–2016 in NCP, respectively. The correlation of *ETc* with *RH* and average temperature were weak, in the range of −0.189–−0.047 and −0.102–0.015, respectively [16]. In contrast, under RCP4.5 and RCP8.5, the decline rate of *RH* climate tendency rate in Heilongjiang Province was almost twice that of the NCP. The decreased *RH* in the air increased the evaporation rate, thus increasing *ETc* [47]. Nageen reported an increasing trend in *ETc* for sugarcane as well, which was due to the forecasted temperature rise in the future Pakistan region, while the increased *Pe* could not compensate for the increased *ETc* [45]. In addition, this study found that the increase in sunshine hours provided more radiation and light energy to the soybean [48], thus promoting the opening of stomata for plant transpiration and leading to an increase in transpiration [49]; accordingly, *ETc* showed an increasing trend. Li et al. [46] reported that the temperature would continue to rise in the future in the Huang-Huai-Hai Plain, while the summer maize evaporation would increase, resulting in increased *ETc*. However, this study focused more on the impact of the combined effects of *Tmin*, *Tmax*, *RH*, and *Rad* meteorological factors on the increase in *ETc* in soybean.

#### *4.2. Soybean ETc, Pe, and Ir*

In this study, the annual average *ETc* for the soybean growing season under RCP4.5 and RCP8.5 for 2021–2080 was higher than that of soybean in Heilongjiang Province for 1966–2015 reported by Li et al. [30]. The higher *ETc* indicated soybean in this study area may suggest the need for more water due to the increase in evapotranspiration derived from future climate conditions [50]. Oludare et al. (2020) reported that the average temperature and *Rad* increased, while soybean *ETc* increased slightly under RCP4.5 and RCP8.5 for 2021–2099 in the Ogun-Ona River Basin, Nigeria [50]. Similar to the results of this study, the *ETc* of soybean in different regions of the world also increased with the same trend of meteorological factors.

Under RCP4.5 and RCP8.5, the *Pe* and *Ir* of soybean were higher than reported by Li et al. [30]. Although a small increase in *Pe* was predicted in the future, more *Ir* was still needed, which probably increased the pressure on agricultural water, as well as drought frequency [51]. The highest *ETc* and *Pe* under RCP4.5 and RCP8.5 in this study were distributed in the south; however, Li et al. (2020) reported that the highest *ETc* and *Pe* were in the west [34]. Due to the increase in *Tmin*, *Tmax*, and *Rad* in the southern region and the decrease in *RH*, higher *ETc* is expected in the future. Moreover, in the historical period, Li et al. [34] did not consider the influence of *Rad* on *ETc*. In addition, *ETc* is also affected by the plant itself, such as plant canopy structure and plant physiology [52]. Under RCP4.5, the climate tendency rate of *ETc* was much greater in the west than that in the east; however, under RCP8.5, the climate tendency rate of *ETc* showed an opposite spatial distribution trend, which differed from the results of Hu et al. [37]. This might be due to the higher values of *Rad* under RCP8.5, resulting in an increase in the climate tendency rate of *ETc* in the east. On the other hand, the meteorological factors came from different meteorological stations and time series [53]. Under future climate change, their increasing or decreasing trends and magnitudes are also very different from the past [54]. This study provides long-term information for soybean water and irrigation requirements in Heilongjiang Province of China under future climate change [55].

#### *4.3. Uncertainties and Limitations of the Study*

This study indicated that there was a strong relationship among temperature, *Pe* variability, *ETc*, and *Ir* under climate change. Two limitations should be taken into account in this study. Firstly, due to political and socioeconomic factors, regional climate

programs are unable to accurately predict the path of future greenhouse gas emissions [56]. We only considered the RCP4.5 and RCP8.5, in which the concentration of CO2 is fixed; however, in fact, the concentration of CO2 varies with time [2]. In addition, the "Special Report on Emission Scenarios" (SRES) also proposed that two other emission scenarios, A1 (emphasizing economic development) and B2 (emphasizing sustainable development), can also predict the future climate [23]. However, anthropogenic-based climate change scenarios are one-sided scenarios. Under the actual climate in the future, the biological and agricultural technological progress of soybean planting will certainly change to reduce the impact of climate change. Secondly, we adjusted the parameters of the CROPWAT crop model for Heilongjiang Province, but there were still some uncertainties in the simulation parameters. For example, *Kc* and crop phenology are expected to change under future climatic change [57]. Therefore, changes in all meteorological factors caused by global warming and the uncertainties and limitations of the model should be deeply considered in further study.

#### **5. Conclusions**

In 2021–2080, *Tmin*, *Tmax*, and *Rad* increased while *RH* decreased under RCP4.5 and RCP8.5. In particular, the climate tendency rates for *Tmin*, *Tmax*, and *Rad* were higher under RCP8.5. There was little difference in the climate tendency rate of *RH* between RCP4.5 and RCP8.5. Affected by the changes in climate factors in the future, the *ET*0, *ETc*, and *Pe* during soybean growth period in Heilongjiang Province showed an upward trend under RCP4.5 and RCP8.5. The climate tendency rates of annual *ETc* were 6.09 mm/(10 years) and 7.16 mm/(10 years), respectively. The climate tendency rate of annual *Ir* was 3.73 mm/(10 years) under RCP 4.5, while it was −0.067 mm/(10 years) under RCP 8.5. The results showed that the soybean in whole Heilongjiang province would face water shortage stress in the future, especially in the central and western regions. There would be more *P* and less *ETc* in the eastern region. Therefore, we should appropriately adjust the crop structure, change the planting system, and recommend increasing the soybean planting area in the eastern Heilongjiang Province. This study can guide future irrigation system planning and management policy in Heilongjiang Province.

**Author Contributions:** Conceptualization, T.N. and N.L.; methodology, T.N. and N.L.; validation, Y.T., Y.J. and Z.Z.; formal analysis, N.L. and T.N.; writing—original draft preparation, N.L. and T.N.; writing—review and editing, T.W., Z.Z., P.C., T.L., L.M. and K.C.; visualization, N.L., Y.T., D.L. and T.W.; supervision, T.N.; project administration, T.N.; funding acquisition, T.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was fund by Basic Scientific Research Fund of Heilongjiang Provincial Universities (grant number: 2021-KYYWF-0019), the Opening Project of Key Laboratory of Efficient Use of Agricultural Water Resources, Ministry of Agriculture and Rural Affairs of the People's Republic of China in Northeast Agricultural University (grant number: AWR2021002), and the National Natural Science Foundation Project of China (grant number: 51779046).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** We thank the Chinese meteorological data sharing service (http://data.cma.cn, accessed on 22 May 2022) for providing the meteorological data. We thank all the members in the Lab of the Pumping, Hydraulic Teaching, and Experimental Center of Heilongjiang University. Lastly, we thank the anonymous reviewers and the editor for their suggestions, which substantially improved the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **The Impacts of Climate Change on Water Resources and Crop Production in an Arid Region**

**Samira Shayanmehr 1, Jana Ivaniˇc Porhajašová 2, Mária Babošová 2, Mahmood Sabouhi Sabouni 1, Hosein Mohammadi 1, Shida Rastegari Henneberry <sup>3</sup> and Naser Shahnoushi Foroushani 1,\***


**Abstract:** Climate change is one of the most pressing global issues of the twenty-first century. This phenomenon has an increasingly severe impact on water resources and crop production. The main purpose of this study is to evaluate the impact of climate change on water resources, crop production, and agricultural sustainability in an arid environment in Iran. To this end, the study constructs a new integrated climate-hydrological-economic model to assess the impact of future climate change on water resources and crop production. Furthermore, the agricultural sustainability is evaluated using the multicriteria decision making (MCDM) technique in the context of climate change. The findings regarding the prediction of climate variables show that the minimum and maximum temperatures are expected to increase by about 5.88% and 6.05%, respectively, while precipitation would decrease by approximately 30.68%. The results of the research reveal that water availability will decrease by about 13.79–15.45% under different climate scenarios. Additionally, the findings show that in the majority of cases crop production will reduce in response to climate scenarios so that rainfed wheat will experience the greatest decline (approximately 59.95%). The results of the MCDM model show that climate change can have adverse effects on economic and environmental aspects and, consequently, on the sustainability of the agricultural system of the study area. Our findings can inform policymakers on effective strategies for mitigating the consequences of climate change on water resources and agricultural production in dry regions.

**Keywords:** climate change; crop yield; cultivated area; future climate scenarios; water use

#### **1. Introduction**

Climate change's impact on agricultural production has raised serious global concerns in the twenty-first century [1,2]. Carbon dioxide (CO2) emissions have been identified as the primary cause of climate change [3–5]. Human activities, such as the use of fossil fuels, environmental degradation, and land-use changes, have all contributed significantly to rising CO2 levels in the atmosphere since the industrial revolution [6–9]. The global CO2 concentration in the atmosphere increased from 288 ppm in 1750 to 415 ppm in 2021 [10]. This has resulted in higher global average temperatures and unpredictability of rainfall patterns [11]. Climate change and variability have far-reaching consequences for natural resources, human communities, and biodiversity [12]. Water resources and agriculture are most affected by climate change because it directly determines the availability of resources in terms of time and space [13]. Some researchers have concluded that climate change has a negative impact on groundwater table recharge, which affects irrigation [12,14]. Others believe that this phenomenon will increase agricultural water demand due to increased

**Citation:** Shayanmehr, S.; Porhajašová, J.I.; Babošová, M.; Sabouhi Sabouni, M.; Mohammadi, H.; Rastegari Henneberry, S.; Shahnoushi Foroushani, N. The Impacts of Climate Change on Water Resources and Crop Production in an Arid Region. *Agriculture* **2022**, *12*, 1056. https://doi.org/10.3390/ agriculture12071056

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 20 June 2022 Accepted: 17 July 2022 Published: 19 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

evapotranspiration, thus putting more pressure on water resources [12,15]. Aside from influencing the availability of water resources, climate change is expected to reduce crop yield and agricultural efficiency by increasing crop water stress [16]. Climate change has recently had a negative impact on crop production in major agricultural areas; it is also expected to reduce global agricultural production by about 16% by 2030, resulting in widespread food insecurity [17–19]. Therefore, meeting the food needs of the world's rising population has become a major concern around the world [2,20].

Climate change is expected to have the greatest impact on agricultural production in the world's dry and semi-arid regions, such as Iran. Iran's average annual rainfall is around 250 mm, which is less than one-third of the global average; therefore, most parts of the country suffer from a lack of water resources for food production [21]. Nevertheless, the adaptation of supportive polices to achieve self-sufficiency in order to meet the domestic food demand in the presence of climate change has led to increased pressure on water resources in the country's arid and semi-arid regions [22,23]. In this regard, the Mashhad plain in northeastern Iran serves as a good example. Climate data analysis reveals that the phenomenon of climate change has happened in this area as a result of decreasing rainfall and increasing temperature (see Figure 1). As shown in the figure, during the years 1990 to 2016, the total precipitation decreased from 300 mm to 220 mm and the average temperature increased from 11.9 ◦C to 16.5 ◦C. Given the importance of this plain in ensuring the country's food security, a thorough understanding of the effects of climate change on water resources and agricultural production in this region is required to adopt accurate and efficient mitigation and adaptation policies.

**Figure 1.** Annual mean temperature and precipitation changes in Mashhad plain.

The literature review indicated that the effects of climate change on water resources and crop production have been studied all over the world. Xiong et al. [24] used climate scenarios of the regional climate model to investigate the consequences of climate change on water availability and cereal production in China in the 2020s and 2040s. The findings of this study revealed that there are insufficient water resources for cereal production, particularly in southern China, due to an increase in nonagricultural water demand and the occurrence of climate change. Palazzoli et al. [16] developed a soil and water assessment tool (SWAT) model to investigate the effects of future climate change on rainfed crop productivity and water resources in Nepal. Based on their results, they predicted significant potential changes in water resources availability (from −26 to +37%) and crop production (rice from −17 to +12%, wheat from −36 to +18% and maize from −17 to +4%). Sinnarong et al. [25] applied an econometric model to estimate the effect of climate change on rice production in Thailand. The results showed that temperature has a negative impact on rice production while precipitation has different regional effects on rice production. Additionally, the findings indicated that rice production under different climatic scenarios would decrease between 4.56% and 33.77%. Mostafa et al. [26] evaluated the impact of climate change on water resources and the agricultural sector of Egypt using climate and irrigation (CROPWAT) models and found the irrigation water requirement for wheat crop would rise by about 6.2% in 2050 and 11.8% in 2100. Furthermore, wheat production would decrease by approximately 8.6% and 11.1% in 2050 and 2100, respectively. Medellín-Azuara et al. [27] estimated the impact of climatic change on crop farming in California using the statewide agricultural production model (SWAP). They found that, by 2050, water supply, agricultural land use, and production of most crops will decrease in California due to climate change, such as rising temperature and declining precipitation. Shahvari et al. [28], using the SWAT model, assessed the impact of climate change on water resources and crop yield in Iran for the future. Their results revealed that future climate scenarios will lead to an increase in runoff in spring and autumn and a decrease in summer and winter. In addition, future climate change will reduce the yield of rainfed crops in the region. Lu et al. [29] constructed a new climate-economic model to analyze the effects of climate change on grain production and water resources. The findings of this study showed that irrigation water consumption has increased by about 100 billion m<sup>3</sup> because of climate change in China. This phenomenon also reduced the grain yield in this area by 1000 kg/hm2. The current gap in the existing literature is a comprehensive view of all meteorological, hydrological, economic, and sustainability aspects of climate change in the agricultural sector.

The study, therefore, assesses the effect of future climate change on water resources, crop production, and agricultural sustainability in the Mashhad plain under three climate scenarios (RCP 2.6, 4.5, and 8.5). Specifically, the study aims at (1) projecting climate variables using the Long Ashton Research Station Weather Generator (LARS-WG) model alongside HadGEM2-ES outputs and the three RCP scenarios; (2) assessing the impact of future climate change on water resources in the Mashhad plain using panel data model; (3) estimating the relationship between crop yield and climate variables, including minimum temperature, maximum temperature, and precipitation, using the GME (generalized maximum entropy) technique; (4) investigating the impact of climate change on cropping pattern, crop production, and water consumption in the selected area using PMP (positive mathematical programming) model; (5) evaluating the agricultural sustainability under climate scenarios using a MCDM (multicriteria decision making) method. The results of this study are expected to provide policymakers with insights into designing climate change mitigation policies to ensure food security and sustainable production in the region.

The contribution of this study to the literature is twofold. First, to the best of our knowledge, this is the first attempt to apply an integrated climate-hydrological-economic model to evaluate the effect of climate change on water resources, crop production, and cropping pattern in Iran. The second contribution of the study is found in the use of the MCDM approach to investigate the sustainability of agricultural activity at the regional level under different climate scenarios.

The study is structured as follows: Section 2 describes the study area, datasets, and methodology. Section 3 presents the results and discussions, and the last section concludes with the research and policy implications of these findings.

#### **2. Materials and Methods**

This study used a variety of methods to achieve its research goals. The LARS-WG model was used to downscale the climate variables (maximum and minimum temperatures and precipitation), and a regression model was used to forecast the groundwater availability in the Mashhad plain. The sensitivity of yield to climate change was estimated using the GME technique. The cropping pattern was then evaluated under climate change using a PMP model. Finally, economic, social, and environmental indicators were ranked using an integrated MCDM method. The complete structure of the framework is presented in Figure 2.

**Figure 2.** Main steps in the methodological framework.

#### *2.1. Study Area*

The present case study is in Northeast Khorasan Razavi Province, Iran, between the latitude 35◦59 to 37◦03 N and longitude 58◦22 to 60◦06 E, covering an area of approximately 9957 km2. The plain is bounded on the north by Hezar Masjid heights, on the northwest by Atrak river basin, on the south by Binaloud Mountain, and on the southeast by Jamroud river basin [30]. It has a semi-arid to arid climate, with an average annual rainfall from 1991 to 2015 is about 262 mm [31]. The average monthly temperature in this plain is reported to be between 11.6 ◦C and 26.7 ◦C. Furthermore, the average annual evapotranspiration ranges from 236 to 310 (mm). The location of the study area is shown in Figure 3. Around 3 million people live in the study basin and rely mostly on groundwater resources for drinking and agricultural cultivation. Climate change and a lack of water resource management in this area have resulted in a 12-m water drop in the water table over a 20-year period [32].

**Figure 3.** Map of study area.

#### *2.2. Data Collection*

The observed daily time series data of minimum temperature, maximum temperature, and precipitation for Mashhad synoptic station during the period 1979–2016 were obtained from Iran's Meteorological Organization. To estimate the model (Equation (6)), the data (2000–2016) on piezometric wells and groundwater depth in the Mashhad plain were provided by Iran Water Resources Management Company. Additionally, the observed monthly temperature and precipitation data (2000–2016) for the synoptic station were obtained from Iran's Meteorological Organization. To estimate the yield response function (Equation (8)), crop yield data (1983–2016) were gathered from Ministry of Agriculture Jihad of Iran. Figure 4 shows the growing seasons of crops (wheat, barley, alfalfa, potato, corn, tomato, melon, onion, sugar beet, and cucumber) in the Mashhad plain. Data and information on outputs prices, inputs costs, technical coefficients, crop yield, and resources availability were gathered through face-to-face interviews with farmers in 2016–2017 cropping season.

**Figure 4.** The growing season of crops in the study area. Note: WHE: wheat, BAR: barley, ALP: alfalfa, POT: potato, COR: corn, TOM: tomato, MEL: melon, ONI: onion, SUG: sugar beet, and CUC: cucumber.

#### *2.3. Meteorological Model*

The LARS-WG model is a random generator that uses statistical downscaling techniques to generate meteorological data [33]. Because of the repeated calculations, it requires less input data and is simpler and more efficient than other models [34,35]. Racsko et al. [36] proposed this model, which Semenov et al. [37] later revised and developed. This model's sixth version (LARS-WG6) was updated in 2018 for downscaling the coupled model intercomparison project phase (CIMP5) [34]. The HadGEM2-ES universal model data were used

in this study to project climate variables during the horizon in 2045 under three climate scenarios, namely RCP 2.6, RCP 4.5, and RCP 8.5. The LARS-WG model was implemented using daily data of maximum and minimum temperature and precipitation from 1979 to 2016.

The performance of the LARS-WG statistical model was evaluated by comparing the simulated and observed maximum and minimum temperatures, as well as precipitation, using the following statistics [2,38,39]: coefficient of determination (R2), normalized root mean square error (NRMSE), root mean square error (RMSE), mean absolute deviation (MAD), and mean square error (MSE) (see Table 1).


**Table 1.** The statistical indicators for model validation.

**Note:** N is the number of data points, Sn is the simulated values, On is the observed values, O is the mean of the observed values, and S is the mean of the simulated values.

#### *2.4. Hydrological Model*

This section investigates the effects of climate change on groundwater resources in the Mashhad plain. The panel data model was used to forecast the groundwater depth in the plain. To this end, the following groundwater conceptual model was estimated using panel data regression [41,42] related to piezometers as cross units from 2000 to 2016.

$$\ln \mathbf{H}\_{\mathbf{t}} = \alpha\_0 + \alpha\_1 \ln \mathbf{H}\_{\mathbf{t}-1} + \alpha\_2 \ln \mathbf{P}\_{\mathbf{t}-1} + \alpha\_3 \ln \mathbf{T} \text{min}\_{\mathbf{t}-1} + \alpha\_4 \ln \mathbf{T} \text{max}\_{\mathbf{t}-1} \tag{1}$$

where t indicates time, Ht is predicted groundwater depth, Ht−<sup>1</sup> is the initial groundwater depth, P is total monthly precipitation (mm), Tmin and Tmax are monthly minimum and maximum temperature (◦C), respectively, and α<sup>0</sup> to α<sup>4</sup> are the model coefficients that should be estimated.

After estimating the amount of changes in groundwater depth because of climate change, Equation (2) was used to calculate the amount of changes in groundwater resources [43–46].

$$
\Delta \mathbf{V} = \mathbf{A} \times \mathbf{S}\_{\mathbf{y}} \times \Delta \mathbf{H} \tag{2}
$$

where ΔV is the groundwater storage change (m3), A is geographical area (m2), Sy is specific yield (dimensionless), and ΔH is average depth change (m).

#### *2.5. Economic Models*

2.5.1. Generalized Maximum Entropy (GME) Model

In this study, the Cobb–Douglas Yield Response (CDYR) model was used to assess the sensitivity of crops yield to climatic variables, such as minimum and maximum temperatures, as well as precipitation [47,48]. The CDYR model, after taking the logarithm of both sides of the equation, can be presented as follows:

$$\log(\mathbf{Y}\_t) = \beta\_0 + \alpha\_t \log(\text{Tmin}\_t) + \lambda\_t \log(\text{Tmax}\_t) + \eta\_t \log(\mathbf{P}\_t) + \text{vTrend} \tag{3}$$

where t is the set of years; Y is the yield of crops (wheat, barley, alfalfa, potato, corn, tomato, melon, onion, sugar beet, and cucumber); Tmin and Tmax are average minimum and maximum temperature; and P is total growing season precipitation. It is worth noting that the impact of additional factors affecting crop production, which were not incorporated in Equation (3), was covered in the residual terms [49].

Due to the lack of access to crop yields data in the research area, we only analyzed data of 34 years (1983–2016). In this case, because of the limited sample size, traditional estimation approaches, such as ordinary least squares (OLS), may result in parameter estimation with excessive variance [49,50]. To address this issue, following Moreno et al. [51], we used the GME estimator. The GME approaches are founded on Shannon's entropy information measure and the generalized maximum entropy theory [50,52]. Instead of calculating the mean and variance of coefficients directly, GME estimator considers a probability distribution for the coefficients and error terms [52]. Assume that y is dependent on K independent variables xk(k = 1, . . . , k):

$$\mathbf{y} = \mathbf{X}\boldsymbol{\mathfrak{P}} + \boldsymbol{\varepsilon} \tag{4}$$

where y is a (T × 1) vector of observations for y, X is a (T × K) matrix of observations for the xk variables, β is the (K × 1) vector of explanatory variable coefficients, and ε is a (T × 1) vector of residual terms. To estimate explanatory variable coefficients using GME, firstly we reparametrize the regression model, and then we recast the coefficients and residual in terms of discrete probability distributions.

In this method, each β<sup>k</sup> is assumed to be as a discrete point with M dimension (M ≥ 2). Let Zk = [Zk1,..., ZkM] be support points for parameter βk, which are symmetrical around zero. Additionally, the probability mass function of zk is defined as Pk = [Pk1,...,PkM] such that:

$$\beta\_{\mathbf{k}} = \mathbf{E}\_{\mathbf{p}\mathbf{k}}[\mathbf{z}\_{\mathbf{k}}] = \mathbf{\dot{z}}\_{\mathbf{k}} \mathbf{p}\_{\mathbf{k}} = \sum\_{\mathbf{m}=1}^{\mathbf{M}} \mathbf{z}\_{\mathbf{km}} \mathbf{p}\_{\mathbf{km}}; \forall \mathbf{K} = 1, \dots, \mathbf{K} \tag{5}$$

Then, β can be represented as follows:

$$\mathbf{B} = \begin{bmatrix} \mathfrak{B}\_1 \\ \vdots \\ \mathfrak{B}\_K \end{bmatrix} = \mathbf{z}\_\mathbf{P} = \begin{bmatrix} \mathfrak{z}\_1 & 0 & \dots & 0 \\ 0 & \mathfrak{z}\_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & \mathfrak{z}\_K \end{bmatrix} \begin{bmatrix} \mathbf{P}\_1 \\ \mathbf{P}\_2 \\ \vdots \\ \mathbf{P}\_K \end{bmatrix} \tag{6}$$

where Z is a (K × KM) matrix of support values, and P is a (KM × 1) of vector of unknown weights.

The unknown error is defined as follows:

$$\varepsilon\_{\mathbb{B}} = \mathbf{E}\_{\mathbb{W}\_{\mathbb{B}}}[\mathbf{v}] = \forall \mathbf{w}\_{\mathbb{B}} = \sum\_{\mathbf{j}=1}^{\mathbb{J}} \mathbf{v}\_{\mathbb{j}} \mathbf{w}\_{\mathbb{k}\mathbb{J}}; \forall \mathbf{g} = 1, \dots, \mathbf{G} \tag{7}$$

where wg = wg1,...,wgJ is a vector of weights, and v´g = vg1,...,vgJ (J ≥ 2) is a set of support points. The error vector is presented as follows:

$$\boldsymbol{\varepsilon} = \begin{bmatrix} \boldsymbol{\varepsilon}\_1 \\ \vdots \\ \boldsymbol{\varepsilon}\_G \end{bmatrix} = \mathbf{V}\_\mathbf{w} = \begin{bmatrix} \mathbf{v}\_1 & \mathbf{0} & \dots & \mathbf{0} \\ \mathbf{0} & \mathbf{v}\_2 & \dots & \mathbf{0} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0} & \mathbf{0} & \dots & \mathbf{v}\_G' \end{bmatrix} \begin{bmatrix} \mathbf{w}\_1 \\ \mathbf{w}\_2 \\ \vdots \\ \mathbf{w}\_G \end{bmatrix} \tag{8}$$

Now, Equation (9) can be extended as follows:

$$\mathbf{y} = \lambda \mathbf{Z}\mathbf{p} + \mathbf{V}\mathbf{w} \tag{9}$$

In Equation (9), y, Z, and V are known vectors, and P and w are unknown vectors that are estimated using GME, which is defined as follows:

$$\text{Max } \mathbf{H}(\mathbf{p}, \mathbf{w}) = -\sum\_{\mathbf{k}=1}^{\text{K}} \sum\_{\mathbf{m}=1}^{\text{M}} \mathbf{p}\_{\mathbf{k}\mathbf{m}} \ln(\mathbf{p}\_{\mathbf{k}\mathbf{m}}) - \sum\_{\mathbf{g}=1}^{\text{G}} \sum\_{\mathbf{j}=1}^{\text{J}} \mathbf{w}\_{\mathbf{g}\mathbf{j}} \ln\left(\mathbf{w}\_{\mathbf{g}\mathbf{j}}\right) \tag{10}$$

and is subject to:

$$\sum\_{\mathbf{k}=1}^{\mathbf{K}} \sum\_{\mathbf{m}=1}^{\mathbf{M}} Z\_{\mathbf{k}\mathbf{m}} \mathbf{P}\_{\mathbf{k}\mathbf{m}} + \sum\_{\mathbf{j}=1}^{\mathbf{J}} \mathbf{v}\_{\mathbf{j}} \mathbf{w}\_{\mathbf{g}\mathbf{j}} = \mathbf{y}\_{\mathbf{g}'} \forall \mathbf{g} = 1, \dots, \mathbf{G} \tag{11}$$

$$\sum\_{\mathbf{k}=1}^{\mathbf{K}} \mathbf{p}\_{\mathbf{k}\mathbf{m}} = 1; \forall \mathbf{m} = 1, \dots, \mathbf{M} \tag{12}$$

$$\sum\_{\mathbf{j}=1}^{\mathbf{j}} \mathbf{w}\_{\mathbf{g}\mathbf{j}} = 1; \forall \mathbf{g} = 1, \dots, \mathbf{G} \tag{13}$$

The GME techniques are created by solving the optimization problem (Equation (10)) while taking constraint into account (Equations (11–13)). Equation (11) is a condition for the compatibility of the probability of the posterior distribution of the coefficients and the residual terms with the observations. Equations (12) and (13) are normalization constraints for the probabilities.

#### 2.5.2. Positive Mathematical Programming (PMP) Model

The present study used an economic modeling system composed of the PMP model to analyze and assess the effects of climate change (the decline in available water and the changes in crop yields) on the cropping pattern of selected crops and farmers' gross revenues in Mashhad plain.

In recent decades, PMP has been widely used to evaluate the effects of climate change on the agricultural sectors [53–58]. The main objective of the PMP is to improve the accuracy of modeling farmers' behavior in the context of an optimization model utilizing observed values from the baseline year [59]. There are two primary reasons for interest in this approach: firstly, in the presence of incomplete and insufficient data, alternative approaches, such as traditional econometrics, are unable to model farmer behavior; secondly, optimization models cannot properly calibrate farm-level models [60]. In the current study, the impacts of changes in climate variables, such as temperature and rainfall on crop pattern, were simulated in the framework of a developed PMP. The study's empirical model comprises a nonlinear objective function as well as constraints, such as water, labor force, fertilizer, and land. Following Röhm and Dabbert [61] and Radmehr and Shayanmehr [59], PMP is constructed in three stages: (1) solve a linear optimization programming model and obtain shadow prices, (2) use a generalized maximum entropy (GME) approach to calibrate crop yield parameters, (3) solve a nonlinear optimization programming model that includes the objective function and constraints (from step one), as well as calibrated yield functions (obtained in the second step). In order to simplify, the nonlinear optimization model developed in the third step is presented in this section as follows:

$$\text{Max TMG} = \sum\_{\mathbf{e}} \sum\_{\mathbf{r}} \text{P}\_{\mathbf{e}} \left( \alpha\_{\mathbf{e}, \mathbf{r}} \chi\_{\mathbf{e}, \mathbf{r}} - \beta\_{\mathbf{e}, \mathbf{r}} \chi\_{\mathbf{e}, \mathbf{r}}^2 \right) - \sum\_{\mathbf{e}} \sum\_{\mathbf{r}} \text{C}\_{\mathbf{e}, \mathbf{r}} \chi\_{\mathbf{e}, \mathbf{r}} \tag{14}$$

and is subject to:

$$\sum\_{\mathbf{r}=\text{land}} \sum\_{\mathbf{e}} \frac{\mathbf{w}\_{\mathbf{e}} \boldsymbol{\chi}\_{\mathbf{e},\mathbf{r}}}{\mathbf{e} \mathbf{f}} \le \mathbf{b}\_{\mathbf{r}\mathbf{e} \text{ water}} \tag{15}$$

$$\sum\_{\mathbf{c}} \mathbf{a}\_{\mathbf{c},\mathbf{r}} \mathsf{X}\_{\mathbf{c},\mathbf{r}} \le \mathbf{b}\_{\mathbf{r}\mathbf{c}\text{ hard},\text{ labor, fertilizer, and macchinery}}\tag{16}$$

$$
\chi\_{\mathbf{e},x} \ge 0 \tag{17}
$$

In this expression, e is the set of different crops; r is the set of production inputs (land, water, labor, machinery, and fertilizers); TMG shows the total gross margin in the region; Xe,r is a decision variable that represents the amount of input r used for crop e; αe,r and βe,r are coefficients of yield function that calibrated using the GME approach (for more information about the details of the technique, see Paris and Howitt [62]); Ce,r is the unit cost of input r for crop e; we is the water requirement of crop e; ef is technical efficiency of irrigation water use; ae,r is the technical coefficient of input r for crop e, which shows the amount of input r to produce a unit of crop e; br is the total available input r. Equation (14) indicates the objective functions that maximize the total gross margin of production in the irrigated area. Equation (15) is the constraint of water that represented the amount of water allocated for agricultural production and should be less than total water availability for crop production in the region. Equation (16) is the constraint of inputs that shows the amount of input allocated for crop production to be less than total input availability for crop production in the region. Finally, the non-negative constraint (Equation (17)) states that the decision variable (Xe,r) must be greater than or equal to zero.

#### *2.6. Multi-Criteria Decision-Making Approach*

The current study examines the effects of climate change on agricultural sustainability using three types of indicators: social, environmental, and economic indicators. The social index is based on the sub-index of farm employment (FE), while the environmental index includes the sub-indices of nitrogen balance (NB), phosphorus balance (PB), and water consumption (WC). The economic index is introduced using the sub-indices of total gross margin (GM) and profit-to-water consumption ratio (PW) [59,63–65]. Many complex decision-making issues employ MCDM models [66]. Analytic hierarchy process (AHP) and technique for order preferred preference by similarity to ideal solution (TOPSIS) are two approaches of this model that have been used in numerous studies [63,67–69]. The main advantages of AHP method are the ability to (i) depict the rationale of human choice; (ii) evaluate the relative performance of alternatives based on the simple algorithm; and (iii) define flexibly the selection set [70–72].

To select and rank indicators, an integrated AHP and TOPSIS method is used. This method consists of the eight steps listed below:

#### **Step 1.** *Build a decision matrix.*

First, a decision matrix is created, which can be expressed as follows:

$$\mathbf{A} = \begin{bmatrix} \mathbf{F}\_1 & \mathbf{F}\_2 & \dots & \mathbf{F}\_j & \dots & \mathbf{F}\_n \\ \mathbf{A}\_1 & \begin{bmatrix} \mathbf{f}\_{11} & \mathbf{f}\_{12} & \dots & \mathbf{f}\_{1j} & \dots & \mathbf{f}\_{1n} \\ \mathbf{f}\_{21} & \mathbf{f}\_{22} & \dots & \mathbf{f}\_{2j} & \dots & \mathbf{f}\_{2n} \\ \vdots & \vdots & \dots & \vdots & \dots & \vdots \\ \mathbf{f}\_{i1} & \mathbf{f}\_{i2} & \dots & \mathbf{f}\_{ij} & \dots & \mathbf{f}\_{in} \\ \vdots & \vdots & \vdots & \vdots & \dots & \vdots \\ \mathbf{f}\_{m1} & \mathbf{f}\_{m2} & \dots & \mathbf{f}\_{mj} & \dots & \mathbf{f}\_{mn} \end{bmatrix} \tag{18}$$

where Ai is the alternative; Fj is the evaluation indicators; and fij is the performance value of Ai with respect of Fj.

**Step 2.** *Construct the normalized decision matrix (rij) using following formula:*

$$r\_{ij} = \frac{f\_{ij}}{\sqrt{\sum\_{i=1}^{m} f\_{ij}^2}} \qquad \quad j = 1, \dots, n; \ i = 1, \dots, m \tag{19}$$

#### **Step 3.** *Compute the weight (wj0) of the indicators.*

The relative importance of various indicators is determined with respect to the objective, and weights of indicators are given based on their importance.

K indicates an n × n pair-wise comparison matrix:

$$\mathbf{K} = \begin{bmatrix} 1 & \mathbf{k}\_{12} & \dots & \mathbf{k}\_{1n} \\ \mathbf{k}\_{21} & 1 & \dots & \mathbf{k}\_{2n} \\ \dots & \dots & \dots & \dots \\ \mathbf{k}\_{n1} & \mathbf{k}\_{n2} & \dots & 1 \end{bmatrix} \tag{20}$$

In an arbitrary random reciprocal matrix K, each criterion kij is the relative importance of i th alternatives compared to the j th indicators [73]. Therefore, it expresses that the higher values of kij show stronger preference of ki over kj. In the matrix k, kij = 1, when i = j and kji = 1/kij.

Geometric mean method is employed for normalization and determines the importance degree of the indicators [74]. If Wi indicates the importance degree for the i th attribute, then:

$$\mathbf{W}\_{\mathbf{i}} = \frac{\prod\_{\mathbf{j}=1}^{n} \left(\mathbf{k}\_{\mathbf{i}\mathbf{j}}\right)^{1/n}}{\sum\_{i=1}^{n} \prod\_{\mathbf{j}=1}^{n} \left(\mathbf{k}\_{\mathbf{i}\mathbf{j}}\right)^{1/n}} \tag{21}$$

E indicates an n-dimensional column vector, which defines the sum of the weighted values of the importance degree of indicators. Then:

$$\mathbf{E} = [\mathbf{e}\_{\mathbf{i}}]\_{\mathbf{n}\times 1} = \mathbf{K} \mathbf{W}^{\mathrm{T}} \mathbf{i} = 1, \ \mathbf{2}, \ \mathbf{3}, \dots, \ \mathbf{N} \tag{22}$$

where

$$\mathbf{K} \mathbf{W}^{\mathrm{T}} = \begin{bmatrix} 1 & \mathbf{k}\_{12} & \dots & \mathbf{k}\_{1n} \\ \mathbf{k}\_{21} & 1 & \dots & \mathbf{k}\_{2n} \\ \dots & \dots & \dots & \dots \\ \mathbf{k}\_{n1} & \mathbf{k}\_{n2} & \dots & 1 \end{bmatrix} \begin{bmatrix} \mathbf{W}\_{1} & \mathbf{W}\_{2} & \dots & \mathbf{W}\_{n} \end{bmatrix} \\ \mathbf{C}\_{n} = \begin{bmatrix} \mathbf{C}\_{1} \\ \mathbf{C}\_{2} \\ \dots \end{bmatrix} \tag{23}$$

Consistency values are defined by the following vector:

EV <sup>=</sup> [evi]1×<sup>n</sup> with a typical component evi calculated as evi <sup>=</sup>  ei wi , i = 1, 2, ... , n. The CI is the consistency index that is calculated from Equation (24):

$$\text{CI} = \left(\frac{\lambda\_{\text{max}} - \mathbf{n}}{\mathbf{n} - 1}\right) \tag{24}$$

λmax is maximum Eigen value that can be obtained as follow [74]:

$$
\lambda\_{\text{max}} = \left(\frac{\sum\_{i=1}^{n} \text{ev}\_i}{\mathbf{n}}\right) \mathbf{i} = 1, 2, \dots, \mathbf{n} \tag{25}
$$

The consistency of evaluation in AHP is measured by consistency ratio (CR). Consistency ratio is defined as Equation (26):

$$\text{CR} = \frac{\text{CI}}{\text{RI}} \tag{26}$$

where RI indicates the inconsistency index of a random matrix. If the value of consistency ratio is less than 0.10, the evaluation of the importance of degrees of attributes is acceptable. **Step 4.** *Calculate the weighted normalized decision matrix (zij) using the following formula:*

$$z\_{i\mathbf{j}} = r\_{i\mathbf{j}}.w\_{j0} \qquad \mathbf{j} = 1, \dots, n; \ \mathbf{i} = 1, \dots, m \tag{27}$$

**Step 5.** *Determine the positive (A*+*) and negative (A*−*) ideal options.*

$$A^{+} = \{z\_1^{+}, z\_2^{+}, \dots, z\_n^{+}\} = \left[ \left( \max z\_{ij} \mid j \in J' \right), \left( \min z\_{ij} \mid j \in J'' \right) \right] \tag{28}$$

$$A^{-} = \{z\_1^{-}, z\_2^{-}, \dots, z\_n^{-}\} = \left[ \left( \text{min} \boldsymbol{z}\_{i\bar{j}} | j \in J' \right), \left( \text{max} \boldsymbol{z}\_{i\bar{j}} | j \in J'' \right) \right] \tag{29}$$

*where J* and *J are the indicators with positive and negative polarity, respectively.* **Step 6.** *Compute the relative distance of each Ai from A*<sup>+</sup> *and A*<sup>−</sup> [63].

$$D\_i^+ = \sqrt{\sum\_{j=1}^n \left(z\_{ij} - z\_j^+\right)^2}, \quad i = 1, \dots, m \tag{30}$$

$$D\_i^- = \sqrt{\sum\_{j=1}^n \left(z\_{ij} - z\_j^-\right)^2}, \quad i = 1, \dots, m \tag{31}$$

**Step 7.** *Determine the relative closeness (Ci ) to the best alternative* [74].

$$\mathbb{C}\_{i} = \frac{D\_{i}^{-}}{D\_{i}^{+} + D\_{i}^{-}} \; , \; i = 1, \dots, m; 0 \le \mathbb{C}\_{i} \le 1 \tag{32}$$

**Step 8.** *Rank the alternatives.*

The alternative that has the highest value of Ci is selected as the best option.

#### **3. Results and Discussion**

#### *3.1. Projecting Climate Variables*

Using the LARS-WG model, climatic parameters of maximum and minimum air temperature and precipitation were predicted in the Mashhad plain based on data of 1979–2016. To calibrate and ensure the accuracy of the LARS-WG model, the simulated data are compared with the observed data on a monthly scale, as shown in Figure 5. Assessment of the monthly average of maximum air temperature, minimum air temperature, and precipitation shows a good agreement among all three parameters.

**Figure 5.** Comparison of the observed and LARS-WG-generated monthly minimum and maximum temperature and precipitation for 1979–2016 in the Mashhad station.

LARS-WG performance also was investigated using R2, NRMSE, RMSE, MAD, and MSE indicators, as presented in Table 2. The model successfully downscaled the minimum and maximum temperature as well as precipitation, according to the evaluation of these indicators. The higher values of R2 (>0.98) and the lower values of RMSE (0.21–2.09), MAD (0.17–1.69), MSE (0.04–4.39), and NRMSE (0.95–9.96) for this period reveal that the simulated precipitation and temperature data are acceptable.


**Table 2.** Results of LARS performance for the observed and simulated data.

Sources: Research findings.

After evaluating the accuracy of the model, the climate scenarios were generated by downscaling HadGEM2 outputs under climate scenarios of RCP 2.6, 4.5, and 8.5 on the horizons of 2045. (We considered a 30-year period (2016–2045) to investigate the effects of climate change on crop production. Because the term climate is a long-term shift in the weather pattern, it is an average of at least 30 years of weather condition of a particular place [75–77].) The percentage change in climate variables was then calculated and compared to the benchmark year (2016). The obtained findings are shown in Table 3. According to the results, the minimum and maximum temperatures are expected to increase by about 5.88% and 6.05%, respectively, while precipitation would decrease by approximately 30.68%.

**Table 3.** Forecasting of temperature and precipitation changes under climate scenarios in 2045 horizon compared to 2016 (benchmark year).


Note: The unit of numbers is percent.

#### *3.2. Evaluating the Impacts of Climate Change on Water Resources*

The groundwater depth was forecasted because of climatic change using a panel data model under three scenarios in the Mashhad plain. The first step in the analysis was to determine whether the variables were stationary. This was performed using the Im-Pesaran-Shin (IPS) and ADF-Fisher-type panel unit root tests. Table 4 shows, for all variables in the model, the null hypothesis of unit roots is rejected with a 99% confidence level. As a result, the model's variables were all stationary. For panel data, the random-effects and fixedeffects models were used. To identify which should be used, the Hausman test was used. As shown in Table 4, the null hypothesis of no correlation between regional effects and independent variables is rejected. As a result, a fixed-effect model with regional-specific effects was used.

The results of the panel data model for sensitivity of the groundwater depth to climate variables are presented in Table 5. The findings indicated that maximum and minimum temperatures had a positive impact on the groundwater depth in the Mashhad plain from 2000 to 2016. Furthermore, precipitation was negatively and significantly related to groundwater depth in the plain. Therefore, as precipitation increased, the groundwater depth decreased, resulting in more water in the well, according to many previous studies, including Shahvari et al. [28] and Izady et al. [41].


**Table 4.** The results of unit root test and Hausman test.

**Note:** \* and \*\*\* show rejection of the unit root hypothesis at the 10% and 1% significance levels, respectively.

**Table 5.** Sensitivity of the groundwater depth to minimum and maximum temperatures and precipitation in the study area.


\* and \*\*\*, respectively, indicate rejection of the unit root hypothesis at the 10% and 1% significance levels.

The results regarding the percentage changes in groundwater depth and water availability under climatic scenarios compared to the baseline are presented in Table 6. According to the findings, the depth of groundwater in the Mashhad plain is expected to rise about 13.79% in RCP 2.6, 13.50% in RCP 4.5, and 15.45% in RCP 8.5. Furthermore, water availability will increase by approximately 13.25%, 13.26%, and 14.84% in response to RCP 2.6, RCP 4.5, and RCP 8.5, respectively.

**Table 6.** Percentage change in groundwater depth and water availability in the study area under climate scenarios compared to the baseline.


Sources: Research findings.

#### *3.3. Assessing the Impacts of Climate Change on Crop Yield*

To assess crop yield sensitivity to temperature and precipitation, the yield response function was estimated using the GME model. The estimated results are displayed in Table 7. The Cobb–Douglas functional form was used in the estimation of yield response functions. Therefore, the estimated parameters in Table 5 show the elasticity values. Based on the results obtained from this table, increasing maximum and minimum temperatures reduce crop yield in many crops. In the case of irrigated wheat, for example, the results show that a 1% increase in maximum temperature results in a 1.17% decrease in yield. In addition, a 1% increase in the minimum temperature increases irrigated wheat yield by approximately 0.76%. Precipitation has a negative impact on yield of some crops due to increase humidity or the potential spread of diseases and pests [78]. Irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, corn, sugar beet, melon, cucumber, and tomato yield are positively influenced by precipitation.


**Table 7.** Estimates of the impact of climatic variables on crop yield using the GME model.

**Note:** Numbers in parenthesis indicate standard error. \*, \*\*, and \*\*\*, respectively, indicate rejection of the unit root hypothesis at the 10%, 5%, and 1% significance levels.

The percentage changes in crop yield in the Mashhad plain under RCP 2.6, RCP 4.5, and RCP 8.5 on the horizon in 2045, as compared to the baseline year, are presented in Table 8. The findings imply that the yield of irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, sugar beet, onion, and cucumber will decrease in response to all three climate scenarios. Alfalfa crop is prone to experience a decrease in yield between 19.07% and 25.20% and is expected to emerge as a highly vulnerable crop in 2045. In addition, climate change will increase the yield of corn, potato, melon, and tomato crops under all scenarios. With changing climate, corn yield will rise more than other crops. This result is in line with the results of Almaraz et al. [79] and Zhang et al. [80].

**Table 8.** Percentage changes in crop yield under RCP 2.6, RCP 4.5, and RCP 8.5 for 2045, as compared to the base period in the Mashhad plain.


Sources: Research findings.

#### *3.4. Evaluating the Impacts of Climate Change on Crop Production and Cropping Pattern*

In this section, changes in crop yields and water availability under climate change were incorporated into the PMP model to assess the impact of climate change on cropping pattern, crop production, and water consumption. It is worth noting that the cropping pattern is defined as a combination of agricultural crops that are grown in a particular geographical area [81]. The percentage changes in cropland under three climate scenarios in 2045, as compared to the base year, are shown in Figure 6. The results imply that RCP 2.6, RCP 4.5, and RCP 8.5 scenarios will decrease the cultivated area of irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, sugar beet, onion, cucumber, and tomato in 2045 in the Mashhad plain. On the contrary, the cultivated area of corn, potato, and melon will increase under two scenarios of RCP 4.5 and RCP 8.5 in 2045 relative to the base year. As shown in the figure, the most significant reduction in cultivated land is related to rainfed wheat and RCP 4.5, where the area under cultivation will reduce by approximately 51.16%.

**Figure 6.** Percentage changes in cropland in the climate scenarios as compared to the baseline in the Mashhad plain.

The percentage changes in crop production under climate scenarios for 2045, as compared to the baseline, are presented in Table 9. The results of this table show that production of crops, such as irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, sugar beet, onion, and cucumber, will decrease in response to all three climate scenarios. Therefore, the biggest decline (59.95%) will occur for rainfed wheat in RCP 4.5 scenario.

**Table 9.** Percentage changes in crop production under RCP 2.6, RCP 4.5, and RCP 8.5 for 2045 as compared to the base year in the Mashhad plain.


Sources: Research findings.

In addition, corn, potato, and melon production will increase by about 17.92%, 5.33%, and 7.14%, respectively, under climate scenarios. This increase in production is due to an increase in yield or area under cultivation, or both, as discussed in the previous section. Additionally, in the presence of climate change in 2045, the Mashhad plain will experience the biggest increase in crop production in corn and RCP 4.5.

Given the decline in production of most crops that account for a large share of the region's production, it is reasonable to conclude that the occurrence of climate change poses a serious threat to crop production and food security in the region.

Figure 7 shows the water consumption for each crop under current conditions and different climatic scenarios. As expected, under climate scenarios, due to the reduction in water availability and crop yield and, consequently, reduced cropland, water consumption will decrease for crops, such as irrigated wheat (21.40%), irrigated barley (21.88%), alfalfa (27.70%), sugar beet (7.90%), onion (8.71%), cucumber (4.52%), and tomato (4.33%), while there is an increase in water consumption for corn, potato, and melon because of improved yield. Overall, it can be stated that water consumption in climatic scenarios will be reduced by about 13.30%, 13.31%, and 14.90% under RCP 2.6, RCP 4.5, and RCP 8.5, respectively, compared to baseline conditions.

**Figure 7.** The amount of water consumed by each crop in the base year and under different climatic scenarios.

#### *3.5. Assessing the Impacts of Climate Change on Agricultural Sustainability*

In the present study, an attempt has been made to evaluate the effects of climate change on agricultural sustainability in the study area using the TOPSIS approach. In the first step, we employed the AHP method to determine the weight of indicators and sub-indicators through interviews with 15 agricultural experts and specialists. Table 10 shows the weights assigned to each indicator and sub-indicator. According to the results of this table, economic, environmental, and social indicators have the greatest importance with 52%, 33%, and 14%, respectively. Due to the low income of farmers in Iran, improving the profitability of agricultural activities is critical to achieving sustainable development in the agricultural sector [60]. In addition, over the past few decades, the excessive use of chemical fertilizers and the uncontrolled extraction of groundwater resources in the process of food production in agriculture has caused irreversible environmental damage [16]. This highlights the importance of environmental issues in assessing the sustainability of Iran's agricultural sector. In the economic indicator, the importance of the "GM" and "PW" sub-indicators are 58% and 42%, respectively. Because of Iran's high unemployment rate, the FE index was deemed as the mere social indicator, with a weight of 14%. Among the environmental sub-indicators, WC is the most important environmental sub-indicator, followed by the NI and PB.


**Table 10.** Selected indicators and weights.

Note: MT is million tomans.

In the second step, using the results obtained from the PMP model, the value of each of the sustainability indicators in the base conditions and climatic scenarios was calculated, which forms the decision matrix (see Table 11). Table 11 shows under the climate scenarios that the values of economic indicators are lower and environmental indicators are higher than the baseline conditions. Table 12 indicates the normalized decision matrix. Figure 8 depicts the distribution of positive and negative ideal options across the various sustainability indicators. In the final step, current and climatic conditions were ranked based on sustainability indicators (see Table 13). Based on the results of Table 13, the base condition has the highest Ci values (0.77), indicating that the phenomenon of climate change can be considered a serious threat to the agricultural sustainability in this region. The findings in this section of the research are in line with the study of Karandish et al. [82] and Ghanian et al. [83].



Sources: Research findings.

**Table 12.** Normalized decision matrix.


Sources: Research findings.

**Table 13.** Ranking of alternatives.


Sources: Research findings.

To ensure consistency in the results of Table 13, a sensitivity analysis of the economic, social, and environmental criteria weights used in the TOPSIS model is performed. To achieve this goal, a total of seven experiments has been conducted to compare the impact of potential changes in the weights of economic, social, and environmental criteria (see Table 14). In Experiment 1, all sub-indicators have the same weight (16.66%); in Experiment 2–7, the weight of one criterion is higher than the weight of the remaining criteria. The results of the sensitivity analysis described indicate that in all experiments (except Experiment 4), the base condition is selected as the best alternative. In other words, given the importance of economic and environmental indicators in Iran's agricultural sector, it is reasonable to expect that if the harmful effects of climate change are not properly managed, the phenomenon will have a negative impact on agricultural sustainability.

**Table 14.** Results of the sensitivity analysis.


#### **4. Conclusions**

The climate simulation model predicted a 6% increase in minimum and maximum temperatures, as well as a 30% decrease in precipitation. Additionally, the results showed that water availability will decrease between 13% and 15% under different climate scenarios. Crop yields were found to be negatively affected by increasing maximum and minimum

temperatures. Precipitation affects crop yield in different ways, with positive effects on irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, corn, sugar beet, melon, cucumber, and tomato yields and negative effects on potato and onion yields. Overall, future climate change is expected to reduce the yield of irrigated wheat, rainfed wheat, irrigated barley, rainfed barley, alfalfa, sugar beet, onion, and cucumber, while the effects will be reversed for corn, potato, melon, and tomato. The results of the PMP model showed that changes in crops yield and water availability will lead to a reduction in the cultivated area of most crops in 2045, among which dryland wheat will experience the greatest decrease (51%). The results of the evaluation of the effects of climate change on agricultural sustainability show that this phenomenon can have adverse economic and environmental effects on the agricultural system of the study area. As a result, it can have a negative impact on agricultural sustainability if not properly managed.

In entirety, the findings of this study reflect the fact that water and food security in the region will be severely adversely affected by climate change in the future. Nevertheless, by continuing to support population growth policies, uncontrolled extraction of groundwater, and expansion of urbanization in the presence of climate change, more severe irreversible effects on water and food resources in the control area are expected. These results underscore the necessity of implementing adaptation policies, such as reforming the cropping pattern and production technologies, as well as the introduction of drought-tolerant varieties to reduce the detrimental effects of climate in the region.

**Author Contributions:** Conceptualization, S.S. and N.S.F.; methodology, S.S.; software, S.S.; validation, S.S., M.S.S., H.M., S.R.H., and N.S.F.; formal analysis, S.S.; investigation, S.S., N.S.F., J.I.P., M.B., and H.M.; data curation, S.S. and N.S.F.; writing—original draft preparation, S.S.; writing—review and editing, J.I.P., M.B., M.S.S., H.M., S.R.H., and N.S.F.; supervision, N.S.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** This work was supported by the Ferdowsi University of Mashhad, Iran [No. 57324] and project "Environmental Assessment of Specific Biotopes of Danubian Lowland" [VEGA 1/0604/20].

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Modeling Adaptive Strategies on Maintaining Wheat-Corn Production and Reducing Net Greenhouse Gas Emissions under Climate Change**

**Xiaopei Yi 1,†, Naijie Chang 1,†, Wuhan Ding 1,2, Chi Xu 1, Jing Zhang 3, Jianfeng Zhang <sup>1</sup> and Hu Li 1,\***


Guangdong Higher Education Institutes, College of Resources and Environmental Sciences, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China; zhangjing@zhku.edu.cn **\*** Correspondence: lihu@caas.cn

† These authors contributed equally to this work.

**Abstract:** Climate change has posed serious challenges to food production and sustainable development. We evaluated crop yields, N2O emissions, and soil organic carbon (SOC) in a typical wheat–corn rotation system field on the North China Plain on a 50-year scale using the Denitrification– Decomposition (DNDC) model and proposed adaptive strategies for each climate scenarios. The study showed a good consistency between observations and simulations (*R*<sup>2</sup> > 0.95 and nRMSE < 30%). Among the twelve climate scenarios, we explored ten management practices under four climate scenarios (3 ◦C temperature change: P/T−3 and P/T+3; 30% precipitation change: 0.7P/T and 1.3P/T), which have a significant impact on crop yields and the net greenhouse effect. The results revealed that changing the crop planting time (CP) and using cold-resistant (CR) varieties could reduce the net greenhouse effect by more than 1/4 without sacrificing crop yields under P/T−3. Straw return (SR) minimized the negative impact on yields and the environment under P/T+3. Fertigation (FG) and Drought-Resistant (DR) varieties reduced the net greenhouse effect by more than 8.34% and maintained yields under 0.7P/T. SR was most beneficial to carbon sequestration, and yields were increased by 3.87% under 1.3P/T. Multiple adaptive strategies should be implemented to balance yields and reduce the environmental burden under future climate change.

**Keywords:** wheat–corn; DNDC; climate change; crop yield; net greenhouse effect

#### **1. Introduction**

Climate change has been identified as one of the long-term and severe challenges in the twenty-first century [1]. With a global average surface temperature increase of 0.85 ◦C during the period from 1880 to 2012 [2], global warming occurred much faster than expected according to a recent study that 1.5 ◦C temperature increase by 2030 and an increase of 3 ◦C by the end of this century [3], and changes in the spatial pattern of precipitation also showed intensification across most global land areas between 1950 and 2016 [4]. As a result, the frequency and severity of extreme weather, such as strong heat waves, floods, and droughts, have increased significantly [5,6]. A lot of uncertainty exists in agricultural production activities that are sensitive to climatic conditions, thereby threatening global food security. Researchers agreed that the warmer and wetter conditions may accelerate the growth and development of crops in a few areas [7]. However, several studies highlighted that the climate crisis has a negative impact on agricultural production in most regions, particularly on the North China Plain (NCP) [8], which has a severe drought risk [9–11]. Along with climate change, greenhouse gas (GHG) emissions that contribute to global warming have

**Citation:** Yi, X.; Chang, N.; Ding, W.; Xu, C.; Zhang, J.; Zhang, J.; Li, H. Modeling Adaptive Strategies on Maintaining Wheat-Corn Production and Reducing Net Greenhouse Gas Emissions under Climate Change. *Agriculture* **2022**, *12*, 1089. https:// doi.org/10.3390/agriculture12081089

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 20 June 2022 Accepted: 18 July 2022 Published: 24 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

been increasing. Nitrous oxide (N2O) is a main source of GHG emissions, and the annual growth rate was between 0.24% and 0.31% in response to climate change [2]. In addition, cropland emit about 43% of global total N2O emissions [12]. Therefore, we attempt to explore adaptive strategies in agriculture to mitigate food shortages and climate warming.

Cropland management is one possible strategy for addressing climate change [13]. In recent decades, studies have assessed many management mitigation strategies to maintain crop yield and reduce GHG emissions [14,15]. Tillage is a fundamental agriculture management option, and the adoption of conservation tillage practices such as reduced tillage and no tillage were reported to reduce carbon dioxide emissions through soil organic carbon (SOC) sequestration [16]. Li et al. [17] claimed that reducing nitrogen inputs by 13% and increasing straw retention by 20% also promoted the sequestration of SOC, as well as reduced GHGs by 13% and 11%, respectively. In addition, adjustment of the sowing date and the use of different varieties increased attainable maize yields, which will enhance the heat and water utilization efficiency of crops [18,19], and efficient irrigation practices have also alleviated the relationship between yield and climate conditions [20]. Although these strategies have been regarded as effective management practices that regulate GHG emissions, multiple practices have not been compared because of limited treatments in field experiments, and there is also a lack of quantitative analysis of the impact on crop yield and the environment under different possible climate scenarios [21,22]. Due to the uncertainty of future changes in temperature and precipitation, there is a need to identify a model to quantify and compare multiple agricultural strategies when developing optimum options to cope with the foreseeable adverse effect of climate change.

The Denitrification–Decomposition model (DNDC), a process-oriented plant–soil model, links C sequestration to GHG emissions by simulating the soil microbial activities [23,24]. In the past three decades, the model has been widely validated in various areas such as the simulation of greenhouse benefits, crop growth, and soil processes in agricultural fields [23,25,26], and there are also reports on wheat–corn rotation systems on the NCP [27,28]. Wheat and corn are important cereal crops, and wheat–corn rotation is one of the dominate cropping systems on the NCP, providing approximately 68.55% of the wheat and 22.89% of the corn for China [29,30]. In this study, based on the internationally known DNDC model [23] and field monitoring data, we expect to achieve two objectives: (1) Evaluate the impact of temperature and precipitation changes on wheat–corn yields, N2O emissions, and SOC loss at a 50-year scale. (2) Quantify the relative contributions of different agricultural practices to wheat–corn yields and the net greenhouse effect and select the adaptive strategies according to the performance of 10 strategies under different climate scenarios.

#### **2. Materials and Methods**

#### *2.1. Study Area and Experimental Description*

The study was conducted in Ye County, Henan Province, China (113◦22 E, 33◦31 N). The field site has a continental monsoon climate with a mean annual air temperature of 14.8 ◦C, a frost-free period of 212 days, and an average annual precipitation of 819 mm. Soil properties before the experiment are presented in Table 1.

**Table 1.** Soil properties of the study fields.


The study was performed in a field with a typical winter wheat–summer corn doublecropping system in this region. The experimental data were obtained from October 2015 to October 2017. There were two treatments: farmers' conventional practice (FP) and optimal practice (PT). Rotary tillage was applied three times to a depth of 15 cm in FP after corn

harvest, and then seeds were applied together with fertilizer. No tillage was carried out in PT, and the wheat seeds were broadcast directly and were mixed in the top 5 cm soil layer. The application of N fertilizer in the wheat season was divided into basal fertilizer (FP: 150 kg N·ha−1; PT: 120 kg N·ha−1) and one additional fertilizer application during the jointing stage (FP: 150 kg N·ha−1; PT: 120 kg N·ha−1). The N fertilizer in the corn season was applied at rates of 300 kg N·ha−<sup>1</sup> and 240 kg N·ha−<sup>1</sup> in FP and PT, respectively. Wheat was irrigated in November and April of the next year, and corn was irrigated at the trumpet stage.

The yields of wheat and corn, N2O fluxes, SOC (0–10 cm), soil temperature, and soil moisture (0–5 cm) were monitored in this study. For details, please refer to a previous study [31].

#### *2.2. DNDC Model*

The DNDC model was constructed by two components to describe the process of C and N transformations in an agricultural system [23]. This first component is mainly simplified into soil climate, crop growth, and decomposition sub-models, which predict soil temperature, soil moisture, pH, redox potential (Eh), and substrate concentration driven by ecological factors. The second part simulates C and N transformations mediated by soil microbes and includes nitrification, denitrification, and fermentation sub-models.

In this study, we used DNDC (version 9.5) calibrated by measured data from the FP treatment and verified the model using measured data from the PT treatment. The main inputs of DNDC are climate, soil, and management data, and output results are crop growth and soil physical and chemical data. Table 2 shows a number of user-set parameters in our study, which came from monitoring and calibrated values. Meteorological data were collected from the Meteorological Bureau of Ye country, Henan Province. We used a zero-intercept linear regression between simulations and observations to evaluate the performance of the modified DNDC. The slope and coefficient of determination (*R*2) of the regression indicate the consistency and correlation between simulations and observations, respectively. Additionally, the normalized root mean square error (nRMSE) was used for quantitative comparisons between simulations and observations [27].

**Table 2.** The DNDC Model Input Parameters for Ye County.


#### *2.3. Scenarios of Future Climate Change*

The latest research claims that the current global warming rate is far faster than expected and will increase by 1.5 ◦C by 2030 and by 3 ◦C by the end of the century [3]. Further, regional precipitation will fluctuate with the continuous warming. Climate change scenarios are used to predict future GHG emissions from agricultural ecosystems and to assess the vulnerability of agricultural production under a future climate. To accurately evaluate the potential of carbon emission reductions from wheat–corn rotation systems under future

climate change, the baseline temperature and precipitation datasets were built based on the mean daily climate values from 1998 to 2017, and seven change scenarios for temperature and precipitation were set up in this study (Table 3). The other input parameters of the DNDC model followed FP. Climate change scenario data were set as DNDC meteorological parameters to assess the 50-year effects of different climate change scenarios on the yield, SOC, and greenhouse gas emissions in a typical wheat–corn rotation system.



Note: T represents the baseline temperature, P represents the baseline precipitation, and the baseline temperature and precipitation are the daily average of the past 20 years (1998–2017).

#### *2.4. Scenarios of Adaptive Strategies*

From the above twelve climate scenarios, the four climate scenarios most sensitive in the wheat–corn rotation system were selected, and we assessed ten strategies for each climate scenario by comparison with the baseline (FP). Ten management practices were selected: (1) Reduce N fertilizer by 20% (0.8N); (2) Increase N fertilizer by 20% (1.2N); (3) Straw return (SR); (4) No tillage (NT); (5) Fertigation (FG); (6) Change the crop planting time (CP); (7) Change the time of irrigation and fertilization (CI, move topdressing irrigation one week ahead); (8) Increase irrigation by 30% (IR); (9) Cold-resistant (CR) varieties; and (10) Drought-resistant (DR) varieties. Except for the management options mentioned above, other factors (such as climate and soil) remained consistent with those used in FP.

#### *2.5. The Net Greenhouse Effect*

The net greenhouse effect was evaluated by the global warming potential (*GWP*) by considering the impacts of climate change on GHG and SOC. The *GWP* with a span of 100 years was used to represent the combined effect of *N2O*, *CO2*, and CH4, which represented the combined effect of those gases in the atmosphere and the relative effect in causing radiative forcing over different time periods. The net greenhouse effect of the same amount of *N2O* is 273 times [32] that of *CO2* over a 100-year period and has been estimated to contribute 7.4% to global warming [33]. The annual *CO2* emissions can be expressed by the annual net SOC (*dSOC*) change. There were no monitoring data of CH4 emissions to verify the model due to lower emissions under dryland farmland. Thus, the calculation was as follows:

$$GWP(k\text{g CO}\_2\cdot ha^{-1}\cdot a^{-1}) = 273 \times N\_2O - 44/12 \times dSOC \tag{1}$$

where *N*2*O* represents the annual emissions in the output results of the DNDC model, and *dSOC* is the net SOC change in the output results of the model, with a positive or negative value indicating a reduction or increase in *CO*<sup>2</sup> emissions.

#### **3. Results**

#### *3.1. Validation of the DNDC Model*

We carried out site verification under two agricultural practices to verify the applicability of the DNDC model in the study area. Modeled crop yields were in good agreement with observations for both the calibration (FP during 2015–2016) and validation (FP during 2016–2017 and PT during 2015–2017) datasets (Figure 1). The calculated statistical indices

(*R*<sup>2</sup> =0.96, *p* < 0.001, nRMSE = 22.69%) indicated that the simulated crop yields were significantly correlated with observed values. However, the slope of the zero-intercept linear equation was 0.85, indicating that our simulated values were lower than the observed crop yields.

**Figure 1.** Comparison of simulated and observed yields during 2015−2017. FP represents farmers' conventional practice, PT represents optimal practice.

The modeled soil temperature data were also generally comparable with the observed values (Figure 2). The *R*<sup>2</sup> of the linear regression of simulated against observed data was 0.98, and nRMSE was 15.97%, indicating that the simulated values had high simulation reliability in the trend and range of soil temperature variation. In addition, the DNDC model overestimated the soil temperature in several periods.

**Figure 2.** Comparison of simulated and observed soil temperature (0−5 cm) during 2015−2017.

Based on a comparison of the simulated annual cumulative N2O emissions under FP and PT treatments (2.70 and 1.92 kg N ha−1, respectively), the observed values showed a small discrepancy due to overestimation (Figure 3). However, the nRMSE with a value of 26.51% demonstrated that the simulation deviations were within an acceptable range (nRMSE < 30%) [27].

**Figure 3.** The simulated and observed values of total N2O emissions in 2016. FP represents farmers' conventional practice; PT represents optimal practice.

As Figure 4 shows, the simulations of the soil SOC from the DNDC model were close to the corresponding field observations (nRMSE = 4.73%). A significant zero-intercept linear regression (slope = 0.98, *R*<sup>2</sup> = 0.99, *p* < 0.001, n = 8) was obtained to relate the simulations of the SOC to the corresponding observations. Furthermore, different trends of SOC change existed in the two treatments, which illustrated that the DNDC model could capture and distinguish management options well.

**Figure 4.** Comparison of simulated and observed soil SOC (0−10 cm) during 2015−2017.

*3.2. The Impact of Climate Change on Crop Yield and the Net Greenhouse Effect* 3.2.1. The Yield under Climate Change Scenarios

Crop yields under different temperature climate scenarios were basically at a relatively stable level (Figure 5), with baseline (P/T) ranging from 6305.54 to 6334.50 kg C·ha−1. Crop yields had an obvious increasing (decreasing) trend over time under the increased (decreased) precipitation scenarios. The P/T−3 and P/T+3 scenarios reached the lowest yields in the 50th year, at 5615.03 and 6024.420 kg C·ha<sup>−</sup>1, respectively, a reduction of 4.46% and 10.95% compared with the baseline. There was no obvious yield reduction under the remaining temperature scenarios.

**Figure 5.** The 50-year effects on crop yields under temperature change scenarios (**a**) and precipitation change scenarios (**b**). P/T−3: temperature decreased by 3 ◦C; P/T−2, temperature decreased by 2 ◦C; P/T−1: temperature decreased by 1 ◦C; P/T: temperature unchanged; P/T+1: temperature increased by 1 ◦C; P/T+2: temperature increased by 2 ◦C; P/T+3: temperature increased by 2 ◦C; 0.7P/T: precipitation decreased by 30%; 0.8P/T: precipitation decreased by 20%; 0.9P/T: precipitation decreased by 10%; P: precipitation unchanged; 1.1P/T: precipitation increased by 10%; 1.2P/T: precipitation increased by 20%; 1.3P/T: precipitation increased by 30%.

#### 3.2.2. The Net Greenhouse Effect under Climate Change Scenarios

Annual N2O emissions showed a trend of increasing in the first ten years and then slowly decreasing over time for all climate change scenarios (Figure 6). Figure 6a,b demonstrates a positive relationship between annual N2O emissions and temperature/ precipitation. Among the temperature/precipitation treatments, the P/T−3 and 0.7P/T scenarios consistently had the lowest annual N2O emissions while the P/T+3 and 1.3P/T scenarios consistently had the highest annual N2O emissions.

**Figure 6.** *Cont*.

**Figure 6.** The 50-year effects on N2O emissions (**a**,**b**), SOC loss (**c**,**d**), and the global warming potential (**e**,**f**) under temperature change scenarios and precipitation change scenarios. P/T−3: temperature decreased by 3 ◦C; P/T−2, temperature decreased by 2 ◦C; P/T−1: temperature decreased by 1 ◦C; P/T: temperature unchanged; P/T+1: temperature increased by 1 ◦C; P/T+2: temperature increased by 2 ◦C; P/T+3: temperature increased by 2 ◦C; 0.7P/T: precipitation decreased by 30%; 0.8P/T: precipitation decreased by 20%; 0.9P/T: precipitation decreased by 10%; P/T: precipitation unchanged; 1.1P/T: precipitation increased by 10%; 1.2P/T: precipitation increased by 20%; 1.3P/T: precipitation increased by 30%.

Overall, SOC changes in baseline ranged from −174.68 to −85.43 kg C-ha<sup>−</sup>1. Figure 6c,d illustrates that increasing temperature and decreasing precipitation promoted SOC loss. For all climate change scenarios, the annual dSOC was the lowest under P/T+3 and 1.3 P/T, reaching −172.03 kg C-ha−<sup>1</sup> and −128.37 kg C-ha−1, respectively, in the 50th year. The highest dSOC was measured in P/T−3 and 0.7 P/T+3, but the overall loss was less than zero, which indicated that SOC was lost.

The net greenhouse effect of the baseline ranged from 853.58 to 1204.50 kg CO2-eq-ha<sup>−</sup>1. (Figure 6e,f). Along with the increase in temperature or precipitation, the greenhouse effect increased, being the lowest under P/T−3 or 0.7 P/T and the highest under P/T−3 and

1.3 P/T. The trends changed under the scenarios of temperature change and precipitation change, with a sharp increase in the first ten years but a slow decrease on a long-term basis. However, the values of the net greenhouse effect were all positive, leading to climate warming.

#### *3.3. Adaptive Strategies in Response to Climate Change*

#### 3.3.1. Adaptation to the P/T−3 and P/T+3 Scenarios

In P/T−3, eight strategies reduced production by more than 9.89%, while two strategies (CP and CR) had little effect on yields (Table 4). For N2O emissions, two strategies (0.8N and NT) resulted in the largest emission reductions of about 30%. CP, CI, IR, CR, and DR reduced N2O emissions by 7.96% to 17.41%, while SR and FG increased N2O emissions by 4.48% and 126.87%, respectively. For SOC changes, all strategies reduced the annual SOC loss, and SR and NT increased SOC by 1516.88 and 126.81 kg C-ha−1, respectively. Considering SOC and N2O emissions together, only FG increased the net greenhouse effect by 52.15%, while the remaining strategies reduced the net greenhouse effect, especially two strategies (SR and NT) reduced the net greenhouse effect by 562.57% and 63.60%, respectively.

**Table 4.** The effects on annual yield, N2O emissions, and SOC loss under the P/T−3 and P/T+3 scenarios.


Note: A negative value represents a percentage decrease relative to the baseline, and a positive value represents a percentage increase relative to the baseline. 0.8N: Decrease N by 20%; 1.2N: Increase N fertilize by 20%; SR: Straw Return; NT: No Tillage; FG: Fertigation; CP: Change the crop planting time; CI: Change the time of irrigation and fertilization; IR: Increase irrigation by 30%; CR: Cold-resistant variety; DR: Drought-resistant variety.

In P/T+3, all strategies reduced the wheat–corn yields and increased the N2O emissions (Table 4). For the annual SOC change, SR increased the soil carbon pool sequestration to 1008.48 kg C-ha−1, and the remaining strategies resulted in carbon loss. Considering SOC and N2O emissions together, only SR and NT reduced the net greenhouse effect by 391.36% and 11.57%, but the yields were reduced by 3.02% and 4.30%, respectively.

#### 3.3.2. Adaptation to the 0.7 P/T and 1.3 P/T Scenarios

Under the 0.7 P/T scenario (Table 5), only FG and DR did not reduce the crop yields greatly, while other strategies caused a yield reduction more than 5.96%. For N2O emissions, all of the remaining seven strategies except 1.2N, SR, and IR reduced N2O emissions; 0.8N and NT caused a reduction of more than 20%. For the net greenhouse effect (3.11–491.03%), although SR largely reduced the net greenhouse effect, it reduced the yield by 5.96%. FG and DR were able to reduce the net greenhouse effect by 8.34% and 10.91% while stabilizing yield.


**Table 5.** The effects on annual yield, N2O emissions, and SOC loss under the 0.7P/T and 1.3P/T scenarios.

Note: A negative value represents a percentage decrease relative to the baseline, and a positive value represents a percentage increase relative to the baseline. 0.8N: Decrease N fertilizer by 20%; 1.2N: Increase N fertilize by 20%; SR: Straw Return; NT: No Tillage; FG: Fertigation; CP: Change the crop planting time; CI: Change the time of irrigation and fertilization; IR: Increase irrigation by 30%; CR: Cold-resistant variety; DR: Droughtresistant variety.

In 1.3 P/T (Table 5), all strategies increased yields by more than 3.87%. Among the ten strategies, only 0.8 N and NT reduced N2O emissions by 16.92% and 6.47%, respectively. Similar to the P/T+3 scenario, SR increased the soil carbon pool sequestration to 1169.4 kg C-ha<sup>−</sup>1, while the remaining strategies reduced the soil carbon pool, but NT still reduced the amount of SOC loss compared to the baseline. Considering SOC and N2O emissions together, only 0.8N, SR, and NT reduced the net greenhouse effect by 2.50%, 461.85%, and 27.81%, respectively, while the rest of the strategies increased the net greenhouse effect (9.26% to 55.51%).

#### **4. Discussion**

#### *4.1. Model Performance*

In this study, adaptability of the DNDC model to a wheat–corn rotation system on the NCP was confirmed through combination with field monitoring data. There is ample evidence that soil temperature influences plant decomposition, soil organic carbon, and N2O emissions by affecting the ratio of N2/N2O [8,34]. Thus, the accurate estimation of soil temperature is vital for a reliable model estimation of crop yields and the net greenhouse effect. The three indexes of soil temperature in our study confirmed that the DNDC model is capable of correcting simulated soil temperature. Figure 4 shows that the soil SOC under rotary tillage was significantly lower than that under no-tillage, which decreased soil disturbance. It showed that DNDC effectively distinguished the differences caused by management in our research. For N2O emissions, there were uncertainties, i.e., that the simulations were not directly evaluated against observed values in 2017 due to the absence of N2O emission data. However, the simulation results of total N2O emissions in 2016 in Figure 3 are good. Similarly, previous studies showed that DNDC provided a good simulation of crop yield, N2O emissions, and SOC for different management options in a wheat–corn system [27,35,36].

However, we noted that the phenomena of overestimation or underestimation of the simulated values were still present for two treatments and validated indicators. Human errors in the field experiments (sampling, index measurements, and data processing) are inevitable. In addition, some parameters used in the model were acquired from default values rather than experimental data, ignoring various external environments [24]. Furthermore, the model only includes some of the soil indicators, far fewer than the real environment. Therefore, we need to pay attention to the influence of those defects, but as a whole, the accuracy of the model simulation is unquestionable, which can provide a certain guarantee for subsequent research.

#### *4.2. Climate Change Impact on Crop Yield and the Net Greenhouse Effect*

The changed relationship between wheat–corn yields and temperature/precipitation in our study was meaningful for yield forecasting and agricultural planning. Crop yields with small temperature fluctuations (±1 ◦C and ±2 ◦C) were basically consistent with the baseline in this study due to small range fluctuations that caused relatively little stress to crop growth [37]. However, a 3 ◦C decrease (increase) would have a significant impact on crop yields because accumulated temperature cannot be tolerated by crops, especially wheat [38]. If we only consider climate warming, an average 2.58% of crop yield is lost for every degree of warming at the national level [39]. Among the three main crops, wheat, corn, and rice, the negative impact of rising temperature on wheat is the most evident, followed by corn. However, the response of rice yields was not significant at the subregional level [39,40]. We can see that precipitation was positively correlated with crop yields in this study. Liu et al. [9] also proved that the projected gain of crop yields increased due to increasing precipitation (+15% or 30%). The result confirmed that water shortage is a strong limiting factor for agricultural development on the NCP [41].

We researched N2O emissions from wheat–corn rotations as a hot-spot agricultural cropping system of GHG emissions [29,42]. In our study, the total amount of N2O emissions showed an increasing trend with rising temperature or precipitation. Factually, soil moisture status and soil temperature are influenced by climate conditions [31,38,43]. The positive feedback of soil temperature to air temperature accelerates N2O emissions through the decomposition of soil organic matter, biological activity of denitrifying bacteria, and indirect dissolution and release of oxygen in water [44]. Precipitation changes the soil moisture and thus has a decisive influence on nitrification and denitrification processes by affecting the metabolic activity of soil microbial cells and nutrient transport [37].

SOC is an important component of soil, and we found that the effect of temperature change on SOC was greater than the effect of precipitation change. This result differs from that of Hursh et al. [45], who reported that soil moisture was the most significant predictor variable. Soil moisture was affected in our simulation because irrigation was set up in the model simulation. SOC decreased with the increase in temperature. The reason is that warming temperature can accelerate the decomposition of SOC [46] and alter the return of plant residues to the soil by affecting plant growth [47]. Although precipitation has a small impact on SOC compared to temperature, more precipitation contributed to the formation of a higher temperature–humidity environment in the soil. The activities of soil microorganisms and mineralization are accelerated under a higher temperature–humidity environment, leading to the decomposition of carbon inputs and ultimately reducing the organic carbon conversion efficiency [48]. It is worth noting that SOC was lost regardless of changes in temperature or precipitation in our study. The results stress the necessity of observing SOC changes when evaluating the impact of strategies on GHG emissions.

#### *4.3. Adaptive Strategies in Response to Climate Change*

The study on the change characteristics of crop yields under various strategies can provide theoretical support for agricultural development under climate change. Our results suggested that CP and CR could effectively counteract the yield reduction caused by decreasing temperature. Changing the planting time of crops can increase the accumulated temperature required during crop growth, and cold-resistant varieties compensate for the cumulative temperature required for crop growth by reducing the demand [9]. As temperatures rise, the water demand of crops increases due to evaporation, which will result in lower crop yields. FG satisfied the water requirement of crops by watering, and DR required less water; thus, the two strategies effectively mitigated a yield reduction.

In our study, the N2O emission in FG was the highest under P/T−3, P/T+3, and 1.3P/T. Fertilization leads to a high risk of N2O emissions due to pulses of excessive N and water in the soil [49]. However, some authors [50] believe that fertigation can reduce N2O emissions under the premise of watering. One possible reason for our result was that the irrigation frequency, the amount of irrigation water, and the fertilization frequency were increased to meet the needs of crop growth during the model simulation in FG. We also found that more N2O was emitted from 1.2N than from 0.8N under the scenarios of increasing temperature and precipitation. The amount of N inputs converted to N2O emissions was higher when total N inputs were higher, especially under high temperature and high humidity [31]. SR indirectly affected N2O emissions by influencing soil carbon, nitrogen availability, and soil aeration after straw application. Kravchenko et al. [42] found that straw application could increase soil N2O emissions. Shan et al. [51] also found that straw return significantly increased N2O emissions and nitrogen in the soil in dryland areas.

The management options that significantly increased SOC were SR and NT, which increased the input of SOC. Plant C input and microbial decomposition are two main processes for SOC storage. Straw return can directly return carbon fixed by crops to soil, thus there was a significant positive correlation between the amount of straw returned and the SOC content [52,53]. Lehtinen et ai. [54] showed that straw return increased the SOC content by 7% on average. Liao et al. [55] studied farmland in North China and demonstrated that the soil SOC content increased from 7.8 g·kg−<sup>1</sup> to 11.0 g·kg−<sup>1</sup> from 1981 to 2011 after straw was applied. Lenka et al. [56] also revealed that the return of straw to the field can increase the SOC content in the topsoil (0–10 cm) in a long-term positioning test. No tillage is also an effective management option to increase SOC by improving the soil structure and by accelerating the formation of soil aggregates [57,58]. Our results are similar to a previous study, i.e., that the SOC sequestration rate was higher under no tillage than under rotary tillage and conventional tillage [59].

Taking into account the yield, N2O emissions, and SOC, we filtered out the appropriate strategies that maintained the crop yields. CP and CR reduced the net greenhouse effect and guaranteed crop yields under P/T−3. The two management options may satisfy the accumulated temperature conditions for crop growth. The N2O emissions and the annual loss of SOC decreased with decreasing temperature, comprehensively resulting in an overall reduction of the net greenhouse effect. However, the yields under each management option did not reach the baseline level in P/T+3, although there were some differences in yield. It may be difficult to cope with the climate warming by using a single management option; hence, a combination of management options should be considered. Under the 0.7P/T scenario, the yields were maintained and the net greenhouse effect was reduced at the same time in FG and DR due to the reduction of N2O emissions and annual SOC loss by ensuring the water requirement of the crops. For 1.3P/T, the simulation results showed that the SOC from straw return was higher and exceeded the GHG emissions brought about by straw return. Thus, the net greenhouse effect was reduced compared with traditional benchmark measures, and crop yield was increased at the same time. Therefore, straw returned to the field is a recommended management option.

#### *4.4. Limitations*

We investigated the effects of ten management practices on crop yields, NO2 emissions, and SOC based on different climate scenarios, which can provide some reference for addressing climate change. However, our study is still limited by many factors. The climate change scenarios in our study may differ from future situations, and we need to find more reliable climate scenarios. It is essential to justify and improve the input parameters of DNDC to reduce the uncertainty of the model simulation. There were large differences in the contributions of crop yield and the N2O emissions among the ten strategies in our study. When developing adaptive strategies for sustainable wheat–corn production, more strategies should be explored while simultaneously measuring yields, SOC changes, and GHG emissions.

#### **5. Conclusions**

The DNDC model was evaluated using field monitoring data of the wheat–corn yield, annual cumulative N2O emissions, and SOC under two treatments with different tillage practices and N applications. The model evaluations demonstrated that the simulations

were consistent with the observations and captured the SOC change under no tillage and rotatory tillage. The temperature scenarios contributed to a crop yield reduction with the greatest impact under scenarios of a 3 ◦C change, while the net greenhouse effect increased with increasing temperature. Although accompanied by increased crop yields, more precipitation intensified the greenhouse effect, especially in the 30% change in the precipitation scenarios. The results suggested that the application of cold-resistant varieties and a change in the planting time mitigated the decrease in yield caused by decreasing temperature. Resistant varieties minimized the negative impact of decreasing precipitation. N fertilizer reduction effectively mitigated N2O emissions but had little effect on SOC changes under increasing temperature/precipitation scenarios. Through the consideration of yields, N2O emissions, and SOC together, straw return could reduce the net greenhouse effect by increasing SOC while maintaining crop yields under warming temperature situations.

**Author Contributions:** X.Y.: Conceptualization, Formal analysis, Methodology, Investigation, Data Curation; Writing—Original Draft. N.C.: Conceptualization, Formal analysis, Methodology, Investigation, Data Curation; Writing—Original Draft. W.D.: Methodology, Writing—Review & Editing. C.X.: Methodology, Data Curation. J.Z. (Jing Zhang): Resources, Funding acquisition. J.Z. (Jianfeng Zhang): Methodology, Resources. H.L.: Funding acquisition, Data Curation, Supervision. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the earmarked fund for China Agriculture Research System (No. CARS-23-B18).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** We would like to thank our Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences for support in data collection, field visits, and valuable thoughts for the preparation of this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Glossary**


#### **References**


## *Article* **Development of Data-Driven Models to Predict Biogas Production from Spent Mushroom Compost**

**Reza Salehi 1, Qiuyan Yuan <sup>2</sup> and Sumate Chaiprapat 1,3,\***


**\*** Correspondence: sumate.ch@psu.ac.th; Tel.: +1-438-889-6591

**Abstract:** In this study, two types of data-driven models were proposed to predict biogas production from anaerobic digestion of spent mushroom compost supplemented with wheat straw as a nutrient source. First, a *k*-nearest neighbours (*k*-NN) model (*k* = 1–10) was constructed. The optimal *k* value was determined using the cross-validation (CV) method. Second, a support vector machine (SVM) model was developed. The linear, quadratic, cubic, and Gaussian models were examined as kernel functions. The kernel scale was set to 6.93, while the box constraint (*C*) was optimized using the CV method. Results demonstrated that *R*<sup>2</sup> for the *k*-NN model (*k* = 2) was 0.9830 at 35 ◦C and 0.9957 at 55 ◦C. The Gaussian-based SVM model (*C* = 1200) provided an *R*<sup>2</sup> of 0.9973 at 35 ◦C and 0.9989 at 55 ◦C, which are slightly better than those achieved by *k*-NN. The Gaussian-based SVM model produced *RMSE* of 0.598 at 35 ◦C and 0.4183 at 55 ◦C, which are 58.4% and 49.5% smaller, respectively, than those produced by the *k*-NN. These findings imply that SVM modeling can be considered a robust technique in predicting biogas production from AD processes as they can be implemented without requiring prior knowledge of biogas production kinetics.

**Keywords:** anaerobic digestion; biogas production; *k*-nearest neighbours; support vector machine

#### **1. Introduction**

The electrical and thermal energy production processes that use non-renewable resources (i.e., fossil fuels; oil, and coal) are becoming less attractive globally. Even though such resources are rich in energy and relatively inexpensive to process, they are limited in supply and will soon be depleted. In addition, the utilization of fossil fuels emits additional greenhouse gases into the atmosphere, which has instigated climate change [1]. Hence, a large number of research bodies have aligned to overcome such an increasing universal concern. One of the most promising and attractive alternative solutions is the use of biogas derived from wastes or renewable feedstock [2,3].

Biogas, a mixture consisting chiefly of methane (CH4) and carbon dioxide (CO2), is the end-product of anaerobic digestion of organic matters (e.g., agricultural residues, livestock manure, food waste, sewage sludge, etc.) [4–8]. Anaerobic digestion is a complex multi-step process that is carried out by a consortium of different microbial species known as anaerobes. Uniquely, they do not need molecular oxygen for their metabolism and growth [9]. The key steps of the anaerobic digestion process, together with the possible applications of biogas, and its adverse environmental impacts are outlined in Figure 1.

**Citation:** Salehi, R.; Yuan, Q.; Chaiprapat, S. Development of Data-Driven Models to Predict Biogas Production from Spent Mushroom Compost. *Agriculture* **2022**, *12*, 1090. https://doi.org/10.3390/ agriculture12081090

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 21 June 2022 Accepted: 20 July 2022 Published: 24 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** A schematic flowchart showing the simple representation of biogas generation during anaerobic digestion process, along with its applications and environmental impacts [10–14]. Notes: <sup>a</sup> A non-biological process in which the cell walls are physically or chemically broken down to release intracellular substrate. <sup>b</sup> Biogas release to the atmosphere should be avoided because CO2 and CH4, the main constituents of biogas, are contributors to global warming. <sup>c</sup> Biogas combustion should be avoided because it is associated with the release of pollutants (e.g., CO, SO2, and NOX) to the atmosphere; SO2, and NOX can react with moisture in the atmosphere to form sulfuric/nitric acid resulting in acid rain. <sup>d</sup> Solar energy and shale gas, due to being plentiful and cheap, can drive out the biogas application in electricity generation in the near future. <sup>e</sup> bio-methane can be used as a vehicular fuel or can be injected into the natural gas network. <sup>f</sup> bio-methanol, and syngas (a mixture of CO and H2) that can be generated via reforming technology. <sup>g</sup> Biogas can be converted to SCP by the action of methanotrophic bacteria alone, or in combination with autotrophic hydrogen oxidizing bacteria or algae; SCP has excellent potential as an animal feed supplement. Abbreviations: AAs: amino acids; Ac: acetate; Bu: butyrate; LCFAs: long chain fatty acids; MSW: municipal solid waste; Pr: propionate; SCP: single-cell protein; Va: valerate.

The increasing global interest in biogas power plant establishment via anaerobic digestion of various organic matters has resulted in attempts to develop numerous mathematical models to predict and suggest optimal operations. Hill [15] developed a model to describe the digestion of animal wastes, assuming that the main five bacterial groups involved in the overall digestion process (acidogenic bacteria, hydrogenotrophic bacteria, homoacetogenic bacteria, acetoclastic bacteria, and H2 utilizing methane bacteria) are inhibited by a high concentration of fatty acids (FAs). Mosey [16] proposed a model consisting of four reactions (one acidogenic reaction, one acetogenic reaction, and two methanogenic reactions), which also takes into account the role of H2. According to this model, in case of a sudden rise in the organic loading rate, an accumulation of volatile fatty acids (VFAs) is likely to occur; this results in a decrease in pH that inhibits H2 utilizing methanogenic bacteria. In other words, H2 partial pressure is increased, which leads to further accumulation of propionic/butyric acid (CH4 generation is stopped when pH drops below 5.5). Based on Mosey's model, Pullammanappallil et al. [17] introduced a model taking into account the gas phase, and acetoclastic inhibition by undissociated FAs. Angelidaki et al. [18] presented a model considering hydrolysis, acidogenesis, acetogenesis, and methanogenesis, which is suitable to describe the behavior of anaerobic digesters fed with manures. This model was developed by incorporating some assumptions as follows: (i) methanogenesis is inhibited by free NH3, (ii) acetogenesis is inhibited by acetic acid, (iii) acidogenesis is inhibited by total VFAs, and (iv) the degree of NH3 ionization, the maximum specific growth rate of bacteria are pH and temperature dependent.

In all the above-mentioned models, organic material was taken into account as a whole; in other words, they are incapable of dealing with complex feed composition. In this regard, the International Water Association (IWA) task group for mathematical modeling of the anaerobic digestion process developed a model known as Anaerobic Digester Model No 1 (more often abbreviated as ADM1), that takes the complex organic substrates into account [19].

Although the kinetic-based mathematical models for describing the anaerobic digestion process can help engineers and asset managers to better plan the management of the biogas plants, it is often criticized that most of them are inherently too complex due to a large number of stoichiometric coefficients and parameters reflecting the kinetic properties of the enzymes and microorganisms that govern the physicochemical and biochemical reactions through anaerobic digestion processes [20]. In addition, these models typically involve physicochemical equilibrium expressions and differential mass balance equations for components in the liquid phase (substrates for acidogenic/acetogenic/methanogenic organisms and their corresponding microbial masses) and in the gas phase (e.g., CH4 and CO2). Hence, these models are often complicated to solve, and many simplifying assumptions must be made to reduce their complexity. However, incorporating simplifying assumptions into the models may not hold in practice. Fedailaine et al. [21] modeled the biokinetics of the anaerobic digestion process involving eight simplifying assumptions, which inevitably limited the application of this model to full-scale anaerobic digesters. In addition, applying assumptions to the models lowers the precision of the models; in other words, an under- or over-estimation of the response of the models will likely occur. For these reasons, developing a simple yet highly predictive model to estimate biogas production from the anaerobic digestion process is highly desired. As such, a different branch of models, called artificial intelligence (AI)-based models (more often known as easy-to-use black-box models) may be recruited. These models have advantages over complex mathematical models because they are constructed on a measured dataset (i.e., input–output data pairs for a given system) without requiring complicated kinetic relationships between the input variables and the corresponding outputs [22,23]. In addition, the AI modeling approach is proven as a robust tool with high generalization power. Holubar et al. [24] used an artificial neural network (ANN) to model an anaerobic digester fed with a mixture of primary (raw) sludge and surplus activated sludge originating from a local municipal wastewater treatment plant. The results showed that ANN is a suitable tool for modeling such a process. Cakmakci [25] applied an adaptive neuro-fuzzy inference system (ANFIS) to predict methane yield in an anaerobic digester fed with pre-thickened raw sludge. According to the findings, there was good agreement between the measured and predicted values. Kusiak and Wei [26] developed several predictive models through data mining algorithms to predict methane production from the anaerobic digesters in the Des Moines Wastewater Reclamation Facility. The results showed that the model built by the ANFIS algorithm offered excellent predictive accuracy with a coefficient of determination (*R*2) of 0.99, and a percentage error of 0.08. Nair et al. [27] used ANN to evaluate the effects of the types of substrates (such as food/vegetable waste and yard trimming), and organic loading rate on CH4 production. The training and validation *R*<sup>2</sup> values were greater than 0.88, indicating that the model's learning and generalization power were satisfactory. Dach

et al. [28] reported that ANN can be considered an appropriate tool to estimate CH4 from anaerobic digestion of slurry from animal waste and agricultural residues. Tan et al. [29] compared the performance of ANFIS and the ADM1 to predict biogas production from the anaerobic digestion of palm oil mill effluent under thermophilic conditions. The authors reported that ANFIS yielded higher predictive accuracy compared with the results obtained using the ADM1. In another study conducted by Beltramo et al. [30], an ANN model was constructed to predict the biogas production rate from a mesophilic anaerobic digester fed with a mixture of maize, grass silages, and pig/cattle manure. The authors conclude that the ANN modeling approach can be considered a promising alternative to ADM1.

This study aimed to develop, validate, and test two different predictive models based on the AI modeling approach, including *k*-nearest neighbors and support vector machine (referred to hereafter as *k*-NN and SVM, respectively) to predict biogas production from anaerobic digestion of spent mushroom compost (SMC). The independent variables involved include temperature, carbon-to-nitrogen ratio (C/N), and retention time (RT). SMC is a bulky residue from mushroom farms, and the waste generated by the mushroom processing industry. It is an ideal source of general nutrients (e.g., nitrogen and phosphorus) and is rich in organic matter that can be used for producing biogas. It is worth mentioning that the nutritional value and the content of organic matter of SMC depend on the types of cultivated mushroom species.

The predictive performance of these models was separately investigated and eventually compared with each other and with the ANN, ANFIS, and logistic models developed by Najafi and Faizollahzadeh Ardabili [31] by means of two statistical indices, including *R*2, and root mean squared error (*RMSE*). To the best of the authors' knowledge, the application of *k*-NN and SVM modeling approaches to predict biogas production from *SMC* has never been exploited.

#### **2. Materials and Methods**

A schematic portrait depicting the workflow of this study is shown in Figure 2; see text for further details.

**Figure 2.** A schematic illustration of the workflow of this study (a the one that provides the least validation error); see text for further details.

#### *2.1. Dataset*

The experimental data were taken from the study of Najafi and Faizollahzadeh Ardabili [31]. Briefly, four 2.5 L batch mode anaerobic digesters, each with an effective volume of 1.5 L, were fed with a mixture of SMC and wheat straw (WS) to induce different C/N ratios of 12.2, 20, 30, and 40. The characteristics of SMC and WS are provided in Table 1.



SMC: spent mushroom compost; WS: wheat straw; TS: total solids; VS: volatile solids.

The authors considered the initial TS content of the substrate in the anaerobic digesters as a constant value (8%), and referring to the values of nitrogen and organic carbon for the SMC and WS (Table 1), the contents of SMC and WS in terms of g TS and g VS as a function of C/N ratio were computed as shown in Table 2.

**Table 2.** The content of substrate (SMC and WS) fed to the anaerobic digesters as a function of the C/N ratio examined.


SMC: spent mushroom compost; WS: wheat straw; TS: total solids; VS: volatile solids; C/N: carbon-to-nitrogen.

Each anaerobic digester was inoculated with a 10 g bovine rumen solution with a concentration of 1000 g bovine rumen per liter; the bovine rumen solution was kept at a temperature of 37 ◦C for five days to assist bacteria in growing more rapidly. The anaerobic digesters then were placed in hot water baths at mesophilic temperature (35 ◦C) and thermophilic temperature (55 ◦C). The biogas produced from the reactors was measured by a water displacement method for two weeks. All the tests were conducted with three replications. The produced biogas from the reactors was measured by a water displacement method for two weeks. Table 3 shows the experimental data used in this study.

#### Data Pre-Processing

The dataset shown in Table 3 was used to develop different predictive models compared with those presented by Najafi and Faizollahzadeh Ardabili [31]. As seen in Table 3, the dataset consists of a total number of 112 input–output data pairs (referred to hereafter as observations); the *j*-th observation contains a collection of 4 data points as {*x<sup>j</sup>* <sup>1</sup>, *<sup>x</sup><sup>j</sup>* <sup>2</sup>, *<sup>x</sup><sup>j</sup>* <sup>3</sup>, *<sup>y</sup><sup>j</sup>* } for *j* = 1 to 112, where *x*1, *x*2, and *x*<sup>3</sup> stand for temperature, C/N ratio, and RT, respectively, while *y* stands for the cumulative biogas production.

Prior to utilizing the dataset to develop a predictive model, it was randomized using Excel (version 2016, Microsoft Corp., Redmond, WA, USA), and then split into two disjoint subsets, including training and testing ones. Ninety observations corresponding to 80% of the dataset were assigned to the training subset, while the remaining 20% of the dataset (i.e., 22 observations) were used as the testing subset. The training subset allowed to adjust the model parameters in order to minimize the error between the experimental data and the model predictions. Meanwhile, the testing subset was employed to evaluate the accuracy of the trained (developed) model for predicting the output. The training and testing subsets were stored in the workspace of MATLAB® (trial version, R2020a) (MathWorks Inc., Natick, MA, USA) in the form of arrays.


**Table 3.** Biogas production in the experimental-anaerobic digester runs [31].

C/N: carbon-to-nitrogen ratio; RT: retention time (d); CBP: cumulative biogas production (mL g VS<sup>−</sup>1); VS: volatile solids; anaerobic digester's volume = 2.5 L; operation mode: batch; feedstock: a mixture of spent mushroom and wheat straw.

#### *2.2. Modeling Approaches*

#### 2.2.1. *k*-NN

The *k*-NN approach was initially proposed by Fix and Hodges [32] and was later expanded by Cover and Hart [33]. It is recognized as one of the top 10 influential data mining algorithms in machine learning research due to its simplicity in implementation and efficacy in terms of prediction performance [34]. The *k*-NN algorithm was initially developed with successful application in solving problems with pattern classification, and it was later utilized as a valuable tool for regression purposes. In other words, the *k*-NN algorithm can be used to predict either class labels or continuous variables. Over the past few decades, *k*-NN algorithm has attracted impressive attention and is applied in the fields of engineering, science, business, medicine, etc. When using the *k*-NN algorithm, the main challenges are associated with the determination of the number of neighbors (*k*), the distance function, and the weighting function [35]. A brief description of the determination of these hyperparameters is provided in the Supplementary Materials (Section A) [36–38].

In order to demonstrate how *k*-NN algorithm is used in developing regression models, let us suppose that Figure 3 shows a number of observations (input–output data pairs) indicated as black square points for a particular system (*X* stands for the number of

observations, and *Y* stands for its corresponding output). Let the blue square be the query observation whose output is unknown, and suppose that the *k*-NN algorithm uses five nearest neighbors. The black/red solid lines connecting the query data point with other data points represent the distances, which can be computed based on a distance function specified by the user (e.g., Equation (S1)). The output of the query data point can be estimated by applying a weighting function (e.g., Equation (S4)) considering the distances between the query data point and the five nearest neighbors. The computational procedure of the *k*-NN algorithm is depicted in Figure 4. It consists of three steps as follows: Step 1 computes the distances between each observation in the testing subset (called query observations) and every observation in the training subset. Step 2 sorts the distances measured from the smallest to the largest, while in Step 3, an appropriate value is assigned to *k*. Once a weighting function is used, the target output is determined.

In this study, a *k*-NN model was developed based on the experimental data (shown in Table 3) using a script written in a MATLAB environment. The Euclidean distance function (Equation (S1)) was used to determine the distances between each query observation (output from the testing subset) and all observations in the training subset. Once all the distances were computed, the *k* neighbors (*k* varied from 1 to 10) with the minimum distances from the query observation were assigned a weight (Equations (S2) and (S3)). Thereafter, the output of the query observation was computed in accordance with Equation (S4).

A five-fold cross-validation (CV) approach was performed in order to obtain an optimal value for *k*. A brief description of an example of a *q*-fold CV is provided in the Supplementary Materials (Section B) [39].

After determining the optimal *k* value, the trained model was used to make predictions using the testing dataset, which was unseen throughout the training process.

**Figure 3.** A basic illustration of how *k*-NN algorithm is used in developing regression models. Notes: The solid lines connecting the query data point with other data points represent the distances, which are computed using a distance function specified by the user. The distances from the five nearest neighbors (*k* assumed to be 5), shown as red solid lines, are considered herein to calculate the output of the query data point using a weighting function specified by the user (see Figure 4 for the detailed computational procedure).

**Figure 4.** A graphical representation of the computational steps of *k*-NN algorithm for solving regression problems. Notes: The inputs to the algorithm are *Xtr*, *Ytr*, *Xts*, *Yts* (a column vector with *n*\* elements where all the elements are initially set to zero), and *k*; *Yts* is the target output. The reader is referred to Table 4 for the description of the symbols used in this figure.



<sup>a</sup> A function found in MATLAB® (trial version, R2020a) (MathWorks Inc., Natick, MA, USA) that produces a matrix consisting of *n* rows, each is a copy of the *i*-th row of matrix *Xts;* <sup>b</sup> A function that returns the number of rows in matrix *Xtr;* <sup>c</sup> A function that returns the number of rows in matrix *Xts;* <sup>d</sup> A function that returns the number of columns in matrix *Xtr*; Subscripts "*tr*", and "*ts*" stand for "training" and "testing", respectively.

The *k*-NN model performance was assessed by two commonly used statistical indices: *R*<sup>2</sup> and *RMSE*. *R*<sup>2</sup> represents the goodness-of-fit between the measured (actual) values and their corresponding predicted values, which is defined by Equation (1).

$$R^2 = 1 - \frac{\sum\_{i=1}^{n} \chi\_i - \chi\_{\text{prcl},i}}{\sum\_{i=1}^{n} \chi\_i - \chi\_{\text{avg}}} \tag{1}$$

*RMSE*, a measure of the average magnitude of the error, is calculated in accordance with Equation (2):

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left( Y\_i - Y\_{prcl,i} \right)^2} \tag{2}$$

where *Yi* is the actual value of the output, and *Ypred*.,*<sup>I</sup>* is the corresponding model prediction for the *i*-th observation; *Yavg.* is the average value of *Yi* (*i* = 1, 2, ... , *n*); and *n* is the total number of observations (in the training or testing subset), on which the *R*<sup>2</sup> and *RMSE* are estimated.

It is evident from Equations (1) and (2), that the values of *R*<sup>2</sup> closer to one and *RMSE* closer to zero demonstrate a smaller value of (*Yi* − *Ypred*.,*I*). In other words, the model perfectly fits the data when *R*<sup>2</sup> = 1 and *RMSE* = 0.

#### 2.2.2. SVM

SVM, a supervised learning technique within the field of computational intelligence, was originally developed at AT&T Bell Laboratories (Holmdel, NJ, USA) by Vapnik [40]. It can be used to solve data classification tasks, which is beyond the scope of this paper and can be extended to solve regression problems, which is the focus of this paper.

Suppose a certain problem is represented by a dataset {(*xi*, *yi*) }*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> where *xi* ∈ *<sup>R</sup><sup>d</sup>* is a vector of *d* input features, *yi* ∈ *R* is the corresponding scalar output value, and *n* is the total number of data patterns. The goal of SVM is to find a regression function *f*(*x*) that estimates the output value whose deviation from the target (actual) value *yi*, for all *xi*, is at most epsilon (*ε*). In other words, an error larger than *ε* is not tolerated. In addition, *f*(*x*) should be as flat as possible.

For simplicity, let us first consider the case of a linear SVM regression, which can be expressed in the following form:

$$f(\mathbf{x}) = \langle w, \mathbf{x} \rangle + b \tag{3}$$

where *<sup>w</sup>* <sup>∈</sup> *<sup>R</sup><sup>d</sup>* is the weight vector, *<sup>b</sup>* <sup>∈</sup> *<sup>R</sup>* is the so-called bias term, and *w*, *<sup>x</sup>* denotes the dot product between the weight vector *w* and vector *x* that is defined as:

$$
\langle w, x \rangle = \sum\_{j=1}^{d} w\_j x\_j \tag{4}
$$

In order to ensure that *f*(*x*) is as flat as possible, the Euclidean norm of *w*, i.e., *w* , should be minimized. This can be represented as a convex optimization problem to minimize:

$$J(w) = \frac{1}{2} \|w\|^2 \tag{5}$$

$$\text{subject to } \begin{cases} \quad \forall i \; : \; y\_i - \langle w\_i, x\_i \rangle + b \le \varepsilon \\ \lor i \; : \; \langle z\_i, \dots, z\_n \rangle + b \le \varepsilon \end{cases}$$

$$\{\text{"true"} \times \text{":// } \forall i: \langle w\_i, x\_i \rangle + b - y\_i \le \varepsilon\}$$

However, it is necessary to point out that such a function  $f(\mathbf{x})$  that satisfies these constraints may not exist. Therefore, the slack variables  $\mathfrak{f}\_i^\*$  and  $\mathfrak{f}\_i^\* \in \mathbb{R}$  are required to be

introduced. Including the slack variables, Equation (5) can be written as follows (also called the primal objective function):

$$J(w) = \frac{1}{2} \|w\|^2 + \mathbb{C} \sum\_{i=1}^n (\xi\_i^\* + \xi\_i^\*) \tag{6}$$

$$\text{subject to} \begin{cases} \forall i: y\_i - \langle w, x\_i \rangle - b \le \varepsilon + \xi\_i^\* \\ \forall i: \langle w, x\_i \rangle + b - y\_i \le \varepsilon + \xi\_i^\* \\ \xi\_i, \xi\_i^\* \ge 0 \\ \mathbb{C} > 0 \end{cases}$$

where parameter *C* is a user-defined constant, known as box constraint, which determines the trade-off between the flatness of *f*(*x*) and the amount up to which deviations greater than *ε* are acceptable.

To solve Equation (3), it is possible to use the Lagrangian function and optimal constraints, to obtain a linear SVM regression [41] (see Section C in Supplementary Materials for the detailed computational procedure). In the case of a non-linear relationship between the input variables and the output, the SVM model can be simply constructed by mapping the inputs into a high-dimensional feature space, *F*:

$$
\varphi \,\,:\, R^d \to F \tag{7}
$$

Thus, Equation (S12) can be formulated in the following form (so-called non-linear SVM regression):

$$f(\mathbf{x}) = \sum\_{i=1}^{n} (a\_i - a\_i^\*) K(\mathbf{x}\_{i\prime}, \mathbf{x}) + b \tag{8}$$

where the term *K*(*xi*, *x*) is defined as the kernel function:

$$\mathcal{K}(\mathbf{x}\_i, \mathbf{x}) = \langle \boldsymbol{\varrho}(\mathbf{x}\_i), \boldsymbol{\varrho}(\mathbf{x}) \rangle \tag{9}$$

where *ϕ*(*xi*), *ϕ*(*x*) is the dot product of the input vectors in the high-dimensional feature space, *ϕ*(*xi*) and *ϕ*(*x*).

In order to develop the SVM model, the Regression Learner App in the framework of MATLAB® (trial version, R2020a) (MathWorks Inc., Natick, MA, USA) was used. On the Apps tab, in the Machine Learning and Deep Learning group, the Regression Learner was selected. The training and testing datasets were loaded from the MATLAB workspace, and then a 5-fold CV was chosen as a validation scheme to protect against overfitting.

The key to the establishment of an SVM model is to specify an appropriate kernel function. In addition, the hyperparameters, i.e., kernel scale (*γ*), *C*, and *ε* greatly affect the performance of the model, which are typically determined by trial-and-error method. For the system under consideration in this study, four types of kernel functions, including linear, quadratic, cubic, and fine/medium/coarse Gaussian were tested (see Table 5 for the mathematical definition of these kernel functions).


**Table 5.** Mathematical definition of the SVM kernel functions and their kernel scales used in this study.

*x*, *xi* denotes the dot product between the vectors *x* and *xi*; *x* − *xi* denotes the Euclidean distance between the two feature vectors *x* and *xi*; the values assigned to *γ* are the MATLAB default values; *N* is the number of predictor variables (*N* = 3 for the system under consideration).

The value of *ε* was set to 0.001 (the smallest acceptable value in MATLAB R2020a), while the value of parameter *C* was varied in the range of 0.1 to 10,000 (total number of data points = 23) in order to pick the best model with the least validation error (the smaller the validation error, the better the model generalization ability). Each SVM model was trained with the training subset using the SMO algorithm, considering that the model validation error was estimated by means of a 5-fold CV method (the default validation scheme in MATLAB R2020a). The SMO algorithm stopped iterating when the feasibility gap (see Equation (10)) was less than the pre-specified gap tolerance (the gap tolerance was set to 0.001).

$$\text{Fassibility gap } (\Delta) = \frac{f(w) + L(\mathfrak{a}, \mathfrak{a}^\*)}{f(w) + 1} \tag{10}$$

where *J*(*w*) and *L*(*α*, *α*∗) denote the primal objective (Equation (6)) and the dual objective (Equation (S10)), respectively.

Once the algorithm met the convergence criterion, in other words, the model training process was complete, the trained model was fed as input to make a prediction using the testing dataset. The SVM model performance was assessed by means of the two aforementioned statistical indices (*R*<sup>2</sup> and *RMSE*; see Equations (1) and (2)).

#### **3. Results and Discussion**

#### *3.1. Evaluation of k-NN Model*

The optimal *k* value of the *k*-NN model was obtained with the aid of a 5-fold CV approach. The optimal *k* value was defined as the value that allows the *k*-NN model to produce the smallest *RMSE* (and the highest *R*2) on the validation folds in runs 1–5. Figure 5 displays *R*<sup>2</sup> and *RMSE* values of the validation fold, as a function of the *k* value varying from 1 to 10, for the *k*-NN models 1 and 2; *k*-NN model 1 uses Equation (S2) as the weighting function, whereas *k*-NN model 2 uses Equation (S3) as the weighting function. It can be seen from Figure 5 that the optimal *k* value for both *k*-NN models 1 and 2 was found to be 2; however, model 2 performed better with validation *R*<sup>2</sup> and *RMSE* of 0.964 and 1.957, respectively, compared with the *R*<sup>2</sup> value of 0.925 and *RMSE* value 2.969 obtained using model 1. Figure 6 shows the prediction accuracy of *k*-NN model 2 (*k* = 2) against the whole dataset under mesophilic condition (35 ◦C) and thermophilic condition (55 ◦C) as a scatter plot of the measured and the model-predicted values. As seen in Figure 6, the data points on the plot are well-dispersed around the 45◦ line (called 100% correlation line or line 1:1) with *R*<sup>2</sup> and *RMSE* values equal to 0.983 and 1.487, respectively, in the case of mesophilic temperature, and 0.996 and 0.829 in the case of thermophilic temperature, respectively. This implies that only 0.4–1.7% of the total variability in the response cannot be explained by the developed *k*-NN model 2.

**Figure 5.** Validation curves for (**A**) *k*-NN model 1, and (**B**) *k*-NN model 2. *k*-NN models 1 and 2 use Equations (S2) and (S3), respectively, as a weighting function (*R*2: coefficient of determination; *RMSE*: root mean squared error).

**Figure 6.** The measured and predicted CBP using *k*-NN model 2 at 35 ◦C and 55 ◦C. *R*2: coefficient of determination; *RMSE*: root mean squared error; VS: volatile solids; CBP: cumulative biogas production; *k* = 2; weighting function: Equation (S3).

#### *3.2. Evaluation of SVM Model*

A 5-fold CV approach was applied to find an appropriate kernel function for the SVM model, and to optimize the parameters *C* and *ε* by means of SMO algorithm. Figure 7 illustrates the variation in validation *RMSE* as a function of the type of kernel function (linear, quadratic, cubic, fine Gaussian, medium Gaussian, and coarse Gaussian), and *C* value. Parameter *C* varied from 0.1 to 10,000, whereas *ε* was set to 0.001. The MATLAB default value was assigned to *γ* (1.0 for all the linear, quadratic, and cubic functions; 0.43 for the fine Gaussian, 1.73 for the medium Gaussian, and 6.93 for the coarse Gaussian function). As seen in Figure 7, among the different kernel functions that were fitted to the training subset, the coarse Gaussian kernel function yielded the least validation error (*RMSE* equals 0.932), which was obtained at a *C* value equal to 1200. The detailed specifications of the trained course Gaussian-based SVM model are tabulated in Table 6.

**Table 6.** Detailed specifications of the best trained SVM model a.


Symbols: *b*: bias; *w*: weight; *C*: box constraint; *ε*: deviation from the target output value; *γ*: kernel scale; *nsv*: number of support vectors, *α* = *α<sup>i</sup>* − *α*<sup>∗</sup> *<sup>i</sup>* where *αi*, and *α*<sup>∗</sup> *<sup>i</sup>* are the Lagrange multipliers associated with the vector *xi* whose elements are *x*<sup>1</sup> (temperature), *x*<sup>2</sup> (C/N), and *x*<sup>3</sup> (RT). Abbreviations: SVM: support vector machine; SMO: sequential minimal optimization; C/N: carbon-to-nitrogen; RT: retention time. Notes: <sup>a</sup> The SVM model constructed based on coarse Gaussian as a kernel function (for a solved example, refer to Section D in the Supplementary Materials). <sup>b</sup> See Equation (10); the SMO algorithm is converged at an iteration at which the feasibility gap is smaller than the gap tolerance (MATLAB R2020a). <sup>c</sup> All 90 data patterns are considered as support vectors (*<sup>α</sup>* <sup>=</sup> 0); see Supplementary Materials (Table S1) for *<sup>α</sup>* values corresponding to the support vectors. <sup>d</sup> The model was implemented in MATLAB R2020a on a Dell laptop with Intel® Core™ i3-2330M CPU @ 2.20 GHz, and 4.00 GB RAM.

The prediction accuracy of the coarse Gaussian-based SVM model against the whole dataset under mesophilic condition (35 ◦C) and thermophilic condition (55 ◦C) is visualized in Figure 8. This figure indicates an excellent agreement between the measured and the model predicted values with *R*<sup>2</sup> and *RMSE* values equal to 0.997 and 0.598 in the case of the mesophilic condition, respectively, and 0.999 and 0.418 in the case of the thermophilic condition, respectively. This indicates that only 0.1–0.3% of the total variability in the response cannot be explained by the developed coarse Gaussian-based SVM model.

**Figure 8.** The measured and predicted CBP using the coarse Gaussian-based SVM model. *ε* = 0.001, *C* = 1200, and *γ* = 6.93 under mesophilic condition (35 ◦C) and thermophilic condition (55 ◦C); *R*2: coefficient of determination; *RMSE*: root mean squared error; VS: volatile solids; CBP: cumulative biogas production.

De Clercq et al. [42] proposed *k*-NN-, SVM-, and random forest-based models to predict biogas production from "Hainan BioCNG", an industrial-scale biogas facility located in the south of China, which is capable of treating daily 750 tons of a wide range of agricultural, municipal and industrial bio-wastes, with a daily maximum production of 30,000 m<sup>3</sup> bio-methane vehicular fuel. Results indicated that the best performance was achieved by the k-NN model, offering a prediction accuracy of 0.86 and 0.85 on the training dataset and testing dataset, respectively. The SVM and random forest models had accuracy in the range of 0.95–0.97 on the training dataset; however, both of these models produced a testing accuracy far lower (about 0.50) than that of the training accuracy, which shows that the SVM and random forest models were noticeably overfitting the training dataset. The authors claimed that one of the possible reasons for the low testing accuracy of the SVM and random forest models was that the dataset used to tune the hyperparameters was too small. Dong and Chen [43] proposed a novel modeling method, which integrated orthogonal experimental design (OED) with SVM, to establish a relationship between the biogas produced from anaerobic digestion of corn stalk (CS) and the pretreatment process parameters, including mass of CS, ultrasonic duration time, alkali pretreatment time, and single-/dual-frequency ultrasound. The anaerobic digester, composed of a 1.0 L bottle with an effective volume of 0.8 L, operated at pH 7–8, a constant temperature of 35 ◦C, and at an initial TS and C/N ratio of 15 g/L and 20:1, respectively. The results of the validation experiment demonstrated that OED-SVM was an efficient method for optimizing the pretreatment process parameters and predicting biogas production from anaerobic digestion of CS. In the study performed by Yang et al. [44], two different models, including SVM and ANFIS were developed to estimate biogas production for anaerobic digestion of fruits, vegetables, and food wastes as a function of temperature, pH, VS, biomass type, reactor volume, HRT, organic loading rate, and reactor/feeding type. Findings showed that the proposed SVM model demonstrated a superior capability of predicting biogas with *RMSE* and *R*<sup>2</sup> of 0.0111 and 0.998 against 0.0683 and 0.946 for ANFIS model. Gao et al. [45] performed a multiple linear regression (MLR) analysis to estimate methane production from anaerobic co-digestion of yellow back fungus spent mushroom and different types of

livestock manures (e.g., chicken, dairy, and pig manures) at a constant temperature of 35 ◦C. The feedstock ratio (spent mushroom-to-manure: 10–90 w/w and TS content (5–15%w)) were considered as the independent variables. From the results, a quadratic polynomial model was found to be a suitable regression model fitting the experimental data, with *R*<sup>2</sup> value greater than 0.95. The author also showed that the Modified Gompertz model could fit the cumulative methane production data with high accuracy (*R*<sup>2</sup> > 0.98). In another study carried out by Kumar et al. [46], two different computational tools, including a feed-forwardbackpropagation neural network (FFBPNN) with logistic function, and response surface methodology (RSM) were used to optimize the performance of an electrochemical-assisted anaerobic digester of 1 L capacity fed with the spent mushroom substrate (i.e., wheat straw-based mushroom left over after cultivation of *Agaricus bisporus* mushroom). Sugar mill wastewater (SMWW), and cow dung were utilized as a supplementary nutrient source and as an inoculum, respectively. The digester temperature (30, 35, and 40 ◦C), direct electrical current (0, 1.5, and 3 V), and SMWW loading (0, 50, and 100% conc.) were taken as the models' input variables, whereas the biogas production was the output of the models. The modeling results demonstrated that the FFBPNN models showed an excellent ability to estimate biogas production with a prediction accuracy of 99.91%, which was slightly better than that obtained by the quadratic model of RSM (99.79%). However, from the perspective of error generated, the FFBPNN model produced a smaller *RMSE* (97.3) compared with that produced by the RSM (117.6).

#### *3.3. Comparison of the Models*

Figure 9 illustrates the measured and predicted values for the cumulative biogas production as a function of RT (1 to 14 days) while different levels of temperature (35 ◦C and 55 ◦C) and C/N ratios (12, 20, 30, and 40) were investigated. It is evident from Figure 9 that the predicted lines (generated using the developed *k*-NN and SVM models) follow the trend of experimental data points most closely.

**Figure 9.** Comparison of measured-predicted CBP using *k*-NN model 2 at (**A**) 35 ◦C and (**B**) 55 ◦C and using the best-trained SVM model at (**C**) 35 ◦C and (**D**) 55 ◦C. CBP: cumulative biogas production.

Performance comparison of the *k*-NN and SVM models is tabulated in Table 7. The results of ANN, ANFIS, and logistic models developed by Najafi and Faizollahzadeh Ardabili [31] are also included in Table 7; two statistical indices (*R*<sup>2</sup> and *RMSE* between the measured and predicted values) were used in order to make the comparison. It can be observed from Table 7 that the total values of *R*<sup>2</sup> for the developed *k*-NN model under mesophilic digestion (35 ◦C) and thermophilic digestion (55 ◦C) were 0.9830 and 0.9957, respectively. These findings indicate that the *k*-NN model performs well in predicting biogas production. In addition to its high predictive performance, the *k*-NN model was straightforward to implement for the problem under consideration because the dataset (composed of 112 observations) and the number of features (i.e., three features) were small. However, it should be noted that in the case of problems that involve several features and a huge dataset, *k*-NN modeling is not a feasible technique because it is computationally expensive in terms of runtime and memory requirement. Furthermore, the *k*-NN algorithm calculates and stores the distance of each observation in the testing dataset from all the observations in the training dataset. The total values of *R*<sup>2</sup> for the SVM model under mesophilic digestion (35 ◦C) and thermophilic digestion (55 ◦C) were 0.9973 and 0.9989, respectively, which are slightly better than those obtained using the *k*-NN model (Table 7).

**Table 7.** Comparison of models developed in this study and those developed by Najafi and Faizollahzadeh Ardabili [31].


*k*-NN: *k*-nearest neighbors; SVM: support vector machine; ANN: artificial neural network; ANFIS: adaptive neuro-fuzzy inference system; *R*2: coefficient of determination; *RMSE*: root mean squared error; C/N: carbon-to-nitrogen.

Comparing the performance of the *k*-NN and SVM models in terms of error produced, the total values of *RMSE* under mesophilic digestion for the SVM model was 0.5980, which is 58.4% smaller than that obtained using the *k*-NN model under the same conditions. In the case of thermophilic digestion, the total value of *RMSE* for the SVM model was 0.4183, which is 49.5% smaller than that obtained using the *k*-NN model under the same conditions. These results imply that the SVM model is a better choice for predicting biogas production. It is worth mentioning that the SVM modeling technique is less computationally demanding than the *k*-NN technique and can effectively handle any complex problems involving many features and a massive dataset with high generalization power. However, SVM is very sensitive to the input hyperparameters, and hence, caution must be taken to properly tune the hyperparameters for any given problem. Parameters that may yield an excellent prediction accuracy for problem A may yield a poor prediction accuracy for problem B.

The total values of *R*<sup>2</sup> and *RMSE* at both mesophilic and thermophilic conditions for the SVM model developed in this study were in the range of 0.9973–0.9989 and 0.4183–0.5980, respectively, which are in agreement with the results of Najafi and Faizollahzadeh Ardabili [31] who developed ANN, ANFIS, and logistic models (*R*<sup>2</sup> = 0.9962–0.9996, *RMSE* = 0.1940–0.7800). Overall, it can be concluded that the SVM can be a useful alternative

tool with the capability of accurately predicting biogas production under both mesophilic and thermophilic conditions.

#### **4. Conclusions**

In this study, two data-driven modeling techniques, including *k*-nearest neighbor (*k*-NN) and support vector machine (SVM), were successfully trained, validated, and tested to estimate biogas production from anaerobic digestion of spent mushroom compost. It is evident from the results that both the developed *k*-NN and SVM models can estimate biogas production-under mesophilic and thermophilic conditions-with high prediction accuracy (*R*<sup>2</sup> = 98.3–99.9%). However, the SVM model generated a smaller error (*RMSE* = 0.418–0.598) than that of the *k*-NN model (0.829–1.437). These findings imply that the SVM model is a versatile yet more effective tool for predicting biogas production during anaerobic digestion.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agriculture12081090/s1, Section A: a brief description of the determination of *k*-NN hyperparameters; Section B: an example of a *q*-fold cross-validation (CV) method; Section C: To derive a linear SVM regression with the use of Lagrangian function and optimal constraints; Section D: A solved example of how to use the developed SVM model in this study; Table S1: *α* values for the support vectors; Figure S1: Schematic illustration of *q*-fold CV approach; Equations (S1)–(S12).

**Author Contributions:** Conceptualization, methodology, software, formal analysis, and writingoriginal draft preparation, R.S.; review and editing, Q.Y. and S.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Prince of Songkla University and the Ministry of Higher Education, Science, Research and Innovation, Thailand, under the Reinventing University Project, grant number REV64061 and the APC was funded by the same grant number REV64061.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors thank the support from the Department of Civil and Environmental Engineering, and Research and Development Office, Prince of Songkla University, Thailand. We also thank the support from Biogas and Biorefinery Laboratory at the Faculty of Engineering, and PSU Energy Systems Research Institute, Prince of Songkla University, Thailand.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Spatiotemporal Change of Heat Stress and Its Impacts on Rice Growth in the Middle and Lower Reaches of the Yangtze River**

**Shuai Zhang 1,2**


**Abstract:** Heat stress will restrict rice yield in the middle and lower reaches of the Yangtze River. An understanding of the meteorological conditions of heat stress of rice production is important for improving the accuracy of the phenology simulation. Based on the observations of phenology and heat stress of rice agrometeorological stations in this region, as well as meteorological observations and future scenarios, this study analyzed the spatiotemporal change of heat stress and its impacts on rice growth in this region from 1990 to 2009. The results showed that the heat stress frequency of early rice increased in this region from 2000 to 2009, and that of late rice and single-season rice decreased. Moreover, rice phenology will advance under heat stress conditions. The spatiotemporal consistency of the observations and the meteorological index of heat stress shows that the change in heat stress is attributed to climate changes and extreme meteorological events. Under future climate scenarios, it is found that the frequency of heat stress will increase, which will have a serious impact on rice production. The results suggest that positive and effective measures should be taken to adapt to climate change for rice production.

**Keywords:** middle and lower reaches of Yangtze River; rice; climate change; heat stress

#### **1. Introduction**

In recent years, the global climate has shown a significant change characterized by warming. The global surface temperature increased by 0.69~1.08 ◦C from 1901 to 2012 [1]. Heat stress caused by climate warming is a risk for global food production for the regions where heat stress happens frequently [2–4], such as East Asia, Southeast Asia, and South Asia.

Rice is one of the most important main food crops in the world [5,6]. China accounts for a quarter of the world's planting area [7,8]. And rice production in China has reached one-third of the world's total rice production [9]. Rice is sensitive to temperature changes [10,11]. Heat stress is the main climate variable which has substantial impact on the rice production [12]. Studies have shown that when the temperature rises by 1 ◦C, the rice production is reduced by 10% [13]. As the frequency of extreme heat events increases, it will lead to an obvious decrease in rice production [14]. The impact of heat stress on rice yield based on a meta-analysis was studied. This analysis found that the mean yield of rice was reduced by 39.6% due to heat stress worldwide [15].

The responses of rice growth to heat stress is different for different developmental stages. The sensitivity of rice growth for heat stress during different rice growing periods was in the order of heading stage and flowering stage > young panicle development stage > filling stage [16]. Heat stress occurring in the reproductive growing period will cause a short pollen germination rate and decreased pollen viability, which result in rice grain spikelet sterility [17]. If it occurs in the young panicle development stage, it will inhibit the differentiation and degeneration of the spikelet. Heat stress during the grain-filling period resulted in faster translocation of photosynthates, it will shorten the length of the grain-filling period, reduce grain weight and imperfect grains. Warming stress from heading to maturity will cause

**Citation:** Zhang, S. Spatiotemporal Change of Heat Stress and Its Impacts on Rice Growth in the Middle and Lower Reaches of the Yangtze River. *Agriculture* **2022**, *12*, 1097. https:// doi.org/10.3390/agriculture12081097

Academic Editor: Josef Eitzinger

Received: 19 June 2022 Accepted: 22 July 2022 Published: 26 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

spikelet sterility [18,19]. In the vegetative growing period of rice, heat stress will limit photosynthesis and reduce leaf area and tiller [20]. In addition, the reproductive growing period of rice is sensitive to heat stress, which leads to a decrease in leaf area, plant height, and harvest index and hinders the development of reproductive organs [21]. During the grain-filling period, the total grain-filling duration of rice is shortened by 21.3–37.1% when heat stress happens [20]. Studies have shown that heat stress (35 ◦C) occurs in the early 72 h of seed development, and the development of endosperm and embryo will be damaged [22], which seriously affects the development of rice.

Heat stress often occurs in the Yangtze River Basin in China [3,23]. This region is an important rice planting areas in China. Because this area belongs to the transition zone of subtropical and warm temperate climate, the regional climate environment is complex, and disastrous weather is prone to occur. Due to global warming, heat, heat waves, and other events, the frequency of heat stress will increase in this region. It is of great significance to study the impact of heat stress on rice growth, which is beneficial to ensure food security in the future in China.

The study aims to explore the spatiotemporal change of heat indices from 1981 to 2009, as proxy of thermal damage to rice production in China by using observations from agrometeorological stations and to project the changes of heat damages under future climate scenarios.

#### **2. Materials and Methods**

#### *2.1. Data*

The study area is the middle and lower reaches of the Yangtze River (Figure 1). The details of the agrometeorological stations in the study area were shown in Table 1. Data of rice heat stress from 1991 to 2009 were obtained. These data were maintained by the China Meteorological Administration (CMA). Daily weather data were obtained from the Chin Meteorological Administration (CMA). The future climate scenarios were constructed from the Regional Integrated Environmental Model (RIEMS) [24–26].

#### **Figure 1.** Agrometeorological stations in study area.


**Table 1.** Information of agrometeorological stations in study area.

#### *2.2. Methods*

The occurrence frequency of heat events is usually used to represent heat stress for crop studies [19]. The occurrence of heat events is defined by using air temperature measurements according to "The National Agrometeorological Disasters Standard" published by Chinese Meteorological Administration [27]. The daily mean and maximum temperature are the key indicators for counting numbers of occurrence of heat events. If a daily mean temperature above 30 ◦C for more than three consecutive days or a daily maximum temperature above 35 ◦C for more than three consecutive days are detected, a heat event is counted. We count numbers of heat events in 1991–2000 and 2000–2009 respectively to explore spatiotemporal change of heat stress. Based on temperature predictions from RIEMS, we also count the occurrence of heat stress events from 2021 to 2040 by using the indices described above. Then the occurrence frequencies in 1981–2000 and 2021–2040 are compared to implicate the possible changes of heat stress in future in study area.

#### **3. Results**

#### *3.1. Spatiotemporal Change of Heat Stress*

The observations of early rice heat stress from 1990 to 2000 and from 2000 to 2009 showed that the heat stress on rice in Hunan occurred in a few stations, and the frequency of occurrence was the largest. Jiangxi Province has the largest number of stations with heat stress. Comparing the two periods of 1990–2000 and 2000–2009, the frequency of heat stress of early rice increased in most stations (Figure 2).

According to the statistics of heat stress in different growing periods of early rice, it can be found that heat stress of early rice mainly occurs from the tillering stage to the maturity stage. Heat stress occurred mainly at the booting stage, milky ripe stage, and maturity stage at the early rice stations in the study area. From 2000 to 2009, the heat stress of early rice stations in this region mainly occurred in the stage from transplanting to milky ripe, which occurred earlier than that in 1990–2000, and the number of occurrences in each stage increased compared with that in 1990–2000 (Figure 3).

**Figure 2.** Occurrence frequency of heat stress to early rice in study area from 1991 to 2000 (**a**) and from 2000 to 2009 (**b**) and change of occurrence frequency (**c**).

**Figure 3.** Occurrence frequency of heat stress in different phenological periods at early rice stations in the study area during 1991–2000 and 2000–2009.

For late rice and single-season rice, the stations with a higher frequency of heat stress were mainly concentrated in Hunan Province. Comparing the two periods of 1990–2000 and 2000–2009, the occurrence frequency of heat stress of late rice and single season decreased at most stations (Figure 4).

**Figure 4.** *Cont*.

**Figure 4.** Occurrence frequency of heat stress to single-season rice and late rice in the study area from 1991 to 2000 (**a**) and from 2000 to 2009 (**b**) and change of occurrence frequency (**c**).

Through the statistics of the heat stress of late rice and single-season rice in different phenology stages, it can be found that the heat stress of late rice and single-season rice in the study area mainly occurred from tillering to maturity from 1990 to 2000. From 2000 to 2009, the heat stress of late rice and single-season rice in the middle and lower reaches of the Yangtze River mainly occurred in the stage from transplanting to milky ripe, which was earlier than that in 1990–2000 (Figure 5).

**Figure 5.** Occurrence frequency of heat stress in different phenological periods at late rice and single-season rice stations in study area during 1991–2000 and 2000–2009.

#### *3.2. Simulation of Heat Stress*

Due to the limited observations of national agrometeorological stations, it is difficult to investigate the occurrence of rice heat stress in the study area. Therefore, based on the standard of rice heat stress and meteorological data issued by the China Meteorological Administration, we counted the occurrence of disasters in the corresponding period (Figure 6).

**Figure 6.** *Cont*.

**Figure 6.** *Cont*.

From 1991 to 2000, the frequency of heat stress events was less than 20 in the northern of the study area, but more than 20 in the southern part. From 2000 to 2009, the occurrence frequency of heat stress in six provinces in this region was more than 20. The frequency of heat stress events generally shows an increasing trend. The average daily temperature of most stations in summer increased by 0.5 ◦C in the late period compared with the early period. Except for some stations in eastern China, the standard deviation of mean temperature has increased, which means an increase in extreme heat events (Figure 6).

#### *3.3. Response of Heat Stress of Rice to Climate Change*

Under future climate scenarios, we studied the change of rice heat stress in the study area. Based on temperature data of RIEMS, the occurrence frequency and change of heat stress in the study area during 1981–2000 and 2021–2040 were simulated.

From 1981 to 2000, the occurrence of heat stress in this region was generally low. It was 1–3 times in the south and 4–8 times in the north. From 2021 to 2040, the occurrence frequency of heat stress will increase significantly, and the occurrence times of most stations could be more than nine times. Comparing these two time periods, however, the frequency of heat stress increased to varying degrees at all sites from 2021 to 2040 (Figure 7).

**Figure 7.** *Cont*.

**Figure 7.** Simulation result of heat stress in study area during 1981–2000 (**a**) and 2021–2040 (**b**), and change of occurrence frequency (**c**).

From the above results, it can be concluded that the occurrence frequency of heat stress will increase in the future climate scenarios, which will have a serious impact on rice production. The period from June to August is the booting, flowering, and maturing stage of rice, during which the occurrence of extreme heat will seriously weaken the yield of rice. Therefore, the study suggests that the heat at the booting and heading stages of rice can be staggered as far as possible by changing varieties or sowing dates, to improve the yield of rice.

#### **4. Discussion**

In this study, the observations of phenology and heat stress of rice agrometeorological stations and the future meteorological predictions are used to investigate the spatiotemporal change of heat stress and its impact on rice growth. It suggested that heat stress occurrence frequency of early rice increased in the study area, whereas that of late rice and singleseason rice decreased. Moreover, rice phenology will advance under heat stress. The conclusions are consistent with the previous studies. For example, Liu et al., suggests the heat stress of early rice increased the Yangtze River Basin [17]. They also indicate that heat stress also widely occurs in the most parts of Hubei, Anhui, Jiangsu, Zhejiang, and Hunan. For the growth of early rice, the average occurrence frequency of heat stress was 7.81 days in this region [17]. From 1980 to 2009, heat stress in a single rice planting area had an increase of 1.49 ◦C day, where the increases for early and late rice are about 0.35 ◦C

per day and 1.57 ◦C per day, respectively [28]. These previous studies indicate that heat stress has substantial impacts on rice growth. Our study further finds that early rice and the late rice had different responses to heat stress. Many previous studies also indicate that relative humidity is an important factor for impacting rice growth [29]. In actuality, changes of relative humidity are highly dependent on temperature changes. Reducing relative humidity under heat conditions is essential for maintaining spikelet fertility [29]. These results suggest that relative humidity is a non-negligible factor for impacting rice growth. In addition, Rehmani et al., suggest that distributions of heat stress are quite different in this region [19]. Our study also finds the different distributions of occurrence frequency of heat stress events in coastline regions and inland regions (Figure 7). This could be caused by the difference of water vapor in these regions. Therefore, it is important to study the complex impacts of relative humidity temperature changes on rice growth. Our results show an increasing trend in heat stress of rice under future scenarios. These results are similar with previous studies. For example, He et al. [14] indicated the occurrence of heat stress of rice will show an increasing trend, and will have an impact on rice yield from the future predictions during 2016–2100. Their results also found that the heat events in the study area will increase by up to 185% and 319% for the RCP 4.5 scenario and the RCP 8.5 scenario, respectively, which will further affect the development of rice [14]. The flowering and maturity of rice will advance under the future climate scenario in the study area [30]. Heat stress will increase with average annual rates of 0.13% and 0.09%, respectively, during 2021–2050 and 2071–2100. Based on all these results, this region will face more heat stress in the future.

In this study, the risk assessment of heat stress is mainly based on statistical methods, meteorological indicators, and climate data under future scenarios. Nowadays, crop models and machine learning methods have also been widely used in meteorological disaster warnings and crop yield assessments. Due to the global climate warming, the response function of development rate to temperature was used to improve the prediction ability of crop models for heading and maturity of double season rice simulations in this region. The heading and maturity simulation accuracy of the improved model increased by 26.2% and 22.9% on average [31]. The improved crop models are also beneficial for improving the prediction skills of warming risks for rice caused by the heat stress. Based on the machine learning method, Li and others have explored the relationship between extreme climate and crop yield which also could be used to monitor extreme heat stress [32]. Therefore, crop models and machine learning methods could be useful tools for accurate prediction of rice yield in the future.

#### **5. Conclusions**

In this study, the spatiotemporal change of heat stress and its impacts on rice growth in the study area from 1990 to 2009 were investigated. The occurrence frequency of heat stress events generally shows an increasing trend from 1990 to 2009. For early rice, heat stress occurred mainly at the booting stage, milky ripe stage, and maturity stage at the early rice stations from 1990 to 2000, whereas the stress mainly occurred in the stage from transplanting to milky ripe from 2000 to 2009. For late rice and single rice, the heat stress mainly occurred from the tillering stage to the maturity stage from 1990 to 2000, whereas the heat stress mainly occurred in the stage from transplanting to the milky ripe from 2000 to 2009, which is earlier than that in 1990–2000. The heat stress frequency of early rice increased in the study area from 2000 to 2009, whereas that of single rice and late rice decreased. From 2021 to 2040, the occurrence frequency of heat stress could increase significantly. The frequency of most stations could increase by more than nine times when compared with the frequency from 1981 to 2000.

**Funding:** This study was supported by the National Key Research and Development Program of China (Project No. 2016YFD0300201) and the National Science Foundation of China (Project No. 41801078).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the authors.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **References**


## *Article* **A Gas Diffusion Analysis Method for Simulating Surface Nitrous Oxide Emissions in Soil Gas Concentrations Measurement**

**K. M. T. S. Bandara 1,2,\*, Kazuhito Sakai 1,3,\*, Tamotsu Nakandakari 1,3 and Kozue Yuge 1,4**


**Abstract:** The detection of low gas concentrations from the soil surface demands expensive highprecision devices to estimate nitrous oxide (N2O) flux. As the prevalence of N2O concentration in the soil atmosphere is higher than its surface, the present study aimed to simulate N2O surface flux (CF) from soil gas measured in a soil-interred silicone diffusion cell using a low-cost device. The methodological steps included the determination of the diffusion coefficient of silicone membrane (*Dslcn*), the measurement of the temporal variations in the N2O gas in the soil (*Csi*) and on the surface (MF), and the development of a simulation process for predicting CF. Two experiments varying the procedure and periods of soil moisture saturation in each fertilized soil sample were conducted to detect *Csi* and MF. Using *Dslcn* and *Csi*, the variations in the soil gas (*Csoil*) were predicted by solving the diffusion equation using the implicit finite difference analysis method. Similarly, using six soil gas diffusivity models, the CF values were simulated from *Csoil*. For both experiments, statistical tests confirmed the good agreement of CF with MF for soil gas diffusivity models 4 and 5. We suggest that the tested simulation method is appropriate for predicting N2O surface emissions.

**Keywords:** nitrous oxide; soil gas flux; silicone diffusion cell; soil gas diffusivity; passive gas sampling; soil gas diffusion coefficient; soil gas flux simulation

#### **1. Introduction**

Accelerated crop production ensures food security for the global population. Highyielding crop varieties that consume significant amounts of synthetic fertilizers are being cultivated to secure food production needs [1]. Nitrogen-based fertilizers are essential for plants during their growth stages [2]. In addition to synthetic fertilizers, the use of organic manure in the forms of crop residues, animal waste, and biological N-fixing plants is common in plant nutrient supply chains [3,4]. By 2019, the global average atmospheric nitrous oxide (N2O) concentration had increased to 333.2 ppbv [5]. The use of synthetic fertilizers is one of the major causes of these changes [6]. The contribution of N2O to atmospheric warming is 298 times greater than that of carbon dioxide, and it significantly contributes to the depletion of the stratospheric ozone layer [7,8]. Therefore, control of fertilizer application levels is highly needed to limit the N2O emissions from the crop fields. Soil can be considered as a large bioreactor that produces various materials as output [9,10], especially greenhouse gases (GHGs) [11]. Among the GHGs, methane (CH4), carbon dioxide (CO2), and nitrous oxide (N2O), the nitrogen-based fertilizers highly contribute to the N2O production via the microbial denitrification and nitrification processes in the soil [12]. The results of various experiments convey the impact of environmental

**Citation:** Bandara, K.M.T.S.; Sakai, K.; Nakandakari, T.; Yuge, K. A Gas Diffusion Analysis Method for Simulating Surface Nitrous Oxide Emissions in Soil Gas Concentrations Measurement. *Agriculture* **2022**, *12*, 1098. https://doi.org/10.3390/ agriculture12081098

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 19 June 2022 Accepted: 23 July 2022 Published: 26 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

and agronomic factors on the N2O emissions from uplands [13–16]. Globally, collective strategies are being implemented to achieve the sustainable development goals (SDGs) by 2030 [17]. The assessment of N2O emissions is vital for implementing climate-responsive actions, such as control measures to limit GHG emissions from cropping fields, controlling the use of synthetic fertilizers, and testing alternative soil nutrient enrichment methods.

According to Denmead 2008, the main techniques that are used for measuring methane and nitrous oxide fluxes in inland ecosystems and atmospheres are known as enclosurebased and micrometeorological measurements [18]. Butterbach-Bahl et al. have mentioned that the selection of the appropriate technique is mainly based on investment capacity, the demand for the findings, and the type of research question to be analyzed [19]. As mentioned in the work of Maljanen et al. [20], the chamber technique is commonly used to estimate N2O fluxes [21–23]. According to Butterbach-Bahl et al., homogenous fields, including those with trees, slopes, and that are building-free, and the mono-crop condition are required for micrometeorological measurements, and the related measurement instruments should be high-precision fast-response devices. When using this technique, the expense for conducting the onsite measurement in one station has been indicated to be USD 60,000–80,000 for CO2, and this price can be increased by USD 30,000 to USD 40,000 for each additional gas (CH4, N2O, etc.) if needed. In 2014, Chikowo et al. also mentioned the inappropriateness of micrometeorological techniques in small-scale farming systems that insist on intercropping, a soil fertility gradient, and complicated land use patterns [24]. The chamber method can be identified as a system that directly samples gas from the soil surface through the use of permanently installed enclosed chambers. In the chamber method, suites for small-scale cultivated areas are created by establishing small, simple chambers on the soil surface. The chamber method can operate manually or automatically when taking gas samples for gas monitoring [19,25]. Since gas sampling is performed within the chamber, a small area of the field is represented when calculating the emissions, and the installation of the chamber disturbs the natural environmental conditions of the soil gas diffusion process. According to the air circulation schedule, there are two types of chambers called static and dynamic chambers, in which the chamber gas is replaced rapidly while measurements continue and while scheduled ventilation takes place after the gas has been collected in the chamber for a period of time [26,27]. Low-level N2O gas concentrations are more prevalent in the chamber than in the soil atmosphere, since gas diffuses from the soil surface to the atmosphere. Therefore, chamber methods also require high-precision gas-monitoring devices and related accessories, increasing the measurement cost. Conversely, the operating costs of the gas-monitoring process limit the expansion of estimation activities for GHG emissions. It is necessary to develop techniques that use low-cost gas-monitoring devices together with an accurate gas-sampling system to better estimate N2O gas emissions.

Compared to surface emissions, a higher N2O gas concertation level is prevalent in the soil atmosphere [28]. This supports the development of a low-cost gas-monitoring instrument to measure the N2O gas releasing activities in the soil atmosphere if gas samples are collected effectively. The soil gas sampling can be carried out indirectly by inserting a perforated pipe or silicone tube into the soil, and the process is known as the passive gas sampling method. The soil gas then surrounds the tube and equilibria with the inner gas level via a gas diffusion process [29–31]. Because of its molecular structure, silicone has special characteristics that make it suitable for soil gas sampling, such as water repellence, releasability, cold resistance, and high gas diffusivity. In various studies, these silicone tubes have been tested to determine their gas sampling ability [32–34]. The results of previous studies have demonstrated the suitability of silicone tubes [35] and perforated stainless-steel pipes [36] for soil gas sampling in continuous automated sampling methods and manual sampling methods. Nondispersive infrared (NDIR) technology has been successfully applied to develop GHG monitoring devices for CO2, CH4 [37–40], and N2O [28]. In the upper-level detection limits (upper ppm levels) shown by low-cost devices compared to high-precision devices (QCL, CRL laser-based, photoacoustic, and FTIR devices), where the

detection limit shows the sub-ppm level, higher soil gas levels are more prevalent, meaning that low-cost devices can be used for gas measurements with soil gas diffusion cells.

In the process of numerical solving of soil gas flux, the gradient method has gained the most attention. In porous soil, gas flux can be estimated based on the measured gas concentrations at each level of the soil profile and the gas diffusivity of the soil by assuming that gas diffusion is the dominant gas transport mechanism in the soil [41]. Since there are high levels of uncertainty when testing the soil gas diffusivity, some research works used soil gas diffusivity model-based approximation approaches. Since flux is more sensitive when its own calculation steps are used, the use of implicit assumptions was suggested [31,42,43]. Therefore, the soil gas diffusivity model-based implicit finite difference simulation method can be used to effectively estimate the N2O flux from measured soil gas levels.

In summary, N2O flux assessment activities in the agricultural fields need to be accelerated to control GHG emissions. The operational costs induced by high-precision gas-monitoring devices and the related accessories that are necessary for existing gasmonitoring methodologies are major barriers to achieving the targeted gas emission estimations. Mutually, low-cost N2O gas-monitoring devices use passive sampling methods to sample the gases in the soil (where there are higher gas levels than on the surface), and numerical simulation approaches can be used to predict gas fluxes in cost-efficient ways. Therefore, this study demonstrated a simulation approach for estimating N2O flux on the soil surface according to the soil gas concentration measured using a low-cost measuring device connected to a silicone soil gas sampler.

#### **2. Materials and Methods**

The conceptual framework of the experiment was mainly based on the soil N2O flux simulation process of the recorded soil N2O levels determined by a low-cost NDIR device with a diffusion cell (silicone tube) entombed in the soil region, where there are higher gas concentration levels than in the atmosphere (Figure 1). Together with the diffusion coefficient, the inner N2O gas concentration of the diffusion cell changes, along with the temporal variations in the soil gas concentration. From the gas concentrations recorded in the silicone diffusion cell, the predicted values on temporal changes in the soil N2O gas concentration, hereafter *Csoil*, were simulated by solving the diffusion equation (Equation (4)) using the implicit finite difference method.

**Figure 1.** The main simulation steps for predicting the N2O flux from measured soil gas levels in the silicone diffusion cell.

Secondly, the values calculated for N2O flux on the soil surface (CF) were simulated by solving the diffusion equation (implicit finite difference method) using the six soil gas diffusivity models. To compare the simulated values, the observed surface flux of N2O gas (MF) was calculated from monitored soil surface emission levels using a high-resolution gasmonitoring device followed by the chamber method. Accordingly, the overall experiment was based on three main steps: (1) the determination of the diffusion coefficient of the silicone membrane; (2) the measurement of the concurrent N2O gas concentrations in the soil gas and soil surface; and (3) the development of a simulation process to determine the CF values. The simulation process is described below.

#### *2.1. Determination of the Diffusion Coefficient of the Silicone Membrane*

The observed N2O concentration in the silicone diffusion cell was the result of the diffused soil gas travelling through the wall of the silicone tube. Therefore, the N2O concentration in soil gas should be estimated by solving the diffusion equation according to the observed N2O concentration in the silicone tube. The diffusion coefficient of the silicone membrane (*Dslcn*) was determined, since it was required to solve the diffusion equation during the simulation process. In the experimental setup, two silicone tubes (length 59 cm, internal diameter 6 mm, and external diameter 8 mm for each) were placed in an enclosed chamber (8000 mL), and the two edges of each tube were serially connected with an air circulation pump, FTIR device (system 1; Perkin Elmer—Spectrum Two FT-IR spectrometer with a long path gas cell system, 7 m optical path length, volume 500 mL, Infrared Analysis, Inc., Anaheim, USA, model 7.2-V), and a new NDIR device [28] (system 2; volume 320.3 mL) to create two separate diffused gas measurement systems (Figure 2). After adding a known N2O gas (purity level 99.5%) volume (50 mL) into the gas chamber, the accumulated gas level in the silicone tube was measured by each device at 30-min intervals for 12 h. From the data recorded by each device, the gas concentration in the silicone tubes (system1 and system 2) and the balanced gas concentration in the gas chamber were approximately determined at each successive period by applying the difference equations below. The gas concentrations of system 1 and system 2 were calculated by solving Equation (3), and *Dslcn* was optimized using an R package.

$$q\_1 = D\_s \frac{\mathcal{C}\_\mathcal{c} - \mathcal{C}\_1}{\Delta x} A = V\_1 \frac{\mathcal{C}\_1 - \mathcal{C}\_1^\flat}{\Delta t} \tag{1}$$

$$q\_2 = D\_s \frac{\mathcal{C}\_\varepsilon - \mathcal{C}\_2}{\Delta x} A = V\_2 \frac{\mathcal{C}\_2 - \mathcal{C}\_2^\circ}{\Delta t} \tag{2}$$

$$V\_{\varepsilon} \frac{\mathbb{C}\_{\varepsilon} - \mathbb{C}\_{\varepsilon}^{o}}{\Delta t} = -(q\_1 + q\_2) \tag{3}$$

**Figure 2.** Experimental setup for determining the diffusion coefficient of the silicone diffusion cell.

*q*1: rate of the diffused gas volume from container to system 1 (10−<sup>6</sup> cm3 s<sup>−</sup>1); *q*2: rate of the diffused gas volume from container to system 2 (10−<sup>6</sup> cm<sup>3</sup> s−1); *Ds*: gas diffusion rate in silicone membrane (cm2 s<sup>−</sup>1); *Cc*: gas concentration in the container (10−<sup>6</sup> cm3/cm3); *C*1: gas concentration in system 1 (10−<sup>6</sup> cm3/cm3); *C*2: gas concentration in system 2 (10−<sup>6</sup> cm3/cm3); *C<sup>o</sup> <sup>c</sup>* : gas concentration of previous time step in the container (10−<sup>6</sup> cm3/cm3); *Co* <sup>1</sup> gas concentration of previous time step in system 1 (10−<sup>6</sup> cm3/cm3); *<sup>C</sup><sup>o</sup>* 2: gas concentration of previous time step in system 2 (10−<sup>6</sup> cm3/cm3); *V*c: volume of the container (cm3); *V*1: volume of system 1 (cm3); *V*2: volume of system 2 (cm3); Δ*x*: thickness of the silicone tube (cm); Δ*t*: time step (s); A: surface area of the silicone tube (cm2).

#### *2.2. The Experimental Setup for Monitoring the N2O Flux and Soil Atmospheric N2O Gas Concentrations*

To simulate the surface flux (CF) and compare it with the measured flux (MF), temporal variation in the soil and atmospheric N2O concentrations should be monitored. Therefore, a laboratory test was conducted to monitor the gas concentrations in both regions (Figure 3). A soil sample (*shimajiry maji*) weight of 4 kg taken from the research field of the University of the Ryukyus was sieved with a 2 mm sieve (particle density: 2.685 g/cm3, bulk density: 0.958 g/cm3) and was repacked in the testing chamber (c). As an ammonium-based nitrogen source, 3 g of (NH4)2SO4 was mixed with the soil. As shown in Figure 3, the experimental setup mainly consisted of serially interconnected air-drying sections (g,i) in each gas-monitoring device (h,i) to monitor the gas concentrations in the soil and atmospheric regions. The soil chamber (c) had a hole underneath and was connected to the water drainage system (f) to ensure drainage after the completion of the saturation events. There were two water supply systems that were used in the soil saturation events: the top water supply system (a) applied water to the top surface of the soil, and the bottom water supply unit (Mariotte cell) (b) provided a controlled water supply from the bottom side of the soil layer.

**Figure 3.** A diagram of the laboratory experimental setup for monitoring variations in the headspace and soil atmospheric N2O gas concentrations: (**a**) top-side water supply unit, (**b**) bottom-side water supply unit (Mariotte cell), (**c**) soil chamber, (**d**) soil region, (**e**) diffusion cell (silicone tube), (**f**) water drainage system, (**g**,**j**) membrane air-drying system, (**h**) low-cost NDIR gas-monitoring device, (**i**) FTIR device, (**k**) data logger connected to soil moisture and temperature sensors, (**l**) headspace of the soil chamber, and (**m**) ventilation system for chamber head space.

A silicone tube diffusion cell (length: 59 cm, internal diameter: 0.6 cm, and wall thickness: 0.1 mm) (e) was buried in the soil 3.5 cm from the top so that the diffused soil N2O gas could accumulate into it through the wall, and the two ends of the tube were serially connected with a low-cost gas-monitoring device (h). Data were recorded in 30-min intervals. The diffused air was circulated within the system using an air pump (AS ONE-EAP-01). The circulated air was sent through a drying section to avoid water accumulation in the system when it was operating over long periods of time. The drying sections were developed by using membrane-type dryers (Suncep SWG-A01-03) (o) consisting of two eccentric tubes. The outward tube for dry gas circulation was connected to a silicone moisture absorber (p), and the middle tube was for soil gas circulation.

To monitor the headspace gas levels, an FTIR spectrometer (i) was used. The measurement schedule for the headspace gas was every 1 min over a 30-min period in a closed chamber and over 1 h of chamber ventilation (by ventilation system (m)) period. The gas flux was calculated from the slope according to the gas concentration in the last seven minutes during the closed-chamber period. A sensor connected to a data logger (moisture meter embedded with a thermometer) (k) was placed in the soil container, and data were recorded at 30-min intervals.

To test the gas diffusion at different volumetric water levels, the water supply and drainage events were carried out under different conditions for each experiment. Using the different water supply methods, two experiments were conducted in which saturation took place over different periods of time. The first experiment was based on the bottom-side water supply, in which a Mariotte cell (b) was used to control the water supply drop by drop. The saturation and drainage periods were both two days long. During the saturation period, the bottom edge of the Mariotte cell's air intake tube was maintained at the same level as the soil in the chamber. In the second experiment, the top-side water supply unit (a) was used to control the water supply (rate: 0.05 mL S−1) over a two-day saturation period, and the soil was allowed to drain with the help of gravity over the course of three days. The drainage steps in each experiment included open gravity drainage (experiment 1) and flow-controlled gravity drainage (rate: 0.05 mL S−1) (experiment 2). For both experiments, concurrent soil gas concentration was measured by each device according to the abovementioned measurement schedules.

#### *2.3. Simulation Steps for Predicted N2O Surface Flux (CF)*

The two simulation steps were followed as indicated in Figure 1. The first step was to simulate *Csoil* according to the measured gas concentrations in a silicone diffusion cell (*Csi*). The second step was to estimate the values for the predicted N2O flux from the soil surface (CF). In the second step, the parameters of the soil diffusivity models were optimized on a trial-and-error basis until the CF values fit with the measured N2O flux (MF) from the soil surface.

#### 2.3.1. Simulation Steps for Predicting the Soil N2O Gas Level from the Measured Gas in Silicone Diffusion Cell

Theoretically, the soil N2O gas can be estimated from the inverse analysis of the diffusion equation on *Csi*. However, the results of the calculation did not converge satisfactorily. Therefore, we estimated the concentration of the N2O gas in the soil (*Csoil*) according to the following steps: (1) *Csoil* was assumed to be the boundary condition of the soil side for solving the diffusion equation (Equation (4)). In the first equation, the observed *Csi* was used by shifting the assumed *Csoil* a few hours forward. For the first calculation, the observed *Csi* was shifted the same number of hours forward as the assumed *Csoil*. (2) The assumed *Csoil* was used to simulate the gas concentration in the silicone tube (*Cssi*) by solving the diffusion equation according to the implicit finite difference method. (3) The simulated values of the *Cssi time series* were compared with those of the *Csi*. Steps 1 to 4 were repeated until the *Cssi* matched the *Csi* on a trial-and-error basis (Figure 4). An assumption was made about the completion of the gas measurements soon after the gas mixture entered the silicone diffusion cell.

$$\frac{\partial \mathcal{C}\_{si}}{\partial t} = D\_{slcn} \frac{\partial^2 \mathcal{C}\_{si}}{\partial x^2} \tag{4}$$

#### where:

*Csi*: N2O concentration in the silicone diffusion cell (g gas m−<sup>3</sup> diffusion cell air); *Dslcn*: gas diffusion coefficient of the silicone membrane (m3 gas m−<sup>1</sup> silicone s<sup>−</sup>1); Δ*x*: thickness of the silicone tube (m); Δ*t*: time step (s).

**Figure 4.** The repetitive simulation steps for obtaining the predicted N2O gas levels in soil.

2.3.2. Steps for Simulating the Predicted N2O Flux from the Soil Surface

Using the gas diffusion coefficients for soil, the predicted N2O flux from the soil surface (CF) was simulated by solving the diffusion equation (Equation (5)) according to the implicit finite difference method. The boundary conditions of the simulation were considered to be as follows: the soil side used simulated *Csoil* values, and the atmospheric side was set to the N2O gas concentration of 0 ppm. Since the gas diffusion coefficient was the main factor influencing the simulation results, the six models that were used to calculate the gas diffusion coefficient in soil (*Dsoil*) were used to test the best-fitting cumulative values of CF with the observed flux (MF). The cumulative values of MF were calculated from the measured values of the N2O concentration in the upper chamber using an FTIR device.

$$\frac{\partial \mathbb{C}\_{\text{soil}}}{\partial t} = \frac{D\_{\text{soil}}}{\varepsilon} \frac{\partial^2 \mathbb{C}\_{\text{soil}}}{\partial \mathbf{x}^2} \tag{5}$$

where:

*Csoil*: predicted soil N2O gas concentration (g gas m−<sup>3</sup> soil air);

*Dsoil*: N2O gas diffusion coefficient in soil (m3 soil air m−<sup>1</sup> soil s<sup>−</sup>1);

*ε*: air-filled porosity (m<sup>3</sup> soil-air m−<sup>3</sup> soil);

Δ*x*: thickness of the soil layer above the silicone tube (m);

Δ*t*: time step (s).

Six soil gas diffusivity models based on diffusive transport gas movement (soil-typeindependent and soil-water characteristic-based models) were used to determine the gas diffusion coefficients of the soil. The models were defined as the relative diffusion coefficient (diffusion coefficient in the soil air (*Dsoil*)/diffusion coefficient in free air (DO)) as a function of air-filled porosity (*ε*) (m3 soil-air m−<sup>3</sup> soil). According to the first model, the model suggested by Buckingham (1904) and known as a power function of (*ε*), Equation (6) was applied [44]. According to the soil type: sand, loam, or clay, the power function *n* varies from 1.7 to 2.3. In the current experiment, we assigned the value of *n* to be 2. The models suggested by Millington and Quirk (1960, 1961), indicated in Equations (7) and (8), where the ratio of the power function of air-filled porosity (*ε*) to *Sat* is known as soil total porosity (m3 pore space m−<sup>3</sup> soil), were used as models 2 and 3 [45,46]. As large tortuosity develops, the presence of water can affect gas diffusion in soil, and some soil gas diffusivity models have been developed for wet soils. Model 4 (WLR–Marshall model) (Equation (9)), which was developed by Moldrup et al., 2000a, according to the Marshall (1959) model and assumes water-induced linear reduction (WLR), was used [47]. A model based on the gas diffusivities at the soil water potential (−100 cm H2O) and the corresponding air-filled porosities (macroporosity) developed by Moldrup et al., 2000b, was applied as the fifth model (Equation (10)) in the

simulation [48]. The sixth model (Equation (11)), which was based on the gas diffusivity in unsaturated soil as suggested by Moldrup et al., 1996, was applied [49]. The required soil gas diffusivity for the gas flux simulation process was determined by each model using recorded temporal soil moisture data (volumetric water content-VWC).

$$\text{Model 1}\,\frac{D\_{\text{soil}}}{D\_0} = f(\varepsilon)f(\varepsilon) = \varepsilon^n \tag{6}$$

$$\text{Model 2 } f(\varepsilon) = \varepsilon^2 / Sat^{2/3} \tag{7}$$

$$\text{Model 3 } f(\varepsilon) = \varepsilon^{10/3} / \text{Sat}^2 \tag{8}$$

$$\text{Model4 } f(\varepsilon) = \varepsilon^{2.5} / Stt \tag{9}$$

$$\text{Model } \dots \text{f} \,(\varepsilon) = 2 \times \varepsilon^3 + 0.04 \times \varepsilon \tag{10}$$

$$\text{Model 6 } f(\varepsilon) = 0.1 \left\{ 2 \left( \frac{\varepsilon}{Sat} \right)^{\frac{\gamma}{3}} + 0.04 \left( \frac{\varepsilon}{Sat} \right) \right\} \tag{11}$$

From the CF values for each model and the MF values, the cumulative flux variation in the CF and MF (hereinafter *CFcu* and *MFcu*) was calculated for the ease of comparing the simulated and predicted values at each time step. From the simulation output, there were six cumulative flux (*CFcu*) lines for all of the models. Therefore, to test the statistical accuracy of the developed *CFcu* lines with the *MFcu* of soil gas diffusivity models, the symmetric mean absolute percentage error (*SMAPE*) (Equation (12)) was used. The *SMAPE* expresses the averaged percentages of the sums of ratios between the absolute differences in *CFcu* and *MFcu*, and half the sum of the absolute values of the *CFcu* and *MFcu* at each time point (*n*).

$$SMAPE = \frac{100\%}{n} \sum\_{t=1}^{n} \frac{|CFcu\_t - MFcu\_t|}{\frac{(|MFcu\_t| + |CFcu\_t|)}{2}} \tag{12}$$

where:

*SMAPE*: symmetric mean absolute percentage error (%);

*CFcut*: cumulative calculated flux at time *t*;

*MFcut*: cumulative measured flux at time *t*;

*n*: number of measurement points in the time series.

To test the model fitness of the *CFcu* to the *Mfcu*, "*Willmott's agreement index (d)*" was applied. "*d*" is a dimensionless measurement of model accuracy developed by Willmott (1981) to standardize (range between 0 and 1) the measured degree of the model prediction error. A standardized value of 1 indicates a perfect match between *CFcu* and *MFcu*, and 0 that there is no agreement between them. "*d*" is described by Equation (13). To compare the results of both accuracy tests, heatmaps were developed for *SMAPE*, as well as the *d* values of both experiments 1 and 2.

$$d = 1 - \frac{\sum\_{t=1}^{n} \left( \mathbb{C}Fcu\_t - MFcu\_t \right)^2}{\sum\_{t=1}^{n} \left( \left| \mathbb{C}Fcu\_t - \overline{MFcu} \right| + \left| MFcu\_t - \overline{MFcu} \right| \right)^2} \tag{13}$$

where:

*d*: Willmott's agreement index (goodness of fit); *CFcut*: cumulative calculated flux at time *t*; *MFcut*: cumulative measured flux at time *t*; *n*: number of measurement time points *t.*

#### **3. Results and Discussion**

#### *3.1. Diffusion Coefficient of the Silicone Membrane*

The results of the gas diffusion test and simulation process are shown in Figure 5. Depending on the variations in the concentration gradient between the gas chamber and silicone tubes, the graphs of the observed N2O gas accumulation rate in the silicone tubes

changed from being steep to moderate. Because of the different internal volumes of the two gas-monitoring devices (system 1: 500 mL and system 2: 320.3 mL), the steepness of the two curves demonstrates steeper N2O gas accumulation in system 1 than in system 2. In the simulation output, the temporal variation in calculated gas concentrations in both systems match with the measured values. For the N2O gas, the diffusion coefficient of the silicone membrane (*Dslcn*) was 1.1 × <sup>10</sup>−<sup>8</sup> cm2/s according to the steps resolved in the difference equations (Equation (3)).

**Figure 5.** Temporal variations in measured and calculated N2O gas concentrations in the silicone tubes of systems 1 and 2.

#### *3.2. Results of the Variations in N2O Gas Concentration in the Silicone diffusion cell and N2O Gas flux of the Headspace of Experiment 1 and 2*

Considering experiments 1 and 2, the variations in the gas flux from soil surface and the gas concentrations in the silicone diffusion cell show similar patterns during the irrigation and drainage events (Figures 6 and 7).

During the saturation events, higher gas production was observed in both experiments than during the drainage period. As discussed in previous studies [50–53], during saturation events, microbes produce more N2O gas via nitrification and denitrification processes, since the soil moisture regulates the oxygen availability in the soil pores. Zheng et al. [54] also demonstrated a similar N2O gas variation pattern in in situ measurements testing the impacts of soil moisture on gas emissions on crop land. However, in this study, during the drainage period, considerably elevated gas levels were observed for a short period of time when the water filling the space in the soil pores was replaced when the soil gas changed its diffusion direction temporarily to be downwards. This condition, which was determined to be a function of soil gas diffusion under low water-filled pore spaces, disappeared soon after the drainage completion and was clearly shown in both experiments. In experiment 1, the drainage and saturation periods were both two days long, and water was allowed to drain with the help of gravity (faster draining process than experiment 2). A very sharp elevated gas level was observed at the beginning of the drainage period compared to that in experiment 2, where the drainage process was carried out at a lower rate. The graphs showing the volumetric water content in Figures 6 and 7 also demonstrate the two drainage flow rates in experiments 1 and 2.

In both experiments, at peak levels, the variations in the N2O gas concentration levels that were measured in the silicone diffusion cell were approximately 10 times higher than they were in the measurements for the headspace concentration. This characteristic explains the possibility of using low-cost and less precise gas-monitoring devices to monitor soil gas levels compared with the costly surface chamber methods [25] associated with high-precision gas measurement events for determining trace gas levels. During the two experiments, the average soil temperatures were 22.6 ± 0.41 ◦C and 26.52 ± 1.01 ◦C for experiments 1 and 2, respectively.

**Figure 6.** Temporal variations in the N2O gas concentrations in the silicone diffusion cell and N2O gas flux from the headspace, soil temperature, and moisture levels in experiment 1.

**Figure 7.** Temporal variations in the N2O gas concentrations in the silicone diffusion cell and N2O gas flux from the headspace, soil temperature, and moisture levels in experiment 2.

#### *3.3. Results of the Simulation Steps of the Soil N2O Gas Level Prediction*

For both experiments 1 and 2, the series of *Csoil* were shifted to an earlier position (time lag) in the time series compared to the *Csi* series (Figures 8 and 9). Because of the diffusion coefficient of the silicone diffusion cell, theoretically, the simulated values of soil N2O should appear earlier on in the time series than the time points of the corresponding gas levels of *Csi*, and this time lag confirms the condition. The time lag was the result of the repeated simulation steps that were conducted to achieve the overlapping *Cssi* and *Csi* series. Since *Cssi* is simulated from *Csoil*, the accuracy of the simulation process is confirmed by the clearly overlapping *Csi* series on the *Cssi* series in both experiments. In the case of experiment 1, the high *Cssi* peaks that occurred soon after drainage began were smaller than those that were observed for *Csi*. We considered that these peaks could be caused by gas diffusion, as well as by rapid changes in the soil gas pressure under the forcible drainage. Therefore, these peaks could not be calculated by solving the diffusion equation.

**Figure 8.** Temporal variations in measured and simulated N2O gas levels in the silicone diffusion cell with predicted soil gas levels for experiment 1.

**Figure 9.** Temporal variations in measured and simulated N2O gas levels in the silicone diffusion cell with predicted soil gas levels for experiment 2.

According to the graphical descriptions of Figures 10 and 11 for experiments 1 and 2, the highest agreement level between *CFcu* and *MFcu* is shown by soil gas diffusivity models 4 and 5. The results (Table 1) of the accuracy tests (*SMAPE:* symmetric mean absolute percentage error (%), *d:* Willmott's agreement index) for model fitting with *MFcu* also confirm the output of said graphical explanation. Among the tested characteristic-based soil gas diffusivity models for soil water, models 4 and 5 demonstrate lower *SMAPE* (%) values (experiment 1: 8.18%, 10.18%; experiment 2: 10.73%, 8.02%) and higher *d* values

(experiment 1: 0.9996, 0.9994; experiment 2: 0.9992, 0.9997). The independent models for the different soil types (models 1, 2, and 3) showed higher SMAPE and lower *d* values for both experiments. Therefore, according to the selected soil category, by considering the common simulation approach for predicting soil surface N2O flux from the soil gas levels, soil gas diffusivity models 4 and 5 are the most appropriate regardless of whether or not the water supply method is controlled.

**Figure 10.** Experiment 1: temporal progression of the simulated cumulative N2O emissions in each of the soil gas diffusivity models with the observed emissions.

**Figure 11.** Experiment 2: temporal progression of the simulated cumulative N2O emissions for each of the soil gas diffusivity models with the observed emissions.

Considering the results of the two experiments, a simulation approach for predicting N2O soil surface flux according to measured soil N2O gas data was successfully carried out at the laboratory level. However, additional experiments need to be conducted on cultivated land before this method is adopted at the field level.

The tested method, including the arrangement of its hardware arrangement together with the simulation steps, makes it easier to estimate soil N2O emissions once the gas diffusion coefficients of the silicone membrane and the soil have been determined. Compared to closed-chamber methods, the tested method requires less accessories for gas sampling and circulating and uses low-cost measurement devices. As a passive gas sampling unit, the special characteristics (water repellency, structural stability of the wall, and higher gas permeability) of the silicone diffusion cell allow it to be layered under the soil. This keeps

the upper surface free of gas-sampling devices and enables natural soil gas diffusion to the air. Moreover, the gas sampler does not require operating power or intensive maintenance, which allows it to be used for long-term monitoring.


**Table 1.** Results of the accuracy tests conducted for *CFcu* with all models with *MFcu*.

The gas sampling technique of the proposed simulation method has demonstrated the function of fewer gas sampling structures above the ground, which allows the natural gas diffusion process from the soil surface. Reflecting the current greenhouse gas sampling and analyzing techniques, compared to the precise eddy covariance technique [55,56], the chamber enclosure method was greatly applied because of the relative inexpensiveness and simplicity of the measurement mechanism [57,58]. However, the artifacts of established gas sampling structures (chamber artifacts) form errors and higher uncertainty in the ultimate results [59] via the altered diffusion gradient [60] and pressure artifacts [61,62]. The long covering period of the transparent chamber results in higher internal temperatures, and the opaque chamber method was proposed for blocking the impact of the light and tested for CO2 [63] and CH4 [64] gases. For the N2O flux, the DN-based opaque static chamber measurement method has been tested to further improvement of accuracy in the opaque chamber [65]. Therefore, the tested method including soil-entombed silicone diffusion cell can minimize the artifacts and biases made by above-ground gas sampling structures of the chamber methods.

In the simulation steps, there are only two stages of analysis in the implicit finite difference method, which are based on the volumetric water content of the soil and the diffusion coefficients of the silicone membrane and the soil. Moreover, special models and solution methods are not required, and only a simple solution for diffusion equation analysis is enough for estimating the flux. Therefore, said parameters for the soil and gas sampler are found, and the analysis of the replicated data from various sampling points can be continued effortlessly. As a low-cost method, it is more likely that this method will be adopted at the field level, and this method increases the number of measurement points per field, accelerating the estimation of N2O gas emission activities in agricultural fields.

#### **4. Conclusions**

We tested a simulation method for estimating the soil surface flux of N2O gas from measured soil atmospheric gas levels. The methodology consisted of three steps: determining the diffusion coefficient of the silicone membrane, monitoring N2O gas variations in the soil and gas flux from soil surface, and carrying out the simulation steps for predicting the surface emissions from the measured soil gas levels. The diffusion coefficient of the silicone membrane for the N2O gas was determined to be 1.1 × <sup>10</sup>−<sup>8</sup> cm2/s by solving a diffusion equation. In the first stage of simulation, *Dslcn* was used to predict the variation

in the soil N2O gas level according to the measured N2O gas concentrations in a silicone diffusion cell by solving the gas diffusion equation via the implicit finite difference analysis method. In the second stage, using soil gas diffusion coefficients from six soil gas diffusivity models, the N2O gas flux from the soil surface emissions was predicted from the predicted soil gas levels. At the laboratory level, we successfully simulated the cumulative values of the predicted soil surface N2O flux and confirmed the good agreement with the measured cumulative flux graphically and statistically in both experiments. From the tested six soil gas diffusivity models, models 4 and 5 demonstrated lower *SMAPE* (%) and higher *d* values for *CFcu* and *Mfcu* in both experiments. The results of the two experiments using varying soil saturation methods and durations in the fertilized soil samples confirmed that the simulation methodology was acceptable for predicting the surface flux from measured soil gas levels. The overlapping *CFcu* and *Mfcu* curves and the results of the statistical tests (*SMAPE* (%) and *d*) demonstrate how expensive conventional N2O flux estimation methods can be replaced with the use of low-cost gas-monitoring devices for soil gas measurements with gas flux simulation steps. Further field-level studies are needed before the simulation method is adopted for use in cultivated cropland.

**Author Contributions:** Conceptualization, methodology, and formal analysis, K.M.T.S.B., K.S., T.N., and K.Y.; investigation and writing—original draft preparation, K.M.T.S.B. and K.S.; writing—review and editing, K.M.T.S.B. and K.S.; supervision, K.S., T.N., and K.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the JSPS KAKENHI Grant-in-Aid for Scientific Research(B), number 21H02307.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data are contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Cropland Expansion Mitigates the Supply and Demand Deficit for Carbon Sequestration Service under Different Scenarios in the Future—The Case of Xinjiang**

**Mingjie Shi 1,2,3, Hongqi Wu 1,2,\*, Pingan Jiang 1,2,\*, Wenjiao Shi 3,4, Mo Zhang 3,4, Lina Zhang 1,2, Haoyu Zhang 5, Xin Fan 6,7, Zhuo Liu 1,2, Kai Zheng 1,2, Tong Dong <sup>8</sup> and Muhammad Fahad Baqa 4,9**


**Abstract:** China's double carbon initiative faces huge challenges, and understanding the carbon sequestration service of terrestrial ecosystems under future interannual regional land use change is important to respond to China's carbon policy effectively. Previous studies have recognized the important impact of land use/land cover (LULC) planning on carbon sequestration in terrestrial ecosystem services (ESs). However, exploring trends in carbon sequestration under sustainable development scenarios that combine economic and ecological development, particularly the mechanisms that balance the supply and demand of carbon sequestration, still requires in-depth exploration in different geographical contexts. In this study, we present the LULC simulation framework from 2000 to 2030 for four different development scenarios in the Xinjiang region, located in an important Belt and Road region, including business as usual (BAU), rapid economic development (RED), ecological land protection (ELP), and sustainable development with both economic and ecological development (SD). Our results suggest that both the supply and demand of carbon stock in Xinjiang will increase in 2025 and 2030, with the demand exceeding the supply. However, our scenario planning mitigates the supply and demand deficit situation for carbon sequestration in the context of future cropland expansion in different scenarios. In summary, our study's findings will enrich the study of carbon sequestration under future scenarios in the Belt and Road region. Xinjiang should pay more attention to the dynamic changes in landscape type structure and its carbon storage supply and demand caused by cultivated land expansion. Among the four scenarios, the spatial difference between carbon storage supply and demand based on the SD scenario is the smallest, which is more in line with the high-quality development of regional ecological security in Xinjiang.

**Keywords:** carbon sequestration; different scenarios; land use; sustainable development; Xinjiang

#### **1. Introduction**

Along with the Chinese government's goal of achieving peak carbon by 2030 and carbon neutrality by 2060, the timely assessment of the terrestrial ecosystem carbon sequestration service has become one of the most important issues in response to the current

**Citation:** Shi, M.; Wu, H.; Jiang, P.; Shi, W.; Zhang, M.; Zhang, L.; Zhang, H.; Fan, X.; Liu, Z.; Zheng, K.; et al. Cropland Expansion Mitigates the Supply and Demand Deficit for Carbon Sequestration Service under Different Scenarios in the Future— The Case of Xinjiang. *Agriculture* **2022**, *12*, 1182. https://doi.org/ 10.3390/agriculture12081182

Academic Editor: María Martínez-Mena

Received: 16 June 2022 Accepted: 4 August 2022 Published: 9 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

carbon neutrality policy [1]. As the paramount indicator of ecosystem carbon stock services, terrestrial ecosystem carbon sequestration is critical to the carbon cycle [2–4]. Land use/land cover (LULC) change is one of the major factors influencing carbon sequestration in terrestrial ecosystems, as land use changes affect the material cycling and energy flow of soils and vegetation carbon sequestration by altering the structure and function of the original ecosystem [5]. Most studies only consider the supply of ecosystem services, ignoring the human demand for ecosystem services [6,7]. Therefore, exploring the coupling of supply and demand in terrestrial ecosystem carbon sequestration is crucial to deepening future human, economic, and social knowledge of carbon source sinks.

The methods currently used to assess carbon sequestration at national and regional scales fall into three broad categories [4,8,9]. The first is the field survey method, which is primarily an area-weighted average method based on soil profiles [3]. However, this type of study may cause some multi-scale variation in results due to differences in soil profile size, location, methods, and sampling periods [10]. The second approach is empirical biogeochemical modeling [11,12]. This approach creates much uncertainty in assessing carbon sequestration, mainly due to differences in the mechanisms or structures of different models [10]. Third are remote sensing methods for calculating net primary production (NPP) which are often used to estimate carbon stocks, but they produce very large errors in some arid and semi-arid regions [13,14]; further, using spatial scales smaller than NPP, typically <1 km resolution, does not provide a true per-pixel NPP output [15]. Currently, the combination of land use and terrestrial ecosystem carbon stock models is widely used in studies to estimate carbon sequestration and their future spatial variability, and the application of such methods is an important trend concerning the development of dynamic carbon stock assessments for the future [12,16]. Among the many models quantifying the carbon sequestration of ecosystem services, machine learning is considered a feasible and reliable method for assessing carbon sequestration, and it has been widely used in carbon stock assessments at national and regional scales to balance overexploitation and environmental protection [8,17]. However, there are still some limitations to the abovementioned research methods. First, they fail to analyze carbon sequestration under different future scenarios, and only assess current carbon sequestration in a single way. Secondly, it has not been possible to explore the coupling between the supply and demand of carbon sequestration in terrestrial ecosystems. Third, they fail to address the deficit in the supply and demand of carbon sinks resulting from the expansion of cropland under different future scenarios. Therefore, it is important to explore the coupling between the supply and demand of carbon sequestration under different future LULC policy scenarios for planning and analyzing the surplus/deficit of carbon sequestration in the context of cropland expansion to provide a balance of supply and demand for a sustainable landscape pattern.

While there is growing recognition of the impacts of rapid LULC change due to urbanization on ecosystem services (ESs), the LULC landscape continues to be transformed in an unsustainable manner [18]. Land management is one of the most important factors influencing land cover, either directly or indirectly, with policy and environmental planning decisions having a significant impact on how land is managed [19]. Moreover, at the landscape level, the current main challenge is to identify alternative best management scenarios for different LULC change scenarios [9]. Numerous studies have shown that the environmental impact could be improved by changing LULC dynamics [20–23]. For example, a study conducted in Hawaii, USA, examined various LULC scenarios, with an increase in the carbon sequestration service of 3458 tons of carbon in each specific scenario [20]. Research in the Willamette Basin of Oregon has shown that different scenarios of LULC can influence the spatial pattern of the carbon sequestration service and that optimized scenarios can increase carbon sequestration in terrestrial ecosystems [21]. Furthermore, a study in Beijing–Tianjin–Hebei, China, planned four different scenarios to explore the maximum area of ESs loss, thus ensuring that the critical ESs are not affected [22]. However, while previous studies have explored carbon sequestration from the perspective of maximizing

economic or ecological benefits [21], there is still a paucity of studies that have examined the targeting of sustainable development goals (SDGs) for assessing carbon sequestration under sustainability scenarios that combine economic and ecological development.

The UN Sustainable Development Goals (SDGs) focus on regional development and ecological security. In the context of the SDGs, it is important to understand regional sustainable development planning and to assess local ecological security [24]. To fill the above research gaps, this paper takes the Xinjiang Uyghur Autonomous Region (hereinafter referred to as Xinjiang) as the study area, because this core area of the Silk Road along the Belt and Road can better reveal the spatial distribution characteristics and evolutionary patterns of mountain ecosystems in a temperate arid zone [25,26]. The study uses the gray multi-objective optimization–patch generation land use simulation (GMOP-PLUS) model to simulate the variation in land use landscape patterns under various scenarios and propose a sustainable development scenario that balances economic and ecological development. The study further applies a random forest model to quantify the carbon sequestration of terrestrial ecosystems in Xinjiang under different scenarios from 2000 to 2030 and to explore the coupling between the supply and demand of carbon sequestration. The main objectives of this study are three-fold: (i) to predict spatial–temporal patterns of land use in Xinjiang from 2020 to 2030 by the PLUS model under the business as usual, rapid economic development, ecological land protection, and sustainable development scenario; (ii) to quantify the spatial and temporal variation characteristics of terrestrial ecosystem carbon sequestration under different scenarios in Xinjiang during 2020–2030 using random forest models; and (iii) to elucidate the relationship between the supply and demand of carbon sequestration in Xinjiang, and explore the difference between the supply and demand of LULC on carbon sequestration under different scenarios.

#### **2. Materials and Methods**

#### *2.1. Study Area*

Xinjiang is located inland in northwestern China, with a geographical location bounded by (73◦40 ~96◦18 E, 34◦25 ~48◦10 N), spanning 2000 km from east to west and 1650 km from north to south, with an area of about 1.66 × 106 km2, this accounting for about one-sixth of China's land area (Figure 1). The average annual temperature in Xinjiang is 10.5 ◦C, and there is ca. 2600 h of sunshine per year. The average annual rainfall is 145.5 mm, and the average annual evaporation is 1000–4500 mm.

**Figure 1.** Digital elevation model of the study area.

As the core region of the overland Silk Road Economic Belt, Xinjiang is an important link for political, economic, and cultural exchanges between China and other Belt and Road countries. The Xinjiang government has historically attached importance to the multiple roles played by ecological and environmental protection, enacting and implementing several master land use plans in conjunction with an ecologically sustainable development agenda. Quantifying green spatial patterns and exploring trends in green spatial change in Xinjiang are essential for assessing and mapping the mismatch between supply and demand for ESs and providing guidance for future landscape and urban planning [25].

#### *2.2. Data and Processing*

The LULC dataset used in the study mainly includes: (1) Five periods of land use data with a spatial resolution of 30 m for 2000, 2005, 2010, 2015, and 2020 from the CAS Data Centre for Resource and Environmental Sciences (http://www.resdc.cn, (accessed on: 15 June 2022)). All these data were combined with field surveys, visual interpretation, and confusion matrix judgment, allowing for the total accuracy of the interpretation to reach 94.3% and the total accuracy of the 25 sub-categories to reach 91.2% [27]. According to the national land use category 1 classification system, there are six types of land: cropland, forest land, grassland, construction land, bare land, and water. (2) The annual average temperature and precipitation data used to discern suitability conditions for different land types were obtained from the CAS Data Centre (http://www.resdc.cn (accessed on: 15 June 2022)). For the latest year of meteorological data, we obtained raster data at a 250 m resolution by spatial interpolating the annual average data for 2020 from meteorological stations. (3) Digital elevation model (DEM) data, used to drive the LULC simulations for natural environmental factors, were obtained from the Geospatial Data Cloud (http://www.gscloud.cn (accessed on: 15 June 2022)) at a spatial resolution of 30 m. Soil type raster data came from the FAO dataset of the Food and Agriculture Organization of the United Nations (https://www.fao.org, (accessed on: 15 June 2022)). (4) Socioeconomic data, mainly containing the spatial distribution of population and gross domestic product (GDP) 1 km gridded data, came from the CAS Data Centre (http://www.resdc.cn (accessed on: 15 June 2022)). Vector datasets for assessing the distance to major roads and the distance to secondary roads came from Open Street Map (http://www.openstreetmap.org (accessed on: 15 June 2022)), and the vector data for river systems came from the National Geographic Information Resource Service (http://www.webmap.cn (accessed on 15 June 2022)). Urban night lighting data were obtained from the China Research Data Service Platform (https: //www.cnrds.com (accessed on: 15 June 2022)). (5) The carbon density data of China's terrestrial ecosystems were taken from papers published between 2004 and 2014 and coupled with relevant experimental data from the same time period to generate a complete, systematic database of China's vegetation and soil organic carbon density [28]. In addition, all raster data were resampled to a spatial resolution of 250 m.

#### *2.3. The GMOP-PLUS Model*

#### 2.3.1. PLUS

To better understand, assess, and predict future land use changes, research scholars have developed numerous land use simulation models. However, such models are usually linear and numerically based and cannot simulate all land use change processes [29]. However, the PLUS model can make use of the rule mining framework of the land expansion analysis strategy (LEAS) to yield a higher simulation accuracy than other models and better portray the landscape patterns of different future scenarios [25,29].

Under the influence of human social activities and regional socio-economic development, both the natural environment and policy factors can promote certain land use. Natural environmental factors include temperature and precipitation, among others. The process by which they drive such changes is complex and relatively stable, and the ensuing change is often small in magnitude over a short period. Policy factors that affect land use changes include GDP and population. In this paper, 12 driving factors affecting land

change are used to reflect the changes of regional ecological environment and provide guidance and reference for the future planning of local land use [9,25,29].

To simulate the patch evolution of different scenarios of land use types, a multi-type random patch seeding mechanism based on threshold descent was used in the PLUS model:

$$OP\_{i,k}^{1,t} = \begin{cases} P\_{i,k}^1 \times (r \times \mu\_k) \times D\_k^t & \text{if } \Omega\_{i,k}^t = 0 \text{ and } r < P\_{i,k}^1\\ P\_{i,k}^1 \times \Omega\_{i,k}^t \times D\_k^t & \text{all others} \end{cases} \tag{1}$$

where *r* is a random value in the range 0–1 and *μ<sup>k</sup>* is the threshold value for generating new land-use patches of land use type *k*. The land use type *k* can be used to generate new land use patches. The number of decision trees is 50, the sampling rate is 0.01, and the number of features used to train the random forest is 12 (i.e., the same as the number of drivers) [29].

#### 2.3.2. Gray Multi-Objective Optimization (GMOP)

GMOP is a dynamic multi-objective planning method that searches for ways to optimize the use of land given a variety of constraints imposed by different scenarios. It also takes into account the uncertainty of those constraints [30]. Accordingly, it is better able to make accurate models of how land use is spread out in space. The goal of this study was to find a sustainable way to use land with GMOP by using the objective optimization functions, constraints, and parameters that have been suggested by other studies [29,31].

#### 2.3.3. GMOP-PLUS

Having been developed from the GM model and gray theory combined with multiple objectives, GMOP can consider the uncertainty of future LULC occurrence and solve the optimization problem of LULC by handling multiple constraints [32]. Previous studies have shown that the GMOP coupled PLUS model can play a comprehensive and decisive role in directing policy concerning the spatial allocation of land use [29,31]. Hence, the sustainable development scenario projected in this paper goes a step further than those used in previous studies [31,32]. In addition, we used Lingo 12.0 software to predict the spatial quantitative changes to the SD scenario in 2025 and 2030.

In our study, the land use structure of the SD scenario is assumed to maximize all three objectives simultaneously (Table 2) [30]. That is, with (1) *maxEd*(*x*) to maximize economic benefits, and (2) *maxEp*(*x*) to maximize ecological benefits, GMOP's optimization objectives are as follows:

$$E\_{d(x)} = \sum\_{i=1}^{n} d\_i \cdot x\_{i\prime} \tag{2}$$

$$E\_{p(\mathbf{x})} = \sum\_{i=1}^{n} p\_i \cdot \mathbf{x}\_i \; . \tag{3}$$

where *Ed*(*x*) and *Ep*(*x*) denote economic and ecological benefits, respectively; *xi* denotes the *i* category of land variable (*i* = 1, 2, ... , 6); and *di* and *pi* are the coefficients of economic and ecological benefits of the land category per unit area, respectively.

**Table 1.** Constraints on the objective function for the 2025 SD scenario (and likewise for 2030).



**Table 2.** Constraints on the objective function for the 2025 SD scenario (and likewise for 2030).

Achieving an optimal land use structure requires maximizing both objectives:

$$\max\{E\_{d(x)}, E\_{p(x)}\} \tag{4}$$

#### *2.4. Scenario Setting and Land Use Requirements*

#### 2.4.1. Scenario Setting

The research can be broadly divided into the following three steps. First, data on the LULC and the various drivers were prepared, and transformation rules for the LULC were developed. Second, spatial optimization of future LULC was carried out based on the PLUS model and Markov chain, and four different development scenarios were planned and simulated. Third, we explored the supply and demand balance relationships of the carbon sequestration service in Xinjiang terrestrial ecosystems under the different scenarios (Figure 2).

**Figure 2.** Science–policy framework linking institutional and ecological information.

Four alternative potential land use change scenarios are presented in this study, namely the business as usual (BAU), rapid economic development (RED), ecological land protection (ELP), and sustainable development (SD) scenarios. The principles and objectives of the design scenarios are as follows:


#### 2.4.2. SD Scenario Setting

This study uses the GMOP-PLUS model to simplify the SD scenario not only to protect, restore, and promote the sustainability of terrestrial ecosystems, but also to account for rapid economic development. We first set up the land use economic value indicators to parameterize the individual land categories in the land use data. Here, *x*<sup>1</sup> = cropland; *x*<sup>2</sup> = forest; *x*<sup>3</sup> = grassland; *x*<sup>4</sup> = water; *x*<sup>5</sup> = construction; and *x*<sup>6</sup> = bare land. The average land economic value (RMB/hm2) of each land category can be obtained from the Xinjiang Government Work Report and the Xinjiang 2020 Statistical Yearbook [30,37], and finally the economic value indicator formula was obtained as follows:

$$E\_{red(x)} = 2.8x\_1 + 0.22x\_2 + 0.16x\_3 + 0.08x\_4 + 85.52x\_5 + 0x\_6 \tag{5}$$

Setting the ecological value index of land use, through the Xinjiang government work report and previous research results [31,37,38], the ecological value per unit area of land use (RMB/hm2) was obtained, and the ecological value index formula was obtained as follows:

$$E\_{elp(x)} = 1.31\mathbf{x}\_1 + 7.83\mathbf{x}\_2 + 2.57\mathbf{x}\_3 + 35.80\mathbf{x}\_4 + 0.0082\mathbf{x}\_5 + 0.016\mathbf{x}\_6\tag{6}$$

To achieve the optimal sustainability scenario, the land use structure needs to maximize both of these indicators so that *Esdgs*(*x*) reaches the optimal ratio:

$$E\_{sdgs(x)} = \max\{E\_{d(x)}, E\_{p(x)}\} = \mathfrak{a}E\_{red(x)} + \beta E\_{elp(x)}\tag{7}$$

The optimal adjustment of the land use structure should be designed according to the actual development of the region with a variety of structural optimization and adjustment options to be considered for positioning Xinjiang's development in the next five years with the simultaneous enhancement of economic and ecological benefits.

#### *2.5. Carbon Sequestration Service Supply and Demand*

#### 2.5.1. Carbon Sequestration Service Supply

In this study, we chose the random forest (RF) model to estimate the spatial and temporal dynamics of carbon sequestration in Xinjiang [17,39]. RF is an ensemble of decision tree predictors that uses bootstrap resampling methods to build decision trees for each sample [39,40]. For the construction of each tree, samples were selected independently; however, the distribution of all decision trees in the forest is the same, which guarantees the robustness of the model. The advantage of random forest is that it can prevent the overfitting of data, and it is favored for its relatively high overall accuracy and Kappa coefficient, interpretability of results, and accuracy of spatial display results for soil carbon sequestration prediction. The RF model is available in the Random Forest R 4.1.2 package. In this study, we divided the carbon sequestration of terrestrial ecosystems into three carbon pools: aboveground biomass carbon pool, underground biomass carbon pool, and soil carbon pool (0–20 cm). RF was used to model these three data parts separately and then add them together.

The carbon density data for this study were obtained from an open access database of the Chinese Academy of Sciences [28,41]. This is a publicly available carbon density dataset that includes 3026 soil samples taken from the soil surface layer across China through 2014. These samples were obtained from 1036 published papers and field survey data. The number of points in Xinjiang is 231, which covers its six major land use types. Thus, this database could provide new insights for future carbon sequestration strategies in Xinjiang. Because the data for dead organic carbon is relatively complex and difficult to observe and obtain, only carbon stored aboveground, belowground, and in the soil was considered in this study [28]. The model was calculated as follows:

$$\mathcal{C}\_{\text{c5}} = \mathcal{C}\_{i-\text{abv}w} + \mathcal{C}\_{i-\text{bclov}} + \mathcal{C}\_{i-\text{soil}} \tag{8}$$

$$\mathbb{C}\_{stocks} = \mathbb{C}\_{0-20} \times AREA\_i \tag{9}$$

where *CCS* is the carbon density; *Ci*−*above* is the carbon density of the aboveground plant biomass, kg/m2; *Ci*−*below* is the carbon density of belowground biomass of plant roots, kg/m2; and *Ci*−*soil* refers to the density of soil organic carbon in the soil layer, kg/m2. *Cstocks* is the total carbon sequestration and *AREAi* denotes the area of different land use types or soil types.

#### 2.5.2. Carbon Sequestration Service Demand

The demand for carbon sequestration service was estimated as the difference between the actual carbon emissions and the allowable carbon dioxide emissions set by local governments, as per Equation (10), consistent with previous research [27]. For spatial mapping, the amount of carbon emissions from industry was split evenly between construction, grassland, woodland, and cropland. The demand for carbon sinks from personal energy was split evenly across construction land:

$$D\_{\mathbb{CS}} = E\_{\text{industry}} + E\_{\text{transporter}} + E\_{\text{View}} \tag{10}$$

where *DCS* is the carbon sequestration demand; *Eindustry*, *Etransportation*, and *Eliving* are the carbon emission data of industry, transportation, and personal energy, respectively; *Eindustry* is the amount of carbon dioxide released by industrial production, whose value comes from the Xinjiang Statistical Yearbook; and *Etransportation* is the carbon emitted by transportation. Each car uses about 1564.9 kg of gasoline per year, and one vehicle in the Xinjiang Uyghur Autonomous Region generates 4.67 tons of carbon per year [42]. The number of vehicles can be found in the Xinjiang Statistical Yearbook; *Eliving* is the carbon emissions caused by each person's energy use. In the Xinjiang Uyghur Autonomous Region, one person is responsible for emitting about 5.84 tons of carbon per year [43]. Based on industrial output, vehicles, and population data from 2000 to 2020 (at 5-year intervals), linear regression was used to calculate industrial output, vehicles, and population in 2025 and 2030.

#### *2.6. LULC Accuracy Verification*

We compared the actual LULC data for 2015 and 2020 in the study area with the LULC data for the same years simulated based on the PLUS model, and then calculated the Kappa coefficient and overall accuracy (OA). The closer these two values are to 1, the higher the accuracy of the simulation; values greater than 0.8 indicate that the statistical accuracy of the model is satisfactory [25,29]. In this study, the Kappa coefficients of the simulated LULC for 2015 and 2020 were 0.931 and 0.905, respectively, and the overall accuracy was 0.964 and 0.949, respectively, indicating a high degree of confidence in the simulation results.

#### **3. Results and Analysis**

#### *3.1. LULC Simulation under Multi-Scenarios*

We applied the PLUS model to simulate the spatial distribution of land use in the Xinjiang region under different scenarios in 2025 and 2030, respectively, and calculated the dynamic rate of land change under four different scenarios for the two periods (Tables 3 and 4). The land use types in the Xinjiang region are dominated by bare land, this accounting for about 60.55% of the total study area. The LULC of the region also shows different trends in future scenarios, with the BAU scenario continuing the trend of urbanization in Xinjiang (Figure 3), with a dynamic land use index of 0.0045 and 0.0043 for construction in 2025 and 2030, respectively; this indicated land use change under this scenario is characterized by a slow, naturally expanding trend of construction. In the RED scenario, Xinjiang's construction expanded further, with land use dynamics of 0.0089 and 0.0145, respectively, in 2025 and 2030, corresponding to about 1.85% and 1.67% of other land types being converted to construction land (Figure 3); this indicates a more pronounced expansion of construction in the 2025–2030 period. Under this scenario, cropland also expands further, with land use dynamics of 0.0087 and 0.0095 in 2025 and 2030, respectively. Under the ELP scenario, the area of forest and grassland increases somewhat as a result of reforestation and ecological engineering policies, with about 7.28% and 0.57% of other land types converted to grassland in 2025 and 2030, respectively (Figure 3). In the SD scenario, we consider both the rapid economic development and the implementation of ecological projects to optimize the economic and ecological benefits. In this case, forest land increases by 554 km<sup>2</sup> and 2089 km<sup>2</sup> in 2025 and 2030, respectively, compared with 2020. There is a similar trend of construction land expansion, with an increase of 413.3 km2 and 609.8 km2 in the 2025 and 2030 SD scenarios, respectively, compared with 2020.


**Table 3.** LULC and its dynamic index K (%) in Xinjiang for each of the 2020–2025 scenarios.

**Table 4.** LULC and its dynamic index K (%) in Xinjiang for each of the 2020–2030 scenarios.


**Figure 3.** Transfer matrix of land use types under different land use scenarios in the Xinjiang region during different periods from 2020 to 2030. Where (**a**) is the land use transfer matrix for the 2020 to 2025 BAU scenario; (**b**) is the land use transfer matrix for the 2020 to 2025 RED scenario; (**c**) is the land use transfer matrix for the 2020 to 2025 ELP scenario; (**d**) is the land use transfer matrix for the 2020 to 2025 SD scenario; (**e**) is the land use transfer matrix for the 2025 BAU to 2030 BAU scenario; (**f**–**h**) and so on.

To explore the spatial and temporal characteristics of different land use types in Xinjiang under four future scenarios, we calculated the area of land use types during 2020–2030. Figure 4 shows the changes in the spatial patterns of cropland, forest land, grassland, and construction land in Xinjiang between 2020 and 2030 under the BAU, RED, ELP, and SD

scenarios. Cropland significantly increased under the BAU, RED, and SD scenarios in 2025, increasing by 3339.2 km2, 3959.3 km2, and 700.4 km2, respectively. These locations were mainly concentrated near the urban expansion zone along the northern slopes of the Tianshan Mountains, the Yili River Valley, the Aksu region, and the urban–rural farming belt in the Hotan region. In contrast, forest area decreased by 132.5 km<sup>2</sup> and 204.1 km2 in 2025 under the BAU and RED scenarios, respectively, mainly in the Altai Mountains, the Yili River Valley, and the valley buffer zone near the Kunlun Mountains. By 2030, grassland under the RED scenario degraded extensively, with a decline of about 11,252 km2, mainly in the Altai Mountains in the north, the Tianshan Mountains in the center, and near the Kunlun Mountains in the south of the study area, probably due to rapid urbanization at the expense of some forest and grassland. In addition, construction land shows an increasing trend in all four scenarios in 2030; the only difference is the magnitude of the increase, with the largest increases evidently occurring under the RED scenario, where the area increased by about 681.3 km2, mainly in the urban agglomeration on the northern slopes of the Tianshan Mountains, the Aksu region, and the Kashgar region.

**Figure 4.** Changes in the spatiotemporal patterns of ecosystem types in each scenario from 2020 to 2030.

#### *3.2. Spatial and Temporal Changes in the Supply of Carbon Sequestration under Different Scenarios*

We used a random forest technique that incorporates environmental factors in our approach to assess changes in the landscape pattern of LULC-induced carbon sequestration service in terrestrial ecosystems in Xinjiang under different scenarios from 2020 to 2030 (Figure 5). The results show a clear spatial and temporal divergence in carbon sequestration under different scenarios. Under the BAU scenario in 2025, carbon sequestration shows a small annual increase compared to 2020 (interval of five years), of about 540 Tg. Under the RED scenario in 2025, carbon sequestration shows a decreasing trend compared to the BAU scenario, with an overall decrease of about 30 Tg, likely due to the continued expansion of construction driven by greater land use, resulting in the production of carbon from terrestrial ecosystems. In the 2025 SD scenario, carbon sequestration increased by another 370 Tg compared to the BAU scenario. This is because the SD scenario combines ecological and economic development, so the increase in forest and grassland areas leads to an increase in total carbon sequestration. In 2030, both the ELP and RED scenarios show an increase in carbon sequestration compared to the BAU scenario, by 20 Tg and 60 Tg, respectively. In the 2030 SD scenario, carbon sequestration increases significantly due to the pronounced profound expansion of forested grassland and the high carbon sequestration service capacity of forested land, making this scenario a carbon sink.

**Figure 5.** Spatial distribution characteristics of carbon sequestration service in each scenario during 2020–2030.

#### *3.3. Analysis of the Supply and Demand for Carbon Sequestration under Different Future Scenarios*

Xinjiang's carbon supply under different scenarios during 2020–2030 can hardly meet the current demand for carbon emissions (Figure 6), and the impact of land use on carbon supplies under different scenarios is also significant. In particular, the carbon supply in Xinjiang changes from 9.26 Pg in 2020 to 14.26 Pg under the SD scenario in 2030, while the carbon demand increases from 147.93 Pg in 2020 to 195.79 Pg, equivalent to an increase of about 32.35%. Considering the land use patterns under the different scenarios, the high-value areas of carbon stock are mainly distributed in the Altai Mountains, Tianshan Mountains, and Yili River Valley in the northern part of the study area due to the spatial distribution of forests and grasslands (Figure 6). Areas with low values of carbon sequestration service are distributed around bare land, construction land, and cultivated areas near river valley plains. Because of the high population density and industrialization of construction land, the carbon demand in this area is high, and the high-value areas of carbon demand are all concentrated around construction areas. In 2025 and 2030, in all the different scenarios for carbon sequestration in Xinjiang, the demand for a carbon sequestration service is exacerbated by the increasing expansion of construction areas, but the SD sustainability scenarios planned for this study can partly mitigate the deficit levels of carbon sequestration supply and demand.

**Figure 6.** Spatial distribution of carbon stock supply and demand in Xinjiang under four different scenarios during 2020–2030.

#### **4. Discussion**

#### *4.1. Analyses of Future Land Use Change under Different Scenarios*

Based on the concept of sustainable development and the perspective of system evolution, this study proposes a land use evaluation framework for regional sustainable development that is oriented towards future development dynamics [44]. This study also attempts to apply global sustainable development goals at the local scale, which can effectively place regional development at the provincial level in the context of global sustainable development assessment and provide a basis for making decisions to help integrate regional development into the process of globalization. In addition, the United Nations Environment Program recognizes the key role of terrestrial ecosystem services in the SDGs, and in our study, we focus on terrestrial ecosystems, which echoes SDG 15.3 to combat desertification and restore degraded land and soil [35]. Land use planning is important for achieving SDG 15.3, and we should work towards a land-degradation-neutral world via better land use scenario planning. In addition, ecosystem services are central to achieving SDG 15.9 and should be integrated into national and local developmental processes [24,35,36].

To ensure the study's accuracy, we simulated the LULC data for 2015 and 2020 using the PLUS model, with Kappa coefficients of 0.931 and 0.905 and overall accuracies of 0.964 and 0.949, respectively, indicating a high degree of confidence in the simulation results. In addition, the PLUS model was used to simulate land use patterns in Xinjiang from 2020 to 2030 under four different scenarios, with a slight magnitude growth trend for cropland and construction land in the BAU scenario and a sharp expansion pattern in the RED scenario, both consistent with the findings of Fu et al. [33] in Xinjiang. In the ELP scenario, the large expansion of forests and grasslands was concentrated in the alpine forest–grassland and Yili River valley regions of the study area, which is also consistent with the findings of Shi et al. in the same valley [25]. We also found that the SD sustainability scenario accounts for the urbanization process while paying more attention to ecological protection, limiting the uncontrolled growth of urban space, and slowing down or even reversing the rising trend of constructed patches in some areas [35].

#### *4.2. Analysis of the Impact of LULC on Carbon*

This study completes the first high-resolution mapping of terrestrial ecosystem carbon sequestration in Xinjiang under different future scenarios. This fine-scale mapping shows that we can combine the contributions of nature and the needs of people [45]. Moreover, the RF approach applied in this study is superior to other methods for estimating carbon sequestration, and our carbon density raster data are spatially continuous rather than using the same carbon density value fixed for each land use type [20,21].

In addition, to ensure the accuracy of the study, we calibrated the output result parameters of Xinjiang carbon sequestration simulated by the RF model with the results of other studies on Xinjiang stocks. Via soil profiling, Yan et al. estimated the soil carbon stock in Xinjiang residing in a 0 to 100 cm depth to be about 19.56 Pg [46]. However, one study did find that the organic carbon in the top 20 cm of soils in Xinjiang accounts for 37.9% of the percentage of organic carbon ina1m deep soil layer [47]. Therefore, Yan et al. estimated the carbon stock in the topsoil layer of Xinjiang to be about 7.41 Pg, whereas this study estimated a higher carbon stock of terrestrial ecosystems in Xinjiang, at about 9.26 Pg. This discrepancy may be because we modeled not only the soil carbon sequestration service but also the aboveground and belowground biomass carbon sequestration, this being a plausible explanation for the relatively high results of our study.

#### *4.3. Analysis of the Supply and Demand of the Carbon Sequestration Service in Different Scenarios*

Studies have shown that LULC is considered a key anthropogenic driver of ecosystem service change at the regional scale [48,49]. Our study used measured datasets of aboveground biomass carbon density, belowground biomass carbon density, and soil carbon density, combined with a random forest model and spatial mapping of raster layers of different future land uses, as factorial environmental variables [17]. In this way, we proposed a new scheme for the spatial simulation of the terrestrial ecosystem carbon sequestration service with the GMOP-PLUS model combined with the random forest method.

According to our findings, the ELP and SD scenarios of this study simulated the implementation of ecological projects, whose primary aim is restoring forests and protecting grasslands, mainly to prevent increased desertification and soil erosion in Xinjiang, and to increase the productivity of vegetation for more carbon sequestration service, objectives closely related to dozens of ecological projects carried out in China in the last half-century [9]. Furthermore, our study shows that the growth of one land use patch comes at the cost of the decline of another land use type. For example, in the RED scenario, the growth of built-upon land comes at the expense of forested grassland [23,31,50]. We expect the supply of carbon storage in Xinjiang to increase by about 5.72% in 2020, but, at the same time, the demand will rise by 21.80%. In 2030, this supply and demand servicing will intensify the deficit. Xinjiang's supply of carbon stock in 2030 will increase by approximately 5.61% compared to 2020, yet the demand for it will increase by 32.35% in the meantime. Still, in the SD scenario, our projections can serve to mitigate some of the carbon sources through the implementation of ecological engineering projects, and the different scenarios are set up to help clarify the relationship between different LULC structures and ESs carbon stock. For example, in the 2025 RED scenario, the increase in carbon sequestration services from cropland expansion is about 12.61 Mg, while in 2030 this trend is reversed and carbon sourcing occurs, with a cumulative net release of 2.42 Mg. Alternatively, in the 2025 SD scenario, cropland expansion is expected to generate a carbon sink of 63.26 Mg, while in 2030 this trend is slowed down, with a projected net carbon sequestration of 11.24 Mg. This result is likely attributed to the expectation that cropland will reach an expansion saturation in 2030 in the scenario simulation setting so that cropland expansion is eventually slowed down. Our findings suggest that different scenarios can help clarify the relationship between different LULC structures and ESs carbon sequestration.

#### *4.4. Limitations and Perspectives*

In this study, future land use patterns under different scenarios were generated through the PLUS model, and carbon stock supply and demand services were assessed for various scenarios. However, there are still some uncertainties and limitations. For example, this study only portrayed four different future scenarios of LULC through policy guidelines; the four alternative scenarios do not represent all possible LULC realities, and more comprehensive scenarios should be explored in subsequent studies. For example, the impact of future climate change on LULC could be considered, among others, to address multi-stakeholder needs for optimal land use policies [25,51].

In addition, with the development of low carbon technologies and the policy direction of the national dual carbon targets, whether future carbon demand will still develop in line with the original demand trend to address the need to achieve China's peak carbon policy by 2030 could potentially impact the results of the carbon stock demand component. Therefore, future studies should plan the LULC scenarios more rationally and truly consider the future carbon stock demand in the context of China's policy. This will help provide better scientific references for future regional decision-making and sustainable development planning.

#### **5. Conclusions**

In this study, we coupled the gray multi-objective optimization (GMOP) and patch generation land use simulation (PLUS) models and proposed a new SD sustainability scenario framework for optimizing the structure of future land use in Xinjiang by using the GMOP-PLUS model. This work also explores the carbon sequestration services of terrestrial ecosystems in key regions from the perspective of land use change, and addresses the disparities arising between the supply and demand of carbon sequestration services in Xinjiang in the future years of 2025 and 2030. The starkest findings to emerge from this study are as follows: (1) the future expansion of arable land in Xinjiang will occur at the expense of some forest and grassland areas, which are particularly prominent in the interlocking zones of river valleys and plains, especially in the Ili Valley, the Altay Mountains, etc.; (2) the supply and demand of carbon stock in Xinjiang will increase in 2025, but the demand is much greater than the supply, and in 2030 this supply and demand imbalance is exacerbated; and (3) Xinjiang, in the context of future cropland expansion, could alleviate the supply and demand deficit situation threatening Xinjiang's carbon stock; the occurrence of this mitigation is most likely under in the SD scenario. Nonetheless, some of the carbon sources can be mitigated by the implementation of ecological engineering in our planned SD scenario, and the analysis of the SD scenario and other scenarios can help to clarify the relationship between different LULC structures and carbon sequestration. Therefore, local governments can increase their efforts to protect ecosystem carbon sequestration services through policies such as returning farmland to forest, reasonable ecological land regulation, and appropriate afforestation activities, in addition to sequestering carbon belowground, while minimizing the loss of ecosystem service functions, to achieve the sustainable development of agroecology in key areas along the Belt and Road.

**Author Contributions:** H.W. and P.J. were responsible for the research design, analysis, and the manuscript's design and its review. M.S. drafted the manuscript and was responsible for data preparation, experiments, and analyses. W.S. was responsible for the research design and reviewing the manuscript. Resources and funding were procured by H.W. and P.J. M.Z., Z.L., K.Z. have performed the data processing work. L.Z., H.Z., X.F. performed the manuscript editing. T.D., M.F.B. performed the manuscript proofreading and retouching. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was financially supported by Major Science and Technology Special Projects in Xinjiang Uygur Autonomous Region, China (Integrated demonstration of high quality, high yield and high efficiency standardized production technology for cotton, No. 2020A01002).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Written informed consent has been obtained from the patient(s) to publish this paper.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** Special thanks go to the College of Resources and Environment, University of Chinese Academy of Sciences, for supporting the implementation of this study. We thank Xiaozhen Wang at the Northwest A&F University for her constructive suggestions. We also thank the anonymous reviewers for their constructive comments on the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Spatial and Temporal Variability of** *ETo* **in Xinjiang Autonomous Region of China during 1957–2017**

**Yanhui Jia 1, Xiaojun Shen 2,3,\*, Ruochen Yi <sup>3</sup> and Ni Song 1,\***

	- <sup>3</sup> College of Water Conservancy Engineering, Tianjin Agricultural University, Tianjin 300392, China
	- **\*** Correspondence: shenxiaojun@tjau.edu.cn (X.S.); songni@caas.cn (N.S.); Tel.: +86-22-2386-8236 (X.S.); +86-373-339-3384 (N.S.)

**Abstract:** This article scientifically studies the direct impact of climate problems on the time transition of reference crop evapotranspiration in the Xinjiang Autonomous Region of China from 1957 to 2017, which is conducive to formulating irrigation scheduling and adaptive capacity countermeasures. The objective of this study is to investigate the impacts of climate change on *ETo* for the cotton growing seasons. The meteorological data were collected from 48 meteorological stations in the region and analyzed using the Mann–Kendall test and linear trend. The results show the following points: (1) the *ETo* decreases from low to high elevations, and with the increase in northern latitude. (2) The annual mean *ETo* and average values of *ETo* during the growing seasons for cotton exhibited two abrupt changes in the period 1957–2017, with the first abrupt change in 1995 to 1999 and the second abrupt change in 2006 to 2011. (3) The *ETo* in Xinjiang of China demonstrates a decreasing trend during 1957–1996; a significant decreasing trend during 1997–2008; and a significant increasing trend during 2009–2017.

**Keywords:** climate change; *ETo*; yield response factor; irrigation water requirement; cotton; Xinjiang

**Citation:** Jia, Y.; Shen, X.; Yi, R.; Song, N. Spatial and Temporal Variability of *ETo* in Xinjiang Autonomous Region of China during 1957–2017. *Agriculture* **2022**, *12*, 1380. https:// doi.org/10.3390/agriculture12091380

Academic Editor: Rabin Bhattarai

Received: 17 July 2022 Accepted: 30 August 2022 Published: 2 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Much of the evidence from the past half century show that the increase in the global average temperature is mainly due to the increase in greenhouse gas emissions [1–3]. Global warming is a significant manifestation of the present observable and directly felt future climate change [4,5]. The period 2011–2015 was the hottest based on the previous records, about 0.57 ◦C higher than the reference temperature for the period of 1961–1990 [6]. Climate change is already affecting many aspects of life for people [7–9] and it will affect crop water demand [10,11], and the spatial and temporal redistribution of water resources in the future [12]. Reference crop water requirement (*ETo*) is an important parameter for the calculation of crop water requirement [13–15]. In order to improve the efficiency of the irrigation water resources, it is important to study the spatiotemporal variation in *ETo* under climate change.

In the past half century, China has produced many studies on regional temperature and rainfall [16,17]. However, there are relatively few studies on Northwest China against the background of continuous climate warming, and these are mainly focused on precipitation [18,19], air temperature [20], and small pan evaporation [21,22]. These studies showed the change feature of precipitation in Xinjiang as it changed from dry to wet in the middle and late 1980s, and the evapotranspiration potential showed a downward trend.

This study investigated the spatial transformation of *ETo* in mountainous areas, oasis and desert areas of Xinjiang from 1957 to 2017. This scientific research plays an important role in revealing the fluctuation regularity of *ETo*, mastering the evolution of drought and improving the utilization rate of irrigation water sources.

<sup>1</sup> Farmland Irrigation Research Institute, Chinese Academy of Agricultural Sciences, Xinxiang 453002, China

#### **2. Materials and Methods**

#### *2.1. Description of the Xinjiang in the Northwest of China*

Xinjiang is located in the most inland core of the Eurasian continent, with a longitude ranging from 73.40◦ E to 96.18◦ E and north latitude ranging from 34.25◦ N to 48.10◦ N and elevation of ranging from −155 m to 8611 m. It has a vast territory, with a total area of about 1.66 million hectares, accounting for more than 17% of China's land area. Tianshan, Altay and other mountains surround the Tarim Basin, Junggar Basin and other wide desert basins. The drought-stricken intermountain basins and humid and cold mountain areas are sensitive to climate issues [23]. Because it is far away from the surrounding deep sea, the arid climate is a very common inland continental climate, which is mainly affected by the inland drought norms, while the temperate monsoon climate in Asia is less harmful [24]. The annual average temperature is about 8 ◦C, and the annual average rainfall is less than 150 mm, slowly decreasing from north to south.

#### *2.2. Data Processing*

Continuous and long-time series of the observed daily maximum/minimum air temperature (*Tmax/Tmin*), relative humidity (*Hr*), wind speed at 10 m height (*U*10) and global solar radiation (*Rs*) during 1957–2017 were collected from typical weather stations in Xinjiang (Figure 1). The FAO Penman–Monteith method used to calculate *ETo* and can be shown in the following equation [25]:

$$ET\_o = \frac{0.408\Delta (R\_{ll} - G) + \gamma \frac{900}{T\_a + 273} \mu\_2 (\mathbf{e}\_5 - \mathbf{e}\_a)}{\Delta + \gamma (1 + 0.34\mu\_2)} \tag{1}$$

**Figure 1.** Xinjiang meteorological observation station is located in Northwest China.

In the above equation, *ETo* is the reference evapotranspiration of crops (mm), *Rn* is the net radioactive material (MJ·m−2·day−1), *<sup>G</sup>* is the thermal diffusion coefficient of the soil layer (MJ·m−2·day<sup>−</sup>1), *<sup>γ</sup>* is the air relative humidity constant (kPa· ◦C<sup>−</sup>1), *es* is the saturated vapor pressure (kPa), *ea* is the specific saturated vapor pressure (kPa), Δ is the slope of the saturated vapor pressure temperature curve (kPa· ◦C<sup>−</sup>1), *Ta* is the daily mean temperature ( ◦C) and *<sup>u</sup>*<sup>2</sup> is the daily mean wind speed at 2 m (m·s<sup>−</sup>1).

The original daily temperature, air relative humidity, sunshine duration and wind speed data information are given by the National Climate Center of the China Meteorological Administration (NCC-CMA). The 48 meteorological stations selected in this study (Figure 1) have been maintained in accordance with the standards of the National Meteorological Administration of China. The standard requires strict quality control of the whole process before releasing the data information, including extreme inspections and regular inspections of duration consistency. For the oases, we chose large, medium and small urban meteorological stations located in densely populated areas to reflect the negative impact of human activities.

#### *2.3. Methods*

#### 2.3.1. Mann–Kendall Method for Nonparametric Test

The nonparametric Mann–Kendall method used in climate trend analyses and gene mutation was created by Mann [26] and was changed by Kendall [27]. The Mann–Kendall method evaluates the development trend in the climate independent variable time series model, and has been widely used in the change trend, because it does not use the unique spread of data information samples [28].

In that way, *Ho* represents the distribution range of random variables, and *H*<sup>1</sup> represents the probability of double transformation. The test statistic *S* is obtained from the following formula:

$$S = \sum\_{i=1}^{n-1} \sum\_{k=i+1}^{n} \text{sgn}(x\_k - x\_i) \tag{2}$$

where *xk* and *xi* are the sequential data values, *n* is the length of the data set, and

$$\text{sgn}(\theta) = \begin{cases} +1, & \theta > 0 \\ 0, & \theta = 0 \\ -1, & \theta < 0 \end{cases} \tag{3}$$

If the sample size exceeds 10, the statistic *S* is basically normal, that is, the statistic is a normal random variable, and its expected value and standard deviation are as follows:

$$Z\_{\mathbb{C}} = \begin{cases} \frac{S - 1}{\sqrt{\text{var}(S)}}, & S > 0; \\ 0, & S = 0; \\ \frac{S + 1}{\sqrt{\text{var}(S)}}, & S < 0. \end{cases} \tag{4}$$

$$E(S) = 0\tag{5}$$

$$var(S) = \left[ n(n-1)(2n+5) \right] / 18 \tag{6}$$

*t* represents the type of all given connections and Σ represents the number of all connections. In the Mann–Kendall test, another very useful technical index is the Kendall straight-line slope, which is the compressive strength with a monotonic development trend, which is calculated by the following formula:

$$\beta = \operatorname{Median}\left(\frac{\mathbf{x}\_i - \mathbf{x}\_j}{i - j}\right), \quad j < i \tag{7}$$

In the above equation, 1 < *j* < *i* < *n*. It indicates an "upward" trend, that is, it increases over time, and a negative value indicates a "downward" trend, that is, it decreases for a long time at any time.

2.3.2. Mann–Kendall's Method of Genetic Variation Test

Under zero assumption with the out trend, the time series of the variables do not change, and the time series can be regarded as *x*1, *x*2, ... ... , *xn*. For each new item, *mi* is calculated as the number of subsequent new items whose value in the coding sequence exceeds *xi*. The test statistical analysis and calculation are as follows:

$$d\_k = \sum\_{i=1}^k r\_k \quad \text{ (\$2 \le k \le N)}\tag{8}$$

$$r\_i = \begin{cases} 1, \ge\_i > x\_j \\ 0, \ge\_i \le x\_j \end{cases} \tag{9}$$

Assuming that the sequence is arbitrary and independent, the variance in the expected values *E*(*dk*) and *dk* can be calculated as follows:

$$\begin{cases} E(d\_k) = k(k-1)/4\\ var(d\_k) = \frac{k(k-1)(2k+S)}{72}(2 \le k \le N) \end{cases} \tag{10}$$

We can, therefore, obtain the statistic *u*(*dk*) with the following equation:

$$
\mu(d\_k) = (d\_k - E(d\_k)) / \sqrt{var(d\_k)}\tag{11}
$$

The terms of the *u*(*dk*) (1 ≤ *k* ≤ *n*) constitute curve *C*1. If the normalized normal probability Pr(|u|) < |u(*dk*)| > a, the null hypothesis of no trend will be rejected at the confidence level a. The coding sequence of annual total precipitation applies the typical 95% confidence level. If this method is applied to the inverse level, it can obtain *u¯* (*dk*), as shown in the following equation:

$$\begin{cases} \overline{u}(d\_i) = -u(d\_i) \\ i = n+1-i \end{cases} \cdots (i, i = 1, 2, 3, \cdots, n) \tag{12}$$

The terms of the *u¯* (*dk*) (1 ≤ *k* ≤ *n*) constitute another curve *C*2. If *C*<sup>1</sup> exceeds the confidence line, it indicates that there is an important upward or downward trend in the coding sequence. If the intersection point of the *C*<sup>1</sup> and *C*<sup>2</sup> is between the two lines, we can conclude that climate gene mutations should be produced in this way [24].

#### 2.3.3. Pre-Whitening Mann–Kendall Method

Pre-whitening [29] was proposed by von Storch (1995), and it was used to reduce the influence of serial correlation on the MK test. The formula is shown below.

$$Y\_t = X\_t - r \times X\_{t-1} \tag{13}$$

where *Xt* represents the original sequential data values, *Yt* represents the sequential data values without autocorrelation and *r* is the autocorrelation coefficient of the sequence data.

#### 2.3.4. Trend-Free Pre-Whitening Mann–Kendall Method

By considering the influence of the dominant trend of the data series on the autocorrelation coefficient estimation, the method of removing the preset white trend is adopted to eliminate the influence of the trend on the autocorrelation coefficient estimation, and the MK test of the data series is more accurate. The procedure is as follows [29]:

$$X\_t' = X\_t - \beta \times t \tag{14}$$

$$Y\_t' = X\_t' - r \times X\_{t-1}' \tag{15}$$

$$Y\_t = Y\_t' + \beta \times t \tag{16}$$

where *β* is the Kendall straight-line slope, which can be calculated by Formula (7), *X <sup>t</sup>* is the detrended series, *Y <sup>t</sup>* is the detrended series without autocorrelation and the *Yt* series preserved the true trend without autocorrelation.

#### 2.3.5. Two-Phase Linear Regression Model

The simple linear regression model with a change point is given as follows [30]:

$$X\_{i} = \begin{cases} \alpha\_{1} + \beta\_{1} \times t\_{i} + \varepsilon\_{1}, & t\_{\min} \le t\_{i} \le t\_{\mathcal{C}}\\ \alpha\_{2} + \beta\_{2} \times t\_{i} + \varepsilon\_{2}, & t\_{\mathcal{C}} \le t\_{i} \le t\_{\text{Max}} \end{cases} \tag{17}$$

where (*Xi*,*ti*) are the observations that correspond to the dependent and the independent variables, *α*1, *α*2, *β*1, *β*<sup>2</sup> are the regression coefficients with the usual interpretations, *ε*1, *ε*<sup>2</sup> are the random error terms for each line of the regression model and the point *tC* is the unknown change point.

*C* (*tC* ≤ *C* < *t*<sup>C</sup> + 1) and the other parameter values can be obtained by fitting, and the fitting formula is as follows:

$$X\_i = \alpha\_1 + \beta\_1 \times t\_i + \beta'(1 - \mathbb{C}) \times INDc(t\_i) + \varepsilon \tag{18}$$

$$
\beta' = \beta\_2 - \beta\_1 \tag{19}
$$

$$INDc(t\_i) = \begin{cases} 0, & t\_i \le \mathcal{C} \\ 1, & t\_i > \mathcal{C} \end{cases} \tag{20}$$

Each regression coefficient is obtained by the least square method, and the sum of the squares of the residuals of the equation (*S*) is obtained; then, the significance test is carried out for the mutation trend. The statistic *U* calculation formula is as follows:

$$
\Omega I = \frac{(S\_0 - S)/3}{S/(n-4)} \tag{21}
$$

In this equation, *S*<sup>0</sup> is the residual sum of squares of the linear regression equation without catastrophe and *S* is the residual sum of squares corresponding to the optimal mutation point. The mutation point was accepted at the significance level of 0.05, otherwise the mutation is not considered obvious. *U* is the F(3, *n* − 4) distribution.

#### 2.3.6. Dominant Analysis

Dominant analysis, also known as the advantage analysis method, was proposed by Budescu [31]. In order to analyze the relative importance of different influencing factors, firstly, it is necessary to carry out regression analysis on the dependent variables by various indicators (influencing factors) and different combinations of these indicators. Calculate the determination coefficient *R*<sup>2</sup> of the regression equation containing these indexes and various combinations of indexes, compare these *R*2, and then analyze the improvement of *R*<sup>2</sup> after adding one index or a combination of indexes into the regression equation. The most improved indicators or combination of indicators are better than others. This method has the following two advantages: (1) it has model independence, that is, the relative importance of the prediction index is kept constant in each sub-model; (2) according to the relative importance of each prediction index, the total prediction variance in the regression model is decomposed and expressed as a percentage. Accordingly, the order of the influence of meteorological elements on *ETo* can be realized.

#### **3. Results and Discussion**

*3.1. Spatial Patterns of Annual Mean ETo*

The annual mean *ETo* during 1957–2017 ranges from 660.94 mm to 1532.54 mm in Xinjiang, the *ETo* ranges from 1532.54 mm to 660.64 mm with a latitude from 36.85◦ N to 45.93◦ N. It ranges from 660.64 mm to 1094.00 mm with a latitude from 43.03◦ N to 48.05◦ N (Figure 2a). The average value of *ETo* for the cotton growing season during 1957–2017 ranges from 1108.15 mm to 519.01 mm with a latitude from 36.85◦ N to 44.96◦ N; the average value of *ETo* ranges 519.01 mm to 870.30 mm, with a latitude from 43.03◦ N to 48.05◦ N (Figure 2a). The average value of *ETo* for the cotton seedling stage during 1957–2017 ranges from 340.67 mm to 153.49 mm in Xinjiang; the average value of *ETo* for the cotton squaring stage ranges from 224.82 mm to 105.15 mm, with a latitude from 36.85 to 48.05 degrees north latitude; the average value of *ETo* for the cotton flowing-boll stage ranges from 431.79 mm to 209.29 mm, with a latitude from 36.85 to 48.05 degrees north latitude; the average value of *ETo* for the cotton boll opening stage ranges from 110.87 mm to 50.92 mm, with a latitude from 36.85 to 48.05 degrees north latitude (Figure 2b). The value of the annual mean *ETo* decreases with the values of the degrees for north latitude (Figure 2a,b); the value of annual mean *ETo* for all years and the growing season of cotton increases with the values of the degree of elevation from 34.5 m to 935.0 m, and decreases with the degree of elevation from 935.0 m to 3090.1 m (Figure 2c,d).

High values of the annual mean *ETo* in Xinjiang are found in deserts and the low values in mountainous areas (Figure 3). The average value of *ETo* is found in different growth periods of the cotton area (Figure 3). The average value of *ETo* in south Xinjiang is greater than that in north Xinjiang (Figure 3).

#### *3.2. ETo Change with Altitude and Elevation*

The relationship between *ETo* and relative height used to reveal the layout of *ETo* conversion space is shown in Figure 2. The results show that *ETo* is inversely proportional to latitude. The annual average *ETo* decreases by 27.86 mm with the increase of 1◦ of the latitude, and the average *ETo* in the cotton wadding growth season decreases by 8.21 mm with the increase of 1◦ of the latitude (Figure 2g,h). The results show that *ETo* has vertical zonality; in other words, with the rise in altitude in Xinjiang, China, *ETo* decreased significantly (Figure 2). The results show that *ETo* is inversely proportional to altitude. The annual average *ETo* decreases by 0.07 mm with the increase in altitude by 1 m, and the average *ETo* in the cotton boll growing season decreases by 0.09 mm with the increase in altitude by 1 m (Figure 2i,j).

**Figure 2.** *Cont*.

**Figure 2.** *ETo* varies with space. (**a**): Annual average *ETo* of Xinjiang; (**b**): the average *ETo* value of the entire cotton growing season in Xinjiang; (**c**): the average *ETo* value of cotton seedling stage in Xinjiang; (**d**): the average *ETo* value of cotton squaring stage in Xinjiang; (**e**): the average *ETo* value of cotton flowing-boll stage in Xinjiang; (**f**): the average *ETo* value of cotton boll opening stage in Xinjiang; (**g**,**h**): relationship between the *ETo* and latitude in Xinjiang; (**i**,**j**): relationship between the *ETo* and elevation in Xinjiang of Northwest China.

#### *3.3. ETo Change with Precipitation*

The relationship between the *ETo* and the precipitation in Xinjiang is based on the longterm (1957–2017) meteorological data series during the cotton growing season (Figure 4). The results show that *ETo* decreased with the increase in precipitation in the growing season for cotton in Xinjiang, the average *ETo* decreases by 1.40 mm with the precipitation increase of 1 mm during the cotton growing season in Xinjiang (a), the average *ETo* decreases by 1.57 mm with the precipitation increase of 1 mm during the cotton growing season in North Xinjiang (b), the average *ETo* decreases by 1.42 mm with the precipitation increase of 1 mm during the cotton growing season in South Xinjiang (c), the average *ETo* decreases by 1.24 mm with the precipitation increase of 1 mm during the cotton growing season in East Xinjiang and (d) the values of the correlation coefficient between rainfall and *ETo* are −0.7245, −0.8241, −0.6065 and −0.5433. Figure 4 shows that the reference crop evapotranspiration rate in northern Xinjiang was highly affected by precipitation during the cotton growing seasons.

**Figure 3.** *Cont*.

**Figure 3.** The change process of the *ETo* in Xinjiang of Northwest China during 1957–2017. (**a**,**b**): Annual average *ETo* over the entire year; (**c**,**d**): average *ETo* value of cotton growing season; (**e**,**f**): average *ETo* value of cotton seedling stage; (**g**,**h**): average *ETo* value of cotton squaring stage; (**i**,**j**): average *ETo* value of cotton flowing-boll stage; (**k**,**l**): average *ETo* value of cotton boll opening stage.

The values of water consumption for cotton were directly affected by the change in *ETo* and precipitation during the cotton growing season under the climate changes; furthermore, the irrigation water demand for cotton in Xinjiang was affected. If one considers the Shihezi region as an example, the value of *Kc* (crop coefficient) is 0.6 for cotton at Shihezi in northern Xinjiang [32], the values of water consumption for cotton were 124.65–554.58 mm during the growing seasons from 1957–2017, and the values of precipitation ranged from 50.5 mm to 235.4 mm, the values of irrigation water requirement for cotton ranged from 234.27 mm to 504.08 mm during the 1957–2017, and the irrigation water requirement decreased by 3.8 mm/10a during the 1950s–1990s, and increased by 1.84 mm/a during the 1990s–2010s.

Against the background of climate change, the contradiction between supply and demand of irrigation water for cotton in Xinjiang has become more serious.

**Figure 4.** The relationship of the *ETo* and the precipitation during the cotton growing seasons in Xinjiang. (**a**): Average value of Xinjiang; (**b**): Average value of North Xinjiang; (**c**): Average value of South Xinjiang; (**d**): Average value of East Xinjiang.

#### *3.4. ETo Change Trend*

Through the autocorrelation test, it was found that almost all the *ETo* series in Xinjiang show strong autocorrelation. Pre-whitening and trend-free pre-whitening are used to pre-process all the *ETo* sequences. The typical treatment results are shown in Table 1. The results show that the two methods can effectively reduce the autocorrelation of the *ETo* series, and the PW method is better than TFPW. However, the trend of the sequences processed by the PW method is seriously reduced. Therefore, the TFPW method is chosen to preprocess the sequences, and then the Mann–Kendall (m–k) nonparametric statistics is carried out. The Mann–Kendall (m–k) nonparametric statistical method showed that *ETo* demonstrated a downward trend in general (*p* < 0.01).

The results of two-phase linear regression showed that the annual average *ETo* in Xinjiang had a downward trend from 1957 to 1996 (Figure 5a), with a decline rate of 33.96 mm/10a; there was a significant growth trend from 1997 to 2008 (Figure 5a), with an annual growth rate of 71.41 mm/10a. From 2009 to 2017, there was an obvious downward trend (Figure 5b), with a decline rate of 66.64 mm/10a. Such growth trends are consistent with global and Chinese climate change [23,33].

**Table 1.** Mann–Kendall value and first-order coefficient of autocorrelation for original and two pre-whitening *ETo* series.


**Figure 5.** Abrupt change in annual mean *ETo* in Xinjiang during 1957–2017. (**a**,**b**): Annual mean of *ETo* in Xinjiang; (**c**,**d**): annual mean of *ETo* in mountains; (**e**,**f**): annual mean of *ETo* in oases; (**g**,**h**): annual mean of *ETo* in deserts.

Figure 5 shows that the annual average *ETo* mutation point occurred in Xinjiang in 1996 and 2008 (Figure 5a,b); for the oasis, it occurred in 1995 and 2008 (Figure 5e,f), followed by the mountain region in 1998 and 2010 (Figure 5c,d), and desert in 1999 and 2009 (Figure 5g,h); this is related to the unique geographical location and climatic condition garden landscapes. Because the mutation analysis shows that all garden landscapes and three specific garden landscapes have relative mutations around 1997 and 2009, we divide 60 years into the following three periods: before 1997, 1997–2009 and after 2009, and carry out a more detailed analysis of each link.

Among the three landscapes here, the mountainous area shows decreasing trends from 1957 to 1998 (Figure 5c), with a downward rate of 7.33 mm/10a, an upward trend from 1999 to 2010 (Figure 5c) with a rate of 45.62 mm/10a, and a downward trend from 2011 to 2017 (Figure 5d), with a downward rate of 64.71 mm/10a; the oases showed a downward trend from 1957 to 1995 (Figure 5e), with a speed of 3.63 mm/10a, an upward trend from 1996 to 2008 (Figure 5e) with a speed of 69.35 mm/10a, and a downward trend from 2009 to 2017 (Figure 5f), with a speed of 39.57 mm/10a; the desert showed a downward trend from 1957 to 1999 (Figure 5g), with a downward rate of 61.11 mm/10a, an upward trend from 1997 to 2009 (Figure 5g) with a rate of 123.99 mm/10a, and a downward trend from 2010 to 2017 (Figure 5h), with a downward rate of 113.05 mm/10a. The reasons for these trends mainly lie in the following two aspects: first, the ecosystem in the desert area has low stability and the mountain ecosystem has high stability; the other reason is that the Xinjiang has experienced rapid population growth, along with the fast expansion of urbanization, industrialization, and tourism. Human activity might have caused the differences among the different landscapes against the background of global warming and regional climate change.

Figure 6 shows that the *ETo* gene mutation point of the cotton growing seasons in Xinjiang occurred in 1997 and 2008 (Figure 6a,b); the mutation point occurred in eastern Xinjiang in 1997 and 2006 (Figure 6g,h), in the south of Xinjiang in 1997 and 2007 (Figure 6e,f), and in the north of Xinjiang in 1997 and 2009 (Figure 6c,d); this is related to the unique geographical location and climatic conditions in different landscapes. Because the mutation analysis shows that all landscapes and three specific landscapes have relative gene mutations around 1997 and 2008, we divide 60 years into the following three periods: before 1997, 1997–2009 and after 2009, and carry out a more detailed analysis of each link.

**Figure 6.** Abrupt change in average values of *ETo* during the growing seasons for cotton in Xinjiang. (**a**,**b**): average value of *ETo* during the growing seasons for cotton in Xinjiang; (**c**,**d**): average value of *ETo* during the growing seasons for cotton in north Xinjiang; (**e**,**f**): average value of *ETo* during the growing seasons for cotton in south Xinjiang; (**g**,**h**): average value of *ETo* during the growing seasons for cotton in east Xinjiang.

The average values of *ETo* during the growing season for cotton in Xinjiang showed a significant downward trend during 1957–1997 (Figure 6a), with a rate of 22.97 mm/10a, a significant upward trend during 1998–2008 (Figure 6a), with a rate of 50.53 mm/10a and a significant downward trend during 2009–2017 (Figure 6b), with a rate of 62.02 mm/10a. The average values of *ETo* during the growing season for cotton in north Xinjiang showed a significant downward trend during 1957–1997 (Figure 6c), with a rate of 19.50 mm/10a, a significant upward trend during 1997–2009 (Figure 6c), with a rate of 27.51 mm/10a and a significant downward trend during 2010–2017 (Figure 6d), with a rate of 30.78 mm/10a. The average values of *ETo* during the growing season for cotton in south Xinjiang showed a significant downward trend during 1957–1997 (Figure 6e), with a rate of 32.95 mm/10a, a significant upward trend during 1998–2007 (Figure 6e), with a rate of 78.01 mm/10a and a downward trend during 2008–2017 (Figure 6f), with a rate of 71.71 mm/10a. The average values of *ETo* during the growing season for cotton in east Xinjiang showed a significant decreasing trend during 1957–1997 (Figure 6c), with a rate of 35.87 mm/10a, a significant increasing trend during 1998–2006 (Figure 6g), with a rate of 81.12 mm/10a and a decreasing trend during 2007–2017 (Figure 6c), with a rate of 31.88 mm/10a.

The cause of the *ETo* change trend can be reduced to two main reasons, the first reason is the increase in irrigation and water conservancy after the founding of the People's Republic of China, which included increasing the irrigated area in Xinjiang and the regional air humidity; the second reason is the planting of windbreaks in agricultural areas that reduced the regional wind speed, finally leading to the continuous decline in the water requirement of reference crops from the 1950s to 1990s. Under the conditions of climate change, the temperature has been increasing since the 1990s, which explains why the values of *ETo* during the cotton growing seasons showed an increasing trend from 1990s to 2010s.

#### *3.5. Abrupt Change in ETo*

Two-phase linear regression analysis was used to analyze the abrupt trend of the *ETo* time series, and the abrupt trend of each characteristic time point passed the significance level test of 0.05 (*F*0.05(3,48) = 2.80, *F*0.05(3,16) = 3.24). The Alar station (81.27◦ E, 40.55◦ N) was selected as an example to demonstrate the abrupt change in the time series of *ETo* during the growing seasons for cotton (Figure 7). The results showed that the mutation occurred in 2001 and 2011. Using this method of detection, the abrupt changes in the annual average *ETo* for the entirety of Xinjiang, southern Xinjiang, eastern Xinjiang, oasis and desert areas show obvious trends. For the average *ETo* of the cotton growing season, there is a clear trend for Xinjiang, northern Xinjiang, southern Xinjiang, eastern Xinjiang and the oasis and desert areas.

**Figure 7.** Abrupt change in average values of *ETo* during the growing seasons for cotton in Arla of Northwest China. (**a**) The first mutation point; (**b**) The second mutation point.

The mutation age detected by the m-k method is shown in Figures 5–7. The results showed that in the past 60 years, the first mutation occurred in 1995–1999, and the second mutation occurred in 2006–2011 from 1957 to 2017. In this abrupt change year, due to climate change and human activities, vegetation cover underwent a great change. In 1987, the climate in Xinjiang changed from warm and dry to warm and humid, and the vegetation coverage changed [23]. In order to more effectively study the interaction between climate change, soil resource cover and human activities, we should understand the reasons for the sudden change in transpiration rate in Xinjiang.

#### **4. Conclusions**

The time specificity of *ETo* in Xinjiang, China, from 1957 to 2017 was revealed by using the M-K nonparametric development trend and mutation detection, and two-phase linear regression model. The conclusions are as follows.

1. The annual average *ETo* of the Xinjiang region demonstrates vertical zonality and longitude and latitude zonality. Due to the rise in altitude, it gradually decreases from high to low, and the *ETo* changes significantly from southern Xinjiang to northern Xinjiang. For different garden landscapes, the low value of *ETo* occurs in desert areas, and the low value consumption occurs in mountainous areas. The large-scale atmospheric circulation, location and altitude constitute complex standards that endanger the spatial distribution of the annual average transpiration rate in Xinjiang.

2. The results show that during the period from 1957 to 2017, *ETo* showed a downward trend. The first mutation occurred in 1995–1999 and the second mutation occurred in 2006–2011. Among the three garden landscapes, *ETo* decreases faster in desert areas, followed by oasis areas and the least in mountainous areas.

3. The results of the m-k analysis and two-phase linear regression show that 48 sites have experienced *ETo* mutations, and the annual average of *ETo* in Xinjiang, China, shows a downward trend from 1957 to 1996. There was also an obvious growth trend from 1997 to 2009 and a significant downward trend in the period from 2008 to 2017.

4. The annual average *ETo* of Xinjiang demonstrates a sudden change, and Xinjiang, southern Xinjiang, eastern Xinjiang and the oases and desert areas show an obvious development trend. In terms of the average *ETo* in the cotton growing season, there is an obvious trend for Xinjiang, northern Xinjiang, southern Xinjiang, eastern Xinjiang, the oases and desert areas. Because of the negative impact of human activities, the land resource cover in this area has changed greatly, so we must further study the relationship between land resource use and cover and *ETo* change.

**Author Contributions:** Conceptualization, Y.J., X.S. and N.S.; methodology, X.S., Y.J. and N.S.; investigation, Y.J., X.S. and N.S.; data curation, Y.J. and N.S.; formal analysis, X.S., R.Y., Y.J. and N.S.; writing—original draft preparation, Y.J., X.S. and R.Y.; writing—editing, N.S. and X.S.; supervision, X.S. and N.S.; project administration, Y.J.; funding acquisition, Y.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was jointly supported by the National Natural Science Foundation of China (51790534), and the China Scholarship Council (No. 201703250034).

**Institutional Review Board Statement:** This study not involving humans or animals, so this research do not applicable.

**Informed Consent Statement:** This study involving humans, not applicable.

**Data Availability Statement:** Data involving meteorological data openly available in a public repository (the National Meteorological Information Center of China Meteorological Administration).

**Acknowledgments:** Thanks to the National Meteorological Information Center of China Meteorological Administration for offering the meteorological data. This study was supported by the Agricultural Science and Technology Innovation Program (ASTIP) of Chinese Academy of Agricultural Sciences. Special thanks are given to the anonymous reviewers for their constructive comments that improved this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Climate Change Affects the Utilization of Light and Heat Resources in Paddy Field on the Songnen Plain, China**

**Ennan Zheng 1,†, Mengting Qin 1,†, Peng Chen 2, Tianyu Xu 1,\* and Zhongxue Zhang 3,\***


**\*** Correspondence: 2021007@hlju.edu.cn (T.X.); zhangzhongxue@163.com (Z.Z.)

† These authors contributed equally to this work.

**Abstract:** Efficient utilization of light and heat resources is an important part of cleaner production. However, exploring the changes in light and heat resources utilization potential in paddy under future climate change is essential to make full use of the potential of rice varieties and ensure highefficient, high-yield, and high-quality rice production, which has been seldom conducted. In our study, a process-based crop model (CERES-Rice) was calibrated and validated based on experiment data from the Songnen Plain of China, and then driven by multiple global climate models (GCMs) from the coupled model inter-comparison project (CMIP6) to predict rice growth period, yield, and light and heat resources utilization efficiency under future climate change conditions. The results indicated that the rice growth period would be shortened, especially in the high emission scenario (SSP585), while rice yield would increase slightly under the low and medium emission scenarios (SSP126 and SSP245), it decreased significantly under the high emission scenario (SSP585) in the long term (the 2080s) relative to the baseline of 2000–2019. The light and temperature resources utilization (ERT), light utilization efficiency (ER), and heat utilization efficiency (HUE) were selected as the light and heat resources utilization evaluation indexes. Compared with the base period, the mean ERT in the 2040s, 2060s, and 2080s were −6.46%, −6.01%, and −6.03% under SSP126, respectively. Under SSP245, the mean ERT were −7.89%, −8.41%, and −8.27%, respectively. Under SSP585, the mean ERT were −6.88%, −13.69%, and −28.84%, respectively. The ER would increase slightly, except for the 2080s under the high emission scenario. Moreover, the HUE would reduce as compared with the base period. The results of the analysis showed that the most significant meteorological factor affecting rice growth was temperature. Furthermore, under future climate conditions, optimizing the sowing date could make full use of climate resources to improve rice yield and light and heat resource utilization indexes, which is of great significance for agricultural cleaner production in the future.

**Keywords:** GCMs; CERES-rice model; climate change; rice yield; light and heat resource utilization

#### **1. Introduction**

Global climate change, which affects agricultural production and human health, is a central issue of constant concern [1]. It has been estimated that global warming will reach 1.5 ◦C in the near-term [2]. Agriculture is very sensitive to climate change and is also one of the industries most affected by climate change [3,4]. The United Nations Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6) on agriculture extends from crop production systems to food supply systems, and evidence of the adverse effects of climate change on crop production is strengthening [2]. Greenhouse gas emissions have a significant impact on climate warming [5–7]. Anthropogenic warming hampers crop yield increase. Moreover, crop yield is decreased with the increase in surface O3 concentrations and CH4 emissions exacerbate these adverse effects [8]. Climate change will increase pressure on food production, especially in vulnerable areas [2]. Therefore,

**Citation:** Zheng, E.; Qin, M.; Chen, P.; Xu, T.; Zhang, Z. Climate Change Affects the Utilization of Light and Heat Resources in Paddy Field on the Songnen Plain, China. *Agriculture* **2022**, *12*, 1648. https://doi.org/ 10.3390/agriculture12101648

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 10 August 2022 Accepted: 4 October 2022 Published: 9 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

<sup>1</sup> School of Hydraulic and Electric Power, Heilongjiang University, Harbin 150080, China

for ensuring global food security, assessing the impacts of future climate change on crop growth is one of the most important issues in the 21st century [9].

Rice is a staple food for the global population, and its production is key to global food security. Stable growth of rice production has been an issue in achieving food security, especially in developing countries [10]. A slight decline in rice productivity will have a significant impact on global food security [11]. The Songnen Plain is an important rice producer in China, with its vast area and large inter-annual variability of heat and precipitation. Climate change changes the conditions of light, temperature, and water during the growth and development of crops, affecting the allocation of their heat, radiation, and water resources [12]. The light and heat resources utilization of rice is a decisive factor affecting its yield, and the light and heat resources utilization in a certain time and space range determines the production potential of the agricultural system [13]. Efficient utilization of local light and heat resources is of great importance to provide full play to the potential of crop varieties and ensure high-efficient, high-yield, and high-quality rice production. However, most of the previous studies only focused on crop phenology and yield changes and rarely in terms of light and heat resources utilization. Therefore, it is necessary to analyze the response of light and heat resource utilization efficiency of rice to future climate change on the Songnen Plain and ensure the rice supply.

At this stage, field experiments cannot simulate climate change, such as temperature, precipitation, and solar radiation very well. The decision support system for agrotechnology transfer (DSSAT) can simulate the process of crop growth and yield formation, soil and crop water balance, nutrient and carbon dynamics, etc. Research and application on DSSAT have focused on yield prediction, crop breeding, land use, water and fertility management, climate change, and other areas. It is one of the most widely used crop modeling systems at present [14]. In the context of climate change, the application of crop growth models to study the impacts of climate conditions on crop production status and yield in historical periods has become quite extensive and mature [15]. The use of global climate models (GCMs) or regional climate models (RCMs) to construct future climate change scenarios and then coupled with crop growth models has developed into an important tool to assess the impacts of future climate change on agricultural production [16–18]. Kang et al. used an enhanced soil and water assessment tool (SWAT) model in combination with five global climate models to assess the potential impacts of climate change on the yield of two major crops (i.e., potato and barley) in Western Canada [18]. Rosenzweig et al. used seven global grid crop models (GGCMs) combined with five global climate models to analyze crop responses to climate change, showing strong negative effects of climate change, particularly at higher levels of warming and lower latitudes [17]. Crop model yield projections derived from averaging multiple GCM ensembles and different shared socioeconomic pathway and representative concentration (SSP) scenarios can provide a more reliable assessment of climate change impacts.

In our study, we used a calibrated and validated crop model, the CERES-Rice model, driven by future climate data from six GCMs for phase six of the coupled model intercomparison project (CMIP6) under the scenario of three different shared socioeconomic pathway and representative concentration pathway (SSP), including SSP126, SSP245, and SSP585, to project impacts of future climate change on rice production on the Songnen Plain of China. The objectives of our study were to (1) evaluate the performance of the CERES-Rice model in simulating the rice growth on the Songnen Plain of China, (2) explore the effects of future climate change on rice phenology, yield, and the light and heat resources utilization efficiency in the study area, and (3) investigate the ability to optimize sowing dates to cope with the effects of climate change on rice growth.

#### **2. Materials and Methods**

#### *2.1. Study Site*

The Songnen Plain is located in the central and western part of Northeast China (Figure 1). We selected the representative site, Qing'an National Irrigation Experimentation Key Station (46◦52 41 N, 127◦30 04 E) on the Songnen Plain for our study. The study site is a typical cold black soil distribution area, that belongs to the cold temperate continental monsoon climate, the annual frost-free period is 128 days, the average annual temperature is 2.5 ◦C, the average annual rainfall is 566 mm, the rainfall is mostly concentrated in July and August, the geographical environment and natural resources conditions in the area are superior, favorable to the growth of crops, and the crop growing period is 156~171 days.

**Figure 1.** Location site.

#### *2.2. Field Experimental Data*

The field trials were conducted for 2 years with the rice cultivar "Suijing No.18". The rice was sown on 17 April, at a planting density of 24 plants/m2, row spacing was 30 cm, the area of the test plot was 100 m2 (10 × 10 m), and each plot was applied with N (110 kg/hm2), P2O5 (45 kg/hm2), and K2O (80 kg/hm2) in the ratio of base fertilizer/tillering fertilizer/heading fertilizer (5:3:2). The three different irrigation methods, namely control irrigation (CI), wet irrigation (WI), and flood irrigation (FI) were designed [19,20]. The observation data included phenology date and yield, which were used to calibrate and validate the CERES-Rice model. The soil parameters of the study site were obtained from the 1:1 million soil data provided by the Nanjing Soil Institute of the Second National Land Survey in the Harmonized World Soil Database (HWSD) and the empirical data automatically generated by the CERES-Rice model, including soil profile characteristics, soil physical and chemical properties, i.e., soil color, hydrology group, bulk density, organic carbon, soil texture of each layer (clay, silt, and stones), soil nitrogen content, pH in water, etc.

#### *2.3. Climate Data*

The historical observed meteorological data of the study site were obtained from the National Meteorological Information Center-China Meteorological Data Network (http: //data.cma.cn/, accessed on 1 March 2022), including four meteorological indicators of daily maximum and minimum temperature (◦C), precipitation (mm), and solar radiation (calculated based on the sunshine hours using the Ångström-Prescott formula, MJ/m2).

Based on the results of the existing study evaluating the CMIP6 model, six GCMs (Table 1) were selected that were more effective in simulating the study site [21]. Compared with CMIP5, CMIP6 improved projections of climate features, such as extreme temperatures and precipitation [22,23]. Future climate data were derived from monthly-scale meteorological data output from multi-modal ensemble averaging (MME) under three shared socioeconomic pathways and typical concentration pathway combination scenarios SSP126, SSP245, and SSP585 of the CMIP6 (https://esgf-node.llnl.gov/projects/cmip6/, accessed on 17 April 2022), including four meteorological indicators: Daily maximum and minimum temperature, precipitation, and solar radiation for three future periods: 2031–2050 (2040s), 2051–2070 (2060s), and 2071–2090 (2080s). The monthly climate projections from GCMs were down-scaled to the study site using an inverse distance-weighted (IDW) interpolation method, then bias correction was conducted by transferring the resulting monthly site data using functions obtained from analyzing the observed and GCM data for the period of 1961–2000. Daily climate variables were downscaled through the WGEN stochastic weather generator based on the spatially down-scaled monthly model [24].

**Table 1.** Information on the six global climate models (GCMs) selected in our study.


Statistical down-scaling technique has been widely used for providing daily climate data to drive crop models in the assessment of the impacts of future climate change on agricultural systems in different countries and regions [25–28]. We used the statistical down-scaling model NWAL-WG provided by Liu and Zuo (2012). The main advantage of this statistical down-scaling approach, especially compared with dynamic down-scaling, is that it can be easily applied to any location where long-term daily historical climate records are available [29].

#### *2.4. Model Simulations*

#### 2.4.1. CERES-Rice Model

The CERES-Rice model used in this study is included in decision support system for agrotechnology transfer (DSSAT) version 4.7.5 [30]. The CERES-Rice model is a processbased crop model driven by daily climate data (daily maximum and minimum temperature, precipitation, solar radiation). It can simulate rice growth, development, leaf area index, and dry matter content. Moreover, it can simulate soil water balance and nitrogen balance. Minimum input data include weather, soil, and crop variety genetic parameters. The CERES-Rice model has been widely tested and applied in many countries, including China [31–33]. In our study, the CERES-Rice model from DSSAT version 4.7.5 (https://get.dssat.net/, accessed on 21 December 2021) was used to simulate rice growth and development.

#### 2.4.2. Model Calibration and Validation

During model calibration and validation, the varietal genetic parameters of the crop were adjusted using the trial-and-error method and the generalized likelihood uncertainty estimation (CLUE) tuning tool that comes with the DSSAT system. The observed dates of anthesis, maturity, and yield were made close to the CERES-Rice model simulation results. The parameters were calibrated using trial data for anthesis, maturity, and yield in 1 year. The parameters were validated using trial data in another year.

#### 2.4.3. Simulation Scenarios

Rice growth was simulated by the CERES-Rice model, the irrigation was set to automatic when required (i.e., when the moisture content was <50% of water capacity at 30 cm depth, the rice was irrigated with 10 mm of water), the harvest time was set at maturity, and other field management methods were the same as in the field experiment. In our study, rice growth was also simulated under different future sowing dates, including 27 March, 3 April, 10 April, 17 April, 24 April, 1 May, and 8 May. The transplanting dates were 35 days after the sowing date.

#### *2.5. Indicator Calculation Method*

2.5.1. Statistical Indices for Model Evaluation

To evaluate the CERES-Rice model performance in simulating rice growth on the Songnen Plain, we used the following three evaluation metrics: (1) Root mean squared error (*RMSE*), (2) normalized root mean squared error (NRMSE), and (3) determination coefficient (*R*2):

$$RMSE = \sqrt{\frac{\sum\_{i=1}^{n} (S\_i - O\_i)^2}{n}} \tag{1}$$

$$\text{NRMSE}(\%) = \frac{RMSE}{\overline{O}} \times 100 \tag{2}$$

$$R^2 = \frac{\sum\_{i=1}^{n} \left( S\_i - \overline{S} \right)^2}{\sum\_{i=1}^{n} \left( O\_i - \overline{O} \right)^2} \tag{3}$$

where *Si* is the simulated value; *Oi* is the observed value; *S* and *O* are the average values of the simulated and observed values, respectively; *n* is the number of samples.

#### 2.5.2. Light and Heat Resources Utilization Evaluation Index

In our study, we computed the following indices to evaluate the light and heat resources utilization efficiency:

(1) Light and temperature resources utilization

$$\mathcal{E}\_{RT} = \frac{Y}{Y\_P} \times 100\% \tag{4}$$

$$
\Upsilon\_P = \Upsilon\_1 \cdot f(t) \tag{5}
$$

$$Y\_1 = \varepsilon (1 - a)(1 - \beta)(1 - \gamma)(1 - \rho)(1 - \omega)f(L)Eq(1 - \lambda)^{-1}(1 - \chi)^{-1}H^{-1}\sum Q \times 10,000\tag{6}$$

$$f(t) = \begin{cases} \begin{array}{l} \mathcal{O}(t < t\_{\min}, t > t\_{\max})\\ \frac{t - t\_{\min}}{t\_{opt} - t\_{\min}}(t\_{\min} \le t < t\_s) \end{array} \\\ \begin{array}{l} \frac{t\_{\max} - t}{t\_{\max} - t\_{opt}}(t\_s \le t \le t\_{\max}) \end{array} \end{array} \tag{7}$$

where E*RT* is the light and temperature resources utilization, *Y* is the crop yield in kg/hm2, *YP* is light and temperature potential productivity in kg/hm2, i.e., the upper limit of yield per unit area per unit time determined by local solar radiation and temperature at optimum conditions. *Y*<sup>1</sup> is photosynthetic potential productivity in kg/hm2, i.e., the upper limit of yield per unit area per unit time determined solely by local solar radiation at optimum conditions. ∑ *Q* is the total solar radiation projected onto the unit area in MJ/m2, H is the dry weight calorific value of the crop and rice taking the value of 16.9 MJ/kg, ε is the ratio of photosynthetically active radiant energy to total radiant energy taking the value of 0.49. *α*, *β*, *γ*, *ρ*, ω, *ϕ*, *λ*, *χ*, f(L), E are photosynthetic production potential parameters; α is the plant population reflectance, taking a value of 0.06, *β* is the plant luxuriant population

light transmission, taking a value of 0.08, γ is the proportion of light above the light saturation point, taking a value of 0.05, *ρ* is the proportion of radiation intercepted by non-photosynthetic organs of the crop, taking a value of 0.10, *ω* is the depletion rate of respiration, taking a value of 0.33, *ϕ* is the quantum efficiency of photosynthesis, taking a value of 0.22, and *λ* is the inorganic nutrient content of the plant body, taking a value of 0.08, *χ* is the water content and takes the value of 0.14, f(L) is the revised positive value of crop leaf area dynamics and takes the value of 0.56, and E is the economic coefficient and takes the value of 0.45.

(2) Light utilization efficiency

$$\mathbf{E}\_R = \frac{H \times \mathbf{Y}}{\sum\_{i=SD}^{MD} PAR} \times 100\% \tag{8}$$

where E*<sup>R</sup>* is light utilization efficiency, i.e., the ratio of the energy stored in the crop harvest per unit of land area during the plant's reproductive life to the photosynthetically active radiation projected onto that unit area during the same period. *MD* ∑ *i*=*SD PAR* is photosynthetically active radiation during crop growth (seeding to maturity), in MJ/m2, i.e., the radiant energy in the solar radiation spectrum that can be used by green plants in the photosynthetic band, it accounts for 49% of the total solar radiation. *SD* is the sowing date,

and *MD* is the harvest date. The total solar radiation can be obtained by Equations (9)–(14):

$$R\_s = \left(a\_s + b\_s \frac{n}{N}\right) R\_d \tag{9}$$

$$R\_{\rm d} = 24 \frac{\not\!0}{\pi} G\_{\rm s\varepsilon} d\_{\rm l} [\omega\_{\rm s} \sin(\varphi) \sin(\delta) + \cos(\varphi) \cos(\delta) \sin(\omega\_{\rm s})] \tag{10}$$

$$d\_I = 1 + 0.033 \cos\left(\frac{2\pi}{365}l\right) \tag{11}$$

$$\delta = 0.409 \sin(\frac{2\pi}{365}l - 1.39) \tag{12}$$

$$
\omega\_{\delta} = \arccos[-\tan(\varphi)\tan(\delta)]\tag{13}
$$

$$N = \frac{24}{\pi} \omega\_s \tag{14}$$

where *Rs* is the solar radiation (MJ/m2/d), *as* and *bs* are empirical parameters, where *as* indicates the fraction of astronomical radiation reaching the Earth's surface on cloudy days, and *bs* indicates the transport properties (aerosol density) of the cloud-free atmosphere, *as* takes 0.19 and *bs* takes 0.54; *n* is the actual insolation duration (h); *N* is the maximum possible insolation duration (h); *n*/*N* is the relative insolation duration, also called the insolation percentage; *Ra* is the astronomical radiation (MJ/m2/d); *π* is the circumference; *Gsc* is the solar constant, taking the value of 0.082; *dr* is the reverse solar-terrestrial relative distance; *ω<sup>s</sup>* is the sunset time angle; *ϕ* is the latitude (rad); *δ* is the solar declination; J is the number of days.

(3) Heat utilization efficiency

$$\text{HUE} = \frac{Y}{\sum\_{i=SD}^{MD} (T\_{i, \text{mean}} - t\_{\text{min}})} \tag{15}$$

where HUE is heat utilization efficiency, in kg/(hm2· ◦C·d); *Ti*,*mean* is the daily average temperature of the day *i*; ∑*MD <sup>i</sup>*=*SD*(*Ti*,*mean* − *tmin*) is the effective cumulative temperature during crop growth that is greater than the biological lower limit temperature *t*, in ◦C·d, the *tmin* of rice is 10 ◦C.

#### *2.6. Data Analysis*

A stepwise multiple linear regression model and correlation analysis were used to quantify the effects of climate factors, including mean temperature (average of maximum and minimum temperatures), precipitation, and solar radiation on future rice yield, the light and temperature resources utilization, light utilization efficiency, and heat utilization efficiency. The Pearson correlation coefficient was used for the correlation analysis. The framework of our study is shown in Figure 2.

**Figure 2.** The framework of analysis.

#### **3. Results**

*3.1. Calibration and Validation of the CERES-Rice Model*

The calibrated CERES-Rice model was able to simulate rice phenology and yield reasonably well in the study area. The CERES-Rice model was calibrated and validated using 2 years of field trial data, including three irrigation methods. The calibrated parameters for "Suijing 18" are shown in Table 2, including phenology and growth parameters.


**Table 2.** The calibration parameters for "Suijing 18".

The results of the validation have shown that the simulated dates of anthesis and maturity were consistent with the observed dates (Figure 3a). The error of simulating anthesis and maturity was generally within 5 days. The normalized root means square error (NRMSE) of anthesis and maturity dates between simulated and observed values was 3%, and the *R*<sup>2</sup> value was 0.968. The simulated and observed yields were also in general agreement (Figure 3b), with NRMSE = 3.4% and *R*<sup>2</sup> close to 1.0. Therefore, the validated results indicated that the CERES-Rice model could effectively simulate rice growth and development on the Songnen Plain of China.

**Figure 3.** Verification results for the simulated and observed phenology and yield in the study field. The dots are simulated and observed values, the red line is the fitted line.

#### *3.2. Projected Climate Change in Rice Growth Period in the Future*

The ensemble average of six GCMs under SSP126, SSP245, and SSP585 conditions were selected to reduce the uncertainty of future climate change projections. Compared with the baseline of 2000–2019, the ensemble-mean daily maximum temperature increased 1.06, 1.26, and 1.25 ◦C under SSP126, respectively, increased 1.31, 1.64, and 1.92 ◦C under SSP245, respectively, and increased 1.29, 2.36, and 3.54 ◦C under SSP585 in the 2040s, 2060s, and 2080s (Figure 4a). The ensemble-mean daily minimum temperature increased 1.81, 1.28, and 1.28 ◦C under SSP126, respectively, increased 1.40, 1.93, and 2.21 ◦C under SSP245, respectively, and increased 1.55, 2.95, and 4.38 ◦C under SSP585 in the 2040s, 2060s, and 2080s relative to the baseline (Figure 4b). In general, the ensemble-mean daily temperature showed an increasing trend during the rice growth period in the future. The largest increase

in temperature was observed in the SSP585 scenario and the lower increase in temperature in the SSP126 scenario.

**Figure 4.** Projected changes in mean maximum and minimum temperature (◦C), solar radiation (MJ m−2), and annual precipitation (mm) during the rice growth period (April to September) for the 2040s, 2060s, and 2080s compared with 2000–2019 under the SSP126, SSP245, and SSP585 scenarios using six GCMs for the study station on the Songnen Plain of China. The box boundaries indicate the 25th and 75th percentiles; the black line and short line within the box mark the median and mean, respectively; and whiskers below and above the box indicate the 10th and 90th percentiles, respectively.

The daily solar radiation was also projected to increase relative to the baseline in the future under both SSPs (Figure 4c). The ensemble-mean solar radiation increased 0.66, 0.86, and 0.87 MJ/m2 in the 2040s, 2060s, and 2080s under SSP126, respectively, increased 0.46, 0.56, and 0.74 MJ/m<sup>2</sup> in the future period under SSP245, respectively, and increased 0.32, 0.57, 0.80 MJ/m2 in the future period under SSP585, respectively. Solar radiation increased more in the SSP126 scenario than in the SSP245 and SSP585 scenarios. The multi-model mean changes in annual precipitation were similar to other climatic factors (Figure 4d). In the 2040s, 2060s, and 2080s, there could be an increase in 40.71, 60.10, and 61.60 mm under SSP126, respectively, increase in 26.46, 58.94, and 64.87 mm under SSP245, respectively, and increase in 54.61, 69.73, and 95.35 mm under SSP585 relative to the baseline. Overall, the multi-model mean annual precipitation was also increased in the future period, especially in the SSP585 scenario, where the increase was more pronounced.

#### *3.3. Impacts of Climate Change on Rice Phenology and Yield*

Climate change could directly affect the rice growth period, including anthesis and maturity, which in turn could have an impact on rice yield. Our simulation results indicated that the rice growth period would be shortened in the future (Figure 5). In the 2040s, 2060s, and 2080s, the multi-model mean changes in anthesis date were −5.2, −4.2, and −5.5 days under SSP126, respectively, and −6.55, −7.4, and −7.15 days under SSP245, respectively, and −7.15, −11.85, and −19.45 days under SSSP585, respectively.

**Figure 5.** Simulated change in a rice growth period in the 2040s, 2060s, and 2080s under SSP126, SSP245, and SSP585 scenarios relative to 2000–2019 using six GCMs for Qing'an station on the Songnen Plain of China. The box boundaries indicate the 25th and 75th percentiles; the black line and dot within the box mark the median and mean, respectively; and whiskers below and above the box indicate mean +1.5 SD and mean −1.5 SD, respectively.

The maturity date would be changed to −5.15, −4.2, and −6.25 days under SSP126, respectively, and −6.35, −7.55, and −9.4 days under SSP245, respectively, and −10.35, −17.05, and −26.8 days under SSP585, respectively. The results showed that the greatest variation in rice growth period under the future climate scenario was observed for SSP585, followed by SSP245, and the least significant for SSP126, which was more similar to the multi-model mean changes in temperature. Correlation analysis showed a significant negative correlation between rice maturity and temperature, with correlation coefficients reaching above 0.9 (Figure 6).

**Figure 6.** Correlations between the projected rice yield, maturity, maximum leaf area index (LAIX), the light and heat resource utilization indices (including ERT, ER, and HUE), and climate indices (including Rad, Tmax, Tmin, and Prec). The gradient of legend color is the function of strength of the correlation; the color and the size of ellipse indicates the strength of the correlation.

Compared with the baseline of 2000–2019, in the 2040s, 2060s, and 2080s, the multimodel mean changes in rice yield were +1.90%, +4.68%, and +2.46% under SSP126, respectively, and +0.25%, +0.48%, and +0.80% under SSP245, respectively, and −1.60%, −3.80%, and −24.89% under SSSP585, respectively (Figure 7). In general, rice yield varied inconsistently under different future scenarios, with a slight increase under SSP126 and SSP245 and a significant downward trend under SSP585. The result showed the greatest reduction in rice yield in the 2080s under SSP585. We used stepwise multiple linear regression analysis to reflect the relationship between rice yield and climatic variables change (including mean temperature, precipitation, and solar radiation). The value of the determination coefficient (*R*2) was 0.717. As the regression coefficient was shown in Table 3, rice yield was significantly positively correlated with solar radiation and negatively correlated with mean temperature, while rice yield had a slight positive correlation with precipitation. In addition, temperature, precipitation, and solar radiation could explain 71.7% of the rice yield change.

**Figure 7.** Simulated change in rice yield in the 2040s, 2060s, and 2080s under SSP126, SSP245, and SSP585 scenarios relative to 2000–2019 using six GCMs for Qing'an station on the Songnen Plain. The box boundaries indicate the 25th and 75th percentiles; the black line and dot within the box mark the median and mean, respectively; and whiskers below and above the box indicate mean + 1.5 SD and mean −1.5 SD, respectively.

**Table 3.** Coefficients of regression analysis of the impacts of climate change in rice growth period (April to September) on rice yield (Y), the light and temperature resources utilization (E*RT* ), light utilization efficiency (E*<sup>R</sup>* ), and heat utilization efficiency (HUE) change. Shown in the table are the <sup>Δ</sup>Y (kg·ha<sup>−</sup>1), <sup>Δ</sup>E*RT* (%), <sup>Δ</sup>E*<sup>R</sup>* (%), and <sup>Δ</sup> HUE (kg/(hm2· ◦C·d) as a function of change in mean temperature (Δ*Tmean*, ◦C), precipitation (Δ*Prec*, mm), and solar radiation (ΔRad, MJ·m<sup>−</sup>2).


Note: \*, \*\*, and \*\*\* indicate significance at *p* < 0.05, *p* < 0.01, and *p* < 0.001, respectively.

#### *3.4. Impacts of Climate Change on the Light and Heat Resources Utilization*

Within a certain range, the light and heat resources utilization in rice determines the production potential of an agricultural system, and research on the light and heat resources utilization was particularly important but has been less studied in the past. Therefore, we focused on the study of this issue. In the baseline period (2000–2019), the calculated light and temperature resources utilization (E*RT*) reached 89.74%. In the 2040s, 2060s, and 2080s, the calculated E*RT* changes were −6.46%, −6.01%, and −6.03% under SSP126, respectively, and −7.89%, −8.41%, and −8.27% under SSP245, respectively, and −6.88%, −13.69%, and −28.84% under SSP585, respectively (Figure 8a). In general, the calculated E*RT* showed a decreasing trend in the future, especially in the SSP585 scenario, and the reduction was more apparent. The light utilization efficiency (E*R*) was 1.39% in the baseline period and decreased by 0.13% under SSP585 in the 2080s, while slightly increased in other scenarios and periods (Figure 8b). Compared with the baseline of 2000–2019, in the 2040s, 2060s, and 2080s, the calculated heat utilization efficiency (HUE) changes were −0.17, −0.08, and −0.01 kg/(hm2· ◦C·d) under SSP126, respectively, and −0.37, −0.40, and −0.39 kg/(hm2· ◦C·d) under SSP245, respectively, and −0.26, −0.74, and −2.00 kg/(hm2· ◦C·d) under SSP585, respectively (Figure 8c). In this case, the changes in HUE were similar to the E*RT*.

**Figure 8.** Simulated change in the light and temperature resources utilization, the light utilization efficiency, and heat utilization efficiency in the 2040s, 2060s, and 2080s under SSP126, SSP245, and SSP585 scenarios relative to 2000–2019 (baseline) using six GCMs for Qing'an station in the Songnen Plain.

As shown in Table 3, the results of multiple linear regression analysis showed the relationship between the indexes of rice, light, and heat resources utilization (including E*RT*, E*R*, and HUE) and changes in climatic variables (including mean temperature, precipitation, and solar radiation). The values of the determination coefficient (*R*2) were 0.779, 0.331, and 0.697, respectively. All photothermal resource utilization indicators showed a significant negative correlation with mean temperature, while there was no significant correlation

with precipitation. The E*<sup>R</sup>* had a slight negative correlation with solar radiation and the HUE had a slight positive correlation with solar radiation.

#### *3.5. The Impact of Different Sowing Dates on Rice Yield and Light and Heat Resources Utilization*

To adapt to the impact of future climate change on rice growth, we have adjusted the sowing dates of rice to change the efficiency of light and heat resources utilization in rice. We simulated the light and temperature resources utilization (ERT), light utilization efficiency (ER), and heat utilization efficiency (HUE) for different sowing periods under two typical scenarios, SSP245 and SSP585 (Figure 9). The results showed that early sowing was beneficial for improving the ERT under all emission scenarios, while light energy utilization was different and appropriately delayed sowing was beneficial for improving light energy utilization. Early sowing under the SSP126 and SSP245 scenarios increased the HUE, while the effect of the sowing date on the HUE was different in different periods under the SSP585 scenario.

**Figure 9.** Simulated change in the light and heat resource utilization indices under different sowing dates in the 2040s, 2060s, and 2080s under SSP126, SSP245, and SSP585 scenarios based on the six GCMs compared with sowing on 17 April 2000–2019 for Qing'an station in the Songnen Plain.

In the context of climate change, by adjusting the sowing date, rice could use light and heat resources more efficiently during the growth period, thus contributing to rice yield. According to the rice yield changes under different sowing dates in the 2040s, 2060s, and 2080s relative to 2000–2019, we found that advancing or delaying the sowing period in the future period could mitigate the negative effects of climate change to some extent, but could not fully offset them, especially under the SSP585 scenario (Figure 10). Under the SSP126 and SSP245 scenarios, earlier sowing dates would increase rice yield, regardless of the period, while delayed sowing would result in lower rice yield. Our study showed that in the 2040s, the optimum sowing date for rice was before 27 March and would increase yield by 3.35% to 4.76%, while in the 2060s and 2080s, the optimum sowing date was around 3 April and would increase yield by 1.30% to 2.50% compared with sowing on 17 April. Under the SSP585 scenario, in the 2040s and 2060s, early sowing could increase yield by 3.36% to 3.76% compared with the baseline sowing date of 17 April. Whereas, in the 2080s, an appropriate delay in sowing would have a positive impact on rice yield, sowing on 1 May would increase yield by 8.37% compared with sowing on 17 April.

**Figure 10.** Simulated change in rice yield under different sowing dates in the 2040s, 2060s, and 2080s under SSP126, SSP245, and SSP585 scenarios based on the six GCMs compared with sowing on 17 April 2000–2019 for Qing'an station in the Songnen Plain.

#### **4. Discussion**

#### *4.1. Performance of the CERES-Rice Model*

The performance of the crop model needs to be validated before it can be applied to future simulations. The results of our calibration indicated that the DSSAT-Rice model could simulate rice growth well in our study area (Figure 3). The simulated phenological periods and yield were in agreement with our experimental observations. Similar findings had been obtained in previous scholarly studies [34–36]. For example, Boonwichai et al. [34] validated the DSSAT-Rice model in the Songkhram River Basin, Thailand, with calibrated and validated *R*<sup>2</sup> values of 0.84 and 0.78, respectively, and simulated rice yield similar to the observed yield. Kontgis et al. [36] calibrated the DSSAT-Rice and DSSAT-Maize models using experimental data and observed a close agreement between the observed and simulated values for anthesis, maturity, yield, biomass, and N uptake in both crops, with d-values of 0.89 to 0.99, indicating an acceptable performance. Overall, our results showed that the calibrated and validated DSSAT-Rice model had an acceptable error rate and could better simulate the effects of climate change on rice growth on the Songnen Plain of China.

#### *4.2. Impacts of Future Climate Change on Rice Growth*

In our study, the future climate variables were averaged from the set of six GCM outputs. We found that under different future scenarios, the daily maximum temperature increased by 1.06–3.54 ◦C, the daily minimum temperature increased by 1.28–4.38 ◦C, the average daily solar radiation increased by 0.32–0.87 MJ m−2, and the average annual precipitation increased by 26.26–95.35 mm (Figure 4). This prediction was consistent with many previous studies [37–41]. Arunrat et al. [37] used GCMs to predict future climate change in Thailand, with an increase in precipitation, as well as maximum and minimum temperatures for the SSP245 and SSP585 scenarios compared with the historical period. The maximum temperature increase for the SSP245 and SSP585 scenarios were 0.7–2.2 ◦C and 0.7–3.9 ◦C, respectively, and the minimum temperatures were 0.7–2.1◦C and 0.8–3.8 ◦C, respectively. Precipitation increased by 2.2–3.9% and 1.8–5.8%, respectively. Tan et al. [38] also showed an increasing trend in both temperature and solar radiation in the future climate scenario. Multi-model ensemble averaging methods are widely used in climate modeling to reduce the large bias and uncertainty introduced by model parameterization errors, model structure, assumptions, and input variables [42].

The predicted future climate change resulted in varying degrees of shortening of rice maturity, with the largest magnitude during the 2080s under the SSP585 scenario, with an average shortening of 26.8 days (Figure 5). Many studies indicated that future climate change would shorten the crop growing period [43–45]. Our simulations projected that the mean rice yield increased by 1.9–4.68% under SSP126 and by 0.25–0.80% under SSP245 in the future period; however, under the SSP585 scenario, rice yield decreased by 1.6–24.89% (Figure 7). A relevant study showed that a gradual increase in 3.0–4.3% in rice yield in irrigated areas was predicted under the SSP245 scenario in Thailand, while a decrease in 6.0–17.7% in rice yield in irrigated areas in the medium and long term was predicted under the SSP585 scenario [8]. The same findings were found in wheat on the Sichuan basin: Wheat yield was significantly higher under the SSP245 scenario and negatively affected under the SSP585 scenario [46]. Temperature and solar radiation were considered to be the main environmental factors affecting the yield potential of rice [47,48]. Moreover, our results demonstrated that rice yield showed a significant positive correlation with solar radiation and a significant negative correlation with temperature (Table 3).

Furthermore, we simulated and calculated the E*RT*, E*R*, and HUE for different scenarios in the future period (Figure 8). The E*RT* was not common in previous studies, which takes the light and temperature potential productivity as the light-temperature resources available in a region, where the light and temperature potential productivity was calculated using the step-by-step revision method. E*<sup>R</sup>* and HUE are the more common evaluation metrics in solar thermal resource utilization assessment [49]. Our results showed that there was a relatively significant downward trend in E*RT* and HUE under future climate scenarios. The main reason for this is the change in light-temperature production potential and effective cumulative temperature during the rice reproductive period due to future temperature and solar radiation increases. The E*<sup>R</sup>* increased in the low and medium emission scenarios and decreased more significantly in the high emission scenario in the 2080s, which was mainly due to the alteration of temperature and photosynthetically active radiation during the reproductive period of rice.

#### *4.3. Adaptive Strategies for Rice Production in Response to Climate Change*

In rice production, a suitable sowing date could make the rice fertility process coincide with the local suitable water temperature and climate conditions; adjusting the sowing period is considered an effective strategy to slow down the development of rice and avoid heat damage [50–52]. In this study, climate resources could be fully utilized to increase rice yield by adjusting the sowing date, but the negative impact of climate change on rice yield could not be fully offset in the high emission scenario (Figure 9). Therefore, there is a need for more adaptation strategies, and some relevant studies have shown that with advances in technologies, such as breeding and crop management, more diverse rice varieties could provide a greater buffer for rice production to better adapt to environmental anomalies, such as climate change [53].

#### *4.4. Uncertainty and Limitations of This Study*

In this study, we evaluated the changes in rice yield and simulated the utilization of light and heat resources by the DSSAT-Rice model in future climate change under SSP126, SSP245, and SSP585 scenarios based on the six GCMs. Different climate change projections may result in different results. To reduce the uncertainty of climate projections, we selected six GCMs with better projection results in the study area for ensemble averaging. Nevertheless, there were still some uncertainties. First, climate change scenarios and subsequently estimated impacts may have some uncertainties due to the inherent systematic biases of GCMs. Second, although the DSSAT-Rice model has been widely used in global climate change assessment studies, it still has uncertainties, and a single crop model may be overconfident and ignore the uncertainty of crop model structure and parameters on climate change. Based on the existing crop models that use different functional relationships to simulate crop growth processes, multi-model ensembles are often considered superior to single-model simulations, and future studies are recommended to use multiple models for simulations. Third, there may be some uncertainty in the timing of varietal adaptation, since we assumed no future adaptation while keeping other agronomic management, including planting density and cropping pattern constant in the future. In addition, we evaluated the performance of only one commonly used variety in the study site, without explicitly considering the impact of other factors that could harm rice yields, such as pests and diseases, extreme climatic events (droughts and floods), irrigation water availability, and socioeconomic conditions.

#### **5. Conclusions**

In this study, we used six GCMs in CMIP6 in combination with the CERES-Rice crop model to assess the impact of climate change on the utilization potential of light and heat resources of rice on the Songnen Plain. The main results could be summarized as follows: (1) Under the SSPs, the maximum and minimum temperatures, precipitation, and solar radiation increased in the future period relative to the baseline, (2) the rice growth period showed different degrees of shortening under the future climate scenario, with a slight increase in rice yield under the SSP126 and SSP245 scenarios, while under the SSP585 scenario, rice yield was significantly lower than the baseline. The ERT and HUE would decrease in the future, while ER would slightly increase in the rest of the scenarios except under SSP585 in the 2080s, which would decrease. The results of data analysis showed that both rice yield and light and heat resource utilization indicators showed a significant negative correlation with temperature, which was the most dominant meteorological factor, and (3) under future climate conditions, optimizing the sowing date could make full use of climate resources to improve rice yield and light and heat resource utilization indexes.

**Author Contributions:** E.Z. contributed to the design of the experiment, revision of the paper, and evaluated the obtained results; M.Q. contributed to the simulation of the model, the analysis of the data, and the writing of the paper; P.C. revised the paper; T.X. and Z.Z. evaluated the obtained results. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was jointly supported by the Basic Scientific Research Fund of Heilongjiang Provincial Universities (2020-KYYWF-1042). We are grateful to the staff of the National Key Irrigation Experimental Station for their technical assistance.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Characteristics of Potential Evapotranspiration Changes and Its Climatic Causes in Heilongjiang Province from 1960 to 2019**

**Tangzhe Nie 1, Rong Yuan 1, Sihan Liao 1, Zhongxue Zhang 2,3,\*, Zhenping Gong 4, Xi Zhao 5, Peng Chen 6, Tiecheng Li 2,3, Yanyu Lin 7, Chong Du 1, Changlei Dai <sup>1</sup> and Hao Jiang <sup>8</sup>**


**Abstract:** Climate change refers to the statistically significant changes in the mean and dispersion values of meteorological factors. Characterizing potential evapotranspiration (*ET*0) and its climatic causes will contribute to the estimation of the atmospheric water cycle under climate change. In this study, based on daily meteorological data from 26 meteorological stations in Heilongjiang Province from 1960 to 2019, *ET*<sup>0</sup> was calculated by the Penman–Monteith formula, linear regression method and the Mann–Kendall trend test were used to reveal the seasonal and inter-annual changing trend of *ET*0. The sensitivity-contribution rate method was used to clarify the climatic factors affecting *ET*0. The results showed that: (1) From 1960 to 2019, the maximum temperature (*T*max), minimum temperature (*T*min) and average temperature (*T*mean) showed an increasing trend, with climate tendency rate of 0.22 ◦C per decade (10a), 0.49 ◦C/(10a), 0.36 ◦C/(10a), respectively. The relative humidity (*RH*), wind speed (*U*) and net radiation (*Rn*) showed a decreasing trend, with a climate tendency rate of <sup>−</sup>0.42%/(10a), <sup>−</sup>0.18 m/s/(10a), <sup>−</sup>0.08 MJ/m2/(10a), respectively. (2) *ET*<sup>0</sup> showed a decreasing trend on seasonal and inter-annual scales. Inter-annually, the average climate tendency rate of *ET*<sup>0</sup> was −8.69 mm/(10a). seasonally, the lowest climate tendency rate was −6.33 mm/(10a) in spring. (3) *ET*<sup>0</sup> was negatively sensitive to *T*min, and *RH*, while positively sensitive to *T*max*, T*mean *U* and *Rn*, its sensitivity coefficient of *U* was the highest, which was 1.22. (4) The contribution rate of *U* to *ET*<sup>0</sup> was the highest on an inter-annual scale as well as in spring and autumn, which were −8.96%, −9.79% and −13.14%, respectively, and the highest contribution rate to *ET*<sup>0</sup> were *Rn* and *T*min in summer and winter, whose contribution rates were −4.37% and −11.46%, respectively. This study provides an understanding on the response of evapotranspiration to climatic change and further provides support on the optimal allocation of regional water resource and agricultural water management under climate change.

**Keywords:** potential evapotranspiration (*ET*0); climate change; Penman–Monteith; sensitivity analysis; contribution rate

#### **1. Introduction**

Over the past 100 years (1906–2005), the global average surface temperature has increased by 0.74 ◦C and is expected to increase by at least 4 ◦C in 2100 if carbon dioxide emissions are not reduced [1]. Climate change will inevitably lead to changes in the global hydrological cycle [2]. Apart from precipitation and runoff, actual evapotranspiration (*ET*)

**Citation:** Nie, T.; Yuan, R.; Liao, S.; Zhang, Z.; Gong, Z.; Zhao, X.; Chen, P.; Li, T.; Lin, Y.; Du, C.; et al. Characteristics of Potential Evapotranspiration Changes and Its Climatic Causes in Heilongjiang Province from 1960 to 2019. *Agriculture* **2022**, *12*, 2017. https://doi.org/10.3390/ agriculture12122017

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 17 October 2022 Accepted: 24 November 2022 Published: 26 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

is the most important climatic factor and a main component in the hydrological cycle [3]. Evapotranspiration is affected by a variety of meteorological factors, such as temperature, relative humidity (*RH*), wind speed (*U*), and net radiation (*Rn*). Characteristics of *ET* changes and its climatic causes under climate change will reveal the water cycle process and its driving mechanism.

Due to the complexity of the *ET* process, it is difficult to directly measure the *ET*: however, with the deepening of vegetation *ET* research, obtaining regional continuous *ET* data by calculating the *ET* becomes particularly vital. Potential evapotranspiration (*ET*0), as the basis for calculating the *ET* [4], represents the maximum *ET* that can be achieved on a fixed underlying surface with unlimited water supply under certain meteorological conditions. *ET*<sup>0</sup> is widely used in the analysis of climate dry and wet conditions, rational use and evaluation of water resources, crop water demand and agricultural production management, and ecological environment research [5]. There are many ways to calculate *ET*0, such as temperature and the solar radiation-based Hargreaves method, the equilibrium evaporation-based Priestley-Taylor model and the FAO Blaney–Criddle formula [6]. However, many scholars choose the Penman-Monteith method recommended by the FAO56 to calculate *ET*0. The advantage of the Penman–Monteith method is that standard meteorological data are easily obtained or obtained through routine observation, and all calculation procedures can be standardized by the calculation of available meteorological data and time scales [7].

*ET*<sup>0</sup> was heterogeneous in temporal and spatial distribution. Jung et al. [8] pointed out that the global average annual *ET* increased at a rate of 7.1 ± 1.0 mm/10a from 1982 to 1997, but since 1998, the global average *ET* decreased by −7.9 mm/10a. Roderick et al. [9] found that in the past 50 years, the *ET*<sup>0</sup> in the northern hemisphere was decreasing at a faster rate of 2 to 4 mm/a. *ET*<sup>0</sup> also showed an increasing trend in some areas, such as West Africa [10] and Turkey [11]. Wu et al. [12] pointed out that the national annual average *ET*<sup>0</sup> decreased at a rate of 0.52 mm/a based on the daily meteorological data of 552 meteorological stations in China from 1961 to 2015, and the *ET*<sup>0</sup> of most stations in arid and humid regions showed a significant decreasing trend; however, neither the increasing nor decreasing trend of *ET*<sup>0</sup> is significant in semi-arid and semi-humid regions. The decrease or increase of *ET*<sup>0</sup> may affect the hydrological cycle differently. For example, when the *ET*<sup>0</sup> decreases, the transport of water vapor in the atmosphere is reduced, resulting in corresponding changes in precipitation patterns. For agriculture, the water use efficiency of crops may be improved by reducing the adverse effects of drought on crops. However, the increase of *ET*<sup>0</sup> will increase water consumption, resulting in increased land water loss and drought, and the atmospheric circulation may be affected, which may lead to strong rainfall, thus changing the distribution of water resources.

The spatio–temporal heterogeneity of *ET*<sup>0</sup> was attributed to the common effect of different climatic factors, including maximum temperature (*T*max), minimum temperature (*T*min), average temperature (*T*mean), *RH*, *U* and sunshine hours, while the dominant factors affecting *ET*<sup>0</sup> varied in different regions. For example, Bandyopadhyay et al. [13] concluded that the main reason for the decrease of *ET*<sup>0</sup> in India was the significant increase in *RH* and decrease of *U* by the non-parametric method of Sen's slope. Roderick et al. [14] believed that the decrease in solar radiation caused by the increase in cloud cover and aerosol concentration in the southern hemisphere was the main reason for the decrease of *ET*<sup>0</sup> in New Zealand. Hossein et al. [15] believed that *U* has the greatest impact on *ET*<sup>0</sup> in Iranian region under arid climate condition by sensitivity analysis. Guo et al. [16] used the Sobol global sensitivity analysis method to analyze the sensitivity of *ET*<sup>0</sup> in Australia calculated by the Penman–Monteith and Priestley–Taylor models and obtained that air temperature was the most sensitive factor to *ET*0. China has seven geographic regions, with complex and diverse climatic characteristics. *ET*<sup>0</sup> has shown a fluctuating downward trend in recent decades, such as Henan Province in Central China [17], Anhui Province in East China [18], Sichuan Basin [19] and Huaihe basin [20] in Southwest China; the results of the sensitivity-contribution rate method showed that the contribution of *U* to *ET*<sup>0</sup> was

higher than that of sunshine hours, indicating that *U* was the dominant meteorological factor in the above area. On the contrary, in North China, the results of the sensitivity coefficient method showed that sunshine hours were the dominant factor, followed by *U* [21]. Zhou et al. [22] analyzed the partial correlation between *ET*<sup>0</sup> and meteorological factors by discussing climate attribution and concluded that the increase of the *T*max in the Three River Headwaters region located in Northwest China was the main reason for the increasing trend of *ET*0, however, the contribution of *U* was the smallest among all meteorological factors, which was different from the meteorological factors affecting the trend of *ET*<sup>0</sup> in other regions.

Heilongjiang Province, located in one of the four major black soil belts in the world, has an existing land area of 454,600 km<sup>2</sup> and an area of 23,900 km<sup>2</sup> of arable land. It is the largest province in terms of grain production in the country. In recent years, Heilongjiang Province has become one of the regions with the largest temperature rise in the country and one of the regions with more serious climate disasters, threatening the safe production of food crops. The impact of floods and droughts on the agricultural economy in Heilongjiang Province accounted for about 89.4% of the total impact of various natural disasters [23]; the reduction of grain production due to floods and droughts accounted for about 12% of the total grain output in the same period [24]. Nie et al. [25] speculated that the climate in Heilongjiang Province would continue to warm in the next 30 and 50 years, and water resources would be scarcer. The probability of drought is increasing. *ET*0, as a key factor to characterize the regional dry and wet conditions, is of great significance for guiding water resource management and agricultural production management in Heilongjiang Province. Jiang et al. [26] expounded on the various characteristics of *ET*<sup>0</sup> in Heilongjiang Province from the scale of the crop growing season and pointed out that *ET*<sup>0</sup> was the most sensitive to *RH*, which provided an important reference index for formulating a reasonable irrigation system for the crop growing season in Heilongjiang Province. However, their study was limited to the growing season of crops, which could not fully reflect the changes of *ET*<sup>0</sup> in Heilongjiang Province between inter-annual and different seasons; moreover, contributing rates of different climate factors were not analyzed. Su et al. [27] pointed out that the contribution rate of water vapor pressure and *U* to the *ET*<sup>0</sup> change in Heilongjiang Province was the largest by multiple regression analysis, and it has a certain significance for understanding the influencing factors on the inter-annual change of *ET*0. However, they failed to study the impact of more meteorological factors on a shorter time scale. The weather of Heilongjiang Province changes significantly in four seasons, and *ET*<sup>0</sup> may be affected by different climatic factors in seasonal scales.

This study used a linear regression equation and the Mann–Kendall method to analyze the trend of *ET*<sup>0</sup> in Heilongjiang Province in the past 60 years on the inter-annual and seasonal scales. The sensitivity coefficient and contribution of different climate factors to *ET*<sup>0</sup> were comprehensively analyzed by the sensitivity-contribution rate method, aiming to provide an important basis for the rational and efficient use of water resources and agricultural water management in different regions of Heilongjiang Province under climate change.

#### **2. Materials and Methods**

#### *2.1. Overview of the Study Area and Data Sources*

Heilongjiang Province, located in the east of Eurasia, is the northernmost province with the highest latitude in China, starting at 121◦11 in the west, 135◦05 in the east, 43◦26 in the south, and 53◦33 in the north, spanning 14 longitudes from east to west and 10 latitudes from north to south, with an average altitude of 481 m. It has a temperate continental monsoon climate. It spans the four major river systems of the Heilongjiang River, Wusuli River, Songhua River and Suifenhe River. The average annual temperature of Heilongjiang Province is between −5 and 5 ◦C, gradually increasing from the north to the south. The annual precipitation is between 400 and 650 mm, with more in the central mountainous areas, followed by the east, and less in the west and north. The annual sunshine hours in the province are mostly between 2400 and 2800 h, where the west is more than the east.

The meteorological data in this study were from the daily meteorological data of 26 meteorological stations (Figure 1) in Heilongjiang Province from 1960 to 2019 recorded by the China Meteorological Data Network, including *T*max, *T*min, *T*mean, *RH*, *U*, and sunshine hours; *Rn* was calculated according to the method recommended by the Food and Agriculture Organization of the United Nations (FAO) [7]. According to the meteorological division method, March to May is divided into spring, June to August is summer, September to November is autumn, and December to February is winter [28].

**Figure 1.** Map of the study area and distribution of meteorological stations in Heilongjiang Province.

*2.2. Mann–Kendall Trend and Mutation Test*

For the time series *X*, (containing *n* samples), an order column is constructed:

$$S\_k = \sum\_{i=1}^k r\_i \ (k = 2, \ 3, \ \dots, n) \tag{1}$$

where

$$r\_i = \begin{cases} +1 & \text{x}\_i > \text{x}\_j \\ 0 & \text{or} \end{cases} \\ \text{(j = 1, 2, \dots, i)} \tag{2}$$

The order column *Sk* is the sum of the number of values when the value of the *i* moment is greater than the *j* moment.

Under the assumption that the time series is random, define the statistic:

$$UF\_k = \frac{\left[S\_k - E(S\_k)\right]}{\sqrt{Var(S\_k)}} \text{ (k = 1, 2, \dots, n)}\tag{3}$$

where *UF*<sup>1</sup> = 0, *E*(*Sk*) and *Var*(*Sk*) are the mean and variance of *Sk*, respectively, and when *X*1, *X*2, ... , *Xn* are independent of one other, they have the same continuous distribution, which can be deduced from the following equation:

$$E(S\_k) = \frac{n(n+1)}{4}, \text{ Var}(S\_k) = \frac{n(n-1)(2n+5)}{72}(2 \le k \le n) \tag{4}$$

*UFk* is a standard normal distribution, it is a sequence of statistics calculated in the order of time series *X* (*X*1, *X*2, ... , *Xn*); given the significance level α by checking the normal distribution table, if *UFi* > *Uα*, it indicates that there is a significant trend change in the sequence. Then repeat the above process in the reverse order of time series *X* (*Xn*, *Xn*−1, ... , *X*1) and make *UBk* = *UFk* (*k* = *n*, *n*−1, . . . , 1), *UB*<sup>1</sup> =0.

Generally, if the significance level α = 0.05, then the critical value *Z* = ±1.96. The curves of the two statistical sequences of *UF*, *UB* and the two straight lines of ±1.96 are plotted on one plot. If the values of *UF* and *UB* are greater than 0, it indicates an upward trend in the series, and if the values of *UF* and *UB* are less than 0, it indicates a downtrend. When *UF* and *UB* exceed the critical line, it indicates a significant upward or downward trend, and the range above the critical line is determined as the time zone in which the

mutation occurred. If the two curves of *UF* and *UB* intersect and the intersection point is between the critical lines, then the moment corresponding to the intersection point is the time when the mutation begins [29].

#### *2.3. Climate Tendency Rate*

The climate tendency rate was calculated by the least squares method, and the unary linear regression equation of Y*<sup>i</sup>* and X*<sup>i</sup>* was established:

$$\mathbf{Y}\_{i} = \mathbf{A}\mathbf{X}\_{i} + \mathbf{B} \tag{5}$$

where A and B are regression constants, and A × 10 is the climate tendency rate, representing the changing rate of each climatic factor every 10 years (10a). A positive value indicates an increasing trend of climate in the corresponding time series, while a negative value indicates a decreasing trend.

#### *2.4. Calculation of ET0*

The daily *ET*<sup>0</sup> of 26 sites in Heilongjiang Province was calculated using the Penman– Monteith model recommended by the FAO. The formula is as follows:

$$ET\_0 = \frac{0.408\Delta (R\_H - G) + \gamma \frac{900}{T + 273} lL\_2 (\varepsilon\_s - \varepsilon\_a)}{\Delta + \gamma (1 + 0.34 lL\_2)}\tag{6}$$

where *ET*<sup>0</sup> is the potential evapotranspiration, mm; Δ is the slope of the temperature change with the saturated water vapor pressure, kPa/◦C; *Rn* is the surface net radiation, MJ/(m2·d); *<sup>G</sup>* is the soil heat flux, MJ/(m2·d); *<sup>γ</sup>* is the hygrometer constant, kPa/◦C; *<sup>U</sup>*<sup>2</sup> is the *U* at a height of 2 m above the ground, m/s; *es* is the saturated vapor pressure, kPa; *ea* is the actual vapor pressure, kPa; *T* is the temperature, ◦C.

#### *2.5. Sensitivity-Contribution Rate Method Based on Partial Derivatives*

Sensitivity analysis makes each input variable change within the corresponding value range, and studies and predicts the influence degree of the changes of these input variables on the output value. The influence degree is called the sensitivity coefficient [30] and is used to judge the interference degree of the relative changes of climatic factors to the changes of *ET*0. This paper mainly analyzed the sensitivity and contribution rate of *T*max*, T*min*, T*mean*, RH*, *U* and *Rn* to *ET*0. The sensitivity coefficient *Sv*<sup>i</sup> of *ET*<sup>0</sup> to each climatic factor was calculated by the following formula:

$$Sv\_i = \lim\_{\Delta \to 0} \left( \frac{\Delta ET\_0}{\Delta v\_i} \cdot \frac{ET\_0}{v\_i} \right) = \frac{\partial ET\_0}{\partial v\_i} \cdot \frac{v\_i}{ET\_0} \tag{7}$$

where *vi* is the climatic factor; *Svi* is the sensitivity coefficient of the climatic factor. The positive and negative values of *Svi* reflect the correlation between *ET*<sup>0</sup> and climatic factor. A negative sensitivity coefficient indicates that *ET*<sup>0</sup> decreases with the decrease of climatic factors, and vice versa. The absolute value reflects the impact of climatic factors on *ET*0. The greater the absolute value, the greater the impact, and vice versa.

The contribution rate *Gvi* is equal to *Svi* multiplied by the annual change rate of meteorological variables (*Rvi*)*,* which is:

$$R\_{\upsilon\_i} = \frac{\mathbf{n} \times \mathbf{T} \mathbf{r} \mathbf{r} \mathbf{d}\_{\upsilon\_i}}{|a\upsilon\_{\upsilon\_i}|} \times 100\% \tag{8}$$

$$\mathbf{G}v\_i = \mathbf{R}v\_i \cdot \mathbf{S}v\_i \tag{9}$$

where *Rvi* is the annual relative change rate of *vi*; n is the total number of years; *Trendvi* is the annual climate tendency rate of *vi*, which is the slope of the univariate linear regression equation between *vi* and n; *avvi* is the annual average value of *vi*; *Gvi* is the contribution rate, the magnitude of the absolute value of *Gvi* reflects the contribution of the relative change of *vi* to the change of *ET*0.

#### *2.6. Data Processing*

The CROPWAT 8.0 (FAO, Rome, Italy) software was used to calculate the daily *ET*<sup>0</sup> by the Penman–Monteith formula, Matlab 2021a (MathWorks, Natick, MA, USA) was used to calculate the rate of change of *ET*0, *T*max*, T*min*, T*mean*, RH*, *U* and *Rn*, and perform a Mann– Kendall mutation test. The Mann–Kendall test was used to analyze the long-term change trend and mutation of *ET*0. The ArcMap 10.4 (ESRI, Redlands, CA, USA) toolkit was used for spatial analysis to perform spatial interpolation and mapping for each meteorological variable, the spatial interpolation method was Inverse Distance Weighting (IDW) and the spatial resolution was 500 dpi.

#### **3. Results**

#### *3.1. Spatial Distribution of the Mean Values of Meteorological Factors*

Spatial distribution of average *T*max, *T*min, *T*mean, *RH*, *U*, *Rn* from 1960 to 2019 is shown in Figure 2. On the inter-annual scale, the *T*max, *T*min and *T*mean showed an increasing trend from north to south, their inter-annual ranges were 5.5~11.5 ◦C, 0.5~7.0 ◦C and 1.0~5.5 ◦C, respectively. A higher *RH* was mainly distributed in the central region, while the *RH* in east and west region was relatively lower. A higher *U* was mainly distributed in the east and west regions. Spatially, *Rn* increased from north to south.

On the seasonal scale, the *T*max, *T*min and *T*mean also showed an increasing trend from north to south. The *RH* was higher in summer and lower in spring. However, the *U* was higher in spring and lower in summer, which were 3.71 m/s and 2.67 m/s, respectively. The *Rn* is larger in summer and smaller in winter; the ranges were 11.5~12.8 MJ/m<sup>2</sup> and 0.30~1.90 MJ/m2, respectively.

#### *3.2. Temporal and Spatial Variation of the Climate Tendency Rate of Meteorological Factors*

At time series on inter-annual and seasonal scales, the *T*max, *T*min, *T*mean showed a significant upward trend (*p* < 0.05) for *T*max in winter (Table 1). Whereas, the *RH*, *U* and *Rn* showed a significant downward trend (*p* < 0.05), except for *RH* in spring and summer, *U* in autumn and *Rn* in autumn and winter. The inter-annual climate tendency rate for the *T*max, *T*min, *T*mean*, RH*, *U* and *Rn* was 0.22 ◦C/10a, 0.49 ◦C/10a, 0.36 ◦C/10a, −0.42 (%/10a), −0.18 (m/s/10a), and −0.04 (MJ/m2/10a), respectively.

**Table 1.** Seasonal and inter-annual climate tendency rate of meteorological factors in Heilongjiang Province.


Note: \* indicates a significant level of 0.05.

In terms of spatial distribution on inter-annual scales, the *T*max, *T*min, and *T*mean showed an increasing trend, higher climate tendency rate of them were mainly distributed in the northern region (Figure 3). The *RH* showed a decreasing climate tendency rate except for some central and southern regions. The *U* was shown a decreasing trend except Jixi, and the climate tendency rate was lower in western and higher in central region. The *Rn* showed a decreasing climate tendency rate except for Hulin and Tonghe. On seasonal scales, The *T*max, *T*min, and *T*mean showed an increasing trend except for *T*max in Suifenhe and *T*mean in Hulin in summer. The climate tendency rate of *T*max, *T*min, *T*mean were greatest in winter compared with the other 3 seasons (Figure 3). The *RH* showed a decreasing climate tendency rate except in the central regions in spring, summer, and autumn. The climate tendency rate of *U* was lower in the western and eastern regions, and the climate tendency rate of *Rn* showed a decreasing trend except for some central and eastern regions.

B

A

307

C

**Figure 2.** *Cont*.

A

C

**Figure 3.** *Cont*.

B

E

F

#### *3.3. Temporal and Spatial Variation of ET0*

*ET*<sup>0</sup> was the highest in summer and lowest in winter, with daily averages of 4.11 mm and 0.35 mm, respectively (Figure 4). *ET*<sup>0</sup> showed decreasing trends in seasons and inter-annual; the higher *ET*<sup>0</sup> values were mainly distributed in the southwest region. The average climate tendency rate of *ET*<sup>0</sup> in spring, summer, autumn, winter and interannual was −6.33 mm/(10a), −2.72 mm/(10a), −2.58 mm/(10a), −0.65 mm/(10a), and −8.69 mm/(10a), respectively (Figure 4). In the western region, the seasonal lower *ET*<sup>0</sup> climate tendency rate, especially in spring and summer, led to a more rapid decrease trend of inter-annual *ET*0.

The inter-annual *ET*<sup>0</sup> in Heilongjiang Province showed a significant increasing trend from 1977 to 1983 *(Z* > 1.96)*,* while it showed a significant decreasing trend in 2017–2019 (*Z* < −1.96). The mutation point of *ET*<sup>0</sup> appeared in 2011, and the changing trend of *ET*<sup>0</sup> changed from increase to decrease (Figure 5e). On a seasonal scale, the mutation years range from 2006 to 2015 (Figure 5a–d).

#### *3.4. Sensitivity of ET0 to Meteorological Factors*

The sensitivity coefficients of *ET*<sup>0</sup> to *U*, *Rn*, *RH*, *T*max, *T*min and *T*mean are shown in Table 2. On the inter-annual scale, in the case that other climatic factors remain unchanged, when the *U*, *Rn*, *T*max, *T*mean, *RH* and *T*min increase by 10%, *ET*<sup>0</sup> will increase by 12.2%, 4.0%, 4.2%, 1.4% or decrease by 11.5% and 1.4%, respectively. The sensitivity order of *ET*<sup>0</sup> change to each climatic factor was *U* > *RH* > *T*max > *Rn* > *T*min = *T*mean.


**Table 2.** Sensitivity coefficients of seasonal and inter-annual *ET*<sup>0</sup> change to climate factors.

On the seasonal scale (Table 2), the changes of *ET*<sup>0</sup> in spring, summer and autumn were all negatively sensitive to *RH*, positively sensitive to *U*, *Rn*, *T*max, and *T*mean, while the changes of *ET*<sup>0</sup> to *RH*, *T*max, *T*min and *T*mean were negatively sensitive in winter. *ET*<sup>0</sup> was most sensitive to *U* in spring and autumn, and most sensitive to *RH* and *T*min in summer and winter, with sensitivity coefficients of 1.35, 1.40, −0.91 and −1.76, respectively.

Autumn 1.40 0.28 −1.34 0.39 −0.14 0.15 Winter 0.53 0.10 −1.67 −1.64 −1.76 −1.32 Inter-annual 1.22 0.40 −1.15 0.42 −0.14 0.14

Figure 6 shows the spatial distribution of sensitivity coefficients of *ET*0. On the interannual scale, The sensitivity coefficients of *ET*<sup>0</sup> to *T*max, *T*min and *T*mean gradually increased from north to south. The sensitivity coefficients of *ET*<sup>0</sup> to *RH* decreased from west to east. A higher sensitivity coefficient to *U* was mainly distributed in the eastern and western regions, while the sensitivity coefficient to *Rn* was mainly distributed in the central region.

On the seasonal scales, the sensitivity of *ET*<sup>0</sup> to *T*max, *T*min, *T*mean was higher in the western region in spring and summer, and higher in the southeast and south in autumn and winter. *ET*<sup>0</sup> showed lower sensitivity to *RH* in the central region throughout the 4 seasons, the sensitivity coefficient of *ET*<sup>0</sup> to *U* was higher in spring and autumn, while the sensitivity coefficient of *ET*<sup>0</sup> to *Rn* was higher in summer.

**Figure 4.** (**Row A**) spatial distribution

 and (**Row B**) climate tendency of *ET*0 from 1960 to 2019.

**Figure 5.** Mann–Kendall (MK) analysis of *ET*<sup>0</sup> in (**a**) spring, (**b**) summer, (**c**) autumn, (**d**) winter and (**e**) inter-annually from 1960 to 2019.

#### *3.5. Dominant Climatic Factors for ET0 Change*

On the inter-annual scale, *U* was the dominant climatic factor for the change of *ET*0, followed by *T*max, *T*mean, *RH*, *Rn* and *T*min*,* with contribution rates of 6.15%, 5.03%, 4.34%, −1.82%, and −1.41%, respectively (Table 3). The positive contribution rates of *RH*, *T*max and *T*mean to *ET*<sup>0</sup> have not been able to offset the negative contribution rates of *U*, *Rn* and *T*min*;* therefore, the *ET*<sup>0</sup> showed a decreasing trend from 1960 to 2019.

**Table 3.** Contribution of seasonal and inter-annual climate factors to *ET*<sup>0</sup> changes from 1960 to 2019.


On the seasonal scale, *U* was the dominant factor for the decrease of *ET*<sup>0</sup> in spring and autumn, followed by *T*max and *RH*; the largest contribution rates to the change of *ET*<sup>0</sup> were −9.79% and −13.14%, respectively. In summer, *Rn* was the dominant factor for *ET*<sup>0</sup> change, the contribution rates of *T*max, *T*min, and *T*mean to *ET*<sup>0</sup> were less different, which were 2.57%, 2.1% and 2.09%,respectively, while in winter, *Rn* contributes the least to *ET*0, and *T*min became the dominant factor.

The spatial distribution of dominant meteorological factors is shown in Figure 7. In spring, the dominant factor for *ET*<sup>0</sup> was *U* in the study area except for Fujin (Figure 7a), In summer, *Rn* was the dominant factor in most regions of the study area, *T*min and *RH* were the dominant factors in the partial northern and eastern regions. In autumn, *U* was the dominant factor for 85% of the total 26 sites (Figure 7c); in winter, *T*min was the dominant factor in the northern and eastern regions, while *U* was the dominant factor in the western region. On the inter-annual scale, the dominant factor of all 26 sites was *U* (Figure 7e).

B

A

**Figure 6.** *Cont*.

314

**Figure 6.** Spatial distribution of sensitivity coefficients of *ET0* to (**Row A**) *T*max, (**Row B**) *T*min, (**Row C**) *T*mean, (**Row D**) *RH*, (**Row E**) *U*,(**Row F**) *Rn* from 1960 to 2019.

**Figure 7.** Spatial distribution of dominant climatic factors in (**a**) spring, (**b**) summer, (**c**) autumn, (**d**) winter and (**e**) inter-annually in the study area from 1960 to 2019.

#### **4. Discussion**

Under the general warming of the global climate, *ET*<sup>0</sup> was on the rise in most areas, such as the annual *ET*<sup>0</sup> for India, which, as a whole, has increased in the latter half of the 20th century [31]; Awash River basin, Ethiopia [32], and South Korea over the recent 100 years [33], which is consistent with the generally accepted trend that *ET*<sup>0</sup> increases with temperature. However, the phenomenon of the "evaporation paradox" appeared in many areas around the world [34], that is, with the continuous increase of temperature, the *ET*<sup>0</sup> showed a decreasing trend, such as the Canadian prairie region, the northern region of South America, Thailand, New Zealand [9,35–37], and the northwest of India [38], the Lijiang watershed [39], as well as Alor Setar, Malaysia [40]. In this study, *ET*<sup>0</sup> in Heilongjiang Province from 1960 to 2019 showed a decrease trend inter-annually and seasonally with the increasing temperature, proving that the "evaporation paradox" phenomenon also existed in our study area, which was consistent with the conclusion of Li et al. [41]. At present, there is still no clear conclusion about the mechanism of the "evaporation paradox"; the existing related research studies are mostly qualitative analysis, and the reasons need to be further explored.

In China, the sensitivity coefficient and contribution rate of *ET*<sup>0</sup> to meteorological factors varied in different climatic regions. Even in the same climatic region, the sensitivity coefficient and contribution rate of *ET*<sup>0</sup> were different spatially. For example, in the subtropical monsoon climate zone, *ET*<sup>0</sup> in the Yangtze River basin [42], Sichuan basin [43], and Yunnan-Guizhou plateau [44] were most sensitive to the *T*max, *RH*, and sunshine hours, respectively. In the temperate continental climate zone, *ET*<sup>0</sup> in the middle and upper reaches of the Yellow River [45] and the Ebinur Lake basin in Xinjiang [46] were most sensitive to the actual water vapor pressure and *U*, respectively, with the highest contribution rate of *U* for both regions. In the plateau mountainous climate region, the meteorological factor with the largest sensitivity coefficient in the Qinghai Tibet Plateau was the actual water vapor pressure, and the sunshine hours had the largest contribution rate to *ET*<sup>0</sup> [47]. Our study area is located in the temperate monsoon climate zone, where *ET*<sup>0</sup> was the most sensitive to *U* and had the largest contribution rate to *ET*0. However, in the Loess Plateau basin, which is also located in the same climate zone, *ET*<sup>0</sup> was most sensitive to actual water vapor pressure and *T*mean had the highest contribution rate [48]. In the North China Plain of the temperate monsoon climate zone, sunshine hours had the largest contribution rate to *ET*<sup>0</sup> [21]. In summary, the plateau area has strong solar radiation, long sunshine time, and low temperature, which may cause the *ET*<sup>0</sup> to be most

affected by the sunshine hours; most of the basin areas have a dry climate, less rainfall, and lack of water resources, causing vegetation to be more sensitive to the water condition; therefore, *RH* might have a larger sensitivity coefficient to *ET*0. Heilongjiang Province is located in plain and mountainous areas, and affected by southeast monsoon in summer and controlled by northwest monsoon in winter [49]. This might be the reason why *U* contributed the most to *ET*<sup>0</sup> change.

In this study, *ET*<sup>0</sup> in Heilongjiang Province showed a decrease trend on the inter-annual scale from 1960 to 2019, with *U* as the dominant factor, which was consistent with the trend of *ET*<sup>0</sup> in Jilin Province analyzed by Liu et al. [50] and in Liaoning Province analyzed by Cao et al. [51]. This was probably because the above three provinces are geographically contiguous and have a similar type of climate characteristics. A research established by Xu et al. in 2003 predicted that the temperature in Heilongjiang Province would increase significantly by 2030 and 2050 with a higher increase in winter and a lower increase in summer by the CCCma (Canadian Center for Climate Modelling and analysis), CCSR (Center for Climate System Research), CSIRO (Commonwealth Scientific and Industrial Research Organization), GFDL (Geophysical Fluid Dynamics Laboratory) and Hadley climate models [52]. Zhang et al. [53] pointed out that the climate of Heilongjiang Province would tend to be warm and humid in the next 41 years by the Hadley model. Wang et al. [54] believed that *ET*<sup>0</sup> was not only affected by various meteorological factors, but interactions between meteorological factors would also interfere with *ET*<sup>0</sup> changes. Moreover, *ET*<sup>0</sup> may also be affected by human activities including aerosol emissions, air pollution [55] and rice area expansion [56]. Therefore, further exploration in these aspects will be needed for contribution analysis of *ET*<sup>0</sup> changes in Heilongjiang Province in the future, as well as in other regions of the world.

#### **5. Conclusions**

In this study, *ET*<sup>0</sup> was calculated by the Penman–Monteith formula, the sensitivitycontribution rate method was used to clarify the climatic factors affecting seasonal and inter-annual changing of *ET*0*.*, The *T*max*, T*min*, T*mean showed an increasing trend and *RH*, *U, Rn* showed a decreasing trend from 1960 to 2019. *ET*<sup>0</sup> showed a decreasing trend with an average climate tendency rate of −8.69 mm/(10a). The annual average *ET*<sup>0</sup> change was negatively sensitive to *T*min and *RH*, and positively sensitive to *T*max, *T*mean, *U* and *Rn*. *U* was the dominant factor for the *ET*<sup>0</sup> decrease in spring and autumn, while *RH* and *T*min were the dominant factors in summer and winter, respectively. On the inter-annual scale, the sensitivity order of *ET*<sup>0</sup> change to each climatic factor was *U* > *RH* > *T*max > *Rn* > *T*min = *T*mean. These results indicated that the dominant factors for *ET*<sup>0</sup> changes were *U* in Heilongjiang Province. Moreover, the dominant factors for *ET*<sup>0</sup> change varies under different climatic and geographical conditions; impacts of human activities on *ET*<sup>0</sup> should also be considered in future studies.

**Author Contributions:** Conceptualization, T.N.; methodology, T.N. and R.Y.; software, R.Y.; validation, R.Y., S.L. X.Z. and Y.L.; formal analysis, T.N. and R.Y.; data curation, P.C. and T.L.; writing original draft preparation, T.N. and R.Y.; writing—review and editing, Z.G., C.D. (Chong Du), C.D. (Changlei Dai) and H.J.; visualization, R.Y.; supervision, T.N., Z.Z. and Z.G.; funding acquisition, T.N. and Z.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was fund by the Opening Project of Key Laboratory of Efficient Use of Agricultural Water Resources, Ministry of Agriculture and Rural Affairs of the People's Republic of China (number: AWR2021002), the Basic Scientific Research Fund of Heilongjiang Provincial Universities (number: 2021-KYYWF-0019), the National Key Research and Development Program of China (2021YFD1500802) and the National Natural Science Foundation Project of China (numbers: 51779046 & 52079028).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We thank the anonymous reviewers and the editors for their suggestions which substantially improved the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Optimal Irrigation under the Constraint of Water Resources for Winter Wheat in the North China Plain**

**Xiaoli Shi 1,2,3,\*, Wenjiao Shi 2,4,\*, Na Dai 1,5 and Minglei Wang 2,4**


**Abstract:** The North China Plain (NCP) has the largest groundwater depletion in the world, and it is also the major production area of winter wheat in China. For sustainable food production and sustainable use of irrigated groundwater, it is necessary to optimize the irrigation amount for winter wheat in the NCP. Previous studies on the optimal irrigation amount have less consideration of the groundwater constraint, which may result in the theoretical amount of optimal-irrigation exceeding the amount of regional irrigation availability. Based on the meteorological data, soil data, crop variety data, and field management data from field experimental stations of Tangshan, Huanghua, Luancheng, Huimin, Nangong, Ganyu, Shangqiu, Zhumadian and Shouxian, we simulated the variation of yield and water use efficiency (WUE) under different irrigation levels by using the CERES-Wheat model, and investigated the optimal irrigation amount for high yield (OIy), water saving (OIWUE), and the trade-off between high yield and water saving (OIt) of winter wheat in the NCP. Based on the water balance theory, we then calculated the irrigation availability, which was taken as the constraint to explore the optimal irrigation amount for winter wheat in the NCP. The results indicated that the OIy ranged from 80 mm to 240 mm, and the OIWUE was 17% to 67% less than OIy, ranging from 0 mm to 200 mm. The OIt was between 80 mm and 240 mm, realizing the co-benefits of high yield and water saving. Finally, we determined the optimal irrigation amount (62–240 mm) by the constraint of irrigation availability. Our results can provide a realistic and scientific reference for the security of both grain production and groundwater use in the NCP.

**Keywords:** optimal irrigation; sustainable irrigation; yield; water use efficiency; North China Plain

#### **1. Introduction**

As one of the major production areas, the North China Plain (NCP) produces 60–80% of the winter wheat in China [1]. For precipitation only meets 30–45% of the water requirement of winter wheat during the growing season, the increased yield has largely relied on unsustainable overuse of water resources in the NCP, especially groundwater [2]. Local farmers irrigate the winter wheat several times, resulting in the largest groundwater depletion in the world. Consequently, a series of eco-environmental problems occurred, such as land surface subsidence, seawater intrusion, streamflow depletion, wetlands, and ecological damages, which seriously affected the material basis of food production in the NCP [3]. Therefore, for sustainable food production and to relieve the pressure of groundwater exploitation, it is necessary to optimize the irrigation amount for winter wheat in the NCP.

**Citation:** Shi, X.; Shi, W.; Dai, N.; Wang, M. Optimal Irrigation under the Constraint of Water Resources for Winter Wheat in the North China Plain. *Agriculture* **2022**, *12*, 2057. https://doi.org/10.3390/ agriculture12122057

Academic Editor: David Maxwell Freebairn

Received: 4 November 2022 Accepted: 28 November 2022 Published: 30 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The main indicators of optimal irrigation amount are currently yield and water use efficiency (WUE). High yield is the direct objective of agricultural production and the most used indicator when evaluating the optimal irrigation amount. For example, Boumaza et al. [4] studied the optimal irrigation according to the maximum biomass production. Some studies applied the yield as the only indicator to assess the optimal irrigation amount, without considering the water utilization during crop growth, which caused high water consumption with high yield. Thus, some scholars combined yield and WUE to explore the optimal irrigation amount of crops. For example, based on the yield and WUE changes with irrigation, some researchers investigated the optimal irrigation amount for winter wheat in the Guanzhong Plain and the NCP respectively [5]. Compared with yield, the relationship between input water and output yield is considered in the WUE, which emphasizes the productivity of water resources, so the WUE is recommended as a common index to characterize water saving. However, most of the previous studies determined the optimal irrigation amount based on the variation patterns of yield and WUE with different irrigation levels, lacking a comparison with the water availability. Thus, the recommended optimal irrigation amount may be unsustainable if it exceeds the irrigation availability. In general, in addition to yield and WUE, the optimal irrigation amount should also focus on the water balance and the natural carrying capacity of the available water to truly realize water saving and sustainable food production.

Field experiments and model simulations are the main methods to investigate the optimal irrigation amount for crops. For example, based on a 16-year field experiment using seven irrigation schedules in the winter wheat-summer maize double cropping system, the optimal irrigation was determined in the Luancheng station of the NCP [6]. Field experiments can accurately evaluate the optimal irrigation; However, it is difficult to apply on a large spatial and temporal scale because of the differences in climate and soil conditions among regions [7]. In recent years, an increasing number of models have been employed in the evaluation of irrigation optimization [5,8]. The crop model can simulate the growth process of crops by setting different environmental parameters, which is a useful complement to the traditional field experiment and has good performance in regional applications [9]. Nevertheless, most of the current studies still simulated the relationship between indicators (yield, WUE) and irrigation level, lacking comparison with available irrigation water. In fact, the determination of optimal irrigation amount needs to account for both simulation results and constraints of water availability.

The objectives of the present analysis are: (1) to explore the optimal irrigation amount for high yield, water saving, and the trade-off between high yield and water saving of winter wheat in the NCP. (2) To calculate the irrigation availability in the NCP based on the water balance theory, and take it as the constraint to investigate the optimal irrigation amount for winter wheat in the NCP.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The NCP is located in the east of China (114◦–121◦ E, 32◦–40◦ N), with a total area of <sup>30</sup> × 106 ha. It covers the Beijing, Tianjin Municipalities, Shandong Province, most of Hebei and Henan Province, and northern Anhui and Jiangsu Province (Figure 1). Prevailing soils in NCP are formed of fluvial materials from the Yellow River, which is fertile and favorable for cultivation. The climate of the NCP is temperate and monsoonal, with rainy hot summers and dry cold winters. The average annual temperature varies from 14 ◦C to 15 ◦C, and the average annual precipitation decreases from the southwest to the northeast, varying from 500 mm to 900 mm. In addition, 70% of the precipitation is concentered in the growing season of maize, while only 30% of the precipitation is in the growing season of winter wheat [10]. Water resources are merely 456 m<sup>3</sup> yr−<sup>1</sup> per capita in NCP, which is below 1/7 of the national average and 1/24 of the world average [11]. As one of the most important agricultural areas of China, grain production, especially winter wheat, heavily relies on groundwater irrigation. With the continuous exploitation, the groundwater is

experiencing rapid depletion, with a rate of 1.66–2.76 cm yr−<sup>1</sup> during 2003–2020 in the NCP [12], a series of shallow groundwater depletion were formed in the pre-mountain plains, and relatively independent deep groundwater depletion appeared in the centralcoastal plains. In this study, we chose nine representative experimental stations (Tangshan, Huanghua, Luancheng, Huimin, Nangong, Ganyu, Shangqiu, Zhumadian and Shouxian) from the NCP to explore the optimal irrigation.

**Figure 1.** Location of the North China Plain.

#### *2.2. Data*

Meteorological data were obtained from the Chinese Meteorological Data Network (http://data.cma.cn/ accessed on 1 May 2020), including daily maximum temperature, daily minimum temperature, precipitation and sunlight hours from 1981 to 2017 (Figure 1). Based on the empirical formula developed by Angstrom [13], daily solar radiation was estimated by sunlight hours.

Soil data were obtained from the Chinese Soil Science Database (http://vdb3.soil.csdb. cn/ accessed on 1 May 2020), including soil type, color, soil depth, organic carbon content, soil texture, total nitrogen content, bulk density, and pH (Table 1), where organic carbon content was obtained by multiplying soil organic matter by a conversion factor of 0.58 [14].

**Table 1.** Soil parameters in the North China Plain.


Field management data mainly include observation of phenology and yield components of winter wheat at each station, as well as the field management measures, such

as varieties, irrigation, and fertilization. The field management data for 1982–2017 were from experiments conducted at the national agrometeorological stations, which are maintained by the Chinese Meteorological Administration. For the phenology, the dates of sowing (BBCH 00), emergence (BBCH 10), dormancy (start of dormancy), green up (end of dormancy), anthesis (BBCH 61) and maturity (BBCH 89) were recorded [15]. For the yield, the spikelet number, the infertile spikelet rate, the grain number, the thousand grain weight, and the plot grain yield were recorded. For irrigation and fertilization, the dates and quantities were recorded.

Groundwater data were obtained from the Atlas of Groundwater Resources and Environment in China [16], surface water resource availability and precipitation data were obtained from the Water Resources Reports of provinces and cities in the NCP, as well as the Water Resources Reports of the Haihe River, the Yellow River, and the Huaihe River from 2011 to 2014.

#### *2.3. Methods*

#### 2.3.1. DSSAT Model

DSSAT is one of the widely used crop models, and CERES-Wheat is the sub-model developed specifically for wheat [17]. The model runs on DSSAT-CSM (cropping system model) public platform as a module, and uses the meteorological and soil database and soil moisture, nitrogen, and carbon balance module to simulate wheat growth and development, yield formation, nitrogen-carbon-water balance process [17]. The simulation process involves light interception and photosynthesis, nutrient absorption and root activity, dry matter distribution, water absorption and transpiration, growth and respiration, leaf area growth, development and organ formation, senescence, field management measures, etc. There are four parts of data needed for the operation of the CERES–Wheat model, which are meteorological data, soil data, crop variety data, and field management data. This model has a complete consideration of soil water balance processes and physical mechanisms, it has been validated for the assessment of wheat production [18].

After the data were prepared, we calibrated the model by using field experimental data for the nine stations. After calibration, we validated the model by comparing the observed data on phenology and yield during the other years (Table 2).


**Table 2.** Experimental dataset for CERES-Wheat calibration and validation at nine stations.

Then we employed the Normalized Root Mean Square Error (NRMSE) and Consistency Index (D value) to check the agreement between observed and simulated values. The formulas are as follows:

$$\text{NRMSE} = \frac{\sqrt{\sum\_{i=1}^{n} (\mathbf{S}\_{i} - \mathbf{R}\_{i})^{2}/\mathbf{n}}}{\overline{\mathbf{R}}} \times 100\% \tag{1}$$

$$\mathbf{D} = 1 - \left[ \frac{\sum\_{i=1}^{n} (\mathbf{S}\_{\mathbf{i}} - \mathbf{R}\_{\mathbf{i}})^2}{\sum\_{i=1}^{n} \left( |\mathbf{S}\_{\mathbf{i}} + \overline{\mathbf{R}}| + |\mathbf{R}\_{\mathbf{i}} - \overline{\mathbf{R}}| \right)^2} \right] \tag{2}$$

where Si is the simulated value; Ri is the observed value; R is the average of observed values; n is the sample size. The simulation is excellent if the NRMSE is less than 10%; the simulation is generally accurate if the NRMSE is between 20% and 30%; the simulation is poor if the NRMSE is greater than 30% [19]. The D value ranges from 0 to 1, a value closer to 1 indicates perfect agreement between the observed value and simulated value, while a value closer to 0 indicates poor predictability [20].

After validation, we used the CERES-Wheat model to simulate the yield and WUE under different irrigation levels (from 40 mm to 320 mm, with an interval gradient of 40 mm) for 36 years (1982–2017) at nine representative stations in the NCP, to explore the optimal amount of irrigation for high yield and water saving of the winter wheat. For each station, the irrigation dates and frequency were set according to the observed irrigation management from experiments conducted at the nine agrometeorological stations, meanwhile, based on the constraint of the total amount of given irrigation levels, the irrigation amount was set according to the percentage of the observed irrigation management from experiments on the nine agrometeorological stations.

2.3.2. Optimal Irrigation Amount for High Yield of the Winter Wheat (OIy)

The yield was directly simulated by the CERES-Wheat model, which was set to be affected by irrigation only, the nutrients, pests, and other factors were not taken into account. The optimal irrigation amount for high yield was the irrigation value corresponding to the maximum yield of the winter wheat.

#### 2.3.3. Optimal Irrigation Amount for Water Saving of the Winter Wheat (OIWUE)

WUE was derived from the ratio of yield to evapotranspiration. It was used to evaluate optimum irrigation management to ensure the most efficient use of water resources. The formula is as follows:

$$\text{WUE} = \frac{\text{Y}}{\text{ET}} \tag{3}$$

where WUE is the water use efficiency; Y is the yield of winter wheat; ET is the evapotranspiration. The yield and evapotranspiration were based on the simulation from the CERES-Wheat model. The optimal irrigation amount for water saving was the irrigation value corresponding to the maximum WUE of winter wheat.

2.3.4. Optimal Irrigation Amount for the Trade-Off between High Yield and Water Saving of the Winter Wheat (OIt)

The optimal amount of irrigation for high yield was not necessarily equal to that value for water saving. The trade-off between high yield and water saving should be considered if we want to complete the two objectives simultaneously. Using the method of Zheng et al. [21], yield and WUE at different gradients of irrigation levels at each station were standardized, and the irrigation amount that achieved the maximum value of normalized yield and WUE was determined as the optimal irrigation amount for the trade-off between the high yield and water saving. The formula is as follows:

$$\lim\_{\mathbf{I}\to\mathbf{O}\_{\mathbf{I}}} \left( \frac{\mathbf{Y}}{\mathbf{Y}\_{\text{max}}} + \frac{\mathbf{W}\mathbf{U}\mathbf{E}}{\mathbf{W}\mathbf{U}\mathbf{E}\_{\text{max}}} \right) = 2 \tag{4}$$

where Y and WUE are the yield and WUE simulated by CERES-Wheat model under different irrigation gradients; Ymax and WUEmax are the maximum values of yield and WUE over the growing season at each station, respectively; OIt is the optimal irrigation amount for the trade-off between high yield and water saving of the winter wheat.

2.3.5. Optimal Irrigation Amount Constrained by the Irrigation Availability (OI)

The calculation of irrigation availability was referred to Lei et al. [22], with the following equations:

$$\mathbf{W}\_{\mathbf{i}} = \mathbf{W}\_{\mathbf{s}} + \mathbf{W}\_{\mathbf{g}} \times \mathbf{k} - \mathbf{W}\_{\mathbf{d}} \tag{5}$$

$$\mathcal{W}\_{\rm d} = \mathfrak{p} \times \mathcal{W}\_{\rm i} \tag{6}$$

where Wi is the available irrigation amount; Ws is the available surface water; Wg is the amount of exploitable coefficient groundwater; k is the proportion of agricultural water to groundwater; Wd is the duplication of surface water and groundwater; ρ is the exploitable coefficient, which is the ratio of exploitable groundwater to the total amount of groundwater; Wi is the infiltration. Since deep groundwater has few recharge sources, a long recovery period, and a very slow renewal rate, it can be regarded as a non-renewable resource. Therefore, the deep groundwater is not suitable for irrigation and not included in the calculation.

For the purpose of water saving and sustainable agricultural production, irrigation of winter wheat should use only the available irrigation without over-exploitation of groundwater. Taking irrigation availability as a constraint, the final optimal irrigation amount was determined by comparing the OIt with the available irrigation amount at the corresponding station. If the available irrigation amount is greater than or equal to the OIt, which indicates that the water resources can support the OIt to complete high yield and water saving simultaneously, then take the OIt as the final optimal irrigation amount. If the available irrigation amount is less than the OIt, then the production of winter wheat is irrigation constrained, the final optimal irrigation amount should be determined according to the yield and WUE changes with irrigation levels, as well as the available irrigation amount.

#### **3. Results**

#### *3.1. Validation of CERES-Wheat Model*

For the anthesis of winter wheat in the NCP, the simulated anthesis (from 172 days to 227 days after sowing) was very close to the observed values (from day 176 days to 223 days after sowing), the simulated anthesis agreed well with the observed anthesis with NRMSE of 1.51% and D value of 0.98 (Table 3). Similarly, simulated maturity (from 209 days to 257 days after sowing) was well matched with the observations (from 210 days to 257 days after sowing), with NRMSE of 0.95% and D value of 0.99, respectively. Although the NRMSE (14.89%) and D value (0.96) of the yield were not as high as the values of anthesis and maturity, the observed yield ranged from 1650 kg ha−<sup>1</sup> to 7395 kg ha−1, and the simulated yield ranged from 1030 kg ha−<sup>1</sup> to 7577 kg ha−<sup>1</sup> (Table 3), the model also exhibited the agreement between the simulation and observation. Overall, the stability and accuracy of the calibrated model were confirmed by the above evaluation, especially for the anthesis and maturity. The calibrated model can be applied to simulate the yield and WUE of winter wheat in response to irrigation management in the NCP.

**Table 3.** Validation of simulation on anthesis, maturity, and yield of winter wheat in the North China Plain.


Notes: The NRMSE is the Normalized Root Mean Square Error between the observed values and simulated values; D value is the Consistency Index between the observed values and simulated values; BBCH is the Biologische Bundesanstalt, Bundessortenamt, Chemische Industrie. This code is recommended for phenological observations.

#### *3.2. Optimal Irrigation Amount for High Yield of Winter Wheat (OIy)*

Low yield occurred in all stations with rainfed management, the range of no irrigation yields was from 2411 kg ha−<sup>1</sup> to 7679 kg ha−<sup>1</sup> (Figure 2). For each station in the NCP, yield exhibited an increasing trend with the increase in irrigation. However, when irrigation reached a certain level, the yield change leveled off, indicating that the high yield of winter wheat greatly relied on irrigation in the NCP (Figure 2). Conversely, excessive

water was not beneficial to yield increase due to the Law of Diminishing Returns. For the maximum yield of winter wheat, Nangong and Tangshan stations required more irrigation (240 mm) than other stations, the maximum yield could reach 10,917 kg ha−<sup>1</sup> and 9346 kg ha<sup>−</sup>1, respectively. Luancheng, Huimin and Huanghua stations required an equal amount of irrigation (160 mm), the maximum yields at the three stations were 8669 kg ha<sup>−</sup>1, 8334 kg ha−<sup>1</sup> and 5856 kg ha−1, respectively. Whereas the amounts of irrigation required at Ganyu and Zhumadian stations were relatively small (120 mm), and the highest yields were 8851 kg ha−<sup>1</sup> and 7333 kg ha<sup>−</sup>1, respectively. Shouxian and Shangqiu stations had the lowest optimal irrigation (80 mm) among the nine stations, their maximum yields were 8260 kg ha−<sup>1</sup> and 6912 kg ha<sup>−</sup>1, respectively.

**Figure 2.** Changes in winter wheat yield with irrigation levels in the North China Plain. (**a**) Tangshan, (**b**) Huanghua, (**c**) Luancheng, (**d**) Huimin, (**e**) Nangong, (**f**) Ganyu, (**g**) Shangqiu, (**h**) Zhumadian, (**i**) Shouxian.

Spatially, taking the Yellow River as the dividing line, the optimal irrigation amounts in the northern five stations (Tangshan, Huanghua, Luancheng, Huimin, and Nangong stations) were all greater than 160 mm (Figure 2). Nevertheless, the range of optimal irrigation amounts in the southern four stations (Ganyu, Shangqiu, Zhumadian, and Shouxian stations) was from 80 mm to 120 mm, the difference between the southern and northern stations probably due to the spatial variability of precipitation, which drove the variation in water demand for irrigation.

#### *3.3. Optimal Irrigation Amount for Water Saving of the Winter Wheat (OIWUE)*

For each station in the NCP, similarly to the trend of yield change, the WUE increased gradually with the irrigation level, and then leveled off, stopped rising, or even started to decline after reaching a certain threshold (Figure 3). WUE decreased with more irrigation because the increase in irrigation was greater as compared to the increase in yield. For the maximum WUE of winter wheat, Nangong and Tangshan stations demanded the highest irrigation amount in the region (200 mm), and their maximum WUEs were 2.4 kg m−<sup>3</sup> and 2.2 kg m<sup>−</sup>3, respectively. When the winter wheat acquired 160 mm irrigation in Huanghua station, its WUE reached the maximum (1.6 kg m<sup>−</sup>3). In addition, the irrigation amount was required for 120 mm in Luancheng, Huimin and Ganyu stations for the optimal WUE, the maximum WUE of winter wheat was 2.3 kg m<sup>−</sup>3, 2.3 kg m−<sup>3</sup> and 1.9 kg m<sup>−</sup>3, respectively. While the maximum WUE of winter wheat was 2.0 kg m−<sup>3</sup> and 1.9 kg m−<sup>3</sup> at Shangqiu and Zhumadian stations, which required relatively less irrigation, 80 mm and 40 mm, respectively.

**Figure 3.** Changes in water use efficiency of winter wheat with irrigation levels in the North China Plain. (**a**) Tangshan, (**b**) Huanghua, (**c**) Luancheng, (**d**) Huimin, (**e**) Nangong, (**f**) Ganyu, (**g**) Shangqiu, (**h**) Zhumadian, (**i**) Shouxian.

The optimal irrigation amount for water saving (OIWUE) had a similar spatial pattern to the optimal irrigation amount for high yield (OIy), OIWUE in the northern stations varied from 120 mm to 200 mm, while the OIWUE range in southern stations was between 0 mm and 120 mm. For the southern station, the abundant precipitation can satisfy the water demand of winter wheat, so the increase in irrigation amount significantly decreased the WUE.

#### *3.4. Optimal Irrigation Amount for the Trade-Off between High Yield and Water Saving (OIt)*

In terms of the relationship among OIy, OIWUE, and OIt in each station, values of the three types of optimal irrigation amount were coincidentally equal in Huanghua, Ganyu, and Shangqiu stations, thus, the OIt in Huanghua, Ganyu, and Shangqiu stations were 160 mm, 120 mm and 80 mm, respectively. With this irrigation management, these stations can complete the perfect combination of high yield and water saving (Figure 4). For the remaining six stations, the OIy values were all greater than the OIWUE values. After calculation, the OIt values of Tangshan station (240 mm), Luancheng station (160 mm), Huimin station (160 mm), Nangong station (240 mm), and Shouxian station (80 mm) were all equal to their own OIy. Thus, the trade-offs in these stations were at the expense of falling WUE to some extent. Nevertheless, Zhumadian station was a special case, for the OIt (80 mm) was between the OIy (120 mm) and the OIWUE (40 mm), its trade-off was the optimal configuration with losses of the maximum yield and WUE (Figure 4).

**Figure 4.** The completion percentage of the maximum yield and water use efficiency (WUE) based on the OIt.

In terms of the spatial pattern, the OIt values in the four southern stations (Ganyu, Shangqiu, Zhumadian and Shouxian stations) were all no greater than 120 mm, while the OIt values of the five northern stations (Tangshan, Huanghua, Luancheng, Huimin and Nangong stations) were all greater than these in the southern stations, varied from 160 mm to 240 mm (Figure 5). The OIt showed a similar spatial pattern to the OIy and OIWUE.

**Figure 5.** The optimal irrigation amount and the available irrigation amount of winter wheat in the North China Plain. Notes: OIy is the optimal irrigation amount for a high yield of winter wheat; OIWUE is the optimal irrigation amount for water saving of winter wheat; OIt is the optimal irrigation amount for the trade-off between high yield and water saving; AI is the available irrigation amount; OI is the final optimal irrigation amount constrained by the available irrigation amount.

Under the irrigation management of OIt, eight stations can maintain 100% of the maximum yield except for Zhumadian station, the loss percentage of yield in Zhumadian station was only 1.04%. In addition to the 100% of the maximum yield, Huanghua, Ganyu, and Shangqiu stations can simultaneously complete 100% of the maximum WUE. Besides, the completion percentages of the maximum WUE in the Nangong and Zhumadian stations were greater than 99%, and the values in Tangshan, Luancheng, and Huimin stations were all higher than 98%. Although Shouxian station had the highest loss percentage of the maximum WUE, it was still less than 2.5%. In conclusion, the OIt contributed to the best yield and WUE of winter wheat for all the nine stations in the NCP, the OIt can maximize the co-benefits of high yield and water saving to the greatest extent possible, it is a suitable choice for considering both yield and efficiency.

#### *3.5. Optimal Irrigation Amount Constrained by the Irrigation Availability (OI)*

Spatially, the irrigation availability in the NCP was generally high in the south and low in the north. Among these stations, Nangong and Huanghua stations had the least available irrigation amount (62 mm and 91 mm), because they are in the Heilonggang area, which has the largest groundwater depletion in the world. Tangshan station (267 mm) in the coastal plain of eastern Hebei Province, Ganyu station (273 mm) in the low plain of Xu-Huai of Jiangsu Province, and Shouxian station (300 mm) in the plain of northern Anhui Province had enough available water resources, they were all greater than 200 mm. The available irrigation amounts at the remaining stations (Zhumadian, Shangqiu, Luancheng, and Huimin stations) varied from 100 mm to 121 mm (Figure 6).

**Figure 6.** The irrigation availability of winter wheat in the North China Plain.

Compared with the OIt, the irrigation availability was abundant for the southern stations (Ganyu, Shangqiu, Zhumadian, and Shouxian stations) and Tangshan station, so the final optimal irrigation amount (OI) was equal to the OIt in the five stations (Figure 5), that is, the OI of the Tangshan and Ganyu stations was 240 mm and 120 mm, respectively, Shangqiu, Zhumadian, and Shouxian stations had the consistent value of OI (80 mm). Nevertheless, four stations in the northern region (Huanghua, Luancheng, Huimin, and Nangong stations) are now in the area of serious groundwater over-exploitation, the available irrigation amount was less than their own OIt, that is, the water resources cannot supply sufficient irrigation for winter wheat to complete the trade-off between high yield and water saving, so the final optimal irrigation should have binding constraints on water availability. After calculation, the Nangong and Huanghua stations had the lowest OI, 62 mm and 91 mm, respectively; Luancheng (120 mm) and Huimin (121 mm) stations had close values in final optimal irrigation. Due to the differences in groundwater constraints, there was no obvious spatial pattern of optimal irrigation amount.

With the final optimal irrigation amount, for the yield, Tangshan, Ganyu, Shangqiu, and Shouxian stations can complete 100% of the maximum yield; the completion percentages of Luancheng, Huimin, and Zhumadian stations were all greater than 96%; while the Nangong and Huanghua stations had the highest loss of yield, they can only complete 69.72% and 81.87% of the maximum yield, respectively (Figure 7). For the WUE, Luancheng, Huimin, Ganyu, and Shangqiu stations can obtain 100% of the maximum WUE, Nangong and Huanghua stations still had the lowest completion percentage (90.46% and 91.58%), completion percentages of the rest of stations were all greater than 97.5%. In general, with this irrigation management, Ganyu and Shangqiu stations can simultaneously complete 100% of yield and WUE optimization; Tangshan, Luancheng, Huimin, Shouxian, and Zhumadian stations had a slight loss in the yield or WUE. The above seven stations did not lose much yield or efficiency within the irrigation constraints. Nevertheless, the yield and WUE at Nangong and Huanghua stations were greatly limited by water availability. Consequently, appropriate fallowing or adjustment of cropping systems is recommended to protect the local water ecosystem.

**Figure 7.** The completion percentage of the maximum yield and water use efficiency (WUE) based on the OI.

#### **4. Discussion**

In the NCP, irrigation is of paramount importance to increasing productivity for winter wheat, investigation of the optimal irrigation amount should consider the natural carrying capacity of water resources. In this study, we simulated the variation of yield and WUE with different irrigation levels by using the CERES-Wheat model at nine representative stations in the NCP. Then we determined the optimal irrigation amount for high yield, water saving, and the trade-off between high yield and water saving of winter wheat. Subsequently, based on the equilibrium relationship between irrigation demand and the natural carrying capacity of water resources, we investigated the optimal irrigation amount

under the constraint of irrigation availability. The optimization of irrigation strategies is beneficial for wheat production and water conservation in the NCP.

Water is a dominant driver affecting yield, and it has been demonstrated that winter wheat yield had a non-linear relationship with irrigation amount, excessive irrigation did not contribute to a continuous increase in yield [23], and our results also supported that. The yield growth may be inhibited by the decrease in soil permeability due to excess water and the lack of oxygen in the root system of winter wheat. Aiming at a high yield, the optimal irrigation amount for winter wheat at nine stations in the NCP ranged from 80 mm to 240 mm. This was consistent with the field experimental results in Xinxiang station, which revealed the 175–180 mm optimal irrigation for a high yield of winter wheat in the NCP [24].

In addition to yield, we chose WUE to investigate the optimal irrigation amount for winter wheat. The WUE showed an increasing trend and then decreased with the irrigation increase. When the winter wheat is stressed by the water, WUE increases with irrigation increase, while irrigation exceeds a certain threshold, winter wheat evapotranspiration no longer changes significantly, and excessive irrigation will leach into the ground or form surface runoff. Taking WUE as the criterion, Zhang et al. [25] concluded that the optimal irrigation amount of winter wheat in the NCP ranged from 60 mm to 140 mm. Wang et al. [26] suggested that the optimal irrigation amount at Beijing station was between 192 mm and 245 mm. In the study of Ma et al. [27], the optimal irrigation amount for winter wheat in the NCP was between 60 mm and 300 mm, being 70–210 mm at the Luancheng station. Our finding also suggested that the optimal irrigation amount for winter wheat varied from 0 mm to 200 mm, with 120 mm at the Luancheng station.

The OIt aimed at the unification and optimization of maximum yield and WUE. It has been shown that the maximum reached by yield and WUE with irrigation level did not overlap [28]. In this study, we also concluded that WUE was maximized before yield. Compared to the OIy, the OIWUE was reduced by 17% to 67%, indicating the water-saving trend. After the trade-off, the OIt contributed to the best yield and WUE for winter wheat for all nine stations in the NCP, varying from 80 mm to 240 mm. Similar conclusions can be found in previous studies. Based on the APSIM model, Zheng et al. [21] investigated that the OIt ranged from 3 mm to 286 mm. Moreover, the OIt of the Luancheng station in another study (202 mm) also had a close value to our study (160 mm) [29].

We considered the sustainable use of groundwater as an upper bound for winter wheat irrigation in this study. Results showed that the available water resources for irrigation exhibited a distribution trend of high in the south and low in the north, revealing a consistent spatial pattern to Lei et al. [22]. With the rapid depletion of groundwater in the NCP, agricultural production has entered a dilemma of food security and water security. This study showed that the irrigation-constrained areas were mainly concentrated in the groundwater over-exploitation area (especially the Heilonggang area). For the regions suffering from water limitations, we should reduce the agricultural intensification, and carry out appropriate fallowing or adjustment of cropping systems to alleviate the groundwater crisis. Meanwhile, for the regions with abundant water resources for irrigation, which were mainly located in the northern Anhui Plain and the low Xu-Huai Plain in the southern part of the NCP, the potential production needs to be maximized to ameliorate the yield losses from the water-scarce regions [30].

Adjustment of the cropping systems is an effective measure to save valuable groundwater for sustainability in over-exploited aquifers. A study conducted by Davis et al. [31] showed that the optimization of cropping structure would save 12–14% of irrigation water at a global scale. Thus, the Chinese government launched the Seasonal Land Fallowing Policy in the NCP in 2014, designed to mitigate serious groundwater over-exploitation. Analysis showed that the policy reduced groundwater consumption and contributed to real water saving [32]. Besides, similar irrigation strategies had been adopted to mitigate groundwater depletion, for example, deficit irrigation [8], drip irrigation systems under the plastic mulch [33], micro-sprinkling [34], and adjustment of planting density [35]. A study

concluded that combining all agricultural management could reduce groundwater exploitation intensity by around 74.58% to 96.95%, resulting that groundwater could recover to the original health level nearly in the NCP [11].

In addition to promoting water conservation technologies and implementing the relevant water use policies from the demand side, on the supply side, it encourages water users to replace groundwater with surface water delivered by the central South-to-North Water Diversion, to alleviate the water stress and groundwater storage deficit in the NCP. A study conducted by Long et al. [36] found, within the context of climate variability and policy implications, water diverted to Beijing reduces cumulative groundwater depletion by 3.6 km3, accounting for 40% of total groundwater storage restoration between 2006 and 2018, meanwhile, increased precipitation and policy-induced contributed about 2.7 km3 (~30%) and 2.8 km<sup>3</sup> (~30%) to the groundwater storage recovery [36]. It proves the important role of South-to-North Water Diversion in groundwater restoration. Overall, more efforts need to be explored to save valuable groundwater for the sustainability of irrigated agriculture in the NCP.

Our analysis was subject to considerable uncertainties. Except for the irrigation management, the rest of the cultivation conditions were assumed as optimal in our study. In general, climate, soil, cultivation practices, pests, and diseases all have influences on the optimal irrigation amount of winter wheat. Thus, the optimal irrigation of winter wheat can be further analyzed based on specific considerations of regional differences in cultivation. Furthermore, the results of the optimal irrigation should be applied to the field for supporting evidence, in order to better guide practice for local farmers. In addition, the optimal irrigation amount can be determined by considering the greenhouse gas emissions [37], soil moisture, and carbon footprint [38], in combination with the precipitation pattern, to make more targeted recommendations in the future.

#### **5. Conclusions**

For the NCP, the OIy ranged from 80 mm to 240 mm, and the OIWUE varied from 0 mm to 200 mm. After the trade-off between yield and WUE, the OIt was between 80 mm and 240 mm, which maximized the co-benefit of high yield and water saving.

The available irrigation amount varied from 62 mm to 300 mm in the NCP. Generally, southern stations had higher water availability than northern stations, and stations located in the Heilonggang area (Nangong and Huanghua stations) had the lowest irrigation availability. As the region has the most severe groundwater depletion in the world, the optimal irrigation amount needs to be constrained by water availability. Based on the equilibrium relationship between irrigation and the natural carrying capacity of water resources, we determined the final optimal irrigation amount (62–240 mm) by the irrigation availability constraint. Yield and WUE were greatly affected in Huanghua and Nangong stations with the final optimal irrigation amount; however, the rest stations can maintain more than 96% of the maximum yield and 97.5% of the maximum WUE and complete the optimization of both high productivity and water saving.

For the water-scarce regions, the irrigation availability cannot support the optimization of yield and WUE, and it is recommended to moderately fallow, deficit irrigation, or implement appropriate cropping system adjustments, as well as use alternative water resources to ensure water security. Meanwhile, we should strengthen the production potential of the southern part of the NCP and appropriately consider the global wheat trade market for capacity substitution to maintain the sustainable use of regional water resources and grain production.

**Author Contributions:** Conceptualization, X.S. and W.S.; methodology, X.S. and W.S.; software, N.D. and M.W.; validation, N.D. and M.W.; visualization, N.D. and M.W.; writing—original draft preparation, X.S.; writing—review and editing, W.S.; supervision, X.S. and W.S.; funding acquisition, W.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Program of China (2022YFB3903504), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA20010202 and No. XDA23100202), and the Science Fund for Creative Research Groups of the National Natural Science Foundation of China (Grant No. 72221002).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Assessing Drought, Flood, and High Temperature Disasters during Sugarcane Growth Stages in Southern China**

**Pei Yao 1,†, Long Qian 2,\*,†, Zhaolin Wang 1, Huayue Meng <sup>1</sup> and Xueliang Ju 3,4**


† These authors contributed equally to this work.

**Abstract:** As a globally important sugarcane-producing region, Southern China (SC) is severely affected by various agrometeorological disasters. This study aimed to comprehensively assess multiple sugarcane agrometeorological disasters with regards to sugarcane yield in SC. The standardized precipitation evapotranspiration index and the heat degree-days were employed to characterize drought, flood, and high temperature (HT) during sugarcane growth stages in three provinces in SC in the period 1970–2020. Moreover, the relationships between sugarcane climatic yield and disaster intensities were investigated. The results indicated that the most recent decade witnessed the most intensive sugarcane agrometeorological disasters; sugarcane drought and HT intensities significantly (*p* < 0.05) increased in one and two provinces, respectively. Central and western SC was most drought-prone, while eastern SC was most flood-prone; sugarcane HT was concentrated in southwestern SC. The mature stage exhibited the greatest monthly intensities of drought and flood; the most HT-prone growth stage varied with provinces. The relationships between drought/flood intensity and sugarcane climatic yield were significant in seven districts; the yield-reducing effect of sugarcane flood was more obvious than that of drought. In conclusion, this study provides references for agrometeorological disaster risk reduction for sugarcane in SC.

**Keywords:** sugarcane; heat stress; SPEI; waterlogging; climate change; growth stage; climatic yield

#### **1. Introduction**

Agriculture is sensitive to climatic environments, and the increasing levels of climate change are having profound effects on agricultural crops from various aspects. Sugarcane is one of the most important economic crops in the world; to date, its response and adaptation to climate change have been extensively investigated in different regions around the world [1,2]. Among the effects of climate change on sugarcane production, some are considered positive, such as elevated CO2 concentration and increased air temperature, which could benefit sugarcane yield [3,4]; however, some are negative and can severely restrict sugarcane production. Such negative effects mainly refer to extreme weather events, e.g., drought, flood, and high temperatures [2,5–7]. Due to the long life cycle of sugarcane [8], its growth and yields are objectively affected by drought and flooding stresses [9–12], as well as high temperatures [13,14]. Given these concepts, it is considered that the most challenging problems induced by climate change for sugarcane production are extreme meteorological disasters [2]. Hence, from the perspective of the sustainability of sugarcane production, it is meaningful to reveal the characteristics of agrometeorological disasters occurring during sugarcane growth stages.

China is the third largest sugarcane-producing country in the world, following Brazil and India [14,15]. In China, sugarcane-producing areas are concentrated in southern China (SC for short), since the warm climate in SC is suitable for the growth and development of

**Citation:** Yao, P.; Qian, L.; Wang, Z.; Meng, H.; Ju, X. Assessing Drought, Flood, and High Temperature Disasters during Sugarcane Growth Stages in Southern China. *Agriculture* **2022**, *12*, 2117. https://doi.org/ 10.3390/agriculture12122117

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 21 October 2022 Accepted: 7 December 2022 Published: 9 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

such tropical plants. In SC, the growing seasons of sugarcane are generally from March to December; during this long life-cycle, typical meteorological disasters in SC—including drought, flood, and high temperature—frequently occur during sugarcane growing seasons. Drought and flood are extensively distributed abiotic stresses for most crops; for sugarcane crops, a series of sugarcane field experiments have demonstrated that soil water deficit and excessive water (induced by drought and flood) can significantly reduce the growth and yield of sugarcane [16,17]. In addition to flood and drought, HT is also common in summer in SC. Although sugarcane crops are relatively tolerant to HT, the growth and yield of sugarcane is affected when air temperatures are above the upper limit of its optimal temperature range [18–20]. In conclusion, for sugarcane in SC, it is necessary to study the impacts of drought, flood, and HT during sugarcane growth stages [21].

Currently, at the regional scale, meteorological indices are powerful tools for evaluating the impacts of agrometeorological disasters on crops. Moreover, many studies employing these tools have further examined the relationships between meteorological disaster intensities and crop yield fluctuations. For agricultural drought and flood, various precipitation-based indices, e.g., the renowned SPI and SPEI, have been employed to quantify the water conditions during crop growing seasons; moreover, the drought and flood intensities characterized by meteorological indices have been found to be significantly related to the climatic yield of various crops, including rice, corn, wheat, and cotton in different regions [22–27]. For HT, many temperature-based indices, such as heat degreedays, have been applied to study the impacts of HT on the yields of various crops, e.g., maize, rice, and wheat [28–30]. However, relevant exploration regarding sugarcane crops has been insufficiently performed. For sugarcane in SC, as drought is the most noticeable agrometeorological disaster, a previous study revealed the spatial-temporal variations of drought during sugarcane growth stages in Guangxi province (the largest sugarcaneproducing province in SC) by using the SPEI [25]. However, in addition to drought, many other disasters—such as flood and HT—also severely threaten sugarcane yield in SC; so far, comprehensive assessment research accounting for multiple agrometeorological disasters during sugarcane growth stages in SC is still lacking. More importantly, the potential effects of drought, flood, and HT on sugarcane yield fluctuations have not been explored on a regional scale.

Given these concepts, the primary aims of the present work were to reveal the spatialtemporal characteristics of drought, flood, and HT during sugarcane growth stages in SC, and further, to investigate the relations between the disaster intensities and sugarcane yield fluctuations. The obtained results can provide guidance for guaranteeing a high yield of sugarcane in SC in future climates.

#### **2. Materials and Methods**

#### *2.1. Study Region*

The basic information on sugarcane production in China is illustrated in Figure 1. It is apparent that Southern China (SC) is the dominating sugarcane-producing region in the country. In particular, Guangxi, Guangdong, and Yunnan provinces, i.e., three major provinces in SC lying between 20–29◦ N and 97–117◦ E (Figure 2), produce approximately 90% of the total sugarcane yield in China [31]; in recent years, they have produced over one hundred million tons of sugarcane per annum. Thus, this study took these three provinces as the study regions. SC has tropical and subtropical climate characteristics; local water and heat resources are abundant, providing ideal conditions for the growth of tropical crops, such as sugarcane.

#### *2.2. Data Collection*

The study period was 1970–2020. The employed meteorological data, mainly including daily air temperature, precipitation, wind speed, and relative humidity from 1970 to 2020, were collected from the national meteorological state of China (http://data.cma.cn, accessed on 5 August 2021). Up to 81 national level meteorological stations distributed in SC that had consistent meteorological data were employed.

**Figure 1.** Basic information on Chinese sugarcane production over the last decade. (**a**,**b**) refer to sugarcane yield (10 kt) and planting areas (1000 ha), respectively.

**Figure 2.** Description of the study region. The colored area on the left-side subgraph indicates southern China, including Yunnan, Guangxi, and Guangdong provinces. The points on the right-side subgraph indicate the national level meteorological stations that were employed in this work.

The whole growing season of sugarcane can be generally divided into four growth stages, i.e., seedling stage, tillering stage, stem elongation stage, and mature stage [25]. Since monthly indices were calculated to describe the drought and flood conditions at different sugarcane growth stages, the specific months for each growth stage were first determined. Depending on a series of existing reports concerning sugarcane in SC [25,32,33], the months of each sugarcane growth stage were determined in each province before computing the corresponding monthly SPEI. Resultingly, in Guangxi and Guangdong provinces, the seedling stage lasts from March to April, the tillering stage lasts from May to June, the stem elongation stage lasts from June to October, and the mature stage is from November to December. For Yunnan province, the seedling stage is from March to May, the tillering stage is from June to July, the stem elongation stage is from July to November, and the mature stage is in December.

The annual observed yield of sugarcane (kg/ha) in every district in SC was obtained from the provincial statistical materials that were accessible to us, including Guangdong rural year books (1992–2020), Guangxi year books (2001–2020), and Yunnan year books (1991–2020).

#### *2.3. Study Method*

The methodological diagram of this study is shown in Figure 3. First, basic meteorological data were employed to compute the drought and flood index (i.e., SPEI) and the heat index (i.e., HDD) in our study regions. Accordingly, the intensities of drought, flood, and high temperature disasters during four major sugarcane growth stages were quantified, and the spatial and temporal characteristics of these sugarcane agro-meteorological disasters were revealed. In addition, annual sugarcane yield data at different areas were collected; then, they were detrended, and the climatic yields of sugarcane were obtained. Afterwards, the variations in the actual sugarcane yield and the climatic yield were analyzed, respectively. Finally, to explore the potential impacts of agrometeorological disasters on sugarcane yield, the relationships between sugarcane climatic yield and the intensities of sugarcane drought, flood, and high temperature were examined.

**Figure 3.** Methodological diagram.

2.3.1. Meteorological Indices for Drought, Flood, and HT

SPEI is a highly recognized index appropriate for agriculture drought and flood monitoring [22,23,26]. In essence, the SPEI describes water conditions based on the difference between precipitation and potential evapotranspiration. In accordance with actual requirements, the SPEI can be computed at month-scale, season-scale, and year-scale. In the present work, monthly SPEI was chosen to quantify the water conditions during each sugarcane growth stage.

First, the difference (D) between monthly precipitation P and monthly PET was computed:

$$\mathbf{D}\_{\mathbf{i}} = \mathbf{P}\_{\mathbf{i}} - \mathbf{PET}\_{\mathbf{i}} \tag{1}$$

where Pi and PETi were precipitation (mm) and PET (mm) in month i, respectively. In this study, PET was calculated using the Penman–Monteith method, which was recommended by FAO due to its solid physical bases.

Afterwards, the Di series over the study period was fitted by the probability density distribution of three-parameter log-logistic function F(x). Finally, the monthly SPEI values were obtained from the standardized values of F(x). Detailed instructions for the theories and calculating processes of SPEI can be found in original documents [34]. According to a series of previous studies using the SPEI to identify agricultural drought and flood conditions [26,27,35–37], SPEI < −0.5 and SPEI > 0.5 were set as the criterion for identifying the drought and flood months, respectively. Since sugarcane growth stages last for several months, a growth stage often witnesses both sugarcane drought and flood conditions. To individually characterize sugarcane drought and flood intensities during the growth stages, we employed an accumulative index derived from a simple and commonly used waterlogging index named SEW30 (sum of excess water table within 30 cm soil profile) [38]. SEW30

accumulates the parts of excessive water tables over a crop growth period; accordingly, our employed accumulative index, called SESPEI (sum of excessive SPEI), accumulates the parts of excessive SPEI (relative to the drought and flood threshold) over a crop growth period. SESPEI was designed for evaluating regional drought and flood intensities during the crop growth stages [22,37].

(1) For sugarcane drought:

$$\text{SESPEI}\_{\text{DR}} = \begin{cases} \sum\_{i=1}^{n} (|\text{SPEI}\_i| - 0.5) & \text{SPEI}\_i < -0.5\\ 0 & \text{SPEI}\_i \ge -0.5 \end{cases} \tag{2}$$

(2) For sugarcane flood:

$$\text{SESPEI}\_{\text{FL}} = \begin{cases} \sum\_{i=1}^{n} (\text{SPEI}\_{\text{i}} - 0.5) & \text{SPEI}\_{\text{i}} > 0.5\\ 0 & \text{SPEI}\_{\text{i}} \le 0.5 \end{cases} \tag{3}$$

where SESPEIDR and SESPEIFL indicate drought and flood intensities over the calculation stage, respectively. n is the number of months across the calculation stage. SPEIi indicates the SPEI value in month i. The −0.5 and 0.5 refer to the thresholds value for drought and flood in the SPEI, respectively.

To quantify sugarcane HT, we employed the heat degree-days index (HDD), which is a simple heat index accounting for both heat duration and heat intensity over a given period [29]. In describing the overall impacts of disasters over a given period, the calculation considerations of the HDD are similar to that of the abovementioned SEWx; thus, the HDD was calculated as:

$$\text{HDD} = \begin{cases} \sum\_{i=1}^{n} (\mathbf{T}\_{\text{max,i}} - \mathbf{T}\_{\text{h}}) & \mathbf{T}\_{\text{max,i}} > \mathbf{T}\_{\text{h}}\\ 0 & \mathbf{T}\_{\text{max,i}} \le \mathbf{T}\_{\text{h}} \end{cases} \tag{4}$$

where Tmax,i is the daily maximum temperature on day i, Th is the threshold temperature for crop HT stress. n is the number of days of the calculation stage. Since sugarcane is a tropical crop, the upper temperature threshold for its normal growth is higher than that for many other crops, e.g., 30~35 ◦C [27,39]. Resultingly, in this paper, the Th of sugarcane was set to 38 °C based on previous research regarding sugarcane heat stress [18,40].

It Is noted that the length of the growth stages influenced the accumulative intensities of drought/flood/HT over the given growth stages. Hence, when comparing the disaster intensities at different growth stages in this study, the accumulative disaster intensity of each growth stage was divided by the months of this growth stage. In this way, the disaster intensities (i.e., monthly intensity of disasters) between different growth stages were compared more fairly.

#### 2.3.2. Spatial-Temporal Characteristics of Sugarcane Agrometeorological Disasters

The linear trend method was performed to detect the changing tendency of the intensity of different agrometeorological disasters:

$$\mathbf{y} = \mathbf{k}\mathbf{x} + \mathbf{b} \tag{5}$$

where y indicates the years of calculation period, x is the examined disaster intensity index, e.g., SESPEI. K is the regression coefficient which represents the climate inclination rate of the disaster. K > 0 andk<0 indicate the upward and downward trends, respectively. A significant regression result (*p* < 0.05) indicated that the disaster intensity significantly changed over the years.

For the spatial characteristic analysis, the ArcGIS software (version 10.2; ESRI, Redlands, CA, USA) was applied to illustrate the spatial distribution of the intensities of sugarcane drought, flood, and HT during different sugarcane growth stages. The disaster

intensity indices, including SESPEIDR, SESPEIFL, and HDD, were first calculated yearly for each station and then averaged over the study period (1970–2020); afterwards, these results were spatially interpolated for the whole study region for the spatial analysis. The kriging method was used for interpolation, and the spatial resolution of computations was 4.2 km × 4.2 km.

#### 2.3.3. Sugarcane Climatic Yield and Its Relations to Meteorological Disaster Intensities

The time series of crop yield can be primarily divided into two parts, i.e., trend yield and detrended yield (also known as the crop-restricting climatic yield—climatic yield for short in this study). The trend yield was determined by non-climatic factors, such as advances in agricultural technology and improvements in field management. Currently, there are various methods for detrending crop yield, but it should be noted that these methods cannot fully remove the influence of external factors. For the present work, detrending yield is not a research issue. Hence, we selected the quadratic polynomial [22,23,26] to detrend sugarcane yield, since this commonly used method can capture the non-linear trend of the time series of a crop yield. On the other hand, the climatic yield was determined by climatic factors, such as precipitation (relevant to drought and flood disasters) and air temperatures (relevant to high temperature disasters). Hence, the climatic yield of sugarcane (Ycl, kg/ha) was calculated as the difference between the actual sugarcane yield (Yact, kg/ha, which refers to the abovementioned annual observed yield derived from provincial year books), and the trend yield (Ytr, kg/ha)

$$\mathbf{Y}\_{\rm cl} = \mathbf{Y}\_{\rm act} - \mathbf{Y}\_{\rm tr} \tag{6}$$

When the sugarcane climatic yield was obtained, its relations to the intensities of sugarcane drought, flood, and HT were investigated by performing a Pearson correlation analysis. Considering that the final sugarcane yield was affected by the combined impacts from the agrometeorological disasters occurring during various growth stages, we related the sugarcane climatic yield to the accumulative SESPEIDR/SESPEIFL/HDD over the whole sugarcane growth period. It should be noted that the calculation data were preprocessed to be more representative before the correlation analysis was performed. In particular, both sugarcane drought and flood disasters are common in SC; hence, when investigating the relationships between drought intensity (or flood intensity) and sugarcane climatic yield, it is meaningful to minimize the influence of flood (or drought). According to a previous drought-relevant paper [23], we considered calculating the years with the middle 40 percent of SESPEIDR/SESPEIFL/HDD values as "near-normal conditions"; as a result, the years with the lowest 30 percent of the index values were excluded to reduce the influence of other influential factors. Taking drought as an example, the years with the top 30 percent and the middle 40 percent of SESPEIDR values referred to "dry conditions" and "near normal conditions", respectively; thus, the remaining years (with the lowest 30 percent of SESPEIDR) were excluded from correlation analysis because they were relatively flood-prone.

#### **3. Results**

#### *3.1. Spatial-Temporal Characteristics of Sugarcane Drought, Flood, and HT*

#### 3.1.1. Temporal Trends

As displayed in Figure 4a, over the past five decades, the intensities of sugarcane drought have increased in Yunnan and Guangdong but decreased in Guangxi; more importantly, a significant (*p* < 0.05) increasing trend of drought intensity was detected in Yunnan (the SESPEIDR increased by 0.132 per decade). For sugarcane flood (Figure 4b), the intensities did not show significant trends in any province; additionally, the flood intensity in Guangdong was slightly higher than that in the other provinces. Compared with sugarcane drought and flood, sugarcane HT's intensity exhibited a more obviously increasing trend (Figure 4c). HT intensities in Yunnan and Guangdong significantly increased; in particular, Yunnan saw an obvious increase in sugarcane HT (the HDD increased by 0.466 per decade). In general, the temporal trends of sugarcane drought and flood were relatively unobvious, except a significant increasing trend of drought intensity in Yunnan. In contrast, the sugarcane HT showed obvious increasing trends, and it was the most enhanced agrometeorological disaster over the past five decades in SC.

#### 3.1.2. Interdecadal Analysis

The intensities of sugarcane drought, flood, and HT during different decades are displayed in Figure 5. Regarding sugarcane drought, the most recent two decades, i.e., the 2000s and 2010s, witnessed more intensive droughts than before. In the 2000s, sugarcane drought was intensive in Guangdong and Guangxi; additionally, in the 2010s, drought intensity in Yunnan reached a historic high. Generally, the most sugarcane drought-prone decade in SC was the 2000s. For sugarcane flood, the most recent decade (2010s) was apparently the most flood-prone decade for Guangdong and Guangxi. However, in the 2010s, the sugarcane flood intensity in Yunnan was at a historic low; therefore, for Yunnan, the most recent decade was severely affected by drought (Figure 5a) and slightly affected by flood (Figure 5b). Finally, for sugarcane HT, the most recent decade was the most HTprone decade, which was in accordance with previous temporal trend results (Figure 5c). In particular, Yunnan showed a dramatically increasing tendency of sugarcane HT. In conclusion, taking drought, flood, and HT into overall consideration, the most recent decade was most affected by agrometeorological disasters for sugarcane in SC.

#### 3.1.3. Spatial Characteristic Analysis

As depicted in Figure 6, at the seedling stage, drought-prone areas were concentrated in southwestern SC, i.e., Yunnan. Afterwards, during the following stage (i.e., tillering stage), the high-prone areas gradually expanded to all the three provinces, including western SC (Yunnan), central SC (western Guangxi) and eastern SC (eastern Guangdong). During the stem elongation stage, the high-prone areas were central SC (western Guangxi) and eastern SC (central Guangdong). Finally, at the mature stage, the high-prone area became northwestern SC (northern Yunnan). Hence, the drought-prone areas varied greatly with the growth stages. For sugarcane flood (Figure 6b1–b4), the high-prone areas at the initial stage were in eastern SC (Yunnan), which was generally similar to the drought-prone areas during this period. Nevertheless, during the remaining three growth stages, the distribution of flood-prone areas (Figure 6b2–b4) was totally different from that of droughtprone areas (Figure 6a2–a4). As for sugarcane HT (Figure 6c1–c4), during the first two growth stages, the high-prone areas were concentrated in southern Yunnan (Figure 6c1,c2). Then, at the stem elongation stage (Figure 6c3), the intensity of sugarcane HT reduced but the HT-prone areas became extensive, covering most parts of Yunnan and Guangxi, and northern Guangdong. At the mature growth stage, which corresponded to local winter seasons, HT did not occur.

**Figure 4.** *Cont*.

**Figure 4.** Temporal trends of drought (**a**), flood (**b**), and high temperatures (**c**) during sugarcane growth stages in Guangdong, Guangxi, and Yunnan provinces. The calculation sample size of every regression model is 51 (from 1971 to 2020). \* and \*\*\* indicate *p* < 0.05 and *p* < 0.001. The regression model in Guangxi is significant at *p* < 0.10.

In summary, the spatial distribution of the drought-prone and flood-prone areas varied greatly with the sugarcane growth stages; during critical growth stages, the drought-prone and flood-prone areas were quite different. Additionally, HT-prone areas were consistently concentrated in southern Yunnan during different growth stages.

Figure 7 illustrates the spatial distribution of sugarcane drought, flood, and HT over the entire sugarcane growing season. The spatial distributions of sugarcane drought (Figure 7a) and flood conditions (Figure 7b) were quite different; this sharp difference was also found in Figure 6. The former was concentrated in western and central SC, including Yunnan and western Guangxi; the latter was concentrated in eastern SC, including eastern Guangdong. For sugarcane HT, the most affected areas were concentrated in Yunnan.

#### 3.1.4. Inter-Growth-Stage Distribution of Sugarcane Drought, Flood, and HT

The comparison of the monthly intensities of disasters at different growth stages is illustrated in Figure 8. It was found that the mature stage was the period most affected by drought and flood (Figure 8a,b) in Guangdong and Guangxi. Meanwhile, in these two provinces, the tillering stage had the lowest monthly intensities of drought and flood. Finally, for sugarcane HT, the greatest monthly intensity in Guangdong and Guangxi was found at the stem elongation stage. In comparison, sugarcane HT was most intensive at the seedling stage in Yunnan (Figure 8c) which was mainly because a few HT-intensive places in Yunnan (i.e., Huaping station and Yuanjiang station) suffered severe HT during the seedling stage.

**Figure 5.** The intensities of drought (**a**), flood (**b**), and high temperature (**c**) during sugarcane growth stages in Guangdong, Guangxi, and Yunnan provinces. × and — in the boxes represent the mean and median values, respectively.

**Figure 6.** Spatial distribution of the monthly intensities of drought (**a1**–**a4**), flood (**b1**–**b4**), and high temperatures (**c1**–**c4**) during the seedling (**a1**,**b1**,**c1**), tillering (**a2**,**b2**,**c2**), stem elongation (**a3**,**b3**,**c3**), and mature stages (**a4**,**b4**,**c4**) of sugarcane in southern China. The intensities were averaged over 1970–2020 and displayed here for spatial analysis.

#### *3.2. Sugarcane Yield Variations over the Past Few Decades*

Figure 9a displays the variations in sugarcane yield (kg/ha) in the three provinces in SC. It is obvious that over the past few decades, sugarcane yield in SC has increased significantly (*p* < 0.001). In particular, Guangdong and Guangxi have witnessed a highly increasing rate of sugarcane yield (increased 801.94 and 867.27 kg/ha per year). These results suggest that sugarcane yield in SC maintained a sustainable increase. The obtained sugarcane climatic yield after detrending the sugarcane yield is displayed in Figure 9b. As illustrated by the negative values of the sugarcane climatic yield, the periods witnessing the most severe sugarcane yield losses in the three provinces were different; most severe sugarcane yield losses in Guangdong, Guangxi, and Yunnan occurred around the years 2017, 2011, and 2006, respectively. In terms of SC, the years around 2000 were the only period during which all three provinces simultaneously suffered severe losses in sugarcane yield. According to Figure 5b, around 2000, sugarcane flood intensity in SC reached a high level, contributing to severe sugarcane yield losses during that period. During the most recent decade, sugarcane climatic yield varied greatly among the three provinces. In Yunnan, sugarcane climatic yield was always near zero, indicating that the sugarcane yield was slightly influenced by agrometeorological disasters. In Guangxi, the sugarcane climate yield reached a historic low in the 2010s and then continued to increase. However, an opposite trend was found in Guangdong; its sugarcane climatic yield was high during 2010–2015, but then decreased sharply and continued to maintain low levels.

**Figure 7.** Spatial distribution of the monthly intensities of drought (**a**), flood (**b**), and high temperatures (**c**) during the entire sugarcane growing season in southern China. The intensities were averaged over 1970–2020 and displayed here for spatial analysis.

**Figure 8.** The monthly intensities of drought (**a**), flood (**b**), and HT (**c**) during different sugarcane growth stages in southern China. × and — in the boxes represent the mean and median values, respectively.

To analyze the spatial characteristic of sugarcane climatic yield in SC, the negative parts of sugarcane climatic yield, which describe yield losses, were calculated during the 1990s, 2000s, and 2010s (Figure 10a–d). It was noted that northwestern SC (including northern Yunnan) and eastern SC (including eastern Guangdong) were consistently low-climaticyielding regions in different decades; thus, sugarcane yields in these regions were greatly affected by agrometeorological disasters (Figure 10d). In addition, the differences of climatic yield between districts were greater in Guangdong than in the other two provinces. We also mapped the spatial distribution of the actual yield of sugarcane per year (kg/ha/year), and the result is shown in Figure 10e. It was found that high-yielding areas were concentrated in eastern SC, mainly including eastern Guangxi and western Guangdong. Moreover, in most high-yielding areas, the corresponding climatic yield was also generally high, implying strong risk-resistant abilities.

**Figure 9.** The actual yield (**a**) and climatic yield (**b**) of sugarcane in Guangdong, Guangxi, and Yunnan provinces. \*\*\* indicates *p* < 0.001.

#### *3.3. The Relationships between Sugarcane Climatic Yield and Agrometeorological Disaster Intensity*

As shown in Figure 11a, for sugarcane drought, the relationships between drought intensity and sugarcane climatic yield were found to be significant in four districts (i.e., Meizhou and Dongguan in Guangdong, and Beihai and Laibin in Yunnan). Two of them (in Dongguan and Laibin) were negative and reached the significance of *p* < 0.01, indicating a yield-reducing effect of sugarcane drought. However, the remaining two significant results were positive (in Meizhou and Beihai), implying positive relationships between sugar drought intensity and climatic yield. In Guangxi and Yunnan, the number of negative correlations was more common than positive correlations; however, more positive results existed in Guangdong. Therefore, the effects of drought on sugarcane yield were not simply negative at the reginal scale. Differing from sugarcane drought, sugarcane flood intensity was majorly negatively related to sugarcane climatic yield in all the three provinces (Figure 11b). Moreover, there were three significant results (Huizhou, Puer, and Beihai; Beihai was significant at *p* < 0.01)—all of them negative—which demonstrated the obvious yield-reducing effect of sugarcane flood. For sugarcane HT (Figure 11c), the relationships of HT intensity vs. sugarcane climatic yield were generally weak, and no significant relationships were found.

**Figure 10.** Sugarcane climatic yield (kg/hg/year) during 1990s (**a**), 2000s (**b**), 2010s (**c**), and 2000s– 2010s (**d**), as well as actual sugarcane yield during 2000s–2010s (**e**) in southern China. Negative values of sugarcane climatic yield were used here to describe sugarcane yield losses. Guangxi has no yield data in the 1990s, so only Yunnan and Guangdong were included in Figure 10a.

**Figure 11.** Correlation coefficients between sugarcane climatic yield and the intensities of drought (**a**), flood (**b**), and HT (**c**) in southern China. The black dotted boxes indicate *p* < 0.05. \* and \*\* indicate *p* < 0.05 and *p* < 0.01, respectively.

#### **4. Discussion**

#### *4.1. Spatial-Temporal Characteristics of Sugarcane Drought and Flood in SC*

Drought and flood are widely distributed agrometeorological disasters with severe impacts on agricultural crops. Our results (Figure 4a) indicated that over the past five decades in SC, the only significant trend of sugarcane drought and flood was an upward trend in Yunnan. This result is consistent with a previous report [41], in which a noticeable drying trend was found in Yunnan over the past six decades. In addition, our results (Figure 7a) also pointed out that Yunnan was obviously the most drought-prone region in SC. Following Yunnan, western and central Guangxi were also at a relatively high risk of sugarcane drought. These spatial characteristics generally matched with a drought risk map constructed in a previous drought-relevant report in SC [42].

From the perspective of interdecadal difference, our results (Figure 5a) suggested that the 2000s witnessed more severe sugarcane drought than the other decades in SC. This finding is consistent with a previous study concerning agricultural drought in SC [43]; in that work, agricultural drought was found to be more intensive in the 2000s than in the other decades. Moreover, our results of sugarcane climatic yield (Figure 9b) showed that the 2000s was the only historic period during which all the provinces in SC suffered severe sugarcane yield losses (as demonstrated by low climatic yield), which can be explained by intensive flooding (Figure 5b) during this decade.

In terms of the sugarcane growth stages, the stem elongation stage was found to be the period during which sugarcane drought and flood were most intensive in a year. A similar finding was observed in another study on soil droughts in Guangxi [44]; that report indicated that soil droughts occurred frequently during autumn and winter seasons in Guangxi, which mainly corresponded to the stem elongation stage. However, when the influences of growth stage length were considered, the mature stage had the greatest probability of drought and flood occurrence per month; by contrast, the tillering stage had the minimum probability, which has been reported in a relevant study that focused on sugarcane drought in Guangxi [25].

Since drought and flood were quantified by the same standardized index (i.e., SPEI) in this paper, the SPEI-based drought intensity (SPEI < −0.5) and flood intensity (SPEI > 0.5) for sugarcane were comparable, and the results are displayed in Figure 12. Sugarcane flood intensity was found to be greater than drought intensity during the sugarcane growth stages in Guangdong and Yunnan. In a previous investigation on drought and flood disasters in Yunnan [45], it was concluded that flood frequency was slightly higher than drought frequency in Yunnan over the past 620 years. Furthermore, Guangdong was usually considered as having a very humid climate, but Zhang et al. [46] suggested that the impacts of drought disasters in Guangdong increased and became non-negligible.

**Figure 12.** The intensities of drought and flood during sugarcane growth stages in SC from 1970 to 2020. × and — in the boxes represent the mean and median values, respectively.

Finally, it should be noted that in the present work, only the months during the sugarcane growth stages (from March to December) were considered in computing sugarcane drought and flood. However, as Wang and Yan [47] pointed out in an investigation concerning drought in SC, although drought events occurred throughout the entire year, January and February were among the most drought-prone months. Therefore, the difference in the calculation months may result in a few of our findings differing from previous investigations regarding drought and flood in SC.

#### *4.2. Spatial-Temporal Characteristics of Sugarcane HT in SC*

In addition to drought and flood, HT is also an extensively distributed agrometeorological disaster. Although sugarcane growth requires warm conditions, HT is regarded as a threat to sugarcane crop in future climates due to global warming [13]. As concluded by Zhang et al. [42], air temperatures in SC exhibited significant increasing trends, and annual air temperatures were expected to increase in the future (2020–2050). Correspondingly, in our results (Figure 4c), the intensity of sugarcane HT increased in all three provinces in SC; moreover, the increasing trend was highly significant (*p* < 0.01) in Yunnan and significant (*p* < 0.05) in Guangdong. We found that the increasing rate of HT intensity was greater in

Yunnan than in Guangdong and Guangxi (Figure 4c). This finding was fairly consistent with a relevant study by Zu et al. [21]; they found that the temperature during the sugarcane growing seasons obviously increased during 1970–2014, and Yunnan exhibited the greatest increase. Moreover, this finding was also in accordance with previous literature concerning extreme weather in Yunnan [41], in which the extreme temperatures were found to increase over the past six decades. Finally, in our results (Figure 4c), Guangdong, following Yunnan, also witnessed a significant increase in sugarcane HT intensity; this finding was in line with a conclusion drawn by Yuan et al. [48]: all cities in Guangdong had significant warming trends during 1958–2016.

#### *4.3. Relationships between Sugarcane Climatic Yield and Agrometeorological Disasters*

To date, the relationships between the intensity of drought/flood/HT and the yields of various crops have been widely investigated. These relationships can be used to assess the impacts of agrometeorological disasters on crop yield fluctuation; for many crops, such as rice, cotton, corn, and wheat, the significant relationships of disaster intensity vs. yield fluctuation have been established in a wide range of reports [23,26,28,29,49,50]. To the best of our knowledge, such relations concerning sugarcane have not yet been investigated. Accordingly, in the present work, we examined the relationships between sugarcane climatic yield and the intensities of drought, flood, and HT in SC; however, no significant relationships were obtained at the provincial scale (Figure 11) and only a few districts had significant relationships of drought/flood intensity vs. sugarcane climatic yield. A probable reason is the relatively strong tolerance of sugarcane to waterlogging and drought stresses. For example, short-term waterlogging treatments can significantly reduce cotton and wheat yields [22,51]; however, in most waterlogging stress experiments using sugarcane, long-term waterlogging was performed, and some sugarcane clones can well adapt to short-term flooding [17]. Nevertheless, the present findings provide evidence for the regional yield-reducing effects of sugarcane drought and flood (i.e., significantly negative correlation analysis results in a few districts), as well as a reference for future investigations into the impacts of drought and flood on sugarcane yield. In addition, a previous study [52] took Guangxi as the study area and found that the impact on crops from drought disasters was weaker than that from flood disasters. Consistent with that conclusion, we found that sugarcane flood impacts were more obvious than drought impacts in terms of yield-reducing effects (Figure 11a,b).

The relationships between HT and sugarcane climatic were not significant in any district (Figure 11c). Considering that sugarcane is a tropical plant, we adopted 38 ◦C, rather than the commonly-used 30 ◦C or 35 ◦C for other crops [27,39], as the threshold temperature of sugarcane HT; however, these air temperatures are not common in SC, resulting in a low level of HDD. In fact, the significantly negative impacts of HT on sugarcane yield have been detected in Australia [5] and northeastern Brazil [6]. Even so, we detected a significant increase in HT intensity (Figure 5c). This is a vigilant finding because the increasing sugarcane HT threats in SC probably induce unpredictable consequences. Similar to our findings, an increasing trend of air temperatures has also been detected in the largest sugarcane-producing country, i.e., Brazil [6]. Hence, under the context of global warming, sugarcane-growing regions may be confronted with higher air temperatures and more heat waves.

#### *4.4. Implications for Future Sugarcane Irrigation and Drainage in SC*

Irrigation and drainage are basic means for reducing the impacts of agricultural drought and flood disasters. Figure 7a,b displays the high-prone regions of sugarcane drought and flood in SC; in these regions, including the whole Yunnan and western Guangxi, timely irrigation is required to prevent sugarcane crops from drought stress. In comparison, sugarcane flood was concentrated in eastern Guangdong, where timely field drainage is important for eliminating sugarcane waterlogging stress. More importantly, according to our district-level results on the relationships between the intensities of

drought/flood and sugarcane climatic yield (Figure 11), five districts (including Dongguan and Meizhou in Guangdong, Beihai and Laibin in Guangxi, and Puer in Yunnan) deserve special attention to be paid to irrigation and drainage for sugarcane, because sugarcane yield in these places was significantly and negatively affected by flood/drought over the past few decades. Furthermore, the impacts of sugarcane drought and flood were different. As discussed above, the yield-reducing effects of sugarcane flood were stronger than sugarcane drought (Figure 11). More importantly, in terms of near-term disaster characteristics, the interdecadal analysis results (Figure 5b) showed that sugarcane flood intensity in SC reached a historic high in the most recent decade. Therefore, high-efficient drainage in sugarcane fields is of great importance for improving sugarcane yield in SC; additionally, differing from sugarcane flood, sugarcane drought seems to be efficiently relieved by irrigation.

Although HT has no direct associations with irrigation and drainage, it can affect the consequences of sugarcane drought and flood. According to recent reports, the coupling of drought and HT, or of flood and HT, can result in more impacts on crop yields [27,39]. Hence, the occurrence of sugarcane HT will have indirect influences on sugarcane irrigation and drainage when HT becomes more intensive. Our results (Figure 4c) demonstrated that sugarcane HT intensity in SC has increased significantly over the past decades; increasing HT provides breeding grounds for the coupling events of drought/flood and HT. Similarly, Xu et al. [39] deduced that future sugarcane HT would be more severe and threaten sugarcane production in China. In conclusion, when making irrigation and drainage schedules for sugarcane in SC, we should be cautious with the increasing influences from HT.

#### *4.5. Future Perspective and Research Limitations*

The effect of climate change on sugarcane production in future climates is a complex and important issue. On the positive side, it is expected that due to increased air temperature and CO2 concentration, sugarcane yield may increase in future scenarios [3]. However, on the negative side, the most challenging problems arising from the increasing risk of extreme meteorological disasters [2] can dramatically restrict sugarcane yield. For SC, many studies have consistently demonstrated that the air temperatures and heat waves in SC will increase in the coming decades [21,41,53]. Although elevated air temperatures may benefit sugarcane growth, according to previous investigations in Brazil and Australia, sugarcane yield can be obviously reduced by HT [5,6]. Hence, HT disasters in SC will probably become more harmful and nonnegligible. In addition, being consistent with HT, drought intensity in SC is also likely to increase in the future [41]. Considering that water deficit stress can indubitably affect sugarcane yield, we should pay special attention to sugarcane drought disaster reduction in SC.

The present work attempted to make contributions to agrometeorological disaster assessment, but it also faces some limitations which are expected to be overcome in future work. First, the division of sugarcane growth stages in this study was spatially and temporally coarse. It is considered that the phenology stages of crops likely have strong variations on small geographic scales (e.g., districts). Hence, to obtain more reasonable disaster intensities, it is crucial to include more detailed and precise databases for the crop phenology stage in different districts; phenology simulation models may provide efficient support. In addition, it is crude to determine growth stages on a monthly scale, which renders the drought and flood results insufficiently specific for sugarcane crops. Therefore, in future investigations, more specific timings of growth stages are expected to be included; accordingly, daily scale indices are preferable to monthly scale indices, such as SPEI. Another limitation of this work lies in computing the special results of disaster intensities. We employed a traditional method to obtain spatial results, i.e., calculating SPEI based on station-specific weather data and then performing spatial interpolation. However, the results of this approach can be affected by some factors, such as the distribution of weather stations and the employed interpolation method. In comparison, if high-quality

data in uniform grid cells are available, one can simply attach them to the targeted areas and obtains more accurate and stable spatial results.

#### **5. Conclusions**

SC is the dominating sugarcane-producing region in China, also playing an important role in global sugarcane industry. However, sugarcane crops in SC are severely affected by agrometeorological disasters, mainly including drought, flood, and HT. This work employed commonly used meteorological indices, i.e., SPEI and HDD, to characterize drought, flood, and HT during sugarcane growth stages in SC over the past five decades. Moreover, relationships between sugarcane climatic yield and disaster intensities were also examined. The main results are as following:

(1) During 1970–2020, the most recent decade witnessed the most severe agrometeorological disasters during sugarcane growth stages in SC. Sugarcane drought intensity significantly increased in Yunnan; in addition, sugarcane HT exhibited increasing trends in all three provinces, and the trends in Yunnan and Guangdong were significant (*p* < 0.05). In addition, in terms of the comparison between sugarcane drought and flood, flood intensity was considered slightly greater than drought intensity in Yunnan and Guangdong.

(2) In terms of the monthly intensity of sugarcane drought and flood disasters (i.e., disaster occurrence probability per month), the mature stage was more affected than the other growth stages. Additionally, the stem elongation was the most HT-prone period in Guangdong and Guangxi. However, for Yunnan, the seedling stage was the period most affected by sugarcane HT.

(3) The most drought-prone and flood-prone regions for sugarcane were western SC (i.e., Yunnan and western Guangxi) and eastern SC (i.e., eastern Guangdong), respectively. In addition, the high-prone regions of sugarcane HT were concentrated in southern Yunnan.

(4) Sugarcane yield in northwestern SC (i.e., northern Yunnan) and eastern SC (i.e., eastern Guangdong) was most affected by agrometeorological disasters, resulting in a lower climatic yield than other regions during different decades.

(5) The relationships between flood intensity and sugarcane climatic yield were majorly negative, and significant relationships were found in three districts, which demonstrates the yield-reducing effect of sugarcane flood. In comparison, sugarcane drought intensity had significant relations to climatic yield in four districts, but two of them were positive. In summary, for sugarcane in SC, the yield-reducing effect of flood was more obvious than drought. No significant effect of sugarcane HT on sugarcane climatic yield was detected.

**Author Contributions:** Conceptualization, L.Q.; methodology, L.Q. and P.Y.; software, P.Y. and Z.W.; validation, P.Y. and H.M.; formal analysis, P.Y.; investigation, P.Y., Z.W. and H.M.; resources, L.Q. and X.J.; data curation, L.Q.; writing—original draft preparation, P.Y. and L.Q. (equal contributions to draft preparation); writing—review and editing, L.Q. and X.J.; visualization, P.Y. and H.M.; supervision, L.Q.; project administration, L.Q.; funding acquisition, L.Q. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China, grant number 51909286; the Fundamental Research Funds for the Central Universities, grant number 2021qntd15. The APC was funded by the National Natural Science Foundation of China, grant number 51909286.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


## *Article* **The Prediction of Wheat Yield in the North China Plain by Coupling Crop Model with Machine Learning Algorithms**

**Yanxi Zhao 1,2,3, Dengpan Xiao 1,2,3,\*, Huizi Bai 1,\*, Jianzhao Tang 1, De Li Liu 4,5, Yongqing Qi <sup>6</sup> and Yanjun Shen 6,7**


**Abstract:** The accuracy prediction for the crop yield is conducive to the food security in regions and/or nations. To some extent, the prediction model for crop yields combining the crop mechanism model with statistical regression model (SRM) can improve the timeliness and robustness of the final yield prediction. In this study, the accumulated biomass (AB) simulated by the Agricultural Production Systems sIMulator (APSIM) model and multiple climate indices (e.g., climate suitability indices and extreme climate indices) were incorporated into SRM to predict the wheat yield in the North China Plain (NCP). The results showed that the prediction model based on the random forest (RF) algorithm outperformed the prediction models using other regression algorithms. The prediction for the wheat yield at SM (the period from the start of grain filling to the milky stage) based on RF can obtain a higher accuracy (r = 0.86, RMSE = 683 kg ha−<sup>1</sup> and MAE = 498 kg ha−1). With the progression of wheat growth, the performances of yield prediction models improved gradually. The prediction of yield at FS (the period from flowering to the start of grain filling) can achieve higher precision and a longer lead time, which can be viewed as the optimum period providing the decent performance of the yield prediction and about one month's lead time. In addition, the precision of the predicted yield for the irrigated sites was higher than that for the rainfed sites. The APSIM-simulated AB had an importance of above 30% for the last three prediction events, including FIF event (the period from floral initiation to flowering), FS event (the period from flowering to the start of grain filling) and SM event (the period from the start of grain filling to the milky stage), which ranked first in the prediction model. The climate suitability indices, with a higher rank for every prediction event, played an important role in the prediction model. The winter wheat yield in the NCP was seriously affected by the low temperature events before flowering, the high temperature events after flowering and water stress. We hope that the prediction model can be used to develop adaptation strategies to mitigate the negative effects of climate change on crop productivity and provide the data support for food security.

**Keywords:** yield prediction; machine learning; APSIM model; climate indices; North China Plain

#### **1. Introduction**

Food security is related to a series of major issues such as national food security, social stability and sustainable development of the national economy, which is highly concerned

**Citation:** Zhao, Y.; Xiao, D.; Bai, H.; Tang, J.; Liu, D.L.; Qi, Y.; Shen, Y. The Prediction of Wheat Yield in the North China Plain by Coupling Crop Model with Machine Learning Algorithms. *Agriculture* **2023**, *13*, 99. https://doi.org/10.3390/ agriculture13010099

Academic Editor: Yanqun Zhang

Received: 24 November 2022 Revised: 19 December 2022 Accepted: 26 December 2022 Published: 29 December 2022

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

by the country [1–3]. Increasing food productivity is an important measure to ensure food security. However, the trend of global warming became more severe throughout the 20th century [4,5]. Generally, climate warming can shorten the crop growth period, which negatively influences the formation of a crop yield and, ultimately, causes crop failure [6,7]. Predicting the yield can provide data support to take appropriate management for farmers. Wheat is one of the three major grain crops in China, with a wide planting range, large planting area and high yield [8]. Therefore, the studies on yield prediction in wheat are conducive to the timely and accurate grasp of the grain production status and scientific formulation of policies for the government [9,10].

The statistical regression model (SRM) directly developed statistical models based on the relationship between selected predictors and target variables to achieve the goal [11–14]. Guan et al. [15] used partial least-square regression (PLSR) to estimate the relationships between crop yield and the predictor variables. In general, the models combined with statistical regression algorithms are easy to understand and require fewer parameters, so the methods are commonly used in yield predictions worldwide [16–18]. However, with the increasing volume and dimension of observation data, it is a great challenge to fully explore the information of datasets for effective analysis and utilization. The most current SRM based on linear regression have some problems in application due to the complexity of the crop production system. For example, crop yields exhibit nonlinear responses to extreme climate events, while previous linear regression models may not perform well under frequent extreme climate conditions [19,20]. Compared with the linear regression analysis, the machine learning algorithm (MLA) is an advanced method for yield estimation that can capture nonlinear relations between the dependent and independent variables [21–24]. The MLA can explore the information of the training data, obtain a higher generalization level, and enhance the robustness and universality of the prediction model [15]. For example, Cai et al. [25] developed the prediction model for wheat in Australia by using some machine learning, while the support vector machine (SVM) algorithm performs better than other statistical regression algorithms. Hunt et al. [26] used the random forest (RF) algorithm to evaluate the crop yield and achieved a good performance. Nevertheless, MLA are not mechanistic and can not fully consider the dynamic process of crop growth.

The crop mechanism model (CM) has good mechanical properties, which is a simulation program that can dynamically describe the process of crop growth and yield formation under various environmental conditions by importing weather data, a variety of parameters, soil data and so on [19]. With the development of CM, the studies on the estimation of the crop yield increased gradually. For example, Huang et al. [27], Xiao et al. [28] and Zhang et al. [29], based on CMs, estimated the yields of maize, wheat and rice in China, respectively. However, the results of the most related studies were end-of-season yield predictions. The greatest limitation of within-season predictions is the lack of meteorological data from the prediction date to the maturity date [30]. Some studies achieved the prediction results by coupling the CM with seasonal weather forecasts. Pagani et al. [31] developed a high-resolution integrated prediction system for rice yield at the district level based on the combination of the WARM model, weather forecasts and remote sensing images. However, the real weather conditions may deviate from the weather forecast data, thus increasing the uncertainty of the prediction model [32].

We can reduce unnecessary errors by combining MLA and CM for yield prediction. Feng et al. [32] used the integration of the MLA and APSIM model to predict the yield of wheat under rainfed conditions in Southeastern Australia, and the hybrid model obtained a decent yield prediction at one month leading time before harvest. Nevertheless, there are few studies on using a hybrid model to predict crop yields under irrigated conditions. Furthermore, CMs can simulate the effects of complicated climate conditions on crop growth to a certain extent but are not sufficient. The quantity variation of key climatic factors (e.g., temperature and precipitation) can be transformed into the climate suitability of crop growth based on the membership function method in fuzzy mathematics [33,34]. Meanwhile, the extreme climate indices (ECIs) can quantify the destructive effects of extreme

climate events on crop growth [35,36]. The climate suitability and ECIs can be included in the hybrid model as predictive indicators to further explore the information reflected by the climate factors and improve the robustness of the hybrid model. However, there were few studies on using the combination of climate suitability and ECIs as predictive variables.

The North China Plain (NCP) is an important grain production base and occupies an important position in the national grain production in China [8]. In this study, we investigated the yield prediction of wheat in the NCP by using the CM and SRM. The main objectives of the study were (1) to develop the yield prediction model of wheat based on the combination of the multiple growth period-specific variables and SRM, (2) to identify the optimal lead time before maturity of yield prediction with acceptable accuracy, and (3) to evaluate the relative importance of input variables during different growth stages in the yield prediction model.

#### **2. Data and Methods**

#### *2.1. Study Area*

The NCP is delimitated in the east by the sea, the west by the Taihang Mountains, the south by the main stream of the Huaihe River, and the north by the Yan Mountains (Figure 1) [37]. The region has a warm temperate monsoon climate with plenty of light and heat resources [37]. The annual precipitation is not evenly distributed, with over 70% of precipitation appearing in July through September. The main soil type in the NCP is the loam of Aeolian origin, a soil type deposited by rivers over geological periods. The NCP is an important grain production region in China, where the main cropping system is the double cropping systems of winter wheat–summer maize [38]. Winter wheat is usually planted in early or middle October and harvested in early June. We selected 20 agro-meteorological sites distributed across the NCP (Figure 1). Table 1 presents basic information for the 20 study sites, including location, irrigation condition, and wheat phenology and yield.

**Figure 1.** The spatial distribution of 20 agro-meteorological sites across the North China Plain.


**Table 1.** Related information about the 20 investigated sites in the study.

Notes: FDm, MDm and WYm denote the mean flowering date, the mean maturity date, and the mean yield for wheat during the investigated period, respectively. DOY is day of year.

#### *2.2. Climate, Soil and Crop Data*

The historical records about daily climate data, including mean temperature (Tmean), maximum temperature (Tmax), minimum temperature (Tmin), precipitation (Prec), and sunshine hours (Sh) during 2000 to 2010 for 20 agro-meteorological sites across the NCP, were obtained from China's Meteorological Administration (CMA). Soil profile data of all the sites were obtained from the 1:1 million scale soil map of China included in the Harmonized World Soil Database (HWSD) version 1.2 [39]. The climate and soil data were used to run the APSIM model.

Detailed field experimental data records included the phenology (sowing date (SD), flowering date (FD), and maturity date (MD)); grain yield; and management data at the agro-meteorological experiment sites for 2000–2010 were also obtained from CMA. The phenology data was observed by experimenters in the specific fields at the agrometeorological experiment sites, while the grain yield was the weight of the harvested crop in the specific fields. We used the experimental crop data to calibrate and validate the crop parameters in the APSIM model.

#### *2.3. Methodology*

2.3.1. Agricultural Production Systems SIMulator (APSIM) Simulations

The APSIM model is a comprehensive model developed to simulate biophysical processes in agricultural production systems [40,41]. The APSIM model can provide an acceptable prediction accuracy of crop productivity under the combined influences of climate change, soil condition, and management measures [42,43]. In this study, the APSIM model was implemented to simulate crop phenology, biomass, and grain yield during 2000–2010 at the 20 selected sites.

#### 2.3.2. Climate Indices

In the study, we took account of four main growth periods, including the period from end of the juvenile stage to floral initiation (JF), the period from floral initiation to flowering (FIF), the period from flowering to the start of grain filling (FS), and the period from the start of grain filling to the milky stage (SM). We assessed the impacts of 10 extreme climate indices (ECIs) [44,45] and 3 for climate suitability (CS) [46] during different growth periods for wheat (Table 2). The calculation methods of the ECIs were shown in Table 2. The CS can further explore the information of the mean climate variables. We can develop the climate suitability model according to related studies [46].

**Table 2.** The information about the thirteen climate indices (CIs) used in the study.


Note: JF, FIF, FS, and SM denote the periods from end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from the start of grain filling to the milky stage, respectively.

The sunshine suitability (SS) of wheat was calculated as follows [47–49]:

$$\text{SS} = \begin{cases} \mathbf{e}^{-\left[\left(\mathbf{S}\_{\mathbf{i}} - \mathbf{S}\_{\mathbf{0}}\right)/\mathbf{b}\right]^{2}} & \mathbf{S}\_{\mathbf{i}} < \mathbf{S}\_{\mathbf{0}}\\ 1 & \mathbf{S}\_{\mathbf{i}} \ge \mathbf{S}\_{\mathbf{0}} \end{cases} \tag{1}$$

where S0 is the daily sunshine hours when the percentage of the daily sunshine hours reaches 70%, Si is the daily sunshine hours (h), and b is a constant that can be determined according to the climatic conditions across the NCP and relevant studies [49,50]. The values for b at different growth periods are shown in Table 3. The arithmetic mean of the daily SS for a specific growth period is the SS for the corresponding period.

**Table 3.** Values of related parameters for calculating the sunshine suitability (SS), temperature suitability (TS), and precipitation suitability (PS) at four growth periods of wheat.


Note: JF, FIF, FS, and SM denote the periods from the end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from start of the grain filling to the milky stage, respectively.

The temperature suitability (TS) of wheat was calculated as follows [47–49]:

$$\text{TS} = \frac{\left[ (\text{T}\_{\text{i}} - \text{T}\_{1})(\text{T}\_{2} - \text{T}\_{\text{i}})^{\text{B}} \right]}{\left[ (\text{T}\_{0} - \text{T}\_{1})(\text{T}\_{2} - \text{T}\_{0})^{\text{B}} \right]} \tag{2}$$

$$\text{Amountg} \left\{ \text{B} = \frac{\left( \text{T}\_2 - \text{T}\_0 \right)}{\left( \text{T}\_0 - \text{T}\_1 \right)} \right\} \tag{3}$$

where Ti is the daily mean temperature (◦C), T0 is the optimal temperature (◦C) at different growth periods, T1 is the lower limit temperature (◦C) at different growth periods, and T2 is the upper limit temperature (◦C) at different growth periods. The specific values of T0, T1, and T2 refer to the climatic conditions across the NCP and relevant studies [49,51]. The values for T0, T1, and T2 at different growth periods are listed in Table 3. The arithmetic mean of the daily TS for a specific growth period is the TS for the corresponding period.

The precipitation suitability (PS) of wheat was calculated as follows [52]:

$$\text{PS} = \begin{cases} \text{P}/\text{P}\_0 & \text{P}\_\text{i} < \text{P}\_0\\ \text{P}\_0/\text{P} & \text{P}\_\text{i} \ge \text{P}\_0 \end{cases} \tag{4}$$

where P is precipitation (mm), and P0 is the physiological water requirement of crops, which can be calculated as follows:

$$P\_0 = \mathbb{K} \mathbf{c} \mathbf{t} \mathbf{E} \mathbf{T}\_0 \tag{5}$$

where Kc is the crop coefficient, and ET0 is the reference crop evapotranspiration (mm). The Kc values of wheat at different growth stages listed in Table 3 are determined according to the relevant studies [53,54]. The ET0 values of wheat are calculated based on the Penman– Monteith formula [54].

#### 2.3.3. Regression Models

Two machine learning algorithms, i.e., random forest (RF) and light gradient boosting machine (LGB), were selected to predict the wheat yield. RF is an ensemble learning algorithm [26,55], which creates multiple decision trees in a random way and applies them in training samples. Among all the current algorithms, RF has high accuracy and stability, which can effectively process input samples with large data volumes and high-dimensional features. LGB is an implementation of the gradient boosting decision tree, which is essentially based on decision tree training integration to gain the optimal model [56,57]. The LGB model uses the histogram algorithm to find the best branching point, which greatly improves the training speed of the model. At the same time, LGB optimizes the growth strategy of the decision tree and uses the leaf-wise algorithm with depth limitation to create the decision tree, which can reduce the unnecessary amount of computation. In addition, multiple linear regression (MLR) was selected as the benchmark model in this study to compare with the above two machine learning models.

#### 2.3.4. The Framework for the Procedures

The diagram for the procedures in this study is shown in Figure 2. We developed a yield-predicting system based on multi-source environmental data using the APSIM model and regression models (MLR and RF). Firstly, the APSIM model was calibrated and validated based on observed phenology data and grain yield data at the selected sites. Then, we ran the implemented model to obtain the biomass and main growth stages, including the end of the juvenile stage, floral initiation, flowering, start of grain filling, and the milky stage. The main growth stages were used to calculate the 13 CIs. We aggregated the APSIM-accumulated biomass (AB) and climate variables into four groups by different growth periods. In the study, four prediction events (JF, FIF, FS, and SM) were triggered successively, while the predictive indicators were added with crop growth progression. Therefore, the number of predictive indicators would increase with progressing phases from JF to SM. Furthermore, we conducted "leave-one-year-out" experiments [25,58] for 2000–2010 to test the performances of the yield prediction models. Finally, the importance values for the input characteristic variables were analyzed based on the RF model and LGB model.

**Figure 2.** The diagram for the procedures used in this study, where JF, FIF, FS, and SM were the periods from the end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from the start of grain filling to the milky stage, respectively. AB was accumulated biomass; MLR, LGB, and RF the multiple linear regression, light gradient boosting machine, and random forest, respectively.

#### 2.3.5. Model Performance Assessment

The performance of the yield prediction model was validated by calculating the root mean square error (RMSE), Pearson's correlation coefficient (r), and mean absolute error (MAE) between the estimated data and the observed data. The calculation formulas were as follows:

$$\mathbf{r} = \frac{\sum\_{i=1}^{n} \left(\mathbf{O\_{i}} - \overline{\mathbf{O}}\right) \left(\mathbf{S\_{i}} - \overline{\mathbf{S}}\right)}{\sqrt{\sum\_{i=1}^{n} \left(\mathbf{O\_{i}} - \overline{\mathbf{O}}\right)^{2}} \sqrt{\sum\_{i=1}^{n} \left(\mathbf{S\_{i}} - \overline{\mathbf{S}}\right)^{2}}}\tag{6}$$

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\mathbf{O}\_{\text{i}} - \mathbf{S}\_{\text{i}})^2} \tag{7}$$

$$\text{MAE} = \frac{\sum\_{i=1}^{n} |\mathbf{O\_i} - \mathbf{S\_i}|}{n} \tag{8}$$

where Oi, Si, O, S, and n represent the observed data, estimated data, mean value of the observed data, mean value of the estimated data, and the number of samples, respectively.

#### **3. Results**

#### *3.1. Validation of the APSIM Model*

The comparison of the observed and APSIM-simulated values of the flowering date (FD), maturity date (MD), and yield from 2000 to 2010 at the 20 sites is shown in Figure 3. The simulated FD and MD were in good agreement with observed values. The r values for the simulated and observed values of FD and MD were 0.78 and 0.82, respectively. The RMSE values between the simulated and observed values of FD and MD were 5.46 d and 4.94 d, respectively. On the other hand, the simulated grain yield was consistent with the observed yield, with r of 0.81 and RMSE of 792 kg ha−1. Overall, the APSIM model can provide an acceptable assessment for the phenology and grain yield of wheat. Therefore, the simulation results from the APSIM model for wheat phenology and grain yield were reliable, and we could use the simulations to develop a hybrid model for predicting the wheat yield.

#### *3.2. The Model Performance and Optimum Leading Time for Yield Prediction*

We developed a hybrid model to predict wheat yield based on the APSIM-simulated AB, climate indices at different growth stages and regression algorithms. The performances of three regression models are shown in Figures 4 and 5. In the early stage, the yield prediction accuracy of the three regression models was generally lower, with RMSE values of above 1000 kg ha−<sup>1</sup> and MAE values of more than 700 kg ha−<sup>1</sup> (Figures 4a,e,i and 5a,c,e). With the progression of wheat growth period, the input variables also increased, and the performances of the prediction models improved further. From JF to SM, the prediction accuracy increased significantly for the three regression models. For the MLR model, r increased from 0.22 to 0.79, RMSE decreased from 1237 kg ha−<sup>1</sup> to 778 kg ha−1, and MAE decreased from 957 kg ha−<sup>1</sup> to 619 kg ha−<sup>1</sup> (Figures 4a–d and 5b). Compared with the machine learning model, MLR was less effective in predicting the wheat yield. The machine

learning model can capture the nonlinear relationship between the characteristic variables and the yield, and the overall performance of the machine learning models was good, especially the RF model. For the RF model, r increased from 0.66 to 0.86, RMSE decreased from 1026 kg ha−<sup>1</sup> to 683 kg ha−1, and MAE decreased from 756 kg ha−<sup>1</sup> to 498 kg ha−<sup>1</sup> (Figures 4i–l and 5f). The performances of the yield prediction models improved gradually with the development of crop growth. However, the tradeoff between the accuracy and leading time needs to be taken into account. The yield prediction at JF will achieve the target of the prediction approximate with three months leading time before the maturity but with a poor performance (r < 0.66). The yield prediction at SM outperformed the yield prediction at other growth periods, while the leading time of the prediction decreased to below 15 d. A higher precision and longer lead time were taken into consideration for the prediction of the yield at FS. Therefore, FS can be regarded as the optimal period, providing the best performance of yield prediction and about one month of lead time.

**Figure 4.** Comparison of the observed and predicted wheat yields for the period from the end of the juvenile stage to floral initiation (JF) (**a**,**e**,**i**), from floral initiation to flowering (FIF) (**b**,**f**,**j**), from the start of grain filling to the milky stage (FS) (**c**,**g**,**k**), and from flowering to the start of grain filling (SM) (**d**,**h**,**l**) from multiple linear regression (MLR) (**a**–**d**), light gradient boosting machine (LGB) (**e**–**h**), and random forest (RF) (**i**–**l**). Red lines are the linear regression fit. Dashed lines represent the 1:1 lines.

We compared the performance of the predicted yield across the study sites under irrigated conditions with the performance of the predicted yield across the study sites under rainfed conditions (Figures 6 and 7). The errors of the predicted yield from three regression models at all growth periods for the study sites under irrigated conditions (MAE ranged from 419 kg ha−<sup>1</sup> to 789 kg ha<sup>−</sup>1) were lower than those for the rainfed sites (MAE ranged from 624 kg ha−<sup>1</sup> to 1130 kg ha<sup>−</sup>1) (Figures 6 and 7). The accuracy of the predicted yield for the irrigated sites was higher than that for the rainfed sites. The water shortage caused by drought limited photosynthesis and carbon allocation, which was not conducive to the formation of the crop yield and affected the prediction accuracy [59,60]. However, the impacts of water stress on the crop yield was reduced by irrigation, which improved the accuracy of the yield prediction under irrigated conditions [61–64]. Nevertheless, the pre-

dicted yield for the study sites under irrigated conditions were underestimated compared to the observed yield, while the predicted yield for the rainfed sites were overestimated compared to the observed yield ( Figures 6a,c,e and 7a,c,e).

**Figure 5.** Time series of observed and predicted wheat yields across the 20 investigated sites based on the four prediction events from multiple linear regression (MLR) (**a**,**b**), light gradient boosting machine (LGB) (**c**,**d**), and random forest (RF) (**e**,**f**). Wheat yields for each year were averaged across the 20 investigated sites. Data were generated from the "leave-one-year-out" cross-validation procedure from the three regression models. JF, FIF, FS, and SM were the periods from the end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from the start of grain filling to the milky stage, respectively.

#### *3.3. Relative Importance of Selected Predictors at Different Growth Stages*

The RF model and LGB model were used to assess the importance of the input characteristic variables in the yield prediction model. The relative importance of the input predictors as determined from the average of the LGB model and RF model for each prediction event is shown in Figure 8. With the crop growth and progression, the importance of the APSIM-simulated accumulated biomass (AB) increased rapidly, while the importance of AB at the last three prediction events was over 30% (Figure 8). For the CIs, the climate suitability indices were most important for the yield prediction at the early prediction event, such as TS and SS (Figure 8a). The roles of the climate suitability indices in the prediction model should not be ignored, though some extreme climate indices had higher importance than the climate suitability indices in the last three prediction events (Figure 8b–d). In the middle prediction events (FIF and FS), SDII and FD at FIF generally ranked high in the climate indices, which may be because the wheat yield was very sensitive to low-temperature stress and water stress before flowering (Figure 8b,c). However, SDII and HCD at SM ranked first in the late prediction events, suggesting that the impact of

heat stress and water stress after flowering on the wheat yield was more significant than low-temperature stress and water stress before flowering (Figure 8b–d).

**Figure 6.** Time series of observed and predicted wheat yields across the investigated sites under irrigated conditions based on the four prediction events from multiple linear regression (MLR) (**a**,**b**), light gradient boosting machine (LGB) (**c**,**d**), and random forest (RF) (**e**,**f**). Wheat yields for each year were averaged across the investigated sites under irrigated conditions. Data were generated from the "leave-one-year-out" cross-validation procedure from the three regression models. JF, FIF, FS, and SM were the periods from the end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from the start of grain filling to the milky stage, respectively.

**Figure 7.** Time series of observed and predicted wheat yields across the investigated sites under rainfed conditions based on the four prediction events from multiple linear regression (MLR) (**a**,**b**), light gradient boosting machine (LGB) (**c**,**d**), and random forest (RF) (**e**,**f**). Wheat yields for each year were averaged across the investigated sites under rainfed conditions. Data were generated from the "leave-one-year-out" cross-validation procedure from the three regression models. JF, FIF, FS, and SM were the periods from the end of the juvenile stage to floral initiation, from floral initiation to flowering, from flowering to the start of grain filling, and from the start of grain filling to the milky stage, respectively.

**Figure 8.** Relative importance of the input predictors as determined from the average of LGB (light gradient boosting machine) model and RF (random forest) model for the period from the end of the juvenile stage to floral initiation (JF) (**a**), from floral initiation to flowering (FIF) (**b**), from flowering to the start of grain filling (FS) (**c**), and from the start of grain filling to the milky stage (SM) (**d**). The results are normalized to sum 100% and shown in decreasing order in the figure (The input predictors lower than 2% were not shown in the figure).

#### **4. Discussion**

A crop model can dynamically describe the process of crop growth and development under various environmental conditions [65]. A growing body of studies have investigated the effects of climate change during the past few decades on crop phenology and yield using various crop models to develop adaptive measures (such as adjustment of the sowing date and renewal of crop variety) for reducing the yield loss [66–68]. However, there were fewer studies on the yield prediction using the crop model due to the limitation of the meteorological data. Some studies used the combination of statistical regression models and crop models to estimate the crop yield. For example, Everingham et al. [69] built one prediction model for sugarcane yield by incorporating the biomass simulated by the crop model and several climate indices into the RF algorithm and obtained a high accuracy. Similarly, Feng et al. [32] conducted the study on the yield prediction for wheat in South-Eastern Australia through combining the APSIM model and RF model, obtaining a high accuracy (r = 0.87, RMSE = 640 kg ha−1). In this study, we developed a hybrid model for the yield prediction of the wheat coupling crop model and several statistical regression models. The yield prediction model based on the crop model and RF algorithm outperformed the yield prediction model based on the crop model and other regression algorithms (MLR and LGB), with r of 0.86 and RMSE of 683 kgha−1. The precision of the study was similar to the related study [32]. This may be because that RF algorithm has a strong ability of data processing, which improved the accuracy and robustness of the yield prediction based on the RF algorithm [70–72].

Global climate change has a significant impact on social economy and the natural environment, especially on agricultural production [73–80]. Different crops have different demands for climate resources, and more or less, climate resources are not conducive to crop growth and development [81,82]. As compared with extreme climate events, mean climate conditions generally made more contributions to the variations of wheat growth in the NCP [83]. The climate suitability can be used to estimate the sensitivity of crops to climate factors, such as mean temperature, precipitation, and sunshine, and there is a certain correlation between climate suitability and climatic yield [84,85]. Climate suitability can further explore the information reflected by the mean climate conditions, though the crop model can simulate the effects of the mean climate conditions on crop growth to some extent. In this study, the climate suitability indices (TS, SS, and PS) played an important role in predicting the final yield of wheat, which generally ranked high in the models for every prediction event (Figure 8). The roles of climate suitability indices in the prediction model should not be ignored.

Extreme temperature events have a negative influence on crop growth and yield formation, which could cause crop yield loss [2,86–88]. The low-temperature events before flowering and high-temperature events after flowering are two major extreme temperature events affecting winter wheat [89,90]. Xiao et al. [91] found that there was the greatest frost duration and intensity in the NCP, which suffered the largest yield losses due to spring frost events. Warm temperatures can improve the growth of crops before the temperature reaches the threshold, but yields will abruptly diminish subsequently [92,93]. Around flowering or the grain filling period, extreme high temperature could affect pollination, reduce male fertility, and the efficiency of grain yield, and a large yield loss would be caused by continuous heat stress [94–97]. Bai et al. [98] found that heat stress after flowering significantly negative impacts wheat production in the NCP, while the wheat yield might have a higher frequency of exposure to extreme high-temperature stress in the future. The findings of the study showed that FD at FIF ranked high for the middle prediction events (FIF and FS), while HCD at SM ranked first in the climate indices for the late prediction events (Figure 8). Low-temperature events before flowering and heat stress after flowering are the main natural disasters affecting wheat growth in the NCP [90,99]. It is of great significance to take appropriate measures to alleviate the negative effects of these disasters on crops.

Drought is also closely correlated to agricultural production [100–103]. In this study, the rank of SDII related to water stress was consistently high for all prediction events, indicating that water stress has a significant impact on wheat yield in the North China Plain. Water stress can affect the coupling mechanism of environmental driving factors and crop yield, while it is difficult to achieve the acceptable yield prediction in the rainfed system [61]. The predicted yield for the sites under rainfed conditions would be overestimated due to the water stress, while irrigation can effectively reduce the effect of drought on the crop yield and increase the accuracy of the crop yield prediction [62,83]. In the study, the predicted yield for the sites in the rainfed system was overestimated, while the MAE of the predicted yield for the sites under the irrigated condition was significantly lower than that of the sites in the rainfed system (Figures 6 and 7). More predictive variables may need to be incorporated into the hybrid model to improve the performance of the model under irrigated conditions.

There are still some uncertainties and limitations in our study. The RF model is more dependent on data. Sufficient data samples are conducive to improving the accuracy and robustness of the model, while the lack of training samples may lead to overfitting and increase the uncertainty of the model [104]. The data processing ability of machine learning algorithms can fully function by obtaining more yield samples, while the performance of the model can be improved further. Furthermore, the model developed in this study is limited to the yield prediction at the site scale, which is difficult to be applied in a large-scale region. Lobell et al. [105] developed a scalable satellite-based crop yield mapper (SCYM) based on satellite images and crop models, which successfully explained 35% of the maize yield variation and 32% of the soybean yield variation in the study area. In the future, we can incorporate the SCYM model into the hybrid model to predict the crop yield at a large-scale region. This is a study direction with great development potential.

#### **5. Conclusions**

Based on the APSIM-simulated AB, climate indices at different growth stages, and statistical regression algorithms, we developed a hybrid model to predict wheat yields in the NCP. The results showed that the prediction model based on machine learning algorithms outperformed the prediction models using MLR regression, especially the RF algorithm. The performances of the yield prediction models improved gradually with the development of crop growth. A higher precision and longer lead time were taken into consideration for the prediction of the yield at FS. The FS can be regarded as the optimal period, providing the acceptable performance of yield prediction and about one month lead time. Moreover, the accuracy of the predicted yield for the irrigated sites was higher than that for the rainfed sites. The APSIM-simulated AB dominated the last three prediction events, with the importance above 30%. The climate suitability indices played an important role in predicting the wheat yield, with high rankings for every prediction event. Among extreme climate events, the low temperature events before flowering, high temperature events after flowering, and water stress were major extreme climate events affecting the winter wheat yield.

In general, the hybrid model can be used to predict the wheat yield under both rainfed and irrigated conditions in the NCP. This model is helpful in developing adaptation strategies to alleviate the negative effects of climate change on crop productivity and improve agricultural risk management. Nevertheless, the hybrid model is dependent on the quantity and quality of the data samples. Furthermore, the model developed in this study is limited in the yield prediction at the large-scale region. In the next study, we can incorporate the SCYM model into the hybrid model to predict crop yield at largescale regions.

**Author Contributions:** Conceptualization, D.X.; methodology and data analysis, Y.Z.; writing original draft preparation, Y.Z.; and writing—review and editing, D.X., H.B., J.T., D.L.L., Y.Q. and Y.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research and the APC were funded by a grant from the Hebei Provincial Science Foundation for Distinguished Young Scholars (No. D2022205010), the National Natural Science Foundation of China (No. 41901128), the High-level Talents Training and Subsidy Project of Hebei Academy of Science (2022G04), and the Technology Program of Hebei Academy of Sciences (22102).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## *Article* **Modelling the Geographical Distribution Pattern of Apple Trees on the Loess Plateau, China**

**Wei Xu 1,2, Yuqi Miao 3, Shuaimeng Zhu 4, Jimin Cheng 2,5 and Jingwei Jin 5,\***

	- Xianyang 712100, China

**Abstract:** The Loess Plateau, known for its fragile ecosystems, is one of the traditional appleproducing regions in China. Although some management measures are needed to enhance sustainable agriculture in response to the rising pressure of climate change, the geographic distribution of apple trees considering multiple variables has not been considered. In this study, we used three software (the maximum entropy model, IDRISI, and ArcGIS) to simulate the potential distribution of suitable habitats and range shifts of apple trees in the near present and near future (i.e., the 2030s and the 2050s) under two climate scenarios (the Shared Socioeconomic Pathways (SSP)1-26 and SSP5-85), while taking a variety of environmental factors into account (e.g., temperature, precipitation, and terrain). After optimization, the class unsuitable habitat (CUH) changed the potential distribution pattern of apple trees on the Loess Plateau. Currently, the areas of lowly suitable habitat (LSH), moderately suitable habitat (MSH), highly suitable habitat (HSH), and CUH were 7.66 <sup>×</sup> 104, 2.80 <sup>×</sup> 104, 0.23 <sup>×</sup> 104, and 18.05 <sup>×</sup> 104 km2, respectively. Compared to the centroid estimated under the climate of 1970–2000, the suitability range of apple trees was displaced to the northwest in both the 2030s and the 2050s in SSP5-85 (i.e., 63.88~81.30 km), causing a larger displacement in distance than SSP1-26 (i.e., 40.05~50.32 km). This study demonstrates the possible changes in the spatial distribution of apple trees on the Loess Plateau in the near future and may provide a strong basis for future policy making.

**Keywords:** suitable habitat; climate scenario; range shift; ArcGIS; MaxEnt; apple trees

#### **1. Introduction**

The Loess Plateau is located in the arid and semi-arid regions of Northwest China. Its world-famous loess deposition, soil erosion, and huge spatial heterogeneity in precipitation have resulted in a unique and fragile plateau ecosystem [1]. In the 20th century, soil degradation and dust storms were further aggravated by the unsustainable land use practices of local farmers and herdsmen (overgrazing and farmland reclamation, etc.) as a result of their intense struggle to survive [2–4]. Additionally, since the 1970s, after the implementation of several ecological improvement projects (e.g., the Grain for Green Project, the Natural Forest Project [5]), soil erosion in this region has been significantly reduced [6], and a large amount of farmland has been transformed into forests and grasslands [7]. Thus, the ecological environment has been significantly improved, and the economic forestry and fruit industries have rapidly developed [8,9]. The primary responsibilities of the Chinese government in this century are gradually turning to maintaining the work of soil and water conservation [10], increasing ecosystem biodiversity [11], raising incomes of local inhabitants [12], and developing sustainable ecological agriculture [6]. Global

**Citation:** Xu, W.; Miao, Y.; Zhu, S.; Cheng, J.; Jin, J. Modelling the Geographical Distribution Pattern of Apple Trees on the Loess Plateau, China. *Agriculture* **2023**, *13*, 291. https://doi.org/10.3390/ agriculture13020291

Academic Editors: Dengpan Xiao and Wenjiao Shi

Received: 23 October 2022 Revised: 18 January 2023 Accepted: 23 January 2023 Published: 25 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

apple consumption is increasing annually [13], and the Loess Plateau has become the largest apple production area in China and even in the world [14]. As the main incomegenerating economic fruit in the region, the development of the apple industry is of great significance to reduce poverty [14,15]. The rational planning of cultivation patterns, as the cornerstone of realizing the healthy and stable development of the apple industry [16], plays an important role in achieving the long-term goal of sustainable development in the Loess Plateau. However, few studies have focused on the suitable habitats (SH) of this economic tree in this region, particularly in the context of climate change [17].

The continuous pressure of climate change and rapid social development affect the structure and function of global ecosystems [18], and are changing the distribution range of species to a great extent [19]. Throughout the past 100 years, the global average temperature has risen by about 0.85 ◦C [20], and is expected to rise by 1.5–2.1 ◦C by 2050 [21]. Until now, CO2 concentrations have increased from 280 parts per million (ppm, the pre-industrial level) to around 408 ppm, and may reach 560 ppm (double pre-industrial levels) by 2060 without actions to reduce emissions [22]. Due to the adaption of the natural environment to human activities, the land use and cover change (LUCC) has become the most obvious alteration in natural ecological environments [23,24], especially in fragile ecosystems close to the range of human activities [25,26]. However, the LUCC may exacerbate climate change, limit human activities [18], and threaten global biodiversity [21].

Plant pathogens are generally ignored in the research and planning of the SH of economic trees [27] despite the fact that they have a powerful effect on the distribution of their host plants [28]. *Valsa mali*, a necrotrophic fungus belonging to ascomycete [29], causes the apple valsa canker (AVC). AVC is a serious disease affecting the quality and yields of apples [30,31], and seriously restricts the sustainable development of the apple industry in the Loess Plateau [32,33]. At the same time, relevant policies [34] and other biological factors [29] also have important impacts on the apple cultivation areas [18,19]. However, to the best of our knowledge, studies mainly focus on the geographic distribution of apple trees relying on environmental factors only (e.g., temperature, precipitation, terrain), while no attempts have been performed for coupling this with land use types and plant pathogens. Without considering these multiple related factors, the persuasiveness and accuracy of simulation results may be seriously affected.

The development of computer technology has promoted a variety of species distribution models (SDMs) [35,36]. These SDMs mainly model and calculate the distribution of species with georeferenced presence/absence data and their interrelated environmental layers (e.g., meteorological data, terrain data, and social data) [37,38]. The maximum entropy model (MaxEnt), a program that relies on continuous/classified environmental variables and associated occurrence data, produces highly precise predictions [38–42]. It should be noted that many environmental factors that are difficult to collect and quantify (e.g., impacts from the related biological species and policies such as land-use planning) are not easily taken into account by MaxEnt [29]. Hence, in order to better mimic the distribution of species, it required a technique to discern the effects of interactions between species and variables that are difficult to quantify.

In this study, we simulate the SH of apple trees while taking relevant policies and plant pathogens into account in the context of climate change. To reach this goal, we: (1) independently simulated the distribution pattern of an economically important plant species (i.e., *Malus pumila* Mill.) and its pathogen (i.e., *V. mali*); (2) evaluated and selected the limits of ranges of abiotic factors (i.e., LUCC) and biographic factors (i.e., *V. mali*) that influence the distribution pattern of an economically important plant species; (3) optimized the SH of apple trees by integrating the effects of those abiotic and biotic factors. The purpose of this study was to better understand how apple trees on the Loess Plateau will respond to climate change in the near future, and to offer a theoretical foundation for apple cultivation, structural adjustment, and policy-making in the connected businesses in this region.

#### **2. Materials and Methods**

#### *2.1. Collection of Occurrence Data of Species and Environmental Variables*

In this study, the occurrence data of apple trees and its plant pathogen (*V. mali*) in the northern area of the Yangtze River (China) were collected with field surveys, published articles, and online databases (for details see: [17,29]). In total, we collected 260 georeferenced present-only records on apple trees and 211 georeferenced present-only records on *V. mali*.

In the Coupled Model Intercomparison Projects phase 6 (CMIP6), 49 different modelling groups from different countries contributed around 100 unique climate models to represent the change in the future climate. In this initiative, the Representative Concentration Pathways used in phase 5 have been replaced by the new Shared Socioeconomic Pathways (SSP) that have approaches to different radiative forcing levels that depend on the emissions of greenhouse gases (SSP1-26: 2.6 wm−2, SSP2-45: 4.5 wm−2, SSP3-70: 7.0 wm<sup>−</sup>2, and SSP5-85: 8.5 wm−2), which in turn lead to increasing warming [43,44]. In terms of intergovernmental energy conservation and emissions reduction, SSP1-26 offers the most optimistic scenario for achieving the goal of limiting the temperature rise to 2 ◦C by 2100 [43], whereas SSP5-85 represents the worst case. In this study, two extreme SSPs (i.e., SSP1-26 and SSP5-85) were chosen to depict the future distribution pattern of *M. pumila* on the Loess Plateau. We chose the climate system model of the Beijing Climate Center (BCC-CSM2-MR) as the source data for this study as it has been widely utilized in previous studies [45,46] in East Asia. We downloaded nineteen bioclimate layers (i.e., bio1-bio19) with a spatial resolution of 2.5 arc-min from the WorldClim (www.worldclim.org/, accessed on 18 May 2020) [47]. These geodatabases include the climatic conditions of the near to present period (period 1970–2000) and climatic conditions estimated for the near future (results of simulations for period 2021–2040 and period 2041–2060). In this article, we use the 2030s and the 2050s for referring to the time periods 2021–2040 and 2041–2060 respectively. Moreover, we downloaded one elevation datum (1 km) from the RESDC (http://www.resdc.cn/, accessed on 19 May 2020), three soil texture data (clay, sand, and silt, 1 km), and one soil type datum (1 km) from the FAO (www.fao.org/soils-portal/, accessed on 11 May 2020). Including the bioclimatic layers, we prepared a total of 27 variables for MaxEnt: bioclimatic layers (19: bio1–bio19), terrain data (4: one elevation datum and its three derived terrain variables: aspect, curvature, and slope), and soil data (4: three soil texture data and one soil type datum) to MaxEnt (Table 1).


**Table 1.** The environmental factors used in the corresponding simulation (variables with labels of "+" for *Valsa mali*, "−" for *Malus pumila* Mill., and "\*" for land use).


**Table 1.** *Cont.*

#### *2.2. The Screening and Pre-Processing of Data*

Model overfitting can be decreased by variable screening [48,49]. We first removed duplicated records [21] from the occurrence data of the species. The occurrence data were then evaluated in compliance with the requirements of the subsequent study simulations. Considering the geographic location of the Loess Plateau, we transformed the environmental layers with a resolution of 2.5 arc-min into the Asia North Albers Equal Area Conic (ANAEAC) with the resolution of ~4857 m. We then filtered the species occurrence data with a boundary distance of 5000 m to ensure that each grid involved in the model simulation at most covers a single species occurrence point. We obtained 158 points for *V. mali* and 107 points for apple trees (Figure 1) after filtering species occurrence data using the SDMs toolbox (version 2.4) of ArcGIS 10.2 (ESRI, Redlands, CA, USA) (more details see [17,29]). By using the toolbox of ArcGIS to further analyse the elevation layer, we obtained three more terrain factors: aspect, curvature, and slope. To avoid multicollinearity [50,51], we conducted a correlation analysis [48] on 27 variables of bioclimatic layers, terrain data, and soil data. We retained the Annual Mean Temperature (bio1) and Annual Mean Precipitation (bio12), and eliminated other bioclimatic variables with a correlation coefficient value greater than 0.8 (Table 1, see details in [29]). Based on the physiological growth requirements of apple trees [50] and the incidence rate trends of *V. mali* [32], we added six extra bioclimatic variables and finally obtained eight bioclimatic variables for MaxEnt (Table 1). This study also made the assumption that these terrain variables will not change in the near future due to the long-term stability of the terrain [48]. We then resampled the environmental variables (i.e., terrain data and soil data) into a spatial resolution of 2.5 arc-min and converted all the layers used in this study into WGS1984 (the geographic coordinate system) and ANAEAC (the projection coordinate system) to ensure that all software did not need to consider coordinate system transformation.

#### *2.3. Model Processing*

#### 2.3.1. Processing with the Binary Maps of MaxEnt and Land Use Data

The self-evaluation capability facilites of MaxEnt [18,52] were used to assess the accuracy of the resulting models, including receiver operating characteristic (ROC) curves and the area under these curves (AUC). The AUC values range between 0 and 1, with higher AUC values indicating more accurate simulation results [53,54]. When the AUC value is more than 0.8, the result is good; when it is higher than 0.9, the result is excellent [38]. In this study, we selected the automatic mode, setting 10,000 as the maximum number of background points, and choosing a random seed for MaxEnt simulation. For the occurrence data of *V. mali* and *M. pumila*, we randomly selected 30% of them as test data to assess the accuracy of the model, while the remaining 70% were used to calibrate it. Five bootstrap replications (exported in ASCII) were performed and the simulation results were exported in ArcGIS for further analysis.

**Figure 1.** The occurrence data of *Malus pumila Mill.* and *Valsa mali* in China.

The selection of threshold values could improve the stability of the MaxEnt model [55]. In this study, we initially averaged the floating-point values of the acquired simulation results in order to discern between the presence and absence of species in the distribution maps. Then, the species floating matrix was divided into two types: the unsuitable part and the suitable part. For apple trees, *M. pumila*, the maximized sensitivity and specificity value (maxss, 0.2385) was set as the threshold [16,20,56]. The part with a floating value greater than the threshold was the suitable portion, and the remainder was the unsuitable portion. The suitable portion was then divided into three classes by occurrence probability values of 0.4 and 0.6 in lowly suitable, moderately suitable, and highly suitable habitats. For *V. mali*, the major plant pathogen of the AVC, the threshold was changed from its maxss value (0.160) to a new threshold (i.e., 80% of the floating value of the grid map) in order to establish the high-risk habitat of pathogen (HRHP). In other words, the HRHP only included values larger than 80% of the floating value. Additionally, the centroid of habitats was used as an essential metric to measure the range variations of the SH [57]. Hence, in this study, we also measured the centroids of apple trees in different climate conditions and drew their distribution maps with the help of the ArcGIS toolbox.

Land use data from various sources often have varying resolutions and translation criteria. Based on the land resources categorization method of the Chinese Academy of Sciences, this study reclassified the land use data from the European Aeronautics and Space Administration (given in Table S1) into seven categories (i.e., cropland, forest, grassland, mosaic area, bare area, built-up area, and water area). To create the plausible land use distribution maps for the near future, we first estimated the land use transfer matrix and transfer probability using the Markov model of IDRISI 17.0 (Clark Labs, Clark University, Worcester, MA, USA) between 2000 and 2010. The CA-Markov model of IDRISI was then utilized to complete the prediction of land use in 2030 and 2050 with a 10-year intermittent iteration, using the land use map of the starting period (years 2000 and 2010) and the newly established land use transfer and probability matrix.

#### 2.3.2. Build the Mask Layer and Optimize the Suitable Habitats of Apple Trees

Unreasonable fruit tree management (e.g., pruning, [58]) and precipitation are major transmission pathways of *V. mali* [29]. In light of the size of orchards and the effect distance of *V. mali*, experts recommended establishing a buffer space for the HRHP with a distance of 300 m in order to avoid the AVC. With the help of the ArcGIS toolbox, we resampled the floating-point maps into about 323 m and screened the HRHP. We then added a high-risk buffer for the HRHP by setting the grids adjacent to the HRHP to be HRHP. We reclassified all values inside the high-risk buffer range in the ArcGIS toolbox, regardless of whether their attributes had previously been classified as high-risk (Figure S2).

In order to optimize the layout of urban and rural structures and promote the verification and rectification of permanent basic farmland (PBF), the Chinese government conducted their third national land survey (from 2017 to 2021) and designated 1.28 × <sup>10</sup><sup>6</sup> km2 of national cultivated land (including 1.03 × <sup>10</sup><sup>6</sup> km2 of actual PBF; see http://www.gov.cn/, accessed on 27 March 2021). Lacking current accurate digital distribution maps of PBF, this study initially used the land use map of 2015 as its research object and made the assumption that all farmland on the Loess Plateau was PBF. After that, the PBF of 2015 was transformed into a mask layer with a permanent transfer barrier. We finally produced the digital PBF maps for 2030 and 2050 (Figure S3) by overlapping the mask map with the predicted land use maps of the near future. Due to their unsuitability for the large-scale development of apple orchards, in this study, the built-up areas, water areas, and PBF were all reclassified as unsuitable land use types (ULUT, Figure S3).

To optimize the distribution pattern of apple trees under the two near future climate scenarios, we overlaid the ULUT mask, the HRHP mask, and the distribution map of apple trees. If the region matched at least one ULUT and HRHP, we defined it as an unsuitable habitat (CUH) for the cultivation of apple trees. Thereafter, the optimized maps were divided into the following five classes: unsuitable habitat (USH), lowly suitable habitat (LSH), moderately suitable habitat (MSH), highly suitable habitat (HSH), and CUH. Despite the fact that neither the USH nor the CUH were suitable for growing apple trees, there were important differences: the USHs were divided based on the outputs of the original model simulation, while the CUHs were split according to the distribution of the ULUT and HRHP.

#### **3. Results**

#### *3.1. Model Robustness and the Independent Distribution Patterns of Apple Trees, ULUT and HRHP*

Throughout the simulations, MaxEnt provided excellent predictions, with average AUC values of the apple trees (i.e., *M. pumila*) and their vital pathogen (i.e., *V. mali*) of 0.946 ± 0.02 and 0.965 ± 0.013 (mean ± SD), respectively. Without considering the effects of the ULUT and HRHP, the spatial distribution patterns of apple trees indicated that the MSH and HSH were mainly distributed in the south and southeast of the Loess Plateau, and the HSH increased in the west under both SSP1-26 and SSP5-85 in the 2030s and the 2050s (Figure S1). In all time periods (i.e., 1970–2000, the 2030s, and the 2050s), *V. mali* was mostly located in the central and southern Loess Plateau (Figure S2). The most noticeable changes in the modelling of future land use were the decrease in forests and the increase in built-up areas from the 1970–2000 period to the 2030s and the 2050s (Figure S3a–c). For apple trees, the ULUT was mainly dispersed in the south and southeast of the Loess Plateau in the three periods, and increased in the north from the 1970–2000 period to both the 2030s and the 2050s (Figure S3d,e).

#### *3.2. Suitable Habitats under the Effects of Multiple Environmental Factors*

The optimized results showed that, in the 1970–2000 period, the areas of USH, LSH, MSH, HSH, and CUH were 36.15 × <sup>10</sup>4, 7.66 × 104, 2.80 × 104, 0.23 × <sup>10</sup>4, and 18.05 × <sup>10</sup><sup>4</sup> km2, respectively. In the south and southeast of the Loess Plateau, the CUH led to a significant decline in MSH (−5.14 × <sup>10</sup><sup>4</sup> km2, ~−64.74%) and HSH (−0.63 × <sup>10</sup><sup>4</sup> km2, ~−74.12%) (Figure 2). Under SSP1-26 and SSP5-85, the inclusion of the CUH resulted in an area decline of the SH in various degrees in the 2030s and the 2050s (Figure 2). In the near future, under SSP1-26, the USH, LSH, MSH, and HSH decreased by 8.75~10.58 × 104, 6.20~7.28 × 104, 3.06~3.40 × 104, and 0.04~0.25 × 104 km2, respectively (Figure 2a,b). In the 2030s and the 2050s, the reduction in the MSH was 50.00% and 45.76%, respectively, whereas it was 50.00% and 59.52% for the HSH. Compared to the MSH and HSH, the LSH was less affected by the CUH, as the proportion of habitat decline in the near future was 39.61% (the 2030s) and 40.68% (the 2050s). Under SSP5-85, the USH, LSH, MSH, and HSH decreased by 8.65~11.07 × 104, 5.73~6.18 × <sup>10</sup><sup>4</sup> (−34.30%~−34.71%), 3.37~4.11 × <sup>10</sup><sup>4</sup> (−32.31%~−42.02%) and 0.21~0.26 × 104 km<sup>2</sup> (−36.62%~−53.85%), respectively (Figures 2d,e and 3).

**Figure 2.** The spatial patterns of apple trees in the 1970–2000 (**c**) and near future under two scenarios ((**a**,**b**) SSP1-26, (**d**,**e**) SSP5-85).

According to the optimized results of the distribution of apple trees, the CUH mainly spread in the east, south, southeast, southwest, west, north, and northwest of the Loess Plateau (Figures <sup>2</sup> and S3). Currently, the CUH, within 1.81 × 105 km<sup>2</sup> (Figure 3), is predominantly contiguously distributed in the south, southeast, and southwest of the Loess Plateau, is sporadic in the central area, and is patchy in the west, northwest, north, and northeast; the MSH and HSH were mainly scattered in the southeast, and the LSH was mainly located in the central area (Figure 2). Compared with the 1970–2000 period, the area of the CUH expanded in different degrees in the 2030s and the 2050s under both SSP1-26 (1.07~2.37 × 104 km2) and SSP5-85 (1.10~2.38 × 104 km2, Figure 3). The CUH mostly extended in the east, north, and northeast of the Loess Plateau, while the changes in the south, southeast, southwest, and west were relatively slight (Figure S3). In contrast, under the two SSPs, the distribution range of the LSH and MSH were expanded to different degrees in the south, central, and southwest areas in the near future (Figure 2, Table S2). The HSH area change trend showed a V-shaped curve from the 1970–2000 period to the 2030s and the 2050s, while its area under SSP1-26 was still lower than it was under near present climatic conditions (Table S2).

#### *3.3. Shifts of Centroids in the near Future under Two Climate Scenarios*

The changes in the distribution patterns of SHs resulted in shifts of their centroids. Currently, the suitability centroid of apple trees is located at 109◦22 11.79" E, 36◦21 11.92" N (Figure 4). Under SSP1-26, the shift distances for the suitability centroid in the 2030s and the 2050s were 40.05 km and 50.32 km, respectively. Under SSP5-85, the shift distances were 63.88 km and 81.30 km, respectively (Figure 4). In particular, we noticed that, under both SSP1-26 and SSP5-85, all suitability centroids of apple trees displaced towards the northwest.

**Figure 4.** The suitability centroids of apple trees on the Loess Plateau.

#### **4. Discussion**

#### *4.1. Effects of the USH and Range Shifts in Suitable Habitats of Apple Trees*

In this study, the geographic distribution patterns of one economically important tree (i.e., *M. pumila*) and one of its vital pathogens (i.e., *V. mali*) under SSP1-26 and SSP5-85, as well as the corresponding land use patterns, were simulated using MaxEnt and the CA–Markov model. Based on this, more detailed maps of the geographic distribution of apple trees were produced. After considering the CUH (i.e., ULUT and HRHP), the distribution patterns of apple trees on the Loess Plateau became more fragmented as a result of the increase in the classification criteria from four SHs to five. Compared with its 1970–2000 range, the SH (i.e., the LSH, MSH, and part of the HSH in the 2050s period under SSP5-85) expanded to different degrees in the south and southwest of the Loess Plateau. This is consistent with the impact of the southeast monsoon on the direction of precipitation inside the Loess Plateau range [59]. Moreover, the geographical distribution of the Qinling Mountains (on the south and spanning from west to east, [1]) may have some impact on this. Furthermore, while not negligible, the influence of regional topography on regional climate change (i.e., temperature and precipitation, [59]) is often difficult to measure precisely.

Global climate change has brought challenges for the cultivation of apple trees on the Loess Plateau. From the 1970–2000 period to the 2030s and the 2050s, the area of the SH under SSP5-85 increased more than under SSP1-26. Under SSP1-26 and SSP5-85, the ideal cultivation habitat of apple trees on the Loess Plateau will shift northwest in the near future compared to their near present distribution. In addition, because the Loess Plateau is one of the most important apple-producing regions in China [14], a series of adaptation measures will be required to maintain the size and yield of the apple industry. In recent decades, the average elevation of apple orchards in Northern India has displaced upward by about 800 m [15], and it has moved northward and westward in China [17]. Across the period from the 2030s to the 2050s, the shift distances of suitability centroids increased between SSP1-26 and SSP5-85. We hypothesize that this may be related to the temperature influence on the growth and development of apple trees: similar to how temperature thresholds influence the development activities of particular pest species [60], an appropriate temperature increase will enhance the distribution of apple trees on the Loess Plateau, but when the temperature change exceeds a certain threshold, the promoting effect of temperature increase is likely to shift from the positive to negative. This is contrary to the practical experience of economically important forest trees seeking the most environmentally similar habitats when facing climate change [15,45].

#### *4.2. Effects of Abiotic and Biological Factors on the CUH*

Compared with the distribution patterns of the CUH in the 1970–2000 period, it will expand in different degrees in the near future. Furthermore, its expansion trends were mainly concentrated in the east, northeast, and north of the Loess Plateau, while the changes occurring in the south, southeast, southwest, west, and northwest directions were minimal. However, considering the delineation of the CUH, there were two potential causes for uncertainty in the simulations. On the one hand, the simulation uncertainty of the ULST may lead to the uncertainty in the CUH. Both natural controlling factors (i.e., temperature, precipitation, terrain, etc.) and socio-economic driving factors have impacts on the LUCC [61], and their effects vary depending on the land use type [62]. This is partly because some land use types (e.g., orchards) have more economic benefits than those with more ecological functions (e.g., forests and grasslands). This study assumed that the terrain factors would remain stable in the near future, and on this basis, the climatic factors, such as temperature and precipitation, could be the dominant natural factors impacting the Loess Plateau ecosystem. Meanwhile, given the significant impacts of socio-economic factors (i.e., policies, regulations, and systems, [63]) on the LUCC, it cannot be ignored. National ecological projects have significantly changed the land use patterns in the north, northwest, and northeast of China during the last 50 years [7], as well as the creation of

nature reserves [64]. These are significant examples of how socio-economic factors have influenced the LUCC [61,63]. Hence, limited by land use policy, predicting future land use changes based on current regulations, although essential and indispensable at this point, cannot overlook the tremendous uncertainty caused by itself. On the other hand, the simulation uncertainty of the HRHP may lead to extra uncertainty in the range of the CUH. In the 2030s and the 2050s, the CUH range differs slightly between SSP1-26 and SSP5-85 (Figure 2). To lessen the computational burden of the simulation, the impact of the two SSPs on land use change was not considered separately in this study (i.e., the same land use simulations were used in both SSP1-26 and SSP5-85). Therefore, the difference in the range of the CUH under similar SSPs was caused by their HRHP range (Figure S2). For the plant pathogen *V. mali*, details about its host (i.e., apple trees, plum trees, etc.), biological factors (such as insect behaviour), and its potential transmission routes (such as seeds and stocks with the pathogen [58]) need to be considered in future studies to obtain more convincing simulation results. In addition, further research is required on the reaction of *V. mali* to temperature and precipitation. An essential factor in the prevention and treatment of the AVC is setting an adequate buffer distance for *V. mali*. Hence, future research should also pay more attention to how many relevant environmental factors such as land types, terrain, and other geographic barriers (e.g., seasonal wind, river, etc.) affect the buffer distance of *V. mali*.

#### *4.3. Strategies to Improve the Accuracy of Simulations*

This study produced an excellent simulation of the potential geographic distribution of apple trees in the near present and near future on the Loess Plateau. However, to increase its accuracy, three uncertainties should be overcome in follow-up studies. First, there are uncertainties within the MaxEnt model itself. Species occurrence data and environmental variables are the fundamental inputs of MaxEnt. Before building the model, we screened the species occurrence data (based on environmental data resolution) and environmental data (i.e., principal component analysis and correlation coefficient) separately [65]. However, it is still necessary to make sure that this is the best strategy to utilize these occurrence data. Additionally, merely considering the correlation and overlap between environmental factors may neglect crucial factors [57] that might have potential impacts on the distribution patterns of species. In order to reasonably filter environmental variables and enhance the stability and precision of the simulation of species distribution, future studies may need more extensive assessment strategies (e.g., evaluate the weight of environmental variables in the model, [66,67]). Second, there are uncertainties in dividing SHs. Though maxss has been commonly applied in SDM studies to distinguish between the potential of presence and absence of species [56], its stability in various hydrothermal environments remains uncertain. Future studies should focus more on improving and enhancing the indicators used to assess the presence and absence situation of species. Third, there are uncertainties in the mask maps. Some natural, inevitable uncertainties existed in the simulation of climate scenarios by the climate prediction organisations, resulting in the uncertainty in the mask maps of the HRHP. Human activities, policies, and climate change all have an impact on the LUCC [19,61]. However, their effects vary from ecosystem to ecosystem, especially in those that are regularly impacted by human activities [26]. The Loess Plateau ecosystem is particularly sensitive due to its peculiar climatic conditions [6,12]. It should be noted that the modelling estimates of land use in the near future in this study did not sufficiently account for relevant environmental factors. In order to increase the accuracy of modelling in this region, future studies may need to perform more in-depth related research on the driving forces [62]. At the moment, the LUCC predictions of the Loess Plateau under different climate change scenarios are still lacking, and hence their potential impacts are overlooked in this work.

#### **5. Conclusions**

In this study, we developed a model that took multiple environmental factors into account, such as temperature, precipitation, terrain, soil, climate change, and human activities, to simulate the distribution pattern of apple trees on the Loess Plateau in the near present and near future under two Shared Socioeconomic Pathways (i.e., SSP1-26 and SSP5-85). The increase in the SH in SSP5-85 was larger than in SSP1-26. In the near future, the CUH increased in different degrees in the east, northeast, and north of the Loess Plateau. The LSH, MSH, and HSH shrunk to varied degrees after optimization, taking the CUH into consideration, and their decrease in percentage was larger than that of the USH. Under the two SSPs, all suitability centroids shifted to the northwest in the near future relative to the 1970–2000 centroid. As the pressures of climate change increased from SSP1-26 to SSP5-85, the shift distances of centroids increased in both the 2030s and the 2050s. Under the same climate change pressures, the shift distance increased more in the 2050s than it did in the 2030s.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agriculture13020291/s1. Figure S1. The spatial distribution patterns of suitable habitats of apple trees on the Loess Plateau in three periods under two climate scenarios (SSP1-26: (a,b); SSP5-85: (d,e)); Figure S2. The spatial patterns of high-risk habitat of Valsa Mali in the 1970–2000 and near future (the 2030s and 2050s) under two climate scenarios (SSP1-26: (a,b); SSP5-85: (d,e)); Figure S3. The land use patterns (a–c) and the unsuitable land use type for the cultivation of apple trees (d–f) on the Loess Plateau in the 1970–2000 and near future (the 2030s and 2050s); Table S1: The correspondence of the reclassification of the land use and cover changes; Table S2: The area of suitable habitats (before and after optimization) of apple trees on the Loess Plateau (104 × km2). The CUH represents the class unsuitable habitat.

**Author Contributions:** W.X. designed the study, collected occurrence data, drew the map, tables, and figures, and wrote the main body of the manuscript; Y.M. and S.Z. improved the manuscript in many parts and calculated the corresponding area of suitable habitats; J.J. and J.C. acquired the funding and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was financially supported by the Key Research and Development Program of Shaanxi Province (2021NY-006), Natural Science Foundation of China (31601987), China Agriculture Research System (CARS-34), Deployment Program of the Chinese Academy of Sciences (KJZD-EW-TZ-G10) and Doctoral Fund of Henan Polytechnic University (B2019-4).

**Institutional Review Board Statement:** Not application.

**Acknowledgments:** We thank the corresponding organisations that provided data and software to support this work. Importantly, we would like to thank Runzhi Mao for his spiritual support of our life and this work. We also thank the anonymous reviewers for their suggestions and contributions in improving this manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Agriculture* Editorial Office E-mail: agriculture@mdpi.com www.mdpi.com/journal/agriculture

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

www.mdpi.com ISBN 978-3-0365-6803-4