1. Introduction
High-precision, high-spatiotemporal-resolution near-surface air temperature data are essential for global change research, environmental monitoring, and disaster early warning [
1,
2,
3,
4]. Land Data Assimilation Systems (LDASs), which merge real-time observations with model simulations to generate spatially continuous gridded data, have become a primary source for such information [
5].
The critical requirement for reliable temperature data is exemplified in agricultural monitoring, particularly under global warming. Due to increasing climate instability, the frequency and intensity of extreme heat events have risen significantly, which has emerged as a major threat to crop production and food security [
6,
7,
8,
9,
10]. In China, heatwaves concentrated from May to September [
11,
12,
13] overlap with the heading and flowering stages of rice, a crop highly sensitive to temperatures above 35 °C [
14,
15,
16,
17]. Even brief exposure to 33.7 °C during flowering can impair pollen viability [
18]. Therefore, accurate temperature data is the fundamental to effective heat stress monitoring and mitigation.
The China Meteorological Administration Land Data Assimilation System (CLDAS) provides gridded temperature covering Asia at hourly temporal and 0.0625° spatial resolutions, offering key support for regional agricultural meteorological disaster monitoring [
19,
20]. CLDAS is recognized for its higher spatiotemporal resolution and has demonstrated competitive performance against other systems such as the Global Land Data Assimilation System (GLDAS) [
21] and the European Centre for Medium-Range Weather Forecasts Reanalysis 5 (ERA5) [
22,
23,
24]. However, CLDAS temperature still exhibits significant systematic biases and random errors, owing to sparse observational stations, complex assimilation algorithms, and land surface heterogeneity [
25]. Substantial errors have been documented across various regions. Significant errors were observed in CLDAS daily maximum temperatures for Chongqing with an MAE of 1.14 °C [
26]. In Qinling and Daba Mountains of Shaanxi Province, fewer than 30% of the temperature data had an absolute error within 2 °C [
27]. Research in Lanzhou and Wuwei demonstrated that errors in daily and hourly temperatures varied systematically with altitude [
28]. These errors directly affect the accuracy of rice heat stress monitoring in these regions.
A range of methods have been developed to correct temperature data and improve its quality. While traditional statistical methods, such as linear regression and quantile mapping, are simple and efficient, recent studies have introduced more specialized corrections. Li et al. [
26] proposed a quasi-symmetric hybrid sliding training method to correct CLDAS daily maximum temperature, reducing its MAE from 1.14 °C to 0.64 °C. Yang et al. [
29] applied a linear correction method to CLDAS temperature data in Guizhou Province, lowering the annual RMSE from 1.55 °C to 1.23 °C. A key limitation of these methods is their dependence on a dense observational network, because the station data is utilized to spatially interpolate or adjust the original grid data.
Machine learning algorithms, such as Random Forest and LightGBM, have also been applied to temperature correction. Li et al. [
30] used Random Forest to correct the 2-m temperature data from the European Centre for Medium-Range Weather Forecasts (ECMWF), significantly improving the regional forecast accuracy. Chen et al. [
31] proposed a spatiotemporally independent Random Forest model for correcting ECMWF temperature data in Hainan Island, which maintained an MAE reduction of 0.2–0.3 °C even with sparse station data. Fang et al. [
32] employed LightGBM with multiple features to construct a high-resolution (0.05° × 0.05°) grid correction model, achieving accurate hourly temperature forecasts in Hubei Province. These methods effectively capture the relationship between air temperature errors and multi-source land surface features, thereby reducing dependency on station spatial distribution, allowing for more efficient bias correction. However, the application of machine learning in temperature correction remains limited. On the one hand, single machine learning often overfits local noise, leading to poor generalization and unstable results. On the other hand, model performance depends on the quality of the input features. The lack of high-spatiotemporal-resolution dynamic surface information prevents the model from accurately capturing the key physical processes that cause temperature errors.
Geospatial artificial intelligence (GeoAI) integrates geospatial data analysis with artificial intelligence (AI) to extract information from satellite imagery and location-based data, supporting data-driven decision-making in agricultural meteorology, enhancing the capacity to address specific challenges including temperature data bias and rice heat stress under climate change [
33]. While traditional machine learning methods possess strong predictive capabilities, they often lack explicit consideration of physical mechanisms, which limits model interpretability and generalizability in complex meteorological scenarios. This study developed a GeoAI framework for temperature correction of CLDAS data and proposed a dual-path model that combines physical mechanisms with data-driven methods. This design ensures that the correction process is not only data-efficient but also grounded in clear geophysical principles, enhancing interpretability and robustness. The physics-guided approach characterizes land–atmosphere interaction mechanisms. The data-driven approach employs an ensemble of XGBoost, LightGBM, and Random Forest to capture complex non-linear relationships between features and temperature errors.
Within the physics-guided approach, a feature set that integrated both dynamic surface observations and static topographic attributes was designed to explicitly represent the underlying physical processes. The dynamic surface metrics derived from FY-4 satellites included the daily maximum Land Surface Temperature (LSTmax) and the diurnal Land Surface Temperature range (LSTmax_min). LSTmax captured the peak thermal state based on the energy balance coupling with air temperature, while LSTmax_min reflected extreme temperature trends and surface thermal inertia. The model also integrated Vegetation Indices, such as Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI), which collectively represent vegetation regulation and moisture feedback mechanisms. For static topographic representation, the model utilized a Digital Elevation Model (DEM) to quantify the vertical temperature lapse rate, correcting large-scale systematic biases. It further incorporated local terrain features, e.g., slope, aspect, Topographic Position Index (TPI), and Terrain Ruggedness Index (TRI), to capture fine-scale thermal variations caused by topographic heterogeneity.
Within the data-driven approach, the three ensemble learners were synergistically integrated for complementary advantages. XGBoost was used to capture high-order interaction relationships between temperature errors and multi-source features [
34,
35,
36,
37]. LightGBM enabled efficient processing of high-dimensional spatiotemporal data [
38,
39,
40,
41]. Random Forest helped reduce the overfitting risk caused by extreme error samples [
42,
43,
44,
45]. This ensemble strategy significantly enhanced the model’s generalization and computational stability.
Through the combination of physical guidance and data-driven approaches, the model ultimately generated high-precision temperature grids with spatial continuity, temporal consistency, and physical interpretability. The corrected data were subsequently applied to monitor rice heat stress across China, which allowed for a reliable monitoring model to be established, demonstrating the practical value of the proposed GeoAI framework in enhancing agricultural meteorological services.
4. Discussion
4.1. Model Interpretability
The variable importance analysis was conducted across the base learners (XGBoost, LightGBM and Random Forest), with results summarized in
Table 7. These results indicated that the model’s corrections relied on features with clear physical meanings, and that its ensemble strategy effectively integrates distinct mechanisms for correcting physical errors.
The key features were found to possess direct physical significance. The three variables with the highest average importance were the raw CLDAS temperature (36.53%), the target of correction, along with DEM (11.72%) and TPI (9.13%). These correspond to the initial temperature, the elevation effect, and topographic position, respectively, all core factors that govern near-surface air temperature. This result demonstrates that the model’s correction process is grounded in established geophysical principles.
The base learners exhibited complementary correction strategies. Random Forest was heavily reliant on the raw CLDAS temperature (81.11%) and assigned relatively high importance to LST (10.21%). Its correction approach likely focused more on errors related to the raw temperature and surface energy processes. In contrast, XGBoost and LightGBM significantly reduced the weight given to raw data and instead placed greater emphasis on topographic indicators such as DEM and TPI to identify spatially structured errors. They also showed relatively lower dependence on LST (6.83% and 5.61%, respectively). This difference highlights the limitations of any single model in capturing complex error sources. Random Forest was more associated with biases in original data and surface thermal radiation, while the gradient boosting models were more adept at capturing systematic biases driven by topography.
The ensemble framework effectively balanced these physical mechanisms by integrating the strengths of each base learner. It combined Random Forest’s focus on raw temperature bias with the gradient boosting models’ sensitivity to topographically structured errors. This integration reduced the over-reliance on the raw input data seen in Random Forest alone, leading to a more robust correction model. The final ensemble not only achieved higher accuracy but also maintained a clear, physically interpretable structure by addressing error sources from both surface thermal processes and terrain effects.
Furthermore, the model’s foundation in physical mechanisms and its interpretable design support strong generalizability. It is applicable not only to monitor rice heat stress, as validated in this study, but also to assess extreme heat events. With only minor adjustments, the framework can be adapted to monitor meteorological stress in other staple crops such as wheat and maize. This flexibility, together with the model’s interpretability, increases its practical value for agricultural and climate impact assessments across different applications.
4.2. Model Performance
In the research, the integration of topographic features with dynamic surface features (NDVI, EVI, LST
max, LST
max_min) notably improved temperature correction accuracy. For Shanxi (NWR), the overall MAE was reduced to approximately 0.9 °C and RMSE to 1.43 °C. Near the Qinling Mountains, MAE was further lowered to 1.02 °C, representing a distinct improvement over the correction for complex terrain reported by Wang [
27]. Similarly, in Guizhou, another region with complex topography, the model reduced the RMSE of corrected CLDAS temperature to 0.98 °C, outperforming the 1.23 °C RMSE reported by Yang [
29]. These results demonstrate that the proposed model offers better adaptability and accuracy in areas with significant topographic heterogeneity.
However, the corrected CLDAS temperature in the NWR still showed a relatively high RMSE of 1.30 °C compared to other regions. This is closely associated with the region’s complex terrain. As shown in
Table 8, there were significant differences in microtopographic relief among China’s major rice-growing regions. The NWR exhibited the most rugged terrain, with a standard deviation of TPI and TRI reaching 197.99 m and 459.23 m, respectively, substantially higher than those of other regions. Because of the CLDAS data’s spatial resolution of 0.0625° (approximately 7 km), elevation differences within a single grid can be hundreds of meters. This resolution is insufficient to capture fine-scale topographic variability, leading to discrepancies between gridded temperature estimates and local conditions, which in turn limits correction accuracy here. This finding is consistent with previous research [
28], demonstrating that surface heterogeneity remains a key constraint on improving CLDAS temperature products.
It is noteworthy that even in highly rugged mountainous areas such as the Qinling and Daba Mountains, which share topographic complexity with parts of the NWR, our model still achieved meaningful improvements over the original CLDAS data. In these regions, the proportion of stations with MAE below 2 °C increased significantly from 80% before correction to 93% after applying our ensemble model.
4.3. Effects of Heat Stress on Rice Growth and Yield
Although rice is a thermophilic crop, there are clear temperature thresholds for its growth and development. According to studies, rice growth rates start to decrease when temperatures regularly rise above 32 °C; at or above 35 °C, severe physiological stress (such as excessive transpiration and photosynthetic inhibition) severely impedes development and causes a sharp decline in seed-setting rates during crucial stages like heading and flowering [
49,
50,
51,
52]. Additionally, research conducted in China supports a clear association between cumulative high stress and yield loss [
53].
A major methodological challenge in evaluating the regional impact of high temperatures lies in separating its influence from other factors. Interannual changes in planting area constitute a crucial confounding variable. While yield per unit area decreases because of heat stress, an expansion in planted area frequently makes up for it at the level of total production, hiding the actual heat-induced harm. The 2020 case of Hunan illustrates this masking effect. Despite numerous recorded heat stress events, a 3.6% expansion in planted area resulted in a 1.05% increase in total yield.
To eliminate the effect of changes in planting area and accurately isolate the direct impact of high-temperature stress on rice production, this study employed partial correlation analysis to control the effect of changes in planting area. The results indicated that various heat stress metrics show stable and significant negative correlations with the yield per unit area. This demonstrates that the heat stress metrics can capture the direct negative impact on rice production, confirming their validity for assessing yield loss.
The correlation analysis presented above confirms that the model’s outputs reliably reflect yield impacts. However, it is important to note that the response of agricultural production to climate is multifaceted. The consequences of heat stress may also be mitigated or exacerbated by other management strategies such irrigation adjustments, cultivar replacement, and phenological adaptation. Furthermore, due to data limitations, the present validation primarily focused on controlling for the compensatory effect of planting area. The multi-year aggregation approach, while ensuring robustness, may also overlook dynamic annual-scale variations. Future research, based on more granular and temporally explicit datasets, could strengthen the validation by incorporating a wider range of adaptive measures and examining interannual variations in model performance, thereby further refining the model’s utility in monitoring climate impacts on agriculture.