With the continuous acceleration of industrialization, air pollution has become increasingly severe and has, to some extent, contributed to the progression of global climate change. Against this backdrop, accurate temperature forecasting plays a vital role in various fields, including agricultural production, energy scheduling,
[...] Read more.
With the continuous acceleration of industrialization, air pollution has become increasingly severe and has, to some extent, contributed to the progression of global climate change. Against this backdrop, accurate temperature forecasting plays a vital role in various fields, including agricultural production, energy scheduling, environmental governance, and public health protection. To improve the accuracy and stability of temperature prediction, this study proposes a hybrid modeling approach that integrates convolutional neural networks (CNNs), Long Short-Term Memory (LSTM) networks, and random forests (RFs). This model fully leverages the strengths of CNNs in extracting local spatial features, the advantages of LSTM in modeling long-term dependencies in time series, and the capabilities of RF in nonlinear modeling and feature selection through ensemble learning. Based on daily temperature, meteorological, and air pollutant observation data from Wuhan during the period 2015–2023, this study conducted multi-scale modeling and seasonal performance evaluations. Pearson correlation analysis and random forest-based feature importance ranking were used to identify two key pollutants (PM
2.5 and O
3) and two critical meteorological variables (air pressure and visibility) that are strongly associated with temperature variation. A CNN-LSTM model was then constructed using the meteorological variables as input to generate preliminary predictions. These predictions were subsequently combined with the concentrations of the selected pollutants to form a new feature set, which was input into the RF model for secondary regression, thereby enhancing the overall model performance. The main findings are as follows: (1) The six major pollutants exhibit clear seasonal distribution patterns, with generally higher concentrations in winter and lower in summer, while O
3 shows the opposite trend. Moreover, the influence of pollutants on temperature demonstrates significant seasonal heterogeneity. (2) The CNN-LSTM-RF hybrid model shows excellent performance in temperature prediction tasks. The predicted values align closely with observed data in the test set, with a low prediction error (RMSE = 0.88, MAE = 0.66) and a high coefficient of determination (R
2 = 0.99), confirming the model’s accuracy and robustness. (3) In multi-scale forecasting, the model performs well on both daily (short-term) and monthly (mid- to long-term) scales. While daily-scale predictions exhibit higher precision, monthly-scale forecasts effectively capture long-term trends. A paired-sample
t-test on annual mean temperature predictions across the two time scales revealed a statistically significant difference at the 95% confidence level (t = −3.5299,
p = 0.0242), indicating that time granularity has a notable impact on prediction outcomes and should be carefully selected and optimized based on practical application needs. (4) One-way ANOVA and the non-parametric Kruskal–Wallis test were employed to assess the statistical significance of seasonal differences in daily absolute prediction errors. Results showed significant variation across seasons (ANOVA: F = 2.94,
p = 0.032; Kruskal–Wallis: H = 8.82,
p = 0.031; both
p < 0.05), suggesting that seasonal changes considerably affect the model’s predictive performance. Specifically, the model exhibited the highest RMSE and MAE in spring, indicating poorer fit, whereas performance was best in autumn, with the highest R
2 value, suggesting a stronger fitting capability.
Full article