1. Introduction
Long-term exposure to high concentrations of sulfur dioxide (SO
2) poses serious health risks, including respiratory and cardiovascular diseases [
1]. In response, China implemented clean air initiatives between 2010 and 2017, resulting in a significant reduction in SO
2 emissions, particularly in northern China [
2,
3]. Accurate forecasting of SO
2 concentrations is critical for both effective pollution control and the protection of public health and ecosystems.
Traditional statistical analysis and numerical simulation represent two principal methodologies employed in air pollutant forecasting models. Statistical forecasting methods, either the time-series methods (e.g., Autoregressive Integrated Moving Average, ARIMA), which do not use meteorological inputs, or regression and similar methods (e.g., Multiple Linear Regression, MLR), which are mostly based on multivariate linear relationship between meteorological conditions and ambient air pollution concentrations, were commonly used [
4]. However, they have significant drawbacks in handling nonlinear relationships, overfitting issues, model interpretability, and sensitivity to outliers. Numerical simulation methods recreate the atmospheric transport and chemical transformation of pollutants through sophisticated mathematical modeling. A quintessential example is the Community Multi-Scale Air Quality (CMAQ) model, which adeptly simulates a spectrum of air quality concerns, such as ozone, fine particulate matter, acid rain, and visibility reduction [
5,
6]. While offering profound insights into the dynamics of complex atmospheric systems, these simulations demand considerable computational power. Moreover, they are particularly challenged by nonlinear or indeterminate systems, as they necessitate accurate initial conditions and parameter configurations, which can be onerous to ascertain.
In recent years, artificial intelligence technologies have demonstrated significant advantages in air quality prediction. Early studies primarily utilized single neural network (NN) models, such as Artificial Neural Networks (ANN), Recurrent Neural Networks (RNN), and Elman Neural Networks, to predict SO
2 concentrations [
7,
8,
9,
10,
11]. Among these, the Long Short-Term Memory (LSTM) model has emerged as a leading approach, offering superior predictive performance compared to traditional time series prediction models [
12].
More recently, hybrid deep learning models have been introduced to further enhance prediction accuracy beyond standalone LSTM networks [
1,
13,
14,
15]. For instance, Zhang and Li [
16] proposed a CNN-LSTM multi-model for forecasting Beijing’s air quality index, outperforming the Auto-Regressive Moving Average (ARMA) model, LSTM, and other methods. Qi et al. [
17] integrated Graph Convolutional Networks with LSTM (GC-LSTM) to model and forecast the spatiotemporal variation of PM
2.5 concentrations in the northern part of the North China Plain, surpassing both MLR and LSTM models. While these convolution-based hybrid models improve forecasting accuracy by extracting complex spatial features, they are less effective in handling data with strong non-linearity, extreme irregularity, and multi-scale variability.
To address these challenges, integrating information decomposition methods with deep learning models has proven to be an effective strategy for improving predictive accuracy and overall model performance [
16,
18,
19,
20]. Zhu et al. [
18] introduced a hybrid deep learning model based on Empirical Mode Decomposition (EMD) for PM
2.5 prediction. Liu et al. [
19] proposed a method combining Wavelet Transform with deep learning techniques for NO
2 forecasting in Tianjin City of northern China. While this approach achieved the highest overall prediction accuracy compared to SVR, GRU, and single LSTM models, its performance in capturing outliers was inferior to that of the single LSTM model. Wang et al. [
20] developed a spatiotemporal hybrid deep learning model integrating Variational Mode Decomposition (VMD), Graph Attention Networks (GAT), and Bi-Directional Long Short-Term Memory (BiLSTM) networks for PM
2.5 concentration forecasting in Beijing, China. Their proposed model achieved significantly lower MAE than EMD-GAT-BiLSTM and GAT-BiLSTM, demonstrating superior performance in both short-term and long-term PM
2.5 prediction tasks. Unlike Wang et al. [
20], who focused on spatiotemporal dependencies using GAT, Zeng et al. [
21] emphasize accurate time series forecasting by integrating the Whale Optimization Algorithm (WOA), VMD, and BiLSTM to enhance the accuracy of hourly PM
2.5 concentration predictions. Their model demonstrated strong generalization ability across different time scales. Similarly, Jia et al. [
22] concentrated on a Random Forest (RF)-enhanced Sequence-to-Sequence (Seq2Seq) architecture for predicting daily NO
2 concentrations in coastal cities of northern China. Since the Seq2Seq model employs a multilayered LSTM framework to map input sequences to output sequences across multiple time steps—outperforming LSTM models in previous predictive studies [
23,
24]—this approach effectively prioritizes the temporal dynamics of pollutant variations, improving predictive accuracy, particularly in capturing abrupt concentration changes. However, these studies primarily focus on predicting air quality levels under relatively stable trends, limiting their adaptability to rapid policy shifts or extreme pollution events.
Due to advancements in flue gas desulfurization installation rates and the implementation of the ‘Coal-to-Natural Gas’ policy in northern China, SO
2 concentrations have declined much more sharply than other air pollutants [
18,
24,
25]. This rapid decline in emissions within the data presents considerable challenges for accurate prediction [
18]. As economically developed regions, these areas experience air pollution primarily from anthropogenic sources, including maritime vessels and land-based vehicles, which also interact with natural sources such as marine aerosols, leading to complex atmospheric dynamics [
3]. Additionally, the periodic sea–land breeze circulation significantly influences the transport and dispersion of pollutants in coastal areas, further complicating pollution patterns [
26,
27]. This interplay among environmental policies, meteorological factors, and intensive human activities creates unique challenges for air quality forecasting. Furthermore, air pollution in these regions leads to substantial economic losses, poses serious threats to human health, and hinders regional sustainable development [
28,
29]. Therefore, accurately capturing the non-linearity, extreme irregularity, and multi-scale variability of pollutant trends—especially the rapid decline in SO
2 emissions—requires advanced modeling techniques capable of addressing these complexities.
In this study, we propose a novel hybrid deep learning model for daily SO
2 forecasting in coastal cities of northern China. Our work integrates several effective techniques and offers key contributions: (1) Feature Selection and Interpolation with RF: RF is suitable for feature selection due to its ability to handle high-dimensional data while capturing complex, nonlinear relationships among variables [
25]. Unlike correlation-based filters, which assess individual predictor importance independently, RF considers interactions among predictors, making it more effective in air pollution modeling [
22]. Additionally, compared to XGBoost, RF provides more stable feature importance rankings across different training subsets, ensuring robustness in predictor selection [
26]. These advantages make RF a reliable choice for identifying key variables influencing SO
2 concentrations. (2) Signal Decomposition and Optimal Sequence Length Determination via VMD: VMD is employed to decompose the SO
2 concentration series, which can effectively reduce noise and data fluctuations [
27,
28]. Moreover, by analyzing both high- and low-frequency bands, we capture different scales of temporal variations in SO
2 and determine the optimal sequence length for the prediction model. (3) Predictive Modeling with Seq2Seq: The integration of the Seq2Seq model significantly improves forecasting performance. Unlike traditional time series neural networks that may struggle with capturing long-term dependencies, the Seq2Seq model, with its encoder–decoder framework, excels at handling long-range sequential data and overcoming this challenge [
22]. Overall, by leveraging RF for selecting the most relevant features as input variables, VMD for decomposing non-stationary SO
2 data, and Seq2Seq for modeling complex temporal dependencies, the proposed hybrid deep learning model can effectively capture intricate SO
2 variation patterns, particularly those characterized by a sharp declining trend, in advance.
The structure of this paper is organized as follows:
Section 1,
Section 2 and
Section 3 provide an introduction to the research domain and the dataset, detailing the methodologies of our hybrid model used in this study. They also outline the evaluation metrics applied to assess the model’s performance.
Section 4 presents the outcomes obtained from our hybrid model. The effectiveness of our model is demonstrated through a comparative analysis with other prevalent models in the field. In this section, we also explore the variations of SO
2 concentrations at different time scales based on the decomposition results, as well as the impact of input sequence length and predictors on forecasting accuracy.
Section 5 concludes the paper, summarizing the key findings and reflecting on the broader implications of the study.
4. Results and Discussion
4.1. Long-Term Trend of SO2 from VMD
The long-term trend of SO
2 levels from 2014 to 2020 was assessed through decomposing the original data by the VMD method (
Figure 4). A significant decreasing trend (
p < 0.0001) was observed across all five cities, based on the IMF1 component of the decomposition results (
Figure 4). Weihai and Yantai show a milder decline in SO
2 concentrations, with slopes exceeding −0.01, while the other three cities exhibit a more pronounced long-term decrease (slopes below −0.01), particularly Weifang, where the slope reaches −0.023. These variations in long-term trends align with the relative amplitudes of SO
2 concentrations across the cities: SO
2 levels are lowest in Weihai and Yantai (under 60 μg/m
3), followed by Qingdao and Rizhao (peaking around 70 μg/m
3), with the highest levels observed in Weifang (above 120 μg/m
3). A similar downward trend in SO
2 concentrations in Qingdao was observed by Meng et al. [
30]. Additionally, Zhang et al. [
38] reported that the mean planetary boundary layer SO
2 vertical column densities exhibited a decreasing trend after 2007, followed by a rebound around 2011, and subsequently a continuous decline from 2011 to 2020 in Qingdao, Rizhao, and Yantai. Given that over 90% of China’s SO
2 emissions stem from the combustion of coal, oil, and other fuels [
39], this trend suggests that Shandong Province has made significant progress in reducing air pollution through the implementation of emission control policies.
In addition, the significant decline in SO
2 concentrations during the heating seasons occurred more rapidly than during the non-heating seasons across the five coastal cities (
Figure 4, black dash-dotted lines), which constitutes the primary contribution to the overall decrease in SO
2 levels. Shandong Province has implemented the CAAP since 2013, which promotes replacing coal with gas for heating and using gas instead of oil for transportation. In December 2017, the ‘Plan for Clean Heating in Winter in Northern China (2017–2021)’ was introduced to further reduce reliance on traditional coal heating. These measures likely contributed to the overall downward trend in SO
2 levels from 2014 to 2020, with a marked decrease during the 2018 heating season. Meng et al. [
30] also confirmed the impact of the ‘Coal-to-Natural Gas’ policy on the SO
2 decreasing trend in Qingdao from 2013 to 2019 using the closed interval of the deweathered percentage change method.
The decomposition analysis of logarithm-transformed SO
2 concentrations across five coastal cities from 2014 to 2020 is presented in
Figure 5, along with the corresponding frequency spectrum derived from Hilbert transform. The parameters, including the number of modes k and the optimized τ, are determined for VMD through the minimization of REI (
Table 1). However, it remains challenging to determine the optimal number of components into which the original series should be decomposed [
40]. Decomposing the series into too few components may fail to capture important features in the raw data, while using too many components can increase computational costs during model training. Through experimentation, we observed that the optimal number of decomposition modes can be identified by the distinct aliasing phenomenon of the center frequency in the last component. For instance, in Weihai City, this study found that when k = 10, the frequency spectrum of the 10th mode exhibited noticeable aliasing (
Figure 5(d2)). Upon comparison with the prediction results for k = 9, the model performed better with 10 components, which was ultimately selected as the optimal number. These optimal parameters help ensure that the VMD decomposition is both precise and effective, providing accurate representation and enhancing the performance of subsequent analyses or predictions.
By decomposing complex signals into IMF of different frequencies, VMD effectively separates long-term trends from short-term fluctuations (
Figure 5). A low-frequency IMF, such as IMF1, is critical to capturing downtrends because it reflects longer-term changes related to macro factors such as policy adjustments. Ignoring the low-frequency IMF will cause the model to fail to identify the downward trend, making the analysis results incomplete. The high-frequency IMF focuses on dealing with short-term fluctuations such as daily/weekly fluctuations, which are often related to short-term factors such as meteorological events. High-frequency IMF can isolate these short-term changes and provide a clear signal for analysis. Without the high-frequency IMF, the model would not be able to capture short-term fluctuations, affecting the understanding of the overall characteristics of the data. The frequency ranges of the IMF components for each site are detailed in
Table S2.
4.2. Weekly Cycle as the Input Sequence-Length from VMD
The choice of input sequence-length directly affects the performance of deep learning model [
41,
42]. A too short time step may result in the model failing to capture the input features fully, while a too long-time step may lead to overfitting. To determine the most effective input sequence length for short-term SO
2 forecasting, we examined the frequency bands and Variance Accounted For (VAF) across five coastal cities, excluding the low-frequency band in IMF1 (
Figure 5). In contrast, the sub-seasonal periodicities observed in IMF2 (12–20 days) likely correspond to mesoscale atmospheric and oceanographic processes, including local wind systems, ocean currents, upwelling zones, and sub-seasonal oscillations (e.g., the Madden-–Julian Oscillation and variations in the East Asian Monsoon) [
32,
43,
44]. Additionally, anthropogenic influences, such as fluctuations in industrial emissions and regional transport of pollutants, may contribute to variations at this timescale. These medium-scale periodicities provide deeper insights into pollutant transport mechanisms, extending beyond daily meteorological fluctuations and enhancing the understanding of multi-scale air pollution dynamics.
Table 2 shows that SO
2 in IMF3–IMF10 exhibits periodicities between 2 and 11 days, while IMF2 contains frequencies corresponding to periods ranging from 12 to 20 days.
The high-frequency bands (IMF3–IMF10) align with the synoptic mode of variability (SMV), which governs atmospheric fluctuations on similar timescales [
32,
42,
45]. SMV is influenced by a combination of remote atmospheric teleconnections, sub-seasonal patterns such as atmospheric waves, regional circulations, and local weather responses to synoptic state perturbations [
46]. The high-frequency modes are essential for short-term forecasting and understanding the temporal variability [
47]. Furthermore, the high-frequency IMFs can be linked to well-known short-term meteorological and human activity patterns. For example, in coastal regions, the diurnal cycle of sea breezes can induce recurrent variations in pollutant dispersion, while weekly traffic cycles contribute to regular short-term fluctuations in emissions [
48]. These practical phenomena help explain the observed 2–11 days periodicities, underscoring the relevance of our time-series decomposition in capturing dynamic and transient features of air pollution.
In contrast, the sub-seasonal periodicities observed in IMF2 (12–20 days) likely correspond to mesoscale atmospheric and oceanographic processes, including local wind systems, ocean currents, upwelling zones, and sub-seasonal oscillations (e.g., the Madden–Julian Oscillation and variations in the East Asian Monsoon) [
32,
43,
44]. Additionally, anthropogenic influences, such as fluctuations in industrial emissions and regional transport of pollutants, may contribute to variations at this timescale. These medium-scale periodicities provide deeper insights into pollutant transport mechanisms, extending beyond daily meteorological fluctuations and enhancing the understanding of multi-scale air pollution dynamics.
Table 2 also shows that the high-frequency bands, with periods ranging from 2 to 11 days, account for 18.4% to 33.8% of the total variance in daily SO
2 measurements, while IMF2 periodicities (12–19 days) explain 4.6% to 7.4% of the variance. Based on these findings, we designed six training experiments with input sequence lengths of 1, 3, 5, 6, 7, and 15 days. The results revealed that an input sequence length of 7 days demonstrated good performance for SO
2 predictions, with lower errors (
RMSE and
MAE) and higher correlations (Pearson’s r and R
2) in Weifang, Weihai, and Yantai (
Figure 6). In Qingdao and Rizhao, a 15-day input sequence length provided slightly better prediction performance than 7 days, which still outperformed other input lengths (
Figure 6). These findings suggest that the weekly-cycle variations predominantly influence the accuracy of one-day-ahead SO
2 predictions across the coastal regions of Shandong Peninsula, particularly in the northern part (Weifang, Weihai, and Yantai), while the biweekly scale variations play an equally or even more significant role in prediction accuracy in the southern regions, such as Qingdao and Rizhao. To maintain model consistency and ensure generalization, we selected a 7-day input sequence length for this study.
Research has shown that SO
2 and other air pollutants exhibit notable weekly cycles, with higher concentrations observed on weekdays compared to weekends [
49,
50]. Weekly cycles provide a valuable framework for distinguishing anthropogenic influences from natural causes, as only human activities are likely to drive variations in concentrations, temperatures, or other atmospheric variables on a seven-day cycle [
50,
51]. Our study employed data from the past 7 days to predict SO
2 concentrations for the next day, effectively considering the ‘weekend effect’ and highlighting the anthropogenic sources of SO
2 in northern China’s coastal cities.
4.3. Influence of Anthropogenic Emission on SO2 Prediction by RF
Figure 7 illustrates the FIVs obtained through RF feature extraction across five coastal cities. For each city, features with FIVs exceeding 0.05 were selected as input variables, resulting in total FIVs ranging from 0.74 to 0.79. Among the selected predictors, CO and RH are common across all five cities, while air temperature is included in four cities, excluding Rizhao. Furthermore, NO
2 and PM
10 are identified as significant predictors in three cities.
Compared to meteorological variables, air pollutants such as CO, NO
2, and PM
10 have a more pronounced impact on SO
2 prediction, accounting for over 60% of the variation in SO
2 (
Figure 7). Previous studies have shown that the gaseous pollutants CO and NO
2 share common anthropogenic emission sources with SO
2 [
52,
53]. This relationship makes their concentration changes highly relevant for predicting SO
2 levels in our model, contributing over 50% to the model’s accuracy (
Figure 7).
Additionally, the primary sources of particulate matter, such as PM
10 and PM
2.5, include industrial emissions, traffic exhaust, biomass burning, coal combustion, and other human activities [
54,
55,
56]. As SO
2 is a gaseous pollutant that can chemically react with other gases and particulate matter, it contributes to the formation of sulfate and related compounds [
57,
58,
59,
60]. These reactions help form particulate matter, underscoring a strong connection between particulate matter concentrations and SO
2 levels. Therefore, our findings suggest that emissions from anthropogenic activities are shared contributors to both particulate matter and SO
2 concentrations.
In addition, meteorological conditions, RH and Temp, have an important influence on the model’s prediction accuracy (
Figure 7). This is due to the influence of RH and Temp on chemical reaction rates and the processes of diffusion and dilution in the atmosphere [
57,
60]. Specifically, increases in RH and Temp accelerate chemical reaction rates, leading to higher SO
2 production. Furthermore, elevated RH and temperatures promote faster diffusion and dilution of SO
2 in the atmosphere.
4.4. SO2 Predicting Performance
In this section, the effectiveness of the RF-VMD-Seq2Seq model in predicting SO
2 concentration is experimentally analyzed in the test set. First, the most relevant input variables for each coastal city were selected from a total of 11 variables after applying the RF method, with the sum of FIVs greater than 0.74 (
Table 3). This targeted selection process refines the input set, enhancing the overall accuracy and robustness of the RF-VMD-Seq2Seq model by focusing on the most significant variables affecting predicted concentrations.
After evaluating the model performance on the training set, we applied the RF-VMD-Seq2Seq model to a test set.
Figure 8 presents a time series comparison of predicted SO
2 concentrations versus observed concentrations for five coastal cities over the test set period (15 January–20 September 2020). The predicted SO
2 concentrations from the test set closely match the observed values in each city, with most differences remaining within 1 µg/m
3. Additionally, there was a slight overestimation in predicting the lower concentrations of SO
2 in Qingdao and Weihai, yet the RF-VMD-Seq2Seq model still performed well overall, as shown in
Figure 8. Despite these minor discrepancies, the model demonstrated strong predictive accuracy and effectively captured key trends in SO
2 concentrations across the coastal cities.
However, when analyzing the differences in model performance, we found that the geographical, climatic, and industrial characteristics of different cities have a significant impact on the predicted results. For example, Weifang, as an important industrial town, has concentrated heavy industry activities and large pollutant emissions. Moreover, its continental monsoon climate (hot and rainy in summer and cold and dry in winter) tends to exacerbate pollutant accumulation, thus affecting the model prediction performance. In contrast, Weihai has little industrial activity and pays attention to ecological protection, and its maritime climate (mild, small temperature difference between day and night) facilitates the diffusion of pollutants. In addition, Weihai is a tourist city, and seasonal population movements may cause short-term fluctuations in pollutant emissions and model predictions.
The RF-VMD-Seq2Seq model demonstrates exceptional performance in capturing both fluctuations and outliers in SO
2 concentrations, including extreme events such as the sharp declines observed during the Spring Festival and the COVID-19 pandemic in 2020 (
Figure 8(a1–a5), gray areas and subplots). Detailed metrics are shown in
Table S4. Notably, SO
2 concentrations peaked each year during the Spring Festival from 2014 to 2019, then quickly returned to high levels after the holiday. However, from late January to March 2020 (see red rectangle in
Figure 4), this transition became more prolonged and pronounced due to the combined effects of the Spring Festival and the progressive lockdown measures implemented during the COVID-19 pandemic in China [
61]. The pandemic, particularly in early 2020, led to a significant reduction in industrial and transportation emissions, introducing irregularities in the data. This resulted in a sharp drop in SO
2 concentrations, temporarily deviating from historical seasonal trends. While our model effectively captured the overall downward trend, the abrupt emission reductions and the subsequent gradual recovery posed challenges for short-term forecasting accuracy.
Moreover, the model effectively handles the upper 5% of SO
2 concentrations (mostly exceeding 10 μg/m
3), with predicted values closely matching observed measurements, demonstrating its robustness in capturing extreme pollution events (
Figure 8(b1–b5)). This ability to predict extreme values is critical, as prior studies have established that elevated SO
2 levels—such as a 10 μg/m
3 increase in daily concentrations—are associated with adverse health outcomes, including hypertension, all-cause mortality, and respiratory mortality [
62,
63,
64,
65].
The majority of the top 5% of SO
2 concentrations occur between December and March, coinciding with the heating season. Increased heating demand leads to higher coal consumption, making it a major source of SO
2 emissions, particularly in northern cities where winter concentrations are significantly higher than in non-heating periods [
48,
59]. Additionally, extreme pollution events contribute to high SO
2 levels, often driven by adverse meteorological conditions such as dense fog, low wind speeds, temperature inversions, and air flow convergence. These factors collectively hinder pollutant dispersion, leading to localized SO
2 accumulation [
60].
Accurate prediction of SO2 levels is vital for mitigating public health risks, particularly during unusual events that disrupt normal pollution patterns. The RF-VMD-Seq2Seq model’s integration of variational mode decomposition ensures the detection and analysis of non-stationary trends, while its sequence-to-sequence framework enables the capture of complex temporal dynamics. By effectively addressing both routine fluctuations and extreme variations, this model confirms its capability to provide reliable forecasts, supporting air quality management and health protection efforts.
The density scatter plots of the predicted vs. observed SO
2 concentrations in five coastal cities are illustrated in
Figure 9. Most of the data points are close to the bisector (black dashed lines in
Figure 9), indicating a high level of agreement between predicted and observed values, especially in Weifang City. Under lower SO
2 pollution conditions (less than 10 µg/m
3), there is a slight overestimation in Qingdao and Weihai, while for higher pollution levels, there is a slight underestimation. In Yantai, the model shows a slight overall overestimation.
The variation in model performance across cities likely stems from differences in local meteorological conditions and emission characteristics. For example, Weifang, an industrial hub with significant petrochemical and manufacturing activities, has higher SO
2 emissions due to industrial combustion (
Figure 8). In contrast, the other four cities, which are coastal and driven by tourism with relatively lower industrial emissions, exhibit different pollution dynamics. Additionally, Weifang’s inland location (
Figure 1) results in weaker coastal wind influence, whereas strong marine airflows in coastal cities enhance pollutant dispersion [
48]. This increased variability contributes to more complex pollution fluctuations, slightly reducing model accuracy in these regions.
Statistically, lower errors and higher correlations further support the strong predictive capability of the RF-VMD-Seq2Seq model across five coastal cities (
Figure 9). The
RMSE values, ranging between 0.79 μg/m
3 and 1.35 μg/m
3, along with the
MAE values from 0.60 μg/m
3 to 0.85 μg/m
3 and
MAPE values between 10% and 15%, reflect minimal fluctuations and differences between predicted and observed SO
2 concentrations.
Moreover, Pearson’s r coefficient greater than 0.96 indicates a strong linear relationship between predictions and actual values. Overall, the RF-VMD-Seq2Seq model demonstrates excellent predictive performance with high accuracy on SO2. This is further supported by an R2 value above 0.91, confirming that the input variables used in the model effectively explain most of the data variation. Overall, these results demonstrate the RF-VMD-Seq2Seq model accurately captures SO2 concentrations even under varying pollution conditions, showing its robustness in handling both lower and higher pollution levels.
4.5. The Role of VMD in Predicting Downward Trend of SO2
To more comprehensively assess the role of VMD in enhancing the performance of the RF-VMD-Seq2Seq model for predicting SO2 concentrations in coastal cities of northern China, we conducted a detailed comparison with the RF-Seq2Seq model using EEMD decomposition (RF-EEMD-Seq2Seq) and the RF-Seq2Seq model without decomposition. To ensure a fair, apple-to-apple comparison, all models were trained and tested on the same samples, allowing for a consistent evaluation of their predictive accuracy and robustness.
Figure 10 shows the comparison of RF-VMD-Seq2Seq with RF-EEMD-Seq2Seq and RF-Seq2Seq. The RF-Seq2Seq model performed the worst across all five cities, with errors (
RMSE,
MAE, and
MAPE) increasing by 113% to 496% and correlations decreasing by 33% to 99%, compared to the other two models (detailed metrics are shown in
Table S3). When comparing the RF-EEMD-Seq2Seq model with RF-VMD-Seq2Seq model, we observed a decline in performance in terms of Pearson’s r and R
2, with a reduction ranging from 6% to 10% and from 13% to 40%, respectively. The high accuracy of this method is attributed to the use of the VMD algorithm, which successfully decomposes the original SO
2 concentration sequence into multiple independent components. This decomposition strategy allows each component to be predicted independently, greatly simplifying the complexity of the prediction task.
In detail, the time series of predicted results from RF-EEMD-Seq2Seq and RF-Seq2Seq are shown in
Figure 11. Without data decomposition in the prediction process, the RF-Seq2Seq model failed to capture the variation in SO
2 levels, particularly its trend (
Figure 11). This highlights that the use of signal decomposition methods, such as VMD and EEMD, significantly improves prediction accuracy for downward SO
2 concentrations.
Regarding the comparison of RF-Seq2Seq models integrated with EEMD and VMD, although RF-EEMD-Seq2Seq effectively captured the overall trend of SO
2 concentrations, it struggled with extreme values. Specifically, it exhibited overestimations during periods of heavy pollution in February 2019 across five coastal cities (
Figure 11). Conversely, during the 2020 Spring Festival, the model underestimated extreme high values in Qingdao and Weifang (
Figure 11). Additionally, it consistently underestimated low SO
2 concentrations in Qingdao and Rizhao (
Figure 11). Decomposing non-stationary data into multiple components, each with unique frequencies, is essential for enhancing the accuracy of predictive models by providing deeper insights into underlying data patterns and trends. EMD-based models struggle with mode aliasing and end effects, lack a strong theoretical foundation, and have a more complex sifting process [
66]. These factors can restrict the overall effectiveness of the hybrid prediction approach. VMD effectively mitigates mode aliasing by optimizing the variational problem, ensuring a more accurate decomposition of complex signals. Additionally, it provides flexible control over the decomposition process by allowing adjustments to the number of modes and bandwidth parameters, enabling a more precise extraction of meaningful patterns from the data [
36,
67,
68]. Our study further indicates that the VMD decomposition method is more effective at explaining the variability in the data and may fully capture the complex and downward trends within the SO
2 concentration data.
Additionally, in comparing the performance of Seq2Seq with traditional LSTM models, our findings indicated that RF-Seq2Seq models more accurately capture general trends and disruptions, regardless of whether RF-based feature selection is applied [
22]. This advantage stems from the Seq2Seq architecture’s ability to effectively model complex temporal dependencies, making it particularly suited for handling the dynamic nature of air quality data. Building upon this, we further enhanced SO
2 prediction accuracy in this study by integrating VMD into the RF-Seq2Seq framework. By decomposing complex signals into IMFs, VMD allows the model to extract meaningful patterns across multiple time scales, leading to more precise predictions. Our findings demonstrate that incorporating a signal decomposition step via VMD significantly refines the input data for the Seq2Seq model, ultimately improving predictive accuracy and robustness compared to traditional forecasting methods.
In addition, random seeds were set in all experiments to ensure consistent initialization conditions, thereby minimizing performance fluctuations caused by randomness. This approach guarantees repeatability and ensures that performance differences across multiple runs are due to methodological differences rather than stochastic variations. However, despite fixing random seeds, we observed variations in RF-EEMD-Seq2Seq outputs due to the inherent noise sensitivity and iterative nature of EEMD. In contrast, RF-VMD-Seq2Seq demonstrated consistently better performance across multiple trials, confirming the stability of its improvement. These findings indicate that VMD provides a more reliable decomposition framework, reinforcing the superiority of RF-VMD-Seq2Seq over RF-EEMD-Seq2Seq.
4.6. Practical Implication
4.6.1. Operational Deployment
The proposed RF-VMD-Seq2Seq model demonstrates high operational efficiency, making it suitable for real-time or near-real-time environmental monitoring applications. The model has a fast prediction time, requiring no more than 30 min per forecast, and it operates with low computational resource consumption. It runs efficiently on systems with less than 8 GB of memory, eliminating the need for specialized hardware. This allows seamless integration into existing air quality forecasting platforms without additional infrastructure costs, enhancing both applicability and flexibility.
The model requires continuous air quality monitoring data, including SO2 concentrations, meteorological variables (e.g., temperature, humidity, wind speed), and historical pollutant trends. Integration with real-time air quality monitoring networks can enable automated data ingestion and preprocessing, ensuring timely forecasting updates.
By streamlining the operational workflow and ensuring computational feasibility, this forecasting framework can serve as a practical decision-support tool for air quality management and early warning systems.
4.6.2. Limitation and Improvement Project
In this study, we employed a 75:25 ratio to partition the dataset into training and test sets. This decision was guided by the sequential nature of time series data, ensuring that data were split strictly in chronological order to prevent information leakage. Through multiple experiments, we found that this ratio provided the optimal balance between sufficient training data for model learning and a representative test set for performance evaluation. While we acknowledge potential limitations—such as the influence of time-specific events (e.g., policy changes or the COVID-19 period) and the periodicity of air pollution trends—this split remains suitable given the objectives of our study. More advanced validation techniques, such as time-series cross-validation or rolling-window methods, could further enhance robustness. However, considering computational efficiency and the stability of our results, the selected partitioning method is sufficient to support the validity of our conclusions.
Our findings indicate that anthropogenic gases such as CO and NO2 are strong predictors of SO2 concentrations, highlighting a strong synergy between their emission sources. This suggests that policies aimed at reducing CO and NO2 emissions—such as stricter controls on fossil fuel combustion in power plants, industrial processes, and vehicle emissions—could simultaneously contribute to SO2 reduction. By identifying these co-pollutants as key indicators, local policymakers can develop more integrated air quality management strategies. For instance, enhancing monitoring efforts for CO and NO2 can serve as an early warning system for SO2 pollution episodes. Moreover, multi-pollutant control policies, such as promoting cleaner industrial production and transitioning to low-emission energy sources, could yield more efficient and cost-effective improvements in air quality.
Our approach is particularly effective in regions experiencing policy-driven emission reductions, such as China’s ‘Coal-to-Gas’ transition. However, its generalizability to other countries depends on industrial structures, energy policies, and climatic conditions. In regions where coal remains a dominant energy source without similar policy interventions, the model may need adjustments to account for different emission dynamics. Nevertheless, the methodology—decomposing pollutant time series and integrating meteorological factors—can be adapted to other settings with appropriate modifications. For instance, in countries with seasonal biomass burning or stricter industrial regulations, alternative predictor variables (e.g., wildfire emissions, vehicle restrictions) may be required. Additionally, the model’s temporal resolution and data sources may need to be adjusted based on local monitoring capabilities.
As SO2 emissions continue to decline due to strengthened environmental policies and industrial transformations, the time-series patterns of air pollution may undergo significant changes. This could impact the stability and generalizability of our model. To ensure continued accuracy, periodic retraining of the RF-VMD-Seq2Seq model is necessary, especially when major policy shifts or emission reductions alter historical trends. An annual update strategy, incorporating the most recent data, could help maintain model adaptability. Additionally, implementing an adaptive learning framework that dynamically adjusts to new emission patterns and external factors would enhance long-term forecasting reliability.
The RF-VMD-Seq2Seq architecture can be extended for multi-day-ahead forecasting, but forecast accuracy tends to decline as the lead time increases (
Table S5). For example, under the same framework and parameter settings, we found that when predicting 2-day and 3-day horizons, the errors approximately doubled, and by the 5-day-ahead forecast, the errors tripled compared to the 1-day-ahead prediction in Qingdao. The determination coefficients decreased from over 0.90 to below 0.60, while correlation coefficients dropped from 0.98 to 0.83. This suggests that while the model retains some predictive capability over longer horizons, the accumulation of uncertainty presents challenges. Future improvements could involve optimizing decomposition parameters, integrating additional exogenous variables, or leveraging ensemble learning techniques to enhance multi-day forecasting accuracy.
Currently, our model relies primarily on historical meteorological data for forecasting SO2 concentrations. However, integrating real-time meteorological forecasts from numerical weather prediction models could further enhance predictive accuracy, particularly for extended forecast horizons. Incorporating forecasted temperature, wind speed, humidity, and atmospheric pressure would enable the model to better anticipate pollution dispersion dynamics and potential anomalies. Future research could explore hybrid approaches that fuse historical data with real-time weather forecasts, improving the model’s applicability for proactive air quality management and decision making by city environmental bureaus.
An hourly modeling framework could be beneficial for regions experiencing rapid pollution fluctuations, such as industrial zones or areas affected by traffic congestion. Hourly forecasting would enable early warnings of short-term pollution spikes, enhancing real-time air quality management. However, transitioning to an hourly model presents several challenges. Data availability is a key concern, as high-resolution air quality and meteorological data must be consistently available with minimal gaps to ensure reliable predictions. Computational load is another factor, as hourly forecasting significantly increases data volume and model complexity, requiring enhanced computational resources. Lastly, shorter temporal dependencies must be accounted for, as hourly variations are influenced more by immediate meteorological shifts and transient emissions, necessitating more frequent updates and potentially different decomposition parameters (e.g., finer-scale IMFs). Future research could explore hybrid approaches, integrating real-time sensor data and high-resolution meteorological forecasts to refine sub-daily predictions while balancing computational efficiency.
In future research, we prioritize extending this modeling approach to PM2.5 due to its severe health impacts and complex formation mechanisms involving both primary emissions and secondary chemical reactions. Compared to SO2, PM2.5 exhibits more intricate temporal variability, influenced by meteorological conditions, precursor gases (e.g., SO2, NO2, and VOCs), and regional transport. Applying the decomposition method to PM2.5 may require more IMFs to capture both short-term fluctuations and long-term seasonal trends. Additionally, different periodic cycles—such as weekly human activity patterns and synoptic-scale meteorological shifts—would need to be considered.
5. Conclusions
In this study, the RF-VMD-Seq2Seq model was established based on random forest feature selection, VMD, and Seq2Seq methods to predict SO2 concentration in five cities in the Shandong Peninsula. The results are as follows:
The RF-VMD-Seq2Seq model demonstrates strong performance in predicting SO2 concentrations across the five coastal cities in Shandong Province, with R2 values exceeding 0.91 at nearly all sites. The Pearson’s r was generally above 0.96, while RMSE remained below 1.35 μg/m3, MAE stayed below 0.94 μg/m3, and MAPE consistently remained below 15%. The VMD decomposition method outperformed the EEMD method in capturing the variability in the data, effectively addressing the complex and non-linear trends in SO2 concentration. Without decomposition, prediction accuracy significantly decreased, with errors doubling and correlations dropping by 33–99%. VMD decomposition significantly improves the predictive performance of standard Seq2Seq. The average indexes in Qingdao, Rizhao, Weifang, Weihai, and Yantai were improved as follows: RMSE decreased by 2.78 μg/m3, MAE decreased by 1.78 μg/m3, MAPE decreased by 23.65, R2 increased by 0.73, and Pearson’s r increased by 0.40.
A significant decreasing trend in SO2 concentrations was observed across all five cities based on VMD decomposition. Notably, the decline in SO2 concentrations during the heating seasons occurred more rapidly than during the non-heating seasons across the five coastal cities. The high-frequency bands from VMD decomposition serve as effective references for determining the input sequence length of the RF-VMD-Seq2Seq model. A 7-day input sequence length enables accurate prediction for the following day, effectively capturing the ‘weekend effect’ and emphasizing the influence of anthropogenic sources of SO2 in the coastal cities of northern China.
The dataset was processed using an RF algorithm to calculate importance scores for each input feature, allowing the selection of key features that reflect how the Seq2Seq model interprets the predictors. Overall, gaseous pollutants had a greater influence on SO2 prediction compared to meteorological factors. CO, NO2, and PM10 had a more pronounced impact on SO2 prediction, accounting for over 60% of the variation in SO2. For the meteorological factors, RH and temperature played critical roles due to their influence on atmospheric chemical reactions, wet deposition, and dilution processes.
Although the RF-VMD-Seq2Seq model proposed in this study delivers strong results for one-day-ahead SO2 prediction, its application is currently limited to short-term forecasting. A potential direction for future improvement is the development of a more integrated model that can predict air pollutants over the medium and long term, achieving even better performance.