Next Article in Journal
Association Between Volatile Organic Compounds and Circadian Syndrome Among Pre- and Postmenopausal Women
Previous Article in Journal
Skin Sensitization Testing Using New Approach Methodologies
Previous Article in Special Issue
A Hybrid Wavelet-Based Deep Learning Model for Accurate Prediction of Daily Surface PM2.5 Concentrations in Guangzhou City
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Prediction of Hourly PM2.5 Concentrations with a Long Short-Term Memory Optimized by Stacking Ensemble Learning and Ant Colony Optimization

1
School of Information Engineering, Nanchang Institute of Technology, Nanchang 330099, China
2
Jiangxi Province Key Laboratory of Smart Water Conservancy, Nanchang 330099, China
*
Author to whom correspondence should be addressed.
Toxics 2025, 13(5), 327; https://doi.org/10.3390/toxics13050327
Submission received: 1 March 2025 / Revised: 16 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

Abstract

:
To address the performance degradation in existing PM2.5 prediction models caused by excessive complexity, poor spatiotemporal efficiency, and suboptimal parameter optimization, we employ stacking ensemble learning for feature weighting analysis and integrate the ant colony optimization (ACO) algorithm for model parameter optimization. Combining meteorological and collaborative pollutant data, a model (namely the stacking-ACO-LSTM model) with a much shorter consuming time than that of only long short-term memory (LSTM) networks suitable for PM2.5 concentration prediction is established. It can effectively filter out feature variables with higher weights, thereby reducing the predictive power of the model. The prediction of hourly PM2.5 concentration of the model is trained and tested using real-time monitoring data in Nanchang City from 2017 to 2019. The results show that the established stacking-ACO-LSTM model has high accuracy in predicting PM2.5 concentration, and compared to the same model without considering time and space efficiency and defective parameter optimization, the mean square error (MSE) decreases by about 99.88%, and the coefficient of determination (R2) increases by about 2.39%. This study provides a new idea for predicting PM2.5 concentration in cities.

1. Introduction

In recent years, the air quality has deteriorated and the phenomenon of haze has increased, posing a greater threat to human health. The main factor of air pollution is inhalable particulate matter, led by PM2.5. Epidemiological studies demonstrate that prolonged exposure to elevated PM2.5 levels significantly increases risks of pulmonary diseases, cardiovascular disorders, and malignancies, while also influencing radiative forcing and climatic patterns [1,2]. Precise PM2.5 forecasting enables proactive public health protection and supports evidence-based environmental policy formulation.
The algorithms of PM2.5 concentration prediction are mainly divided into two categories: deterministic and artificial intelligence models. Deterministic models, including WRF [3,4] and CMAQ [5,6], simulate atmospheric processes to predict pollutant concentrations through physical and chemical mechanisms [7,8]. However, deterministic models require good computing resources and a meteorological knowledge background, and there are the following shortcomings. In terms of input PM2.5 samples, their stability is insufficient, which means that the quality and reliability of the data are difficult to guarantee and may introduce a large amount of noise and errors. Excessive computational demands significantly compromise prediction efficiency and often preclude real-time application. Furthermore, these models exhibit unstable performance with substantial scenario-dependent variability. Excessive computational demands significantly compromise prediction efficiency and often preclude real-time application. Furthermore, these models exhibit unstable performance with substantial scenario-dependent variability. The lack of accuracy also makes it difficult for the prediction results to effectively guide decision-making and planning. These issues seriously constrain the application and development of traditional prediction methods in the field of PM2.5 concentration prediction. By comparison, artificial intelligence models based on deep learning perform well in air quality prediction, such as support vector machines (SVM) [9], BP neural networks [10], extreme learning machines (ELM) [11,12], and so on. But these feedforward models are not adjusted according to the specific situation of the dataset and are more suitable for time-series and real-time updating of data. The LSTM are gradually entering the field of scholars.
LSTM, as an important architecture in deep learning, displays unique advantages in dealing with time-series data. The reason is that LSTM can effectively capture the long-term dependencies in time-series data, which is very crucial for PM2.5 concentration data that varies over time. By using the obtained air quality data to train the LSTM model, the model can learn the inherent laws and trends in the data. The LSTM-based approach utilizes PM2.5 data along with ancillary data, meteorological variables from the Copernicus Atmospheric Monitoring Service operated by ECMWF, and temporal variables related to local emissions to improve air pollution forecasting performance. Bedi [13] and Bedi et al. [14] utilized recurrent neural networks with long short-term memory (LSTM) units to predict Delhi’s PM2.5 concentration and showed that the model combining PM2.5 data with gaseous pollutants as well as the model combining PM2.5 data with gaseous pollutants and meteorology predicted daily and hourly PM2.5 concentration more accurately than other models. However, LSTM models are highly dependent on the inputs predicted by the numerical model, which reflects a common problem of using a single model for prediction, namely the low prediction accuracy and long training time of the model. To address these issues, the concept of ensemble models has begun to be widely accepted. These novel ensemble forecast models overcome these drawbacks by simultaneously taking into account both forecast accuracy and stability [15]. For example, applying only the LSTM model to the CMAQ forecasts can yield reasonable forecast skill levels comparable to the operational AirKorea forecasts that elaborately combine the CMAQ model, AI models, and human forecasters [16]. Lin et al. [17] applied an application strategy, an innovative application-strategy-based LSTM (namely ASLSTM), on the BLSTM to customize ASLSTM for the short-term and accurate prediction of PM2.5 concentrations. Combining spatial weighting, empirical mode decomposition (EMD), and a long short-term memory (LSTM) network, Yu et al. [18] proposed an ensemble model to predict PM2.5 concentration, namely CBAM-CNN-Bi LSTM. By comparing with RNN, HPO-RNN, GRU, and LSTM solely, the results show that the new model surpasses five benchmark models in terms of prediction accuracy.
In addition, LSTM can be combined with other ensemble learning techniques for prediction, such as convolutional neural networks (CNNs) [19,20,21,22,23], graph convolution neural networks (GCNNs) [15,24], ensemble empirical mode decomposition (EEMD) [25,26], attention mechanism [26,27,28], osprey optimization algorithm (OOA) [29,30], hybrid integration (HIG) algorithm [31,32], temporal convolutional network (TCN) [27,33,34,35], transformer [15,36,37], graph sample and aggregation network (GraphSAGE) [15,38,39], bidirectional recurrent gated neural network (BiGRU) [22,40,41], adaptive boosting (AdaBoost) [42,43], genetic algorithm (GA) [44,45,46], k-nearest neighbors (kNN) [46,47,48], random forest (RF) [20,46], support vector regression (SVR) [46,49], and particle swarm optimization (PSO) [46,50,51].
In this study, we initially designed a basic LSTM-based prediction model for preliminary forecasting, selecting all monitoring stations in Nanchang as the baseline. As a representative urban area, Nanchang’s air quality data can reflect the impact of various environmental factors and human activities on PM2.5 concentrations. This model was used to predict PM2.5 levels in Nanchang over the coming days, providing a foundation for subsequent optimization efforts. The original LSTM model consumed significant computational resources during prediction and relied solely on a rolling forecast mechanism based on time steps, without extensive hyper parameter optimization. Given these limitations, we prioritized two key aspects when selecting optimization algorithms: reducing computational demands and improving efficiency. While Bayesian optimization was initially considered for overall model tuning, its inefficiency in large-scale parameter searches and high computational requirements made it less suitable. After thorough evaluation, we adopted a stacking ensemble learning approach to analyze feature variable weights, thereby reducing computational load and enhancing prediction accuracy. Additionally, we introduced the ant colony optimization (ACO) algorithm to optimize the model’s hyper parameters, preventing it from settling into local optima. Building on these improvements, we developed an integrated stacking-ACO-LSTM model, which significantly reduced runtime compared to the standalone LSTM network, making it well suited for PM2.5 concentration prediction. The results demonstrate that this model not only performs well in spatiotemporal efficiency and parameter optimization for Nanchang’s PM2.5 prediction but also exhibits superior forecasting performance and stronger generalization capabilities. It holds substantial practical value and provides meaningful real-world reference significance.

2. Materials and Methods

2.1. Data Source

The study is based on the hourly pollutant index data of the PM2.5, PM10, SO2, NO2, CO, O3, and AQI in Nanchang City from 1 February 2017 to 30 September 2019, which are published on the China Air Quality Online Monitoring and Analysis Platform (https://www.aqistudy.cn). As for why Nanchang City was chosen as the research object, there are mainly the following reasons: Firstly, the data availability is high, and the channels for obtaining data of Nanchang City are very convenient. Secondly, compared with other prefecture-level cities, Nanchang City has more complete air quality monitoring stations, which can comprehensively and accurately collect various pollutant data. Thirdly, as the capital city of Jiangxi Province, Nanchang has a large population and active economic activities, and all these factors will have a significant impact on the air quality.
The relationships among the data: In nature, there are more than a dozen factors that affect PM2.5, such as wind speed, temperature, precipitation, industrial production, emissions from natural sources, coal combustion, and so on. However, in a certain area, some influencing factors tend to be constant within a fixed period of time. For example, in southern cities, the average monthly precipitation and wind speed do not vary much. Therefore, to some extent, factors like wind speed, temperature, and precipitation have a relatively small impact on the determination of PM2.5 content within a fixed area. As an industrial city, Nanchang’s industrial production process emits waste gases, which will lead to a continuous increase in the contents of PM10, CO, SO2, O3, and NO2 in the air. In view of this, in order to conduct analysis and prediction more efficiently, it is decided to select only these factors with more significant influences for in-depth study. In this way, we can focus our efforts on exploring the specific relationships and action mechanisms between them and PM2.5, thus providing more targeted evidence for a better understanding and control of the PM2.5 concentration.
Taking the data of PM2.5 concentration as an example, there should originally be a total of 23,304 pieces of PM2.5 concentration data, among which 1456 pieces of data are missing, accounting for approximately 6.25% of the total data. After removing the missing data, there are 21,848 pieces of data left. The length of the time series is 21,848, which is divided into training and test sets in a ratio of 8:2. For missing data, the commonly used filling methods usually include interpolation methods (such as linear interpolation, spline interpolation, etc.) or the average value method. They are filled in with the average value of nearby time points, which can ensure the integrity of the data and also make the filling and recharging conform to the characteristics of the overall data. Since the overall amount of missing data is not very large, and using the average value to fill in the missing data can also ensure the accuracy of the data to a large extent, using this method will basically not have an impact on the final experimental results. And due to the large scale of the data in the experiment, the data were standardized.

2.2. Long Short-Term Memory Network

Long Short-Term Memory Network (LSTM) is an improved network framework based on recurrent neural network (RNN), which mainly solves the problem of insufficient processing of long-distance dependencies in RNN. The basic idea is based on the original RNN hidden layer having only one state sensitive to short-term inputs, plus a unit state that preserves long-term memory. LSTM achieves long-term state memory through the input gate, forget gate, and output gate. The gate structure selectively allows information to pass through, mainly through a sigmoid neuron and a pointwise multiplication operation. The first step in LSTM is to determine what information to discard from the cell state, which is accomplished through the forget gate. The next step is to determine how much new information to incorporate into the cellular state, which is accomplished through the input gate. Ultimately, it is necessary to determine what value to output, which is determined by the output gate. The basic framework of LSTM is shown in Figure 1.
In Figure 1, the shape of the storage unit (namely the vector dimension) is the same as the hidden state. It is designed to record additional hidden states and input information data. Some literature treats the storage cell as a special kind of hidden state. In particular, the input gate controls the addition of information from the input observation data at this moment and the hidden state from the previous moment to the storage cell. The forget gate controls what was forgotten in the storage cell at the previous moment. The output gate controls which information in the memory cell will be output to the hidden state.
To make it easier to understand the forward-propagation process of the LSTM model, we adapt the model structure diagram as shown in Figure 2, where a t refers to the candidate memory cell C ~ t at time t.
From this, we can obtain the forward-propagation formula of the LSTM model:
Candidate memory cell :       C ~ t = tanh X t W x c + H t 1 W h c + b c ,                         X t R m × d , H t 1 R m × h , W x c   R d × h Input Gate :         I t = σ X t W x i + , H t 1 W h i + b i ,                                                                                           W x i R d × h , W h i R h × h Forget Gate :         F t = σ X t W x f + , H t 1 W h f + b f ,                                                                                       W x f R d × h , W h f R h × h Output Gate :       O t = σ X t W x o + , H t 1 W h o + b o ,                                                                                     W x o R d × h , W h o R h × h    
                Memory cell :                   C t = I t   C ~ t + F t     C t 1             Hidden state :                                 H t = O t       tanh ( C t )       Model output :     Y ^ t = H t W h y + b y ,                 W h y R h × q , Y ^ t R m × q Loss function :     L = 1 T t = 1 T 1 Y ^ t , Y t ,                 L R
where m and d are the batch size of mini-batch stochastic gradient descent and the dimension of the word vector of the input word, respectively. At the same time, h and q are the vector widths (dimensions) of the hidden state and the model output.

2.3. Ant Colony Optimization

LSTM is prone to getting stuck in local optima and has a slow convergence speed [52,53]. Therefore, in order to optimize the hyperparameters of the PM2.5 concentration prediction model, the ant colony optimization (ACO) algorithm is introduced for parameter optimization, which is one of the most popular algorithms for hyperparameter optimization. The ACO is an essentially self-organizing, positive feedback parallel algorithm with strong robustness.
The ACO is a bionic algorithm derived from simulating the path-finding methods of natural ants [54,55]. Ants leave pheromones on the paths they pass through as they move, which are used to transmit information. In addition, ants can sense this pheromone substance during movement and use it to guide their direction of movement. Therefore, the collective behavior of an ant colony composed of a large number of ants shows a positive information feedback phenomenon. That is, the more ants have walked on a certain path, the greater the probability for later ants to choose this path. So, after a series of searches, the ant colony can finally find a logical shortest path.
At the initial moment of the algorithm, m ants are randomly placed in n cities. Meanwhile, the first element of each ant’s taboo list (tabuk) is set to the city where it is currently located. At this time, the amount of pheromone on each path is equal. Let T i j 0 = c (where c is a relatively small constant). At time t, the probability that ant k transfers from city i to city j is:
p i j k ( t ) = [ τ i j ( t ) ] α · [ i j ( t ) ] β s J k ( i ) [ τ i s ( t ) ] α · [ i s ] β ,             w h i l e   j J k i 0 ,                                               O t h e r s
where J k i = 1 , 2 , 3 n t a b u k represents the set of cities that ant k is allowed to choose in the next step. Among them, the taboo list(tabuk) records the cities that ant k has passed through so far. When all n cities are added to the taboo list(tabuk), ant k has completed a tour. At this time, the path that ant k has traveled is a feasible solution to the traveling salesman problem (TSP), where i j is a heuristic factor, representing the expected degree of an ant’s transfer from city i to j. In the ACO, it is usually taken as the inverse of the distance between city i and j. α and β, respectively, represent the relative importance of pheromones and the expected heuristic factor. And when all the ants complete one tour, the pheromone on each path is updated according to the following formula:
τ i j ( t + n ) = ( 1 ρ ) · τ i j t + τ i j
where ρ (0 < ρ < 1) and 1 − ρ represent the evaporation coefficient of pheromone on the path and the persistence coefficient of pheromone, respectively. τ i j   represents the increment of pheromone on edge ij in this iteration, that is:
τ i j = k = 1 m τ i j k
Among them, τ i j k represents the amount of pheromone left on edge i and j by the kth ant in this iteration. If ant k does not pass through edge i and j, then τ i j k = 0. It can be expressed as:
τ i j k = Q L k ,   When ants pass through edge ij during this round trip 0 ,                                                                                                                                     O t h e r s
where Q is a normal number, Lk represents the length of the path traveled by the kth ant in this tour.

2.4. Stacking Ensemble Learning

To some extent, each learning model has some limitations and cannot fully adapt to the data, nor can the accuracy reach 100%. In order to further improve the accuracy of prediction, it is possible to consider combining multiple models to fully utilize the advantages of each model, from which the best model can be selected. This method is also known as an ensemble learning algorithm. Ensemble learning algorithms can be classified into bagging, boosting, and stacking ensemble learning based on their types [56]. While the bagging and boosting algorithms use the principles of voting and weighted averaging, respectively, the stacking algorithm takes a different approach by constructing a new model to retrain the predictions of multiple learners. The basic idea is to use multiple base learners to learn from the training data separately and then use the outputs of these base learners as new features to input into the meta-learner for training. Finally, the meta-learner provides the prediction results.
Specifically, stacking ensemble learning consists of two stages [57]. In the first stage, different base learners are used to train the original training data to obtain a variety of very different predictions. Among them, these base learners can be different types of algorithms, such as decision trees, support vector machines, neural networks, etc. In the second stage, the prediction results of the base learners in the first stage are utilized as new features and combined with the labels of the original training data to form a new training set for training meta-learners. Among them, the meta-learner is usually a comparatively simple model, such as linear regression, logistic regression, and so on. In general, it can be said that in the testing stage, the base learners are firstly used to predict the new test data. And then these prediction results are input into the meta-learner to obtain the final prediction result.
It is necessary to discuss in detail its two stages. Suppose we have a training set containing N samples D = {(x1, y1), (x2, y2), …, (xN, yN)}, where xi and yi are the input feature vector and the corresponding label, respectively. We have M base learners L1, L2, …, LM and one meta-learner Lmeta.
The first stage. Each base learner L m (m = 1, 2, …, M) is trained on the training set D to obtain the model L m (D). Where, different base learners can be based on different algorithms. For example, model L1 can be obtained by training on decision tree algorithm and model L2 can be obtained by training on support vector machine algorithm.
For each sample x i , a prediction is made using the base learner to obtain the prediction result z i , m = L m ( x i ), where m = 1, 2, …, M. For example, for a binary classification problem, the base learner might output the probability value that the sample belongs to a certain category, while for a regression problem it outputs the specific predicted value.
The second stage. Construct a new training set D = D = ( z 1 , y 1 ) , ( z 2 , y 2 ) , , ( z N , y N ) , where z i = ( z i , 1 , z i , 2 , …, z i , M ). In this step, the prediction results of each base learner for each sample obtained are combined to form a new feature vector that contains the combined learning information of the different base learners for the sample.
Utilize the new training set D to train the meta-learner L m e t a to obtain the model L m e t a ( D ). In this case, the meta-learner can be a simpler linear model (e.g., linear regression, logistic regression) or a more complex one, such as a nonlinear model. When training the meta-learner, we need to consider the algorithm and loss function it chooses, e.g., in regression problems, there is a mean square error loss function; in classification problems, there is a cross-entropy loss function, etc. The parameters of the meta-learner are tuned by the gradient descent method. This tuning allows the meta-learner to learn how to optimally combine the predictions of each base learner.

3. Experiment and Results

3.1. Evaluation Metrics

The flowchart of optimizing the LSTM prediction model by ACO and the stacking ensemble learning method (see Figure 3). The main purpose of this is to further render the initial values and thresholds of the neural network more reasonable, thus improving the convergence speed of the neural network and finding the optimal solution. In this model, the LSTM neural network mainly serves as a predictor. In the ACO optimization part, the optimized ACO is mainly employed. This algorithm is utilized to set and optimize the parameters of the overall model. Finally, the obtained optimal weight and threshold are applied to the LSTM neural network. The main role of the stacking ensemble learning method is to perform feature selection on the overall model. The specific implementation steps are as follows:

3.2. Parameter Settings

When constructing the stacking-ACO-LSTM model, the key parameters were carefully set and tuned (Table 1).
Among them, the value range of the number of ants is set as [10, 20, 30, 40, 50]. This parameter determines the number of ants participating in the search in each iteration. The more the number of ants, the more hyper parameter combinations the algorithm can explore in each iteration. However, it will also increase the computational load at the same time.
The number of neurons in the hidden layer of the LSTM is set with several options: [16, 32, 64, 128]. Different numbers of neurons will significantly affect the model’s ability to learn and represent data features. If the number is too small, the model may not be able to fully capture the complex patterns in the data. On the other hand, if the number is too large, it may lead to over-fitting and a waste of computational resources.
As an important means to prevent over-fitting, the value range of dropout is set as [0.2, 0.3, 0.4, 0.5]. Dropout randomly discards some neuron connections during the training process, reducing the co-adaptation degree among neurons and enhancing the generalization ability of the model. An appropriate dropout value can effectively balance the training effect of the model and the risk of over-fitting.
The value range of the number of epochs for model training is [20, 30, 40, 50, 60]. The number of training epochs determines how many times the model learns from the training data. If the number of epochs is too small, the model may not converge sufficiently and fail to learn the inherent patterns in the data. Conversely, if the number of epochs is too large, over-fitting may occur, resulting in the model’s reduced adaptability to new data.
Through the optimization of the stacking ensemble learning method and the ant colony algorithm, the number of ants finally selected in the model is 30, and this quantity shows the best accuracy in prediction. The number of neurons in the LSTM hidden layer is selected to be 32, which performs excellently in balancing model complexity and learning ability. A dropout value of 0.2 is chosen, which effectively prevents over-fitting while maximizing the model’s learning ability. The number of epochs is set to 50, enabling the model to fully converge on the training data and avoiding over-fitting. As a result, the stacking-ACO-LSTM model achieves relatively ideal performance.

3.3. Prediction Methods

This paper selects the sliding window prediction method. Its core idea is to slide a fixed-size window over the data sequence, moving one time step or one data point each time. In this way, the original data are transformed into multiple subsequences, which can be used for training the model or making predictions. This prediction method can effectively capture the dynamic changes of the time series.

3.4. Results

In this experiment, five different experiments were mainly carried out, namely LSTM, LSTM based on the stacking ensemble learning method (stacking-LSTM), LSTM based on ACO (ACO-LSTM), BP neural network based on the stacking ensemble learning method (stacking-BP), and LSTM based on the stacking ensemble learning method with ACO (stacking-ACO-LSTM). In this way, the gap between the real value and the predicted value can be seen more clearly. The predicted curve is shown in Figure 4. In the experiment, a total of three evaluation indicators, namely regression coefficient of determination (R2), mean square error (MSE), and mean absolute error (MAE), are employed to evaluate the five models. The evaluation results are shown in Table 2 below.
As shown in Table 2 below, among all the tested models in Nanchang City, they include the LSTM network optimized by stacking ensemble learning, the BP neural network optimized by stacking ensemble learning, and the LSTM network optimized by the ACO algorithm. From the data in the table, it can be seen that the performance of these three optimization methods is at a moderate level. Their mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R2) are 36.516, 4.480, and 0.939; 36.085, 4.154, and 0.939; and 36.602, 4.590, and 0.938, respectively. Compared with the unoptimized LSTM model, the improvement in performance is not prominent. In view of this, on the basis of optimizing the LSTM network using stacking ensemble learning, the ACO algorithm is introduced to jointly optimize it, thus constructing the stacking-ACO-LSTM model. The mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R2) of this model are 0.058, 0.178, and 0.942, respectively. Compared with the model using only the LSTM network, the mean squared error (MSE) and mean absolute error (MAE) have decreased by 99.88% and 96.58%, respectively, and the coefficient of determination (R2) has increased by 2.39%. From the results of the mean squared error (MSE) and mean absolute error (MAE), it can be clearly found that the performance of the stacking-ACO-LSTM model surpasses that of other models. However, the improvement in its coefficient of determination (R2) compared with other models is not significant. To further analyze the advantages and disadvantages of these models and evaluate their generalization ability, the data of two other prefecture-level cities in Jiangxi Province, namely Ganzhou City and Jiujiang City, are introduced, and this is achieved by comparing the prediction performance of several models on the data of Nanchang City, Ganzhou City, and Jiujiang City. From the integrated table data, it can be seen that, regardless of whether it is Ganzhou City or Jiujiang City, the optimization effects of the final optimized stacking-ACO-LSTM model in terms of mean squared error (MSE) and mean absolute error (MAE) far exceed those of other models, and its coefficient of determination (R2) is also slightly higher than that of other models. Nevertheless, it is found from Table 2 that the correlation results of Jiujiang City are worse than those of the other two cities. We believe that the reason why the correlation results of Jiujiang City are not as ideal as those of Nanchang City and Ganzhou City mainly lies in the significant fluctuations in the particulate matter data of Jiujiang City. In addition, environmental factors such as weather conditions and air humidity may also have an impact on this correlation. It should be noted that the core of this paper focuses on the predictive ability of the optimized model. Although the correlation index of Jiujiang City is relatively low, its predictive accuracy within its own optimized model is still at a good level. Thus, this result still has academic value and analytical significance and can effectively support the discussion on the effectiveness of the model in this paper. Conclusively, by comprehensively considering the prediction situation of Nanchang City and taking into account the generalization ability of the model, the final stacking-ACO-LSTM model demonstrates the best prediction ability.
Table 2. Model evaluation results of three cities.
Table 2. Model evaluation results of three cities.
CityModel MSE MAE R2
NanchangLSTM47.5865.2060.920
Stacking-LSTM36.5164.4800.939
ACO-LSTM36.6024.5900.938
Stacking-BP36.0854.1540.939
Stacking-ACO-LSTM0.0580.1780.942
GanzhouLSTM49.4114.9760.894
Stacking-LSTM45.1764.7900.904
ACO-LSTM41.8694.6590.911
Stacking-BP46.0584.8810.902
Stacking-ACO-LSTM0.0890.2150.920
JiujiangLSTM204.33210.2210.709
Stacking-LSTM189.2199.8640731
ACO-LSTM193.9349.8040.724
Stacking-BP196.8899.8450.720
Stacking-ACO-LSTM0.2480.3570.741
In this paper, on the basis of previous research results, five relevant pollutant indicators affecting PM2.5 in Nanchang were selected, and correlation analyses were conducted before modeling. The analysis results are shown in Figure 5, where PM2.5 concentration is represented by y. It is not difficult to see from Figure 5 that the correlation between AQI, PM10, and PM2.5 is extremely significant and highly close. While the correlation between CO, NO2, SO2, and PM2.5 is moderate. The correlation between O3 and PM2.5 concentration is the lowest. Among these substances, AQI is relatively special. It is an index calculated by comprehensively considering the concentrations of various air pollutants. Therefore, treating these particulate matters is equivalent to treating AQI itself. In conclusion, when fully addressing PM2.5 particulates in haze control, the focus of treatment should first be on PM10. This is because the high correlation between PM10 and PM2.5 means that effective treatment of PM10 will greatly facilitate progress in PM2.5 treatment. Secondly, the management of CO, NO2, and SO2 should not be overlooked. Although their correlation with PM2.5 is not as close as that of PM10, they still have an important impact on air quality and should be given sufficient attention in the treatment process. In this system, O3 has an extremely low correlation with PM2.5 concentration and is regarded as a factor that can almost be ignored. However, this does not mean that O3 is completely ignored. In the overall air quality treatment plan, we need to consider the actual situation. Under the premise of ensuring the realization of the main goals, O3 can be given a certain degree of attention and management.

4. Conclusions

In the process of using the ant colony optimization to optimize the parameters of the model, the selection of the number of ants and the maximum number of iterations when the algorithm runs is very important. Their values will be crucial for the whole model. In the optimal model stacking-ACO-LSTM, the number of ants chosen at the beginning of this paper is 20, and the maximum number of iterations is 2. At this time, the model effect can reach 0.940. However, this is probably not the optimal result. So it is considered to gradually increase the number of ants and the maximum number of iterations. When the number of ants is 30 and the maximum number of iterations is 3, the model effect reaches its optimal state, which can reach 0.942. As both the number of ants and the maximum number of iterations increase upwards, the results become worse. Therefore, the model reaches its optimum when the number of ants is 30 and the maximum number of iterations is 3.
The stacking-ACO-LSTM model performs exceptionally in the prediction of PM2.5 concentration in Nanchang City. Among them, the introduced deep learning LSTM framework plays a crucial role. It utilizes its own advantages in processing time-series data to overcome the problems of unstable samples, long running time, and unstable and inaccurate results of traditional prediction methods. Meanwhile, the combination of the stacked ensemble learning method and the ant colony optimization method further optimizes the model parameters and improves the prediction performance of the model. The predicted values of the combined model are closest to the real values compared with other models, which is highly practical.
Accurate prediction results of PM2.5 content can provide a data basis for air quality prediction. At the same time, some measures can be taken in advance to solve the air pollution problem based on the results obtained, which can effectively improve urban air quality and provide some suggestions for green traveling. In the process of the experiment, the initial model lacked parameter optimization measures for the overall model. Consequently, the obtained result might not be the optimal one. In view of this, the ant colony optimization method is used in this paper to find the optimal hyperparameters of the integration model. There are numerous and complex particulate substances in the air, which have nonlinear characteristics. In this paper, the stacking ensemble learning method is used to perform weight analysis on the feature variables, that is, the factors that can affect PM2.5. By filtering out the feature variables with larger weights, the computational amount of the model prediction is reduced, and the accuracy of the prediction model is improved. Through the experimental comparison and analysis, it can be seen that the stacking-ACO-LSTM model proposed in this paper has the highest prediction accuracy of PM2.5 concentration among these models.

Author Contributions

Conceptualization, Z.L. and X.H.; methodology, Z.L.; software, Z.L.; validation, Z.L. and X.H.; formal analysis, X.H.; investigation, X.H.; resources, Z.L.; data curation, Z.L.; writing—original draft preparation, X.H.; writing—review and editing, Z.L.; visualization, Z.L.; supervision, X.H.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 42261077.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Sequence data and source code that support the findings of this study have been deposited online at https://doi.org/10.5281/zenodo.15151495.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, M.Y.; Wang, K. Short-term effects of PM2.5 components on the respiratory infectious disease: A global perspective. Environ. Geochem. Health 2024, 46, 293. [Google Scholar] [CrossRef] [PubMed]
  2. Chanda, F.; Lin, K.X.; Chaurembo, A.I.; Huang, J.Y.; Zhang, H.J.; Deng, W.H.; Xu, Y.J.; Li, Y.; Fu, L.D.; Cui, H.D.; et al. PM2.5-mediated cardiovascular disease in aging: Cardiometabolic risks, molecular mechanisms and potential interventions. Sci. Total Environ. 2024, 954, 176255. [Google Scholar] [CrossRef] [PubMed]
  3. Duan, W.J.; Wang, X.Q.; Cheng, S.Y.; Wang, R.P. A new scheme of PM2.5 and O3 control strategies with the integration of SOM, GA and WRF-CAMx. J. Environ. Sci. 2024, 138, 249–265. [Google Scholar] [CrossRef] [PubMed]
  4. Cao, Q.F.; Shen, L.; Chen, S.C.; Pui, D.Y. WRF modeling of PM2.5 remediation by SALSCS and its clean air flow over Beijing terrain. Sci. Total Environ. 2018, 626, 134–146. [Google Scholar] [CrossRef]
  5. Shao, T.; Wang, P.; Yu, W.X.; Gao, Y.Q.; Zhu, S.Q.; Zhang, Y.; Hu, D.H.; Zhang, B.J.; Zhang, H.L. Drivers of alleviated PM2.5 and O3 concentrations in China from 2013 to 2020. Resour. Conserv. Recycl. 2023, 197, 107110. [Google Scholar] [CrossRef]
  6. Sulaymon, I.D.; Zhang, Y.; Hopke, P.K.; Ye, F.; Gong, K.; Mao, J.; Hu, J. Modeling PM2.5 during severe atmospheric pollution episode in Lagos, Nigeria: Spatiotemporal variations, source apportionment, and meteorological influences. J. Geophys. Res.-Atmos. 2023, 128, e2022JD038360. [Google Scholar] [CrossRef]
  7. Yuan, Z.Y.; Gao, S.C.; Wang, Y.R.; Li, J.Y.; Hou, C.Z.; Guo, L.J. Prediction of PM2.5 time series by seasonal trend decomposition-based dendritic neuron model. Neural Comput. Appl. 2023, 35, 15397–15413. [Google Scholar] [CrossRef]
  8. Wu, F.M.; Min, P.F.; Jin, Y.; Zhang, K.N.; Liu, H.Y.; Zhao, J.M.; Li, D.A. A novel hybrid model for hourly PM2.5 prediction considering air pollution factors, meteorological parameters and GNSS-ZTD. Environ. Model. Softw. 2023, 167, 105780. [Google Scholar] [CrossRef]
  9. Zhang, J.P.; Chen, Z.G.; Fu, J.; Liu, P. PM2.5 collection efficiency of wire-plate electrostatic precipitator: Prediction of temperature effects using support vector machine model combined with particle swarm optimization algorithm. Environ. Eng. Sci. 2024, 41, 140–148. [Google Scholar] [CrossRef]
  10. Li, L.; Fu, Y.F.; Fung, J.C.H.; Tse, K.T.; Lau, A.K. Development of a back-propagation neural network combined with an adaptive multi-objective particle swarm optimizer algorithm for predicting and optimizing indoor CO2 and PM2.5 concentrations. J. Build. Eng. 2022, 54, 104600. [Google Scholar] [CrossRef]
  11. Masood, A.; Hameed, M.M.; Srivastava, A.; Pham, Q.B.; Ahmad, K.; Razali, S.F.M.; Baowidan, S.A. Improving PM2.5 prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm. Sci. Rep. 2023, 13, 21057. [Google Scholar] [CrossRef] [PubMed]
  12. Yin, S.; Liu, H.; Duan, Z. Hourly PM2.5 concentration multi-step forecasting method based on extreme learning machine, boosting algorithm and error correction model. Digit. Signal Process. 2021, 118, 103221. [Google Scholar] [CrossRef]
  13. Logothetis, S.A.; Kosmopoulos, G.; Panagopoulos, O.; Salamalikis, V.; Kazantzidis, A. Forecasting the Exceedances of PM2.5 in an Urban Area. Atmosphere 2024, 15, 594. [Google Scholar] [CrossRef]
  14. Bedi, S.; Katiyar, A.; Krishnan, N.A.; Kota, S.H. Utilizing LSTM models to predict PM2.5 levels during critical episodes in Delhi, the world’s most polluted capital city. Urban Clim. 2024, 53, 101835. [Google Scholar] [CrossRef]
  15. Gao, Z.H.; Mo, X.Y.; Li, H. Prediction of PM2.5 concentration based on deep learning, multi-objective optimization, and ensemble forecast. Sustainability 2024, 16, 4643. [Google Scholar] [CrossRef]
  16. Ho, C.H.; Park, I.; Kim, J.; Lee, J.B. PM2.5 forecast in Korea using the Long Short-Term Memory (LSTM) model. Asia-Pac. J. Atmos. Sci. 2023, 59, 563–576. [Google Scholar] [CrossRef]
  17. Lin, M.D.; Liu, P.Y.; Huang, C.W.; Lin, Y.H. The application of strategy based on LSTM for the short-term prediction of PM2.5 in city. Sci. Total Environ. 2024, 906, 167892. [Google Scholar] [CrossRef]
  18. Yu, Q.; Yuan, H.W.; Liu, Z.L.; Xu, G.M. Spatial weighting EMD-LSTM based approach for short-term PM2.5 prediction research. Atmos. Pollut. Res. 2024, 15, 102256. [Google Scholar] [CrossRef]
  19. Bai, X.S.; Zhang, N.; Cao, X.Y.; Chen, W.Q. Prediction of PM2.5 concentration based on a CNN-LSTM neural network algorithm. PeerJ 2024, 12, e17811. [Google Scholar] [CrossRef]
  20. Cho, E.; Yoon, H.; Cho, Y. Evaluation of the impact of intensive PM2.5 reduction policy in Seoul, South Korea using machine learning. Urban Clim. 2024, 53, 101778. [Google Scholar] [CrossRef]
  21. Kumar, S.; Kumar, V. Multi-view Stacked CNN-BiLSTM (MvS CNN-BiLSTM) for urban PM2.5 concentration prediction of India’s polluted cities. J. Clean. Prod. 2024, 444, 141259. [Google Scholar] [CrossRef]
  22. Wu, X.X.; Zhu, J.; Wen, Q. Short-term prediction of PM2.5 concentration by hybrid neural network based on sequence decomposition. PLoS ONE 2024, 19, e0299603. [Google Scholar] [CrossRef] [PubMed]
  23. Pak, U.; Son, Y.; Kim, K.; Kim, J.; Jang, M.; Kim, K.; Pak, G. Novel particulate matter (PM2.5) forecasting method based on deep learning with suitable spatiotemporal correlation analysis. J. Atmos. Sol.-Terr. Phys. 2024, 264, 106336. [Google Scholar] [CrossRef]
  24. Shen, J.X.; Liu, Q.X.; Feng, X.J. Hourly PM2.5 concentration prediction for dry bulk port clusters considering spatiotemporal correlation: A novel deep learning blending ensemble model. J. Environ. Manag. 2024, 370, 122703. [Google Scholar] [CrossRef]
  25. Fu, M.L.; Le, C.W.; Fan, T.C.; Prakapovich, R.; Manko, D.; Dmytrenko, O.; Lande, D.; Shahid, S.; Yaseen, Z.M. Integration of complete ensemble empirical mode decomposition with deep long short-term memory model for particulate matter concentration prediction. Environ. Sci. Pollut. Res. 2021, 28, 64818–64829. [Google Scholar] [CrossRef]
  26. Liu, Z.H.; Ji, D.; Wang, L.L. PM2.5 concentration prediction based on EEMD-ALSTM. Sci. Rep. 2024, 14, 12636. [Google Scholar] [CrossRef]
  27. Zhu, J.M.; Niu, L.L.; Zheng, P.; Chen, H.Y.; Liu, J.P. A hybrid PM2.5 interval concentration prediction framework based on multi-factor interval decomposition reconstruction strategy and attention mechanism. Atmos. Environ. 2024, 335, 120730. [Google Scholar] [CrossRef]
  28. Pranolo, A.; Zhou, X.F.; Mao, Y.C. A novel bifold-attention-LSTM for analyzing PM2.5 concentration-based multi-station data time series. Int. J. Data Sci. Anal. 2024, 1–18. [Google Scholar] [CrossRef]
  29. Saminathan, S.; Malathy, C. PM2.5 concentration estimation using Bi-LSTM with osprey optimization method. Nat. Environ. Pollut. Technol. 2024, 23, 1631–1638. [Google Scholar] [CrossRef]
  30. Liu, J.R.; Hou, Z.W.; Yin, T.X. Short-term power load forecast using OOA optimized bidirectional long short-term memory network with spectral attention for the frequency domain. Energy Rep. 2024, 12, 4891–4908. [Google Scholar] [CrossRef]
  31. Zhao, L.X.; Li, Z.Y.; Qu, L.L. A novel machine learning-based artificial intelligence method for predicting the air pollution index PM2.5. J. Clean. Prod. 2024, 468, 143042. [Google Scholar] [CrossRef]
  32. Zhang, L.; Xu, L.; Jiang, M.; He, P. A novel hybrid ensemble model for hourly PM2.5 concentration forecasting. Int. J. Environ. Sci. Technol. 2023, 20, 219–230. [Google Scholar] [CrossRef]
  33. Jiang, F.X.; Zhang, C.Y.; Sun, S.L.; Sun, J.Y. Forecasting hourly PM2.5 based on deep temporal convolutional neural network and decomposition method. Appl. Soft Comput. 2021, 113, 107988. [Google Scholar] [CrossRef]
  34. Ren, Y.; Wang, S.Y.; Xia, B.S. Deep learning coupled model based on TCN-LSTM for particulate matter concentration prediction. Atmos. Pollut. Res. 2023, 14, 101703. [Google Scholar] [CrossRef]
  35. Zou, R.K.; Huang, H.Y.; Lu, X.M.; Zeng, F.M.; Ren, C.; Wang, W.Q.; Zhou, L.G.; Dai, X.Y. PD-LL-Transformer: An hourly PM2.5 forecasting method over the Yangtze River Delta Urban Agglomeration, China. Remote Sens. 2024, 16, 1915. [Google Scholar] [CrossRef]
  36. Yu, M.Z.; Masrur, A.; Blaszczak-Boxe, C. Predicting hourly PM2.5 concentrations in wildfire-prone areas using a SpatioTemporal Transformer model. Sci. Total Environ. 2023, 860, 160446. [Google Scholar] [CrossRef]
  37. Cui, B.W.; Liu, M.Y.; Li, S.Q.; Jin, Z.F.; Zeng, Y.; Lin, X.Y. Deep learning methods for atmospheric PM2.5 prediction: A comparative study of transformer and CNN-LSTM-attention. Atmos. Pollut. Res. 2023, 14, 101833. [Google Scholar] [CrossRef]
  38. Liu, X.; Li, W. MGC-LSTM: A deep learning model based on graph convolution of multiple graphs for PM2.5 prediction. Int. J. Environ. Sci. Technol. 2023, 20, 10297–10312. [Google Scholar] [CrossRef]
  39. Zeng, Q.L.; Li, Y.M.; Tao, J.H.; Fan, M.; Chen, L.F.; Wang, L.; Wang, Y.H. Full-coverage estimation of PM2.5 in the Beijing- Tianjin-Hebei region by using a two-stage model. Atmos. Environ. 2023, 309, 119956. [Google Scholar] [CrossRef]
  40. Tong, W.T.; Li, L.X.; Zhou, X.L.; Hamilton, A.; Zhang, K. Deep learning PM2.5 concentrations with bidirectional LSTM RNN. Air Qual. Atmos. Health 2019, 12, 411–423. [Google Scholar] [CrossRef]
  41. Kristiani, E.; Lin, H.; Lin, J.R.; Chuang, Y.H.; Huang, C.Y.; Yang, C.T. Short-term prediction of PM2.5 using LSTM deep learning methods. Sustainability 2022, 14, 2068. [Google Scholar] [CrossRef]
  42. Wang, J.Y.; Wang, D.S.; Zhang, F.S.; Yoo, C.K.; Liu, H.B. Soft sensor for predicting indoor PM2.5 concentration in subway with adaptive boosting deep learning model. J. Hazard. Mater. 2024, 465, 133074. [Google Scholar] [CrossRef] [PubMed]
  43. Li, Z.F.; Gan, K.; Sun, S.L.; Wang, S.Y. A new PM2.5 concentration forecasting system based on AdaBoost-ensemble system with deep learning approach. J. Forecast. 2023, 42, 154–175. [Google Scholar] [CrossRef]
  44. Zaini, N.; Ahmed, A.N.; Ean, L.W.; Chow, M.F.; Malek, M.A. Forecasting of fine particulate matter based on LSTM and optimization algorithm. J. Clean. Prod. 2023, 427, 139233. [Google Scholar] [CrossRef]
  45. Erden, C. Genetic algorithm-based hyperparameter optimization of deep learning models for PM2.5 time-series prediction. Int. J. Environ. Sci. Technol. 2023, 20, 2959–2982. [Google Scholar] [CrossRef]
  46. Utku, A.; Can, Ü.; Kamal, M.; Das, N.; Cifuentes-Faura, J.; Barut, A. A long short-term memory-based hybrid model optimized using a genetic algorithm for particulate matter 2.5 prediction. Atmos. Pollut. Res. 2023, 14, 101836. [Google Scholar] [CrossRef]
  47. Vignesh, P.P.; Jiang, J.H.; Kishore, P. Predicting PM2.5 concentrations across USA using machine learning. Earth Space Sci. 2023, 10, e2023EA002911. [Google Scholar] [CrossRef]
  48. Lee, Y.S.; Choi, E.; Park, M.; Jo, H.; Park, M.; Nam, E.; Kim, D.G.; Yi, S.M.; Kim, J.Y. Feature extraction and prediction of fine particulate matter (PM2.5) chemical constituents using four machine learning models. Expert Syst. Appl. 2023, 221, 119696. [Google Scholar] [CrossRef]
  49. Chen, Y.C.; Li, D.C. Selection of key features for PM2.5 prediction using a wavelet model and RBF-LSTM. Appl. Intell. 2021, 51, 2534–2555. [Google Scholar] [CrossRef]
  50. Wang, S.W.; Li, P.; Ji, H.; Zhan, Y.L.; Li, H.H. Prediction of air particulate matter in Beijing, China, based on the improved particle swarm optimization algorithm and long short-term memory neural network. J. Intell. Fuzzy Syst. 2021, 41, 1869–1885. [Google Scholar] [CrossRef]
  51. Zhang, L.; Liu, J.L.; Feng, Y.H.; Wu, P.; He, P.K. PM2.5 concentration prediction using weighted CEEMDAN and improved LSTM neural network. Environ. Sci. Pollut. Res. 2023, 30, 75104–75115. [Google Scholar] [CrossRef] [PubMed]
  52. Che, Z.Y.; Peng, C.; Yue, C.X. Optimizing LSTM with multi-strategy improved WOA for robust prediction of high-speed machine tests data. Chaos Solitons Fractals 2024, 178, 114394. [Google Scholar] [CrossRef]
  53. Kumar, K.; Haider, M.T.U. Enhanced prediction of intra-day stock market using metaheuristic optimization on RNN-LSTM network. New Gener. Comput. 2021, 39, 231–272. [Google Scholar] [CrossRef]
  54. Merkle, D.; Middendorf, M.; Schmeck, H. Ant colony optimization for resource-constrained project scheduling. IEEE Trans. Evolut. Comput. 2002, 6, 333–346. [Google Scholar] [CrossRef]
  55. Aghelpour, P.; Graf, R.; Tomaszewski, E. Coupling ANFIS with ant colony optimization (ACO) algorithm for 1-, 2-, and 3-days ahead forecasting of daily streamflow, a case study in Poland. Environ. Sci. Pollut. Res. 2023, 30, 56440–56463. [Google Scholar] [CrossRef]
  56. Ribeiro, M.H.D.M.; dos Santos Coelho, L. Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl. Soft Comput. 2020, 86, 105837. [Google Scholar] [CrossRef]
  57. Feng, L.W.; Li, Y.Y.; Wang, Y.M.; Du, Q.Y. Estimating hourly and continuous ground-level PM2.5 concentrations using an ensemble learning algorithm: The ST-stacking model. Atmos. Environ. 2020, 223, 117242. [Google Scholar] [CrossRef]
Figure 1. Basic framework of LSTM.
Figure 1. Basic framework of LSTM.
Toxics 13 00327 g001
Figure 2. Forward propagation model.
Figure 2. Forward propagation model.
Toxics 13 00327 g002
Figure 3. Flowchart of the model.
Figure 3. Flowchart of the model.
Toxics 13 00327 g003
Figure 4. Prediction curves of five models.
Figure 4. Prediction curves of five models.
Toxics 13 00327 g004
Figure 5. Correlation heatmap analysis.
Figure 5. Correlation heatmap analysis.
Toxics 13 00327 g005
Table 1. Parameter search space.
Table 1. Parameter search space.
ParameterRange of Variation
The number of ant[10, 20, 30, 40, 50]
The number of neurons of the LSTM[16, 32, 64, 128]
Dropout[0.2, 0.3, 0.4, 0.5]
The number of epochs in model training[20, 30, 40, 50, 60]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Z.; Hong, X. Improved Prediction of Hourly PM2.5 Concentrations with a Long Short-Term Memory Optimized by Stacking Ensemble Learning and Ant Colony Optimization. Toxics 2025, 13, 327. https://doi.org/10.3390/toxics13050327

AMA Style

Liu Z, Hong X. Improved Prediction of Hourly PM2.5 Concentrations with a Long Short-Term Memory Optimized by Stacking Ensemble Learning and Ant Colony Optimization. Toxics. 2025; 13(5):327. https://doi.org/10.3390/toxics13050327

Chicago/Turabian Style

Liu, Zuhan, and Xianping Hong. 2025. "Improved Prediction of Hourly PM2.5 Concentrations with a Long Short-Term Memory Optimized by Stacking Ensemble Learning and Ant Colony Optimization" Toxics 13, no. 5: 327. https://doi.org/10.3390/toxics13050327

APA Style

Liu, Z., & Hong, X. (2025). Improved Prediction of Hourly PM2.5 Concentrations with a Long Short-Term Memory Optimized by Stacking Ensemble Learning and Ant Colony Optimization. Toxics, 13(5), 327. https://doi.org/10.3390/toxics13050327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop