1. Introduction
In recent years, the air quality has deteriorated and the phenomenon of haze has increased, posing a greater threat to human health. The main factor of air pollution is inhalable particulate matter, led by PM
2.5. Epidemiological studies demonstrate that prolonged exposure to elevated PM
2.5 levels significantly increases risks of pulmonary diseases, cardiovascular disorders, and malignancies, while also influencing radiative forcing and climatic patterns [
1,
2]. Precise PM
2.5 forecasting enables proactive public health protection and supports evidence-based environmental policy formulation.
The algorithms of PM
2.5 concentration prediction are mainly divided into two categories: deterministic and artificial intelligence models. Deterministic models, including WRF [
3,
4] and CMAQ [
5,
6], simulate atmospheric processes to predict pollutant concentrations through physical and chemical mechanisms [
7,
8]. However, deterministic models require good computing resources and a meteorological knowledge background, and there are the following shortcomings. In terms of input PM
2.5 samples, their stability is insufficient, which means that the quality and reliability of the data are difficult to guarantee and may introduce a large amount of noise and errors. Excessive computational demands significantly compromise prediction efficiency and often preclude real-time application. Furthermore, these models exhibit unstable performance with substantial scenario-dependent variability. Excessive computational demands significantly compromise prediction efficiency and often preclude real-time application. Furthermore, these models exhibit unstable performance with substantial scenario-dependent variability. The lack of accuracy also makes it difficult for the prediction results to effectively guide decision-making and planning. These issues seriously constrain the application and development of traditional prediction methods in the field of PM
2.5 concentration prediction. By comparison, artificial intelligence models based on deep learning perform well in air quality prediction, such as support vector machines (SVM) [
9], BP neural networks [
10], extreme learning machines (ELM) [
11,
12], and so on. But these feedforward models are not adjusted according to the specific situation of the dataset and are more suitable for time-series and real-time updating of data. The LSTM are gradually entering the field of scholars.
LSTM, as an important architecture in deep learning, displays unique advantages in dealing with time-series data. The reason is that LSTM can effectively capture the long-term dependencies in time-series data, which is very crucial for PM
2.5 concentration data that varies over time. By using the obtained air quality data to train the LSTM model, the model can learn the inherent laws and trends in the data. The LSTM-based approach utilizes PM
2.5 data along with ancillary data, meteorological variables from the Copernicus Atmospheric Monitoring Service operated by ECMWF, and temporal variables related to local emissions to improve air pollution forecasting performance. Bedi [
13] and Bedi et al. [
14] utilized recurrent neural networks with long short-term memory (LSTM) units to predict Delhi’s PM
2.5 concentration and showed that the model combining PM
2.5 data with gaseous pollutants as well as the model combining PM
2.5 data with gaseous pollutants and meteorology predicted daily and hourly PM
2.5 concentration more accurately than other models. However, LSTM models are highly dependent on the inputs predicted by the numerical model, which reflects a common problem of using a single model for prediction, namely the low prediction accuracy and long training time of the model. To address these issues, the concept of ensemble models has begun to be widely accepted. These novel ensemble forecast models overcome these drawbacks by simultaneously taking into account both forecast accuracy and stability [
15]. For example, applying only the LSTM model to the CMAQ forecasts can yield reasonable forecast skill levels comparable to the operational AirKorea forecasts that elaborately combine the CMAQ model, AI models, and human forecasters [
16]. Lin et al. [
17] applied an application strategy, an innovative application-strategy-based LSTM (namely ASLSTM), on the BLSTM to customize ASLSTM for the short-term and accurate prediction of PM
2.5 concentrations. Combining spatial weighting, empirical mode decomposition (EMD), and a long short-term memory (LSTM) network, Yu et al. [
18] proposed an ensemble model to predict PM
2.5 concentration, namely CBAM-CNN-Bi LSTM. By comparing with RNN, HPO-RNN, GRU, and LSTM solely, the results show that the new model surpasses five benchmark models in terms of prediction accuracy.
In addition, LSTM can be combined with other ensemble learning techniques for prediction, such as convolutional neural networks (CNNs) [
19,
20,
21,
22,
23], graph convolution neural networks (GCNNs) [
15,
24], ensemble empirical mode decomposition (EEMD) [
25,
26], attention mechanism [
26,
27,
28], osprey optimization algorithm (OOA) [
29,
30], hybrid integration (HIG) algorithm [
31,
32], temporal convolutional network (TCN) [
27,
33,
34,
35], transformer [
15,
36,
37], graph sample and aggregation network (GraphSAGE) [
15,
38,
39], bidirectional recurrent gated neural network (BiGRU) [
22,
40,
41], adaptive boosting (AdaBoost) [
42,
43], genetic algorithm (GA) [
44,
45,
46],
k-nearest neighbors (
kNN) [
46,
47,
48], random forest (RF) [
20,
46], support vector regression (SVR) [
46,
49], and particle swarm optimization (PSO) [
46,
50,
51].
In this study, we initially designed a basic LSTM-based prediction model for preliminary forecasting, selecting all monitoring stations in Nanchang as the baseline. As a representative urban area, Nanchang’s air quality data can reflect the impact of various environmental factors and human activities on PM2.5 concentrations. This model was used to predict PM2.5 levels in Nanchang over the coming days, providing a foundation for subsequent optimization efforts. The original LSTM model consumed significant computational resources during prediction and relied solely on a rolling forecast mechanism based on time steps, without extensive hyper parameter optimization. Given these limitations, we prioritized two key aspects when selecting optimization algorithms: reducing computational demands and improving efficiency. While Bayesian optimization was initially considered for overall model tuning, its inefficiency in large-scale parameter searches and high computational requirements made it less suitable. After thorough evaluation, we adopted a stacking ensemble learning approach to analyze feature variable weights, thereby reducing computational load and enhancing prediction accuracy. Additionally, we introduced the ant colony optimization (ACO) algorithm to optimize the model’s hyper parameters, preventing it from settling into local optima. Building on these improvements, we developed an integrated stacking-ACO-LSTM model, which significantly reduced runtime compared to the standalone LSTM network, making it well suited for PM2.5 concentration prediction. The results demonstrate that this model not only performs well in spatiotemporal efficiency and parameter optimization for Nanchang’s PM2.5 prediction but also exhibits superior forecasting performance and stronger generalization capabilities. It holds substantial practical value and provides meaningful real-world reference significance.
2. Materials and Methods
2.1. Data Source
The study is based on the hourly pollutant index data of the PM
2.5, PM
10, SO
2, NO
2, CO, O
3, and AQI in Nanchang City from 1 February 2017 to 30 September 2019, which are published on the China Air Quality Online Monitoring and Analysis Platform (
https://www.aqistudy.cn). As for why Nanchang City was chosen as the research object, there are mainly the following reasons: Firstly, the data availability is high, and the channels for obtaining data of Nanchang City are very convenient. Secondly, compared with other prefecture-level cities, Nanchang City has more complete air quality monitoring stations, which can comprehensively and accurately collect various pollutant data. Thirdly, as the capital city of Jiangxi Province, Nanchang has a large population and active economic activities, and all these factors will have a significant impact on the air quality.
The relationships among the data: In nature, there are more than a dozen factors that affect PM2.5, such as wind speed, temperature, precipitation, industrial production, emissions from natural sources, coal combustion, and so on. However, in a certain area, some influencing factors tend to be constant within a fixed period of time. For example, in southern cities, the average monthly precipitation and wind speed do not vary much. Therefore, to some extent, factors like wind speed, temperature, and precipitation have a relatively small impact on the determination of PM2.5 content within a fixed area. As an industrial city, Nanchang’s industrial production process emits waste gases, which will lead to a continuous increase in the contents of PM10, CO, SO2, O3, and NO2 in the air. In view of this, in order to conduct analysis and prediction more efficiently, it is decided to select only these factors with more significant influences for in-depth study. In this way, we can focus our efforts on exploring the specific relationships and action mechanisms between them and PM2.5, thus providing more targeted evidence for a better understanding and control of the PM2.5 concentration.
Taking the data of PM2.5 concentration as an example, there should originally be a total of 23,304 pieces of PM2.5 concentration data, among which 1456 pieces of data are missing, accounting for approximately 6.25% of the total data. After removing the missing data, there are 21,848 pieces of data left. The length of the time series is 21,848, which is divided into training and test sets in a ratio of 8:2. For missing data, the commonly used filling methods usually include interpolation methods (such as linear interpolation, spline interpolation, etc.) or the average value method. They are filled in with the average value of nearby time points, which can ensure the integrity of the data and also make the filling and recharging conform to the characteristics of the overall data. Since the overall amount of missing data is not very large, and using the average value to fill in the missing data can also ensure the accuracy of the data to a large extent, using this method will basically not have an impact on the final experimental results. And due to the large scale of the data in the experiment, the data were standardized.
2.2. Long Short-Term Memory Network
Long Short-Term Memory Network (LSTM) is an improved network framework based on recurrent neural network (RNN), which mainly solves the problem of insufficient processing of long-distance dependencies in RNN. The basic idea is based on the original RNN hidden layer having only one state sensitive to short-term inputs, plus a unit state that preserves long-term memory. LSTM achieves long-term state memory through the input gate, forget gate, and output gate. The gate structure selectively allows information to pass through, mainly through a sigmoid neuron and a pointwise multiplication operation. The first step in LSTM is to determine what information to discard from the cell state, which is accomplished through the forget gate. The next step is to determine how much new information to incorporate into the cellular state, which is accomplished through the input gate. Ultimately, it is necessary to determine what value to output, which is determined by the output gate. The basic framework of LSTM is shown in
Figure 1.
In
Figure 1, the shape of the storage unit (namely the vector dimension) is the same as the hidden state. It is designed to record additional hidden states and input information data. Some literature treats the storage cell as a special kind of hidden state. In particular, the input gate controls the addition of information from the input observation data at this moment and the hidden state from the previous moment to the storage cell. The forget gate controls what was forgotten in the storage cell at the previous moment. The output gate controls which information in the memory cell will be output to the hidden state.
To make it easier to understand the forward-propagation process of the LSTM model, we adapt the model structure diagram as shown in
Figure 2, where
refers to the candidate memory cell
at time
t.
From this, we can obtain the forward-propagation formula of the LSTM model:
where
m and
d are the batch size of mini-batch stochastic gradient descent and the dimension of the word vector of the input word, respectively. At the same time,
h and
q are the vector widths (dimensions) of the hidden state and the model output.
2.3. Ant Colony Optimization
LSTM is prone to getting stuck in local optima and has a slow convergence speed [
52,
53]. Therefore, in order to optimize the hyperparameters of the PM
2.5 concentration prediction model, the ant colony optimization (ACO) algorithm is introduced for parameter optimization, which is one of the most popular algorithms for hyperparameter optimization. The ACO is an essentially self-organizing, positive feedback parallel algorithm with strong robustness.
The ACO is a bionic algorithm derived from simulating the path-finding methods of natural ants [
54,
55]. Ants leave pheromones on the paths they pass through as they move, which are used to transmit information. In addition, ants can sense this pheromone substance during movement and use it to guide their direction of movement. Therefore, the collective behavior of an ant colony composed of a large number of ants shows a positive information feedback phenomenon. That is, the more ants have walked on a certain path, the greater the probability for later ants to choose this path. So, after a series of searches, the ant colony can finally find a logical shortest path.
At the initial moment of the algorithm,
m ants are randomly placed in
n cities. Meanwhile, the first element of each ant’s taboo list (
tabuk) is set to the city where it is currently located. At this time, the amount of pheromone on each path is equal. Let
=
c (where c is a relatively small constant). At time
t, the probability that ant
k transfers from city
i to city
j is:
where
represents the set of cities that ant k is allowed to choose in the next step. Among them, the taboo list(
tabuk) records the cities that ant k has passed through so far. When all
n cities are added to the taboo list(
tabuk), ant k has completed a tour. At this time, the path that ant
k has traveled is a feasible solution to the traveling salesman problem (TSP), where
is a heuristic factor, representing the expected degree of an ant’s transfer from city
i to
j. In the ACO, it is usually taken as the inverse of the distance between city
i and
j.
α and
β, respectively, represent the relative importance of pheromones and the expected heuristic factor. And when all the ants complete one tour, the pheromone on each path is updated according to the following formula:
where
(0 <
< 1) and 1 −
ρ represent the evaporation coefficient of pheromone on the path and the persistence coefficient of pheromone, respectively.
represents the increment of pheromone on edge
ij in this iteration, that is:
Among them,
represents the amount of pheromone left on edge
i and
j by the
kth ant in this iteration. If ant
k does not pass through edge
i and
j, then
= 0. It can be expressed as:
where
Q is a normal number,
Lk represents the length of the path traveled by the
kth ant in this tour.
2.4. Stacking Ensemble Learning
To some extent, each learning model has some limitations and cannot fully adapt to the data, nor can the accuracy reach 100%. In order to further improve the accuracy of prediction, it is possible to consider combining multiple models to fully utilize the advantages of each model, from which the best model can be selected. This method is also known as an ensemble learning algorithm. Ensemble learning algorithms can be classified into bagging, boosting, and stacking ensemble learning based on their types [
56]. While the bagging and boosting algorithms use the principles of voting and weighted averaging, respectively, the stacking algorithm takes a different approach by constructing a new model to retrain the predictions of multiple learners. The basic idea is to use multiple base learners to learn from the training data separately and then use the outputs of these base learners as new features to input into the meta-learner for training. Finally, the meta-learner provides the prediction results.
Specifically, stacking ensemble learning consists of two stages [
57]. In the first stage, different base learners are used to train the original training data to obtain a variety of very different predictions. Among them, these base learners can be different types of algorithms, such as decision trees, support vector machines, neural networks, etc. In the second stage, the prediction results of the base learners in the first stage are utilized as new features and combined with the labels of the original training data to form a new training set for training meta-learners. Among them, the meta-learner is usually a comparatively simple model, such as linear regression, logistic regression, and so on. In general, it can be said that in the testing stage, the base learners are firstly used to predict the new test data. And then these prediction results are input into the meta-learner to obtain the final prediction result.
It is necessary to discuss in detail its two stages. Suppose we have a training set containing N samples D = {(x1, y1), (x2, y2), …, (xN, yN)}, where xi and yi are the input feature vector and the corresponding label, respectively. We have M base learners L1, L2, …, LM and one meta-learner Lmeta.
- ➀
The first stage. Each base learner (m = 1, 2, …, M) is trained on the training set D to obtain the model (D). Where, different base learners can be based on different algorithms. For example, model L1 can be obtained by training on decision tree algorithm and model L2 can be obtained by training on support vector machine algorithm.
For each sample , a prediction is made using the base learner to obtain the prediction result (), where m = 1, 2, …, M. For example, for a binary classification problem, the base learner might output the probability value that the sample belongs to a certain category, while for a regression problem it outputs the specific predicted value.
- ➁
The second stage. Construct a new training set , where ,, …, ). In this step, the prediction results of each base learner for each sample obtained are combined to form a new feature vector that contains the combined learning information of the different base learners for the sample.
Utilize the new training set to train the meta-learner to obtain the model (). In this case, the meta-learner can be a simpler linear model (e.g., linear regression, logistic regression) or a more complex one, such as a nonlinear model. When training the meta-learner, we need to consider the algorithm and loss function it chooses, e.g., in regression problems, there is a mean square error loss function; in classification problems, there is a cross-entropy loss function, etc. The parameters of the meta-learner are tuned by the gradient descent method. This tuning allows the meta-learner to learn how to optimally combine the predictions of each base learner.
3. Experiment and Results
3.1. Evaluation Metrics
The flowchart of optimizing the LSTM prediction model by ACO and the stacking ensemble learning method (see
Figure 3). The main purpose of this is to further render the initial values and thresholds of the neural network more reasonable, thus improving the convergence speed of the neural network and finding the optimal solution. In this model, the LSTM neural network mainly serves as a predictor. In the ACO optimization part, the optimized ACO is mainly employed. This algorithm is utilized to set and optimize the parameters of the overall model. Finally, the obtained optimal weight and threshold are applied to the LSTM neural network. The main role of the stacking ensemble learning method is to perform feature selection on the overall model. The specific implementation steps are as follows:
3.2. Parameter Settings
When constructing the stacking-ACO-LSTM model, the key parameters were carefully set and tuned (
Table 1).
Among them, the value range of the number of ants is set as [10, 20, 30, 40, 50]. This parameter determines the number of ants participating in the search in each iteration. The more the number of ants, the more hyper parameter combinations the algorithm can explore in each iteration. However, it will also increase the computational load at the same time.
The number of neurons in the hidden layer of the LSTM is set with several options: [16, 32, 64, 128]. Different numbers of neurons will significantly affect the model’s ability to learn and represent data features. If the number is too small, the model may not be able to fully capture the complex patterns in the data. On the other hand, if the number is too large, it may lead to over-fitting and a waste of computational resources.
As an important means to prevent over-fitting, the value range of dropout is set as [0.2, 0.3, 0.4, 0.5]. Dropout randomly discards some neuron connections during the training process, reducing the co-adaptation degree among neurons and enhancing the generalization ability of the model. An appropriate dropout value can effectively balance the training effect of the model and the risk of over-fitting.
The value range of the number of epochs for model training is [20, 30, 40, 50, 60]. The number of training epochs determines how many times the model learns from the training data. If the number of epochs is too small, the model may not converge sufficiently and fail to learn the inherent patterns in the data. Conversely, if the number of epochs is too large, over-fitting may occur, resulting in the model’s reduced adaptability to new data.
Through the optimization of the stacking ensemble learning method and the ant colony algorithm, the number of ants finally selected in the model is 30, and this quantity shows the best accuracy in prediction. The number of neurons in the LSTM hidden layer is selected to be 32, which performs excellently in balancing model complexity and learning ability. A dropout value of 0.2 is chosen, which effectively prevents over-fitting while maximizing the model’s learning ability. The number of epochs is set to 50, enabling the model to fully converge on the training data and avoiding over-fitting. As a result, the stacking-ACO-LSTM model achieves relatively ideal performance.
3.3. Prediction Methods
This paper selects the sliding window prediction method. Its core idea is to slide a fixed-size window over the data sequence, moving one time step or one data point each time. In this way, the original data are transformed into multiple subsequences, which can be used for training the model or making predictions. This prediction method can effectively capture the dynamic changes of the time series.
3.4. Results
In this experiment, five different experiments were mainly carried out, namely LSTM, LSTM based on the stacking ensemble learning method (stacking-LSTM), LSTM based on ACO (ACO-LSTM), BP neural network based on the stacking ensemble learning method (stacking-BP), and LSTM based on the stacking ensemble learning method with ACO (stacking-ACO-LSTM). In this way, the gap between the real value and the predicted value can be seen more clearly. The predicted curve is shown in
Figure 4. In the experiment, a total of three evaluation indicators, namely regression coefficient of determination (R
2), mean square error (MSE), and mean absolute error (MAE), are employed to evaluate the five models. The evaluation results are shown in
Table 2 below.
As shown in
Table 2 below, among all the tested models in Nanchang City, they include the LSTM network optimized by stacking ensemble learning, the BP neural network optimized by stacking ensemble learning, and the LSTM network optimized by the ACO algorithm. From the data in the table, it can be seen that the performance of these three optimization methods is at a moderate level. Their mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R
2) are 36.516, 4.480, and 0.939; 36.085, 4.154, and 0.939; and 36.602, 4.590, and 0.938, respectively. Compared with the unoptimized LSTM model, the improvement in performance is not prominent. In view of this, on the basis of optimizing the LSTM network using stacking ensemble learning, the ACO algorithm is introduced to jointly optimize it, thus constructing the stacking-ACO-LSTM model. The mean squared error (MSE), mean absolute error (MAE), and coefficient of determination (R
2) of this model are 0.058, 0.178, and 0.942, respectively. Compared with the model using only the LSTM network, the mean squared error (MSE) and mean absolute error (MAE) have decreased by 99.88% and 96.58%, respectively, and the coefficient of determination (R
2) has increased by 2.39%. From the results of the mean squared error (MSE) and mean absolute error (MAE), it can be clearly found that the performance of the stacking-ACO-LSTM model surpasses that of other models. However, the improvement in its coefficient of determination (R
2) compared with other models is not significant. To further analyze the advantages and disadvantages of these models and evaluate their generalization ability, the data of two other prefecture-level cities in Jiangxi Province, namely Ganzhou City and Jiujiang City, are introduced, and this is achieved by comparing the prediction performance of several models on the data of Nanchang City, Ganzhou City, and Jiujiang City. From the integrated table data, it can be seen that, regardless of whether it is Ganzhou City or Jiujiang City, the optimization effects of the final optimized stacking-ACO-LSTM model in terms of mean squared error (MSE) and mean absolute error (MAE) far exceed those of other models, and its coefficient of determination (R
2) is also slightly higher than that of other models. Nevertheless, it is found from
Table 2 that the correlation results of Jiujiang City are worse than those of the other two cities. We believe that the reason why the correlation results of Jiujiang City are not as ideal as those of Nanchang City and Ganzhou City mainly lies in the significant fluctuations in the particulate matter data of Jiujiang City. In addition, environmental factors such as weather conditions and air humidity may also have an impact on this correlation. It should be noted that the core of this paper focuses on the predictive ability of the optimized model. Although the correlation index of Jiujiang City is relatively low, its predictive accuracy within its own optimized model is still at a good level. Thus, this result still has academic value and analytical significance and can effectively support the discussion on the effectiveness of the model in this paper. Conclusively, by comprehensively considering the prediction situation of Nanchang City and taking into account the generalization ability of the model, the final stacking-ACO-LSTM model demonstrates the best prediction ability.
Table 2.
Model evaluation results of three cities.
Table 2.
Model evaluation results of three cities.
City | Model | MSE | MAE | R2 |
---|
Nanchang | LSTM | 47.586 | 5.206 | 0.920 |
Stacking-LSTM | 36.516 | 4.480 | 0.939 |
ACO-LSTM | 36.602 | 4.590 | 0.938 |
Stacking-BP | 36.085 | 4.154 | 0.939 |
Stacking-ACO-LSTM | 0.058 | 0.178 | 0.942 |
Ganzhou | LSTM | 49.411 | 4.976 | 0.894 |
Stacking-LSTM | 45.176 | 4.790 | 0.904 |
ACO-LSTM | 41.869 | 4.659 | 0.911 |
Stacking-BP | 46.058 | 4.881 | 0.902 |
Stacking-ACO-LSTM | 0.089 | 0.215 | 0.920 |
Jiujiang | LSTM | 204.332 | 10.221 | 0.709 |
Stacking-LSTM | 189.219 | 9.864 | 0731 |
ACO-LSTM | 193.934 | 9.804 | 0.724 |
Stacking-BP | 196.889 | 9.845 | 0.720 |
Stacking-ACO-LSTM | 0.248 | 0.357 | 0.741 |
In this paper, on the basis of previous research results, five relevant pollutant indicators affecting PM
2.5 in Nanchang were selected, and correlation analyses were conducted before modeling. The analysis results are shown in
Figure 5, where PM
2.5 concentration is represented by
y. It is not difficult to see from
Figure 5 that the correlation between AQI, PM
10, and PM
2.5 is extremely significant and highly close. While the correlation between CO, NO
2, SO
2, and PM
2.5 is moderate. The correlation between O
3 and PM
2.5 concentration is the lowest. Among these substances, AQI is relatively special. It is an index calculated by comprehensively considering the concentrations of various air pollutants. Therefore, treating these particulate matters is equivalent to treating AQI itself. In conclusion, when fully addressing PM
2.5 particulates in haze control, the focus of treatment should first be on PM
10. This is because the high correlation between PM
10 and PM
2.5 means that effective treatment of PM10 will greatly facilitate progress in PM
2.5 treatment. Secondly, the management of CO, NO
2, and SO
2 should not be overlooked. Although their correlation with PM
2.5 is not as close as that of PM
10, they still have an important impact on air quality and should be given sufficient attention in the treatment process. In this system, O
3 has an extremely low correlation with PM
2.5 concentration and is regarded as a factor that can almost be ignored. However, this does not mean that O
3 is completely ignored. In the overall air quality treatment plan, we need to consider the actual situation. Under the premise of ensuring the realization of the main goals, O
3 can be given a certain degree of attention and management.
4. Conclusions
In the process of using the ant colony optimization to optimize the parameters of the model, the selection of the number of ants and the maximum number of iterations when the algorithm runs is very important. Their values will be crucial for the whole model. In the optimal model stacking-ACO-LSTM, the number of ants chosen at the beginning of this paper is 20, and the maximum number of iterations is 2. At this time, the model effect can reach 0.940. However, this is probably not the optimal result. So it is considered to gradually increase the number of ants and the maximum number of iterations. When the number of ants is 30 and the maximum number of iterations is 3, the model effect reaches its optimal state, which can reach 0.942. As both the number of ants and the maximum number of iterations increase upwards, the results become worse. Therefore, the model reaches its optimum when the number of ants is 30 and the maximum number of iterations is 3.
The stacking-ACO-LSTM model performs exceptionally in the prediction of PM2.5 concentration in Nanchang City. Among them, the introduced deep learning LSTM framework plays a crucial role. It utilizes its own advantages in processing time-series data to overcome the problems of unstable samples, long running time, and unstable and inaccurate results of traditional prediction methods. Meanwhile, the combination of the stacked ensemble learning method and the ant colony optimization method further optimizes the model parameters and improves the prediction performance of the model. The predicted values of the combined model are closest to the real values compared with other models, which is highly practical.
Accurate prediction results of PM2.5 content can provide a data basis for air quality prediction. At the same time, some measures can be taken in advance to solve the air pollution problem based on the results obtained, which can effectively improve urban air quality and provide some suggestions for green traveling. In the process of the experiment, the initial model lacked parameter optimization measures for the overall model. Consequently, the obtained result might not be the optimal one. In view of this, the ant colony optimization method is used in this paper to find the optimal hyperparameters of the integration model. There are numerous and complex particulate substances in the air, which have nonlinear characteristics. In this paper, the stacking ensemble learning method is used to perform weight analysis on the feature variables, that is, the factors that can affect PM2.5. By filtering out the feature variables with larger weights, the computational amount of the model prediction is reduced, and the accuracy of the prediction model is improved. Through the experimental comparison and analysis, it can be seen that the stacking-ACO-LSTM model proposed in this paper has the highest prediction accuracy of PM2.5 concentration among these models.