1. Introduction
Air quality is intrinsically linked to human well-being and health. Air is a fundamental constituent of life and is pivotal for sustaining ecosystems [
1]. The state of air quality transcends ecological health, directly impacting human life and public well-being. Severe air pollution inflicts substantial harm upon economies and livelihoods [
2]. Recently, with the rapid development of urbanization and industrialization, air pollution has become increasingly severe. The World Health Organization (WHO) estimates that air quality issues account for approximately 7 million yearly fatalities globally [
3,
4,
5]. Consequently, air quality has become a focal point of global concern. Investigating the determinants of air quality and devising accurate air quality prediction methods hold considerable relevance for regulating and remedying atmospheric pollution [
6]. Presently, numerous urban centers in our country have deployed environmental monitoring systems capable of yielding extensive data on particulate matter concentrations [
7]. Nevertheless, air quality prediction remains a formidable challenge, given the substantial temporal fluctuations and many influencing factors [
8]. In the current era of pollution prevention and control, alongside efforts toward carbon neutrality, achieving high-precision prediction of the Air Quality Index (AQI) has become a critical research focus. Accurate AQI prediction holds significant value for urban development and public health, providing essential insights for policy making and environmental management. This research contributes to enhancing urban air quality and plays a pivotal role in safeguarding national health by enabling timely and effective pollution mitigation strategies.
Air quality prediction models can be categorized as numerical or statistical. Numerical models are often used to predict air quality based on atmospheric physical and chemical processes using meteorological and mathematical principles related to them [
9]. Since the 1950s, there has been a rapid development of such models, including Community Multiscale Air Quality Modeling Systems [
10] (CMAQ) and Weather Research and Forecasting Model with Chemistry [
11] (WRF-Chem). These models are overly dependent on specific values of airborne pollutant concentrations and require complex calculations. Even so, the prediction results of these models have high uncertainty. The development of air quality prediction models from a data-driven perspective is a common approach to statistical modeling. There are a number of methods based on statistical modeling that have been widely used, such as autoregressive integrated moving average [
12] (ARIMA) and multiple linear regression [
13] (MLR). In 1993, Boznar et al. [
14] developed a Multi-Layer Perceptron (MLP)-based model to predict SO2 concentration in air. Since then, machine learning models have been rapidly developed and widely used in the field of air quality prediction, such as the artificial neural networks [
15] (ANN), the random forest [
16] (RF), and the support vector regression [
17] (SVR). Machine learning methods and other methods are effectively combined to establish the corresponding hybrid model, which can improve the accuracy of the model. Through the continuous development of machine learning, two improved methods have achieved greater success. One of them is LSSVRGSA, proposed by Muhammad et al. [
18] in 2020, which uses the Gravity Search Algorithm (GSA) to optimize the unknown parameters of the Least Squares Support Vector Regression (LSSVR) to reach the best performance of the model. Another improvement method was proposed by Zhu et al. [
19] in 2017 to obtain the input series after empirical modal decomposition (EMD) of the raw air quality data and input them into SVR to obtain more accurate prediction performance. It is well-known that air quality data have obvious spatial and temporal characteristics; however, these models have defects in extracting the spatial and temporal characteristics contained in the series data. Especially when dealing with a very large amount of data and requiring accurate prediction, the prediction performance of statistical models becomes unsatisfactory.
Amidst the escalation of air pollution and the swift advancement of deep learning technology, accurate air quality prediction has become increasingly imperative. Numerous predictive models grounded in profound learning principles have emerged in response. However, efficiently capturing spatiotemporal dependencies and improving the generalization ability of existing models remain significant challenges. For instance, Li Xiaoli and colleagues [
20] introduced a predictive approach hinged on image quality analysis, although external factors and the quality of the captured images heavily influence its efficacy. For prediction purposes, Septiawa and associates [
21] employed the BPTT algorithm, tailored for Recurrent Neural Network (RNN) architectures. However, this model is susceptible to the well-documented issues of vanishing and exploding gradients, particularly over extended sequences, which impedes its capability to learn long-term dependencies. Gilik et al. [
22] developed a CNN-LSTM hybrid deep neural network that leverages spatiotemporal correlations to estimate concentrations of air pollutants across multiple urban locales. Despite its sophisticated framework, the model’s dependency on data, sensitivity to data quality, and voluminous hyperparameter optimization pose significant challenges. Deng GQ et al. [
23] proposed a PSO-SVR model enhanced by particle swarm optimization, addressing the common PSO drawbacks of suboptimal global search efficiency and local minimum entrapment. Du YH and team [
24] crafted a GA-ACO-BP neural network, converging the merits of genetic algorithms and ant colony optimization into BP neural networks. While this approach boosts model performance, it also decelerates model convergence. This study proposes a CNN-Bi-LSTM model based on adaptive inertia weight PSO optimization (APSO-CNN-Bi-LSTM) to address these issues and further improve prediction accuracy.
The salient contributions of this study are twofold:
The model can effectively combine CNN and Bi-LSTM to extract complex spatiotemporal dependencies in historical air quality data and focus on the key factors, thus providing higher accuracy in air quality prediction.
The introduction of adaptive inertia weights enabled the PSO algorithm to generalize different datasets better, implying that the model can be effectively applied to air quality prediction in other areas or periods.
The structure of the subsequent chapters in this paper is outlined as follows:
Section 2 introduces the geographical environment, climatic characteristics, and data sources pertinent to the study area.
Section 3 details the APSO-CNN-Bi-LSTM model, including its design and implementation.
Section 4 presents the experimental setup and results, comparing the proposed model’s performance with that of other models and providing a thorough analysis of the findings. Finally,
Section 5 offers concluding remarks, summarizing the study’s contributions and implications. This structured layout ensures a comprehensive understanding of the research context, methodology, experimental evaluation, and conclusions drawn from the study.
3. Model Introduction
This paper used five models, including PSO-SVR, BPTT, CNN-LSTM, GA-ACO-BP, and APSO-CNN-Bi-LSTM, to predict the air quality in Xi’an.
Figure 2 illustrates the study’s flowchart, detailing the process and comparative analysis of these models.
3.1. CNN
CNN has powerful grid data processing capabilities and was widely used in image analysis [
27]. The input layer, convolutional layer, pooling layer, fully connected layer, and output layer form the main structure of the CNN. The information from the input layer is passed through the convolutional and pooling layers for feature transformation and extraction. The fully connected layer further integrates the extracted local information from the convolutional and pooling layers and maps this information to the output signal through the output layer.
Figure 3 depicts the structure of a CNN, illustrating the flow of information from the input layer through the convolutional and pooling layers to the fully connected and output layers.
Indeed, the convolutional layer is the cornerstone of a Convolutional Neural Network (CNN), distinguished by its unique capability to automatically detect and extract features from the input data, primarily images. This layer employs small, learnable filters, commonly called convolutional kernels, which slide across the entire input matrix to perform a convolution operation. Unlike traditional fully connected layers where each neuron is connected to every neuron in the previous layer, the convolutional layer maintains a local connectivity pattern, significantly reducing the number of parameters and enabling the network to learn spatial hierarchies of features. Each convolutional kernel, through a process termed ‘convolution’, performs element-wise multiplication between its weights and a portion of the input (referred to as the receptive field), followed by a summation operation to produce a single value in the feature map. This operation is iteratively applied across the entire input, generating a two-dimensional (or higher, depending on the dimensionality of the input and the number of filters) feature map that encodes various aspects of the input data. The mathematical formula representing the computation at each element (pixel) in a feature map can be generically described as follows:
where
is the output value of the feature map at
,
is the value of the input matrix at
.
is the weight of the convolution kernel at
, and
is the bias of the convolution kernel.
is the selected activation function, and
is the number of channels. The input matrix is usually convolved using multiple kernels. Each convolutional kernel can capture and encode a specific spatial feature information by performing the relevant operations in the input data matrix, thus generating a high-dimensional feature map enriched with that information. After that, this high-dimensional feature representation is processed by a pooling layer, which selectively retains critical information by down-sampling to reduce computational complexity and dimensional redundancy in subsequent processing, thus improving computational efficiency.
Since CNNs have better feature extraction for grid data, in this study, each category of the
environmental variables covering
stations was expanded into a high-dimensional construct, specifically in a matrix with
rows and
columns. A total of
categories (including meteorological parameters, air quality indicators, and spatial and temporal attributes) of data were input as multi-channel. To deeply explore the time series’ inherent temporal dynamics and long-term dependence, we innovatively incorporated a temporal distribution layer, which implemented a hierarchical transformation of the continuous time slices to refine the historical timing information of the input series effectively. Therefore, the temporal distribution layer was selected to encapsulate data from the past
hours. Consequently, the input of the CNN constituted a 4-D array with dimensions of
(
;
Figure 4). After iterative refinement by convolutional and pooling layers, the extracted spatiotemporal features were integrated and transformed into a highly condensed 1-D feature vector, which removed redundant information in the spatial and temporal dimensions. Through this series of operations, the model systematically extrapolated the predicted values of the air quality index for the next
hours at
different monitoring stations.
3.2. LSTM
To address the gradient issues encountered in RNN networks, this paper incorporates the Long Short-Term Memory (LSTM) model. The fundamental unit of the LSTM model comprises one or multiple Cells alongside three adaptive gates:
In-put,
Forget, and
Out-put (illustrated in
Figure 5). These gates facilitate the retention and regulation of information within the network. The gates employ a sigmoid activation function for decision-making, whereas the Input and Cell states typically transform the tanh function [
28,
29,
30].
As depicted in
Figure 5, given an input sequence represented as
and a corresponding sequence of hidden layer states denoted as
, the computational formulation is expressed as follows:
While represents the Hadamard product, where , and represent outputs from various gates. Additionally, denotes the cell state vector, signifies the output information of the hidden layer unit from the preceding time step, and represents the updated state of the memory cell. ,and are the corresponding weights, and is activation function, respectively.
3.3. Bi-LSTM
In addressing the limitations inherent in traditional LSTM cells—particularly their unidirectional flow of information, which confines their context understanding to past events without tapping into future context—Schuster and Paliwal [
31] innovated the Bidirectional Recurrent Neural Network (BRNN) paradigm. This pioneering architecture ingeniously integrates a pair of LSTM hidden layers that operate in tandem yet in opposing temporal orientations, collaboratively yielding a more comprehensive output. The essence of the BRNN lies in its ability to break free from the constraint of sequential data processing by simultaneously considering both preceding and succeeding inputs within its predictive framework. By doing so, it capitalizes on the synergy of two complementary views: one scanning the sequence from start to end (forward pass), the other reversing this trajectory (backward pass). This dual perspective integration significantly enriches the model’s capacity for contextual understanding and nuanced prediction [
32,
33,
34]. Illustrative of this sophisticated design,
Figure 6 presents a Bi-directional LSTM (Bi-LSTM) model schematic, visually encapsulating how these counter-propagating LSTM layers converge to enhance feature extraction and sequence analysis. This framework enhances the model’s capacity to capture long-range dependencies, significantly improving its performance in tasks that demand deep contextual sensitivity, such as natural language processing, speech recognition, and time-series prediction. By leveraging these capabilities, the model is better equipped to tackle a wide range of complex, real-world challenges, thereby increasing its applicability across various domains.
In
Figure 6, the outputs for the model at time
from the forward sequence, denoted as
, and from the reverse sequence, represented as
, are respectively displayed.
In the described model, represent the weight matrices, while denotes a smooth bounded function. Furthermore, and correspond to the forward and reverse outputs at time , respectively. Additionally, are identified as bias vectors. Finally, constitutes the model’s ultimate output at time .
Considering Bi-LSTM’s aptitude for time-series analysis, this study transformed the past hours’ data from stations, amounting to variables, into a 2-D array shaped as , effectively condensing the information into dimensions. The fully connected output layers generate the AQI predictions for stations over the next hours.
3.4. CNN-Bi-LSTM
The CNN-Bi-LSTM amalgamates the strengths of both CNN and Bi-LSTM, yielding remarkable breakthroughs, particularly in natural language processing and classification tasks [
35]. In this study, the CNN-Bi-LSTM model employed a streamlined process. It ingested the previous
hours’
variables data from
stations. The input format mirrored that of a typical CNN, configured as
). The input, convolution, and pooling layers were utilized for initial feature extraction from the input data. These extracted features were then vectorized into a 1-D array and fed into the Bi-LSTM layer, which examines their sequential patterns, effectively merging spatial and temporal feature analyses. Finally, the air quality index for the next
hours at
stations were obtained through the fully connected layer and output layer.
Figure 7 illustrates the structure of the CNN-Bi-LSTM model.
Figure 7 illustrates the meticulous pre-processing steps undertaken for the input dataset, underscoring the importance of data normalization before model ingestion. This vital pre-processing phase ensures that all features contribute equally to the learning process without bias induced by differing scales. The Bi-LSTM model, depicted in its intricate cyclic architecture, emerges as a sophisticated evolution of the LSTM, ingeniously marrying the strengths of its predecessor with bidirectional processing capabilities. It retains LSTM’s prowess in tackling the vanishing gradient problem and effectively modeling long-term dependencies while augmenting these qualities by concurrently examining historical and forecasted data. This dual-directional inspection mechanism enables the extraction of a broader spectrum of temporal features, thereby enriching the model’s interpretive depth and enhancing resilience against anomalous data points or outliers. Incorporating a Dropout layer within this architectural blueprint serves a strategic purpose. By randomly “dropping out” or temporarily turning off a fraction of neurons during training, the model is encouraged to avoid reliance on any specific subset of inputs, fostering a more distributed and collaborative learning dynamic. Consequently, this regularization technique enhances the model’s generalization capabilities, mitigating the risk of overfitting—where the model becomes overly tailored to the training data, compromising its ability to generalize to unseen instances. The judicious use of Dropout thus fortifies the Bi-LSTM’s adaptability and predictive accuracy across diverse datasets and real-world scenarios.
3.5. PSO Based on Adaptive Inertia Weight
Particle Swarm Optimization (PSO), a groundbreaking method in computational intelligence, was ingeniously conceived by Kennedy and Eberhart in 1995 [
36], inspired by the elegant coordination and foraging dynamics observed in bird flocking. Characterized by its sleek simplicity and universal applicability, PSO stands out for its capacity to navigate complex optimization terrains without necessitating domain-specific preliminaries. This trait has fueled significant academic curiosity and research. This versatility has propelled its application across various disciplines and challenges [
37], marking it as a versatile tool for solving optimization problems. At the heart of the PSO algorithm lies a metaphorical swarm of particles, each embodying a potential solution candidate within the search space. These particles are assessed based on their ‘fitness’—a measure determined by the objective function being optimized. As the algorithm iteratively progresses, particles dynamically recalibrate their velocities and modify their positions through a harmonious blend of individual learning from their best-known positions and collective wisdom garnered from the swarm’s globally discovered optimal locations. At any given instant
, the velocity and position updating equations for the
dimension of the
particle in the PSO algorithm are articulated as follows:
In the PSO framework, and respectively represent the velocity and position of a particle. The represents the inertia weight, which influences the particle’s momentum and ability to explore the search space. The coefficients and , known as acceleration factors, embody cognitive and social components, guiding the particle toward its personal best position and the swarm’s global best position , respectively. The variables and are two independent random numbers uniformly distributed in the interval [0, 1]. Adding these stochastic elements introduces a degree of randomness to the algorithm’s evolutionary process. This inherent uncertainty not only allows the algorithm to explore the search space more thoroughly but also dramatically enhances the potential of the algorithm to determine the optimal solution to the problem.
In the PSO, the inertia weight
is pivotal for modulating the influence of a particle’s past velocity on its present velocity. This inertia weight is critical in harmonizing the algorithm’s local and global search capabilities. A higher
value augments the algorithm’s proficiency in global optimization, whereas a lower value enhances local optimization capabilities [
38].
The concept of an adaptive inertia weight entails initiating the algorithm with a higher inertia weight to bolster the global search potential, thereby enabling particles to traverse a more extensive search space [
39,
40,
41,
42]. As the iterations progress, the inertia weight is methodically reduced to enhance the local search capability, allowing particles to identify more precise solutions. This dynamic adjustment of weights during the algorithm’s operation is designed to optimize both the convergence rate and the quality of the solution. By adapting the approach to balance the varying requirements of exploration and exploitation at different phases, the algorithm aims to maintain a continual equilibrium between local and global searches. This balance is crucial for facilitating the identification of the global optimal solution.
The cosine decreasing inertia weight strategy aims to introduce a more nuanced control over the exploration–exploitation balance throughout the optimization process in Particle Swarm Optimization (PSO) [
43], adapting better to the inherently nonlinear nature of complex optimization landscapes. Unlike the conventional linear decrease, where the inertia weight reduces monotonically in a predetermined fashion, this adaptive strategy offers a smoother transition that aligns more closely with the varying needs for exploration and intensification during different optimization stages [
44]. The formula for calculating this inertia weight is as follows:
Among them,
is the current number of iterations,
and
are the minimum and maximum values of weight
. Furthermore,
refers to the maximum permissible iteration count, with
representing the mathematical constant
. In this study, the
was 1500, and the acceleration factors
and
, the number of particles was 4, the search dimension was 3, and the Early Stopping mechanism (Patience = 29) was added to save computing resources and prevent over-fitting. APSO-CNN-Bi-LSTM algorithms are shown in
Figure 8.
The outlined implementation procedure for integrating the Adaptive Particle Swarm Optimization (APSO) algorithm with a predictive model for air quality prediction embodies a systematic and iterative approach aimed at refining model parameters and enhancing prediction accuracy. Summarizing the steps provided:
- (1).
Initialization: Commence by initializing the predictive model and the APSO algorithm. This includes setting up the model structure and parameters and defining the swarm characteristics for PSO, such as the number of particles and their initial positions in the solution space.
- (2).
Objective Function Definition: Define the objective function that the PSO algorithm will aim to minimize or maximize. In this case, Mean Absolute Error (MAE) is chosen as the fitness function , reflecting the error between predicted and actual air quality measurements. Establish a fitness threshold to determine the quality of the acceptable solution
- (3).
Initial Fitness Calculation: Introduce the initial parameter set into the model and compute its fitness value . If this initial fitness surpasses the predefined threshold , record the parameters and their corresponding fitness as a candidate solution. Otherwise, proceed to the optimization stage.
- (4).
APSO Optimization: Initialize the swarm by assigning random positions and velocities to each particle, each representing a unique configuration of model parameters. Train the model using each particle’s position (parameter set), evaluate its fitness using MAE, and retain the particle’s state .
- (5).
Particle Update: Iterate through generations or iterations, adjusting each particle’s position and velocity using adaptive inertia weights. This dynamic adjustment balances exploration and exploitation to progressively steer the swarm toward better solutions—Recalculate fitness after each update.
- (6).
Termination and Selection: Repeat the parameter update process until maximum iterations are reached. Upon termination, review all recorded pairs and identify the set with the best fitness score. Employ this optimal parameter configuration in the model for the final air quality prediction task.
By systematically refining model parameters through APSO, this methodology seeks to optimize the model’s predictive power, thereby enhancing the accuracy and reliability of air quality forecasts.
3.6. Model Performance Evaluation Indicators
To rigorously assess the generalization capacity of the proposed model, the dataset was meticulously partitioned into two subsets: a training set, encompassing the initial 80% of the data, was utilized for model learning and parameter tuning, while the remaining 20% constituted the validation set, serving to gauge the model’s performance on unseen data. This division adheres to standard practice in machine learning, ensuring that the model’s predictive efficacy can be objectively evaluated. Consistent with many comparative predictive models, this study adopted the Mean Absolute Error (MAE) as the loss function. MAE is a prevalent metric for quantifying the discrepancy between the model’s predicted outputs and the actual observations, with a particular sensitivity to large deviations. It provides a quadratic penalty for errors, emphasizing the significance of substantial prediction inaccuracies. The mathematical expression for calculating MAE is formally defined as follows:
Among them, represents the actual value, denotes the predicted value generated by the model, and represents the total number of data points.
4. Experiment Analysis
In this experiment, the experimental environment used is PyCharm 2023.2.1, Python 3.8, Tensorflow 2.12.0, CUDA 11.4, and GPU 3090. Four real air quality datasets were employed to evaluate the model’s performance. The following questions were validated and answered: (1) whether the APSO-CNN-Bi-LSTM model can achieve more minor prediction errors than the previous model; (2) whether the APSO-CNN-Bi-LSTM model had better convergence; (3) whether the adaptive inertia weight PSO algorithm and the CNN in the APSO-CNN-Bi-LSTM model played a role in the prediction of this model.
4.1. Datasets and Pre-Processing
The first step for accurate air quality prediction was to obtain high-quality data. In this paper, four real air quality datasets of Xi’an City were used to evaluate the performance and convergence speed of the APSO-CNN-Bi-LSTM model. To effectively predict air quality and solve the problem, the collected raw data must be pre-processed to remove irrelevant, redundant, noisy, and unreliable data to prevent misleading results. It also produced more accurate results.
The datasets were first normalized using the function
, to eliminate the effect on the prediction results due to the difference in the scale [
45]. The specific mathematical formulas are shown as follows.
Among them,
is the sample data to be normalized, while
and
denote the minimum and maximum values observed in the sample data column, and
is the result obtained after the normalization of the sample data.
Table 3 presents an overview of the four datasets, with detailed dataset information as follows.
In this section, we juxtapose the APSO-CNN-Bi-LSTM model introduced in this paper against a cadre of alternate air quality prediction algorithms. The precision of the model was predominantly quantified by the discrepancy between the actual observational data and the predicted data by the model. To appraise the efficacy of the model delineated herein, we employed specific metrics to ascertain the model’s predictive accuracy [
46]. The calculation formulae used for this purpose are outlined as follows:
Among them, represents the actual value, denotes the predicted value generated by the model, and represents the total number of data points.
4.2. Effect of Particle Number on Experimental Results
The number of particles is a crucial factor in the Particle Swarm Optimization (PSO) algorithm, directly influencing its optimization performance and convergence speed. Increasing the number of particles allows for a more comprehensive coverage of the search space, enhancing the likelihood of finding the global optimal solution and minimizing the risk of falling into local optima. Conversely, having too few particles can lead to insufficient search space coverage, which might result in premature convergence to suboptimal solutions. An appropriate number of particles facilitates faster convergence because particles can quickly share information and cluster around potential solution areas. However, excessive particles can decelerate convergence due to the increased computational load and the tendency for over-exploration.
Table 4 demonstrates the average AQI prediction performance evaluation metrics for the next 2 h using the APSO-CNN-Bi-LSTM model with varying particle numbers. The results indicate that while the test error does not vary significantly with different particle numbers, optimal performance was achieved when the particle number was set to four. Specifically, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) reach their minimum values of 49.6534 and 28.5812, with a model training time of 32.3 min.
Choosing an appropriate number of particles in PSO is essential for balancing comprehensive search space coverage and computational efficiency. The results from
Table 4 illustrate that a particle number of four provides an optimal balance for the APSO-CNN-Bi-LSTM model, yielding minimal prediction errors and reasonable training time.
4.3. Forecast Error Analysis
Table 5 lists the prediction performance errors of five models (PSO-SVR, BPTT, CNN-LSTM, GA-ACO-BP, and APSO-CNN-Bi-LSTM) proposed in this paper on the four datasets. From
Table 5, it can be seen that, as far as RMSE and MAE were concerned, the models in this paper almost achieved the best results. The ablation experiments of the three models (APSO-Bi-LSTM, CNN-Bi-LSTM, and APSO-CNN-Bi-LSTM) also showed that the PSO algorithm based on the optimization of adaptive inertia weights and the two modules of CNN both contribute to the increase in the model prediction accuracy. Comparisons with the APSO module optimized solely with adaptive inertia weights revealed that the APSO-CNN-Bi-LSTM model, featuring both modules, yielded improvements of 3.52%, 3.83%, 3.81%, and 0.9% across the four datasets. Furthermore, in contrast to the CNN-Bi-LSTM model with only a CNN module, the APSO-CNN-Bi-LSTM model exhibited enhancements of 7.32%, 2.51%, 0.23%, and 0.64% on the same datasets, underscoring the complementary roles of the CNN and APSO modules in boosting prediction performance. Moreover, the APSO-CNN-Bi-LSTM model proposed in this study showcased superior performance compared to several other prediction models, reaffirming its effectiveness in the air quality prediction tasks.
As is widely recognized, predicting air quality becomes increasingly challenging as the prediction time step extends. To further evaluate the short-term AQI prediction capability of the APSO-CNN-Bi-LSTM model, we predicted the AQI for the next 1 to 12 h. The RMSE and MAE change curves, illustrated in
Figure 9, provide a comparative analysis of the model’s prediction performance over these intervals. The data presented in
Figure 9 represent the average performance across the four datasets.
As shown in
Figure 9, the predictive ability of all models decreases as the prediction time step increases. It is evident that the APSO-CNN-Bi-LSTM model consistently outperforms other deep learning models based on the two performance evaluation indicators,
and
. For prediction time steps less than 6 h, the models’ differences are insignificant. However, when the prediction time step exceeds 6 h, APSO-CNN-Bi-LSTM clearly outperforms other models. This indicates that APSO-CNN-Bi-LSTM is more adept at handling complex data as the prediction problem becomes more challenging.
The CNN module effectively captures spatial dependencies in the data, while the Bi-LSTM module leverages past and future information for predictions. Additionally, the APSO optimization algorithm further refines model hyperparameters, minimizing the need for manual intervention. This combination of techniques enhances the model’s ability to manage and predict complex data scenarios.
4.4. Convergence Analysis
A predictive model’s efficiency is a critical factor in practical engineering scenarios. APSO-CNN-Bi-LSTM model demonstrates superior prediction accuracy and enhanced convergence properties—evidenced by needing fewer iterations to reach optimal performance, as shown in
Figure 10—a significant advantage. This quicker convergence implies that the model training phase can be completed in a shorter timeframe, reducing computational resources and energy consumption. This leads to cost savings, a crucial aspect for businesses and organizations implementing such systems at scale.
Figure 11 further reinforces this advantage by illustrating that our proposed model requires less time to cease training across all four datasets than alternative models. This rapid training cessation accelerates the initial deployment process and facilitates more efficient iterative upgrades in response to new data or changing environmental conditions. In essence, the ability of the APSO-CNN-Bi-LSTM model to expedite the initial training and future updates is a testament to its practical viability. It underscores how advancements in algorithm design—incorporating adaptive optimization strategies with sophisticated neural network architectures—can translate into tangible benefits beyond mere accuracy improvements, addressing real-world constraints such as time-to-market and operational expenses. This efficiency is particularly valuable in industries like environmental monitoring, where timely insights and adaptive responses to air quality changes can have immediate public health and economic implications. The prediction time step for this part is set to 1 h.
4.5. Analyzing AQI from Spatiotemporal Distribution Characteristics
The detailed analysis of the AQI trends in Xi’an City from 2019 to 2020, as depicted in
Figure 12, highlights the marked seasonal fluctuations and their underlying causes, providing crucial insights for environmental management and policymaking. The “U”-shaped pattern observed annually, with AQI peaking in December and reaching its nadir in August, underscores the strong influence of meteorological conditions and human activities on air pollution levels. The summer months benefit from favorable weather conditions for pollutant dispersion—enhanced solar radiation, increased atmospheric instability, and more frequent precipitation. These meteorological phenomena act in concert to cleanse the air, leading to a more consistent and healthier air quality environment. Conversely, the winter season experiences the worst air quality, characterized by a high AQI and substantial variability. The cold temperatures facilitate the formation of temperature inversions, a meteorological phenomenon that traps pollutants near the ground.
Additionally, the heightened demand for heating, often reliant on fossil fuels, and occasional festive fireworks contribute to a surge in pollutant emissions, exacerbating the air pollution problem. Spring and autumn present their unique challenges. Springtime sees an increase in dust storms, introducing additional particulate matter into the atmosphere. While generally more stable regarding weather, Autumn witnesses increased biomass burning from agricultural practices, which releases pollutants such as delicate particulate matter and volatile organic compounds. Understanding these seasonal dynamics is essential for devising targeted interventions to mitigate air pollution. For instance, measures to curb emissions from heating sources and regulate fireworks usage during winter could be prioritized. Similarly, efforts to control dust storms in the spring and manage agricultural burning in autumn would be instrumental in maintaining better air quality throughout the year.
The APSO-CNN-Bi-LSTM model, with its demonstrated capability for accurate and efficient prediction, can be a powerful tool in anticipating these seasonal shifts, enabling authorities to proactively implement strategies to mitigate pollution peaks and safeguard public health. By integrating such predictive analytics into policy planning, cities like Xi’an can work toward more sustainable and resilient urban environments.
As illustrated in
Figure 1 and the subsequent discussion, Xi’an’s geographical and meteorological context is crucial in shaping its air quality dynamics. The city’s unique topography significantly influences pollutant dispersion and accumulation patterns in the Weihe River Basin. The Qinling Mountains to the south act as a natural barrier, hindering the northwesterly winds that dominate during fall and winter. These winds, weakened upon encountering the mountain range, fail to effectively ventilate and cleanse the air over Xi’an, leading to higher pollutant concentrations. Furthermore, the mountains also obstruct the influx of moist air from southern regions, exacerbating dry conditions that favor pollutant buildup, especially during dry seasons. The basin-like configuration of the Weihe River Valley exacerbates this situation, trapping pollutants and further intensifying their concentration within the city. The region’s semi-arid climate, characterized by harsh, cold winters and sweltering summers, creates atmospherically stable conditions during the colder months. Lower wind speeds and atmospheric stability hamper the vertical and horizontal dispersion of pollutants, creating a conducive environment for air pollution episodes. Urbanization and energy consumption patterns also contribute to the spatial variation in air quality within Xi’an. Analysis of the four datasets reveals that central urban areas exhibit higher AQI values than the outskirts, indicating a higher pollution load. The AQI progressively declines as one moves away from the densely populated and industrially active city center, highlighting a transparent urban–rural gradient in air quality. This spatial and temporal heterogeneity underscores the complexity of managing air pollution in Xi’an. Strategies for mitigation must account for these geographical and meteorological peculiarities, along with the urban layout and energy consumption profiles. Data-driven insights from the APSO-CNN-Bi-LSTM model can guide targeted interventions, such as optimizing urban planning, promoting cleaner energy alternatives, and implementing pollution control measures during high-risk periods. These measures will contribute to a more sustainable and livable urban environment.
4.6. Visualization and Analysis of Results
Visualizing the model’s prediction results against real-world data, as performed in
Figure 13 and
Figure 14 by comparing randomly selected periods from datasets 1036A and 1037A, demonstrates the proposed model’s performance. The figure showcases instances where the APSO-CNN-Bi-LSTM model excelled in accurately predicting sudden temporal variations in air quality, showing a closer alignment with actual readings than alternative models. This precision in capturing abrupt changes is fundamental in air quality prediction since sudden spikes or drops in pollutant levels can have immediate implications for public health advisories and environmental management decisions. The minor discrepancy between predicted and actual values underscores the model’s ability to effectively learn and replicate the complex dynamics of air quality fluctuations, even during periods of instability. By accurately tracking these fluctuations, the APSO-CNN-Bi-LSTM model offers decision-makers a reliable tool to anticipate and respond to air quality events promptly. This heightened accuracy can support proactive measures to mitigate exposure risks, such as issuing timely alerts, adjusting industrial operations, or implementing temporary emission controls, thereby contributing to more effective environmental governance and public health protection. Overall, the graphical representations in
Figure 13 and
Figure 14 provide compelling empirical evidence of the model’s exceptional predictive capabilities, closely mirroring actual air quality measurements. These results underscore the model’s effectiveness and superiority over other approaches in addressing the complex challenges associated with predicting air quality dynamics. The visual alignment between predicted and actual values highlights the model’s robustness and reliability in real-world applications. The prediction time step for this part is set to 1 h.
4.7. Model Generalization Experiment
Model generalization verification is a crucial step when using deep learning models for air quality prediction tasks. Generalization refers to the model’s ability to predict unseen data. In air quality prediction, this means that the model must perform well on training data and accurately predict air quality conditions in new, unknown areas or periods. To verify the generalization capability of our model, we used dataset 1040A, which was collected from another monitoring station in Xi’an, as the data source. We visualized the predicted and actual values for generalization verification over three consecutive days. As shown in
Figure 15, this comparison allows us to observe how well the model’s predictions align with the real-world air quality data in an untrained environment. By closely examining these visualizations, we can assess the model’s performance and ability to generalize to new data. This approach is essential for ensuring that our model is not overfitting to the training data and can provide reliable predictions in diverse and previously unseen scenarios, a critical aspect of air quality management and planning.
Table 6 shows the parameter settings.
5. Summary
This study successfully integrates innovative methodologies from deep learning and optimization theory to tackle the intricate challenge of air quality prediction. The introduction of the APSO-CNN-Bi-LSTM model signifies a substantial advancement in environmental prediction capabilities. By synergistically combining the strengths of three key components—Convolutional Neural Networks (CNN) for spatial feature extraction, Bidirectional Long Short-Term Memory networks (Bi-LSTM) for capturing temporal dependencies, and Adaptive Particle Swarm Optimization (APSO) for efficient hyperparameter tuning—the model achieves a remarkable balance between exploration and exploitation. Experimental validations on real-world datasets affirm the model’s superiority in prediction accuracy, especially under sudden air quality shifts, where the model’s predictive power shines through with minimal deviation from actual measurements. This heightened accuracy is paramount for public health interventions and policy planning.
Moreover, the model’s accelerated convergence, which requires fewer iterations to attain optimal performance, underscores its practicality and efficiency. This research promotes the application of advanced computing techniques in environmental science. It provides a new approach to enhance our understanding and management of environmental challenges, ultimately contributing to a more sustainable and healthier living environment.
Future research can explore integrating more types of environmental data, such as satellite remote sensing data and social media data, with existing datasets. By incorporating these additional data sources, the predictive models can better understand the factors influencing air quality. Additionally, deeper neural network structures or novel machine learning algorithms, such as advanced neural networks or Transformer models, can improve prediction accuracy and model interpretability. These advancements will enhance the ability to predict air quality conditions more reliably and provide better environmental management and decision-making tools.