Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model

Zhu, Xuguang; Zou, Feifei; Li, Shanghai

doi:10.3390/app14135787

Open AccessArticle

Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model

by

Xuguang Zhu

¹,

Feifei Zou

^2,*

and

Shanghai Li

²

¹

College of Innovation and Practice, Liaoning Technical University, Fuxin 123008, China

²

School of Software, Liaoning Technical University, Huludao 125105, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(13), 5787; https://doi.org/10.3390/app14135787

Submission received: 26 May 2024 / Revised: 27 June 2024 / Accepted: 28 June 2024 / Published: 2 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Effective air quality prediction models are crucial for the timely prevention and control of air pollution. However, previous models often fail to fully consider air quality’s temporal and spatial distribution characteristics. In this study, Xi’an City is used as the study area. Data from 1 January 2019 to 31 October 2020 are used as the training set, while data from 1 November 2020 to 31 December 2020 are used as the test set. This paper proposes a multi-time and multi-site air quality prediction model for Xi’an, leveraging a deep learning network model based on APSO-CNN-Bi-LSTM. The CNN model extracts the spatial features of the input data, the Bi-LSTM model extracts the time series features, and the PSO algorithm with adaptive inertia weight (APSO) optimizes the model’s hyperparameters. The results show that the model achieves the best results in terms of MAE and RMSE. Compared to the PSO-SVR, BPTT, CNN-LSTM, and GA-ACO-BP models, the MAE improved by 9.375%, 6.667%, 2.276%, and 4.975%, while the RMSE improved by 8.371%, 8.217%, 6.327%, and 5.293%. These significant improvements highlight the model’s accuracy and its promising application prospects.

Keywords:

Bi-LSTM; CNN; PSO algorithm; air quality prediction; AQI

1. Introduction

Air quality is intrinsically linked to human well-being and health. Air is a fundamental constituent of life and is pivotal for sustaining ecosystems [1]. The state of air quality transcends ecological health, directly impacting human life and public well-being. Severe air pollution inflicts substantial harm upon economies and livelihoods [2]. Recently, with the rapid development of urbanization and industrialization, air pollution has become increasingly severe. The World Health Organization (WHO) estimates that air quality issues account for approximately 7 million yearly fatalities globally [3,4,5]. Consequently, air quality has become a focal point of global concern. Investigating the determinants of air quality and devising accurate air quality prediction methods hold considerable relevance for regulating and remedying atmospheric pollution [6]. Presently, numerous urban centers in our country have deployed environmental monitoring systems capable of yielding extensive data on particulate matter concentrations [7]. Nevertheless, air quality prediction remains a formidable challenge, given the substantial temporal fluctuations and many influencing factors [8]. In the current era of pollution prevention and control, alongside efforts toward carbon neutrality, achieving high-precision prediction of the Air Quality Index (AQI) has become a critical research focus. Accurate AQI prediction holds significant value for urban development and public health, providing essential insights for policy making and environmental management. This research contributes to enhancing urban air quality and plays a pivotal role in safeguarding national health by enabling timely and effective pollution mitigation strategies.

Air quality prediction models can be categorized as numerical or statistical. Numerical models are often used to predict air quality based on atmospheric physical and chemical processes using meteorological and mathematical principles related to them [9]. Since the 1950s, there has been a rapid development of such models, including Community Multiscale Air Quality Modeling Systems [10] (CMAQ) and Weather Research and Forecasting Model with Chemistry [11] (WRF-Chem). These models are overly dependent on specific values of airborne pollutant concentrations and require complex calculations. Even so, the prediction results of these models have high uncertainty. The development of air quality prediction models from a data-driven perspective is a common approach to statistical modeling. There are a number of methods based on statistical modeling that have been widely used, such as autoregressive integrated moving average [12] (ARIMA) and multiple linear regression [13] (MLR). In 1993, Boznar et al. [14] developed a Multi-Layer Perceptron (MLP)-based model to predict SO2 concentration in air. Since then, machine learning models have been rapidly developed and widely used in the field of air quality prediction, such as the artificial neural networks [15] (ANN), the random forest [16] (RF), and the support vector regression [17] (SVR). Machine learning methods and other methods are effectively combined to establish the corresponding hybrid model, which can improve the accuracy of the model. Through the continuous development of machine learning, two improved methods have achieved greater success. One of them is LSSVRGSA, proposed by Muhammad et al. [18] in 2020, which uses the Gravity Search Algorithm (GSA) to optimize the unknown parameters of the Least Squares Support Vector Regression (LSSVR) to reach the best performance of the model. Another improvement method was proposed by Zhu et al. [19] in 2017 to obtain the input series after empirical modal decomposition (EMD) of the raw air quality data and input them into SVR to obtain more accurate prediction performance. It is well-known that air quality data have obvious spatial and temporal characteristics; however, these models have defects in extracting the spatial and temporal characteristics contained in the series data. Especially when dealing with a very large amount of data and requiring accurate prediction, the prediction performance of statistical models becomes unsatisfactory.

Amidst the escalation of air pollution and the swift advancement of deep learning technology, accurate air quality prediction has become increasingly imperative. Numerous predictive models grounded in profound learning principles have emerged in response. However, efficiently capturing spatiotemporal dependencies and improving the generalization ability of existing models remain significant challenges. For instance, Li Xiaoli and colleagues [20] introduced a predictive approach hinged on image quality analysis, although external factors and the quality of the captured images heavily influence its efficacy. For prediction purposes, Septiawa and associates [21] employed the BPTT algorithm, tailored for Recurrent Neural Network (RNN) architectures. However, this model is susceptible to the well-documented issues of vanishing and exploding gradients, particularly over extended sequences, which impedes its capability to learn long-term dependencies. Gilik et al. [22] developed a CNN-LSTM hybrid deep neural network that leverages spatiotemporal correlations to estimate concentrations of air pollutants across multiple urban locales. Despite its sophisticated framework, the model’s dependency on data, sensitivity to data quality, and voluminous hyperparameter optimization pose significant challenges. Deng GQ et al. [23] proposed a PSO-SVR model enhanced by particle swarm optimization, addressing the common PSO drawbacks of suboptimal global search efficiency and local minimum entrapment. Du YH and team [24] crafted a GA-ACO-BP neural network, converging the merits of genetic algorithms and ant colony optimization into BP neural networks. While this approach boosts model performance, it also decelerates model convergence. This study proposes a CNN-Bi-LSTM model based on adaptive inertia weight PSO optimization (APSO-CNN-Bi-LSTM) to address these issues and further improve prediction accuracy.

The salient contributions of this study are twofold:

The model can effectively combine CNN and Bi-LSTM to extract complex spatiotemporal dependencies in historical air quality data and focus on the key factors, thus providing higher accuracy in air quality prediction.
The introduction of adaptive inertia weights enabled the PSO algorithm to generalize different datasets better, implying that the model can be effectively applied to air quality prediction in other areas or periods.

The structure of the subsequent chapters in this paper is outlined as follows: Section 2 introduces the geographical environment, climatic characteristics, and data sources pertinent to the study area. Section 3 details the APSO-CNN-Bi-LSTM model, including its design and implementation. Section 4 presents the experimental setup and results, comparing the proposed model’s performance with that of other models and providing a thorough analysis of the findings. Finally, Section 5 offers concluding remarks, summarizing the study’s contributions and implications. This structured layout ensures a comprehensive understanding of the research context, methodology, experimental evaluation, and conclusions drawn from the study.

2. Introduction to the Study Area and Dataset Sources

2.1. Air Quality in Xi’an

Xi’an was chosen as the study area because it is an essential city in northwest China and belongs to the temperate semi-arid climate zone. It borders the Loess Plateau to the east and the Qinling Mountains to the west, with high elevations on all sides and low in the center. Therefore, this geographical location makes Xi’an susceptible to cold air masses in winter, creating meteorological conditions unfavorable to the dispersion of pollutants. At the same time, the Qinling Mountains, while blocking some of the intrusion of sandstorms to a certain extent, may also act as a barrier to the accumulation of pollutants under certain meteorological conditions. In summer, humid airflow from the south can bring better purification and improve air quality. According to the annual statistical report of Xi’an City, in 2019 and 2020, there were 297 days with significant air pollution levels, and the primary pollutants were PM_2.5, O₃, and PM₁₀. These pollutants pose health risks to residents and impact the city’s ecological balance and economic sustainability.

To quickly and accurately assess air quality, it is necessary to adopt scientific evaluation methods and reasonable air pollution evaluation indicators to qualitatively or comprehensively characterize the current air quality situation [25]. Adopting the Air Quality Index (AQI) as a primary assessment tool underscores the necessity for a comprehensive and standardized approach to understanding air quality conditions. The AQI offers a holistic picture of air pollution levels by consolidating data on six key pollutants (NO₂, CO, O₃, SO₂, PM₁₀, and PM_2.5). The AQI quantifies the individual pollutant concentrations and translates these concentrations into a single, easily understandable index, categorizing air quality into several levels ranging from good to hazardous. This simplification facilitates communication with the public, policymakers, and researchers, allowing for quick assessments and appropriate responses. The acquisition of hourly AQI data from four monitoring stations in Xi’an City over two years (2019–2020) from the China Environmental Testing Center (CETC, http://www.cnemc.cn/sssj/, accessed 6 May 2024) highlights a rigorous and detailed data collection process. Such granular data are vital for precise modeling and prediction exercises, as they capture the dynamic nature of air pollution and its variation across different locations and times. Table 1 and Figure 1 show the information on the four air quality monitoring stations and the study area.

2.2. Input Variables

Air quality is influenced by various factors, including pollution emissions, meteorological conditions, geographical location, and socio-economic factors. Historical concentrations of air pollutants can serve as valuable references for future air quality predictions. Meteorological indicators are reported to be essential input variables for air quality prediction models. Geographical location has a significant influence on climatic factors as well as wind speed and direction. The city’s industrialization level and energy structure will also have more or less impact on air quality. Under the combined effect of these factors, the air quality prediction task has spatiotemporal solid characteristics. Therefore, we selected the 6 h air quality, spatiotemporal, and meteorological data as input variables to construct the AQI prediction model (shown in Table 2). The temporal resolution for all data types is set to hourly to capture the dynamic nature of air quality changes accurately.

Meteorological datasets were sourced from the China National Meteorological Science Data Center (http://data.cma.cn/site/index.html, accessed 6 May 2024) [26]. These datasets include rainfall quantity, air pressure, dew points, wind speed, and direction. The air quality datasets were provided by the China Environmental Testing Center (CETC, http://www.cnemc.cn/sssj/, accessed 6 May 2024). These datasets encompass hourly concentration data of various pollutants (NO₂, CO, O₃, SO₂, PM₁₀, and PM_2.5) collected from four air quality monitoring stations in Xi’an, spanning from 1 January 2019 to 31 December 2020. In addition to these datasets, the spatiotemporal data comprised three types of static spatial data: the longitude and latitude of the ground air quality monitoring stations, the distance from each monitoring station to the Xi’an station, and the structural data of these spatial attributes, which remained constant over time. Furthermore, three types of temporal data were included: seasonal data, weekday indices, and the specific times at each monitoring station. The monitoring season was mapped using 1 to 4, and the weekday index was represented by 0 (Saturday and Sunday) or 1 (Monday to Friday).

3. Model Introduction

This paper used five models, including PSO-SVR, BPTT, CNN-LSTM, GA-ACO-BP, and APSO-CNN-Bi-LSTM, to predict the air quality in Xi’an. Figure 2 illustrates the study’s flowchart, detailing the process and comparative analysis of these models.

3.1. CNN

CNN has powerful grid data processing capabilities and was widely used in image analysis [27]. The input layer, convolutional layer, pooling layer, fully connected layer, and output layer form the main structure of the CNN. The information from the input layer is passed through the convolutional and pooling layers for feature transformation and extraction. The fully connected layer further integrates the extracted local information from the convolutional and pooling layers and maps this information to the output signal through the output layer. Figure 3 depicts the structure of a CNN, illustrating the flow of information from the input layer through the convolutional and pooling layers to the fully connected and output layers.

Indeed, the convolutional layer is the cornerstone of a Convolutional Neural Network (CNN), distinguished by its unique capability to automatically detect and extract features from the input data, primarily images. This layer employs small, learnable filters, commonly called convolutional kernels, which slide across the entire input matrix to perform a convolution operation. Unlike traditional fully connected layers where each neuron is connected to every neuron in the previous layer, the convolutional layer maintains a local connectivity pattern, significantly reducing the number of parameters and enabling the network to learn spatial hierarchies of features. Each convolutional kernel, through a process termed ‘convolution’, performs element-wise multiplication between its weights and a portion of the input (referred to as the receptive field), followed by a summation operation to produce a single value in the feature map. This operation is iteratively applied across the entire input, generating a two-dimensional (or higher, depending on the dimensionality of the input and the number of filters) feature map that encodes various aspects of the input data. The mathematical formula representing the computation at each element (pixel) in a feature map can be generically described as follows:

X_{x, y}^{o} = f_{c} (\sum_{m = 0}^{k} \sum_{n = 0}^{k} w_{m, n} X_{x + m, y + m}^{i n} + b)

where

X_{x, y}^{o}

is the output value of the feature map at

(x, y)

,

X_{x + m, y + m}^{i n}

is the value of the input matrix at

(x, y)

.

w_{m, n}

is the weight of the convolution kernel at

(m, n)

, and

b

is the bias of the convolution kernel.

f_{c} (\cdot)

is the selected activation function, and

k

is the number of channels. The input matrix is usually convolved using multiple kernels. Each convolutional kernel can capture and encode a specific spatial feature information by performing the relevant operations in the input data matrix, thus generating a high-dimensional feature map enriched with that information. After that, this high-dimensional feature representation is processed by a pooling layer, which selectively retains critical information by down-sampling to reduce computational complexity and dimensional redundancy in subsequent processing, thus improving computational efficiency.

Since CNNs have better feature extraction for grid data, in this study, each category of the

m

environmental variables covering

n

stations was expanded into a high-dimensional construct, specifically in a matrix with

m

rows and

n

columns. A total of

c

categories (including meteorological parameters, air quality indicators, and spatial and temporal attributes) of data were input as multi-channel. To deeply explore the time series’ inherent temporal dynamics and long-term dependence, we innovatively incorporated a temporal distribution layer, which implemented a hierarchical transformation of the continuous time slices to refine the historical timing information of the input series effectively. Therefore, the temporal distribution layer was selected to encapsulate data from the past

t

hours. Consequently, the input of the CNN constituted a 4-D array with dimensions of

m \times t \times c \times n

(

6 \times 6 \times 3 \times 4

; Figure 4). After iterative refinement by convolutional and pooling layers, the extracted spatiotemporal features were integrated and transformed into a highly condensed 1-D feature vector, which removed redundant information in the spatial and temporal dimensions. Through this series of operations, the model systematically extrapolated the predicted values of the air quality index for the next

t^{'}

hours at

n

different monitoring stations.

3.2. LSTM

To address the gradient issues encountered in RNN networks, this paper incorporates the Long Short-Term Memory (LSTM) model. The fundamental unit of the LSTM model comprises one or multiple Cells alongside three adaptive gates: In-put, Forget, and Out-put (illustrated in Figure 5). These gates facilitate the retention and regulation of information within the network. The gates employ a sigmoid activation function for decision-making, whereas the Input and Cell states typically transform the tanh function [28,29,30].

As depicted in Figure 5, given an input sequence represented as

X (X_{1}, X_{2}, \dots, X_{t})

and a corresponding sequence of hidden layer states denoted as

h (h_{1}, h_{2}, \dots h_{t})

, the computational formulation is expressed as follows:

i_{t} = s i g m o i d (W_{x i} X_{t} + W_{h i} h_{t - 1})

f_{t} = s i g m o i d (W_{x f} X_{t} + W_{h f} h_{t - 1})

C_{t} = i_{c} * sigmoid (W_{x c} X_{t} + W_{h c} {X h}_{t - 1}) + C_{t - 1} * f_{t}

o_{t} = s i g m o i d (W_{x o} X_{t} + W_{c o} C_{t} + W_{h o} X_{t - 1})

h_{t} = o_{t} * s i g m o i d (C_{t})

While

*

represents the Hadamard product, where

i_{t} {, f}_{t}

, and

o_{t}

represent outputs from various gates. Additionally,

C_{t}

denotes the cell state vector,

h_{t - 1}

signifies the output information of the hidden layer unit from the preceding time step, and

h_{t}

represents the updated state of the memory cell.

W_{h i}, W_{x i}, W_{h f}, W_{x f}, W_{x c}, W_{h c}, W_{x o}, W_{h o}

,and

W_{c o}

are the corresponding weights, and

s i g m o i d

is activation function, respectively.

3.3. Bi-LSTM

In addressing the limitations inherent in traditional LSTM cells—particularly their unidirectional flow of information, which confines their context understanding to past events without tapping into future context—Schuster and Paliwal [31] innovated the Bidirectional Recurrent Neural Network (BRNN) paradigm. This pioneering architecture ingeniously integrates a pair of LSTM hidden layers that operate in tandem yet in opposing temporal orientations, collaboratively yielding a more comprehensive output. The essence of the BRNN lies in its ability to break free from the constraint of sequential data processing by simultaneously considering both preceding and succeeding inputs within its predictive framework. By doing so, it capitalizes on the synergy of two complementary views: one scanning the sequence from start to end (forward pass), the other reversing this trajectory (backward pass). This dual perspective integration significantly enriches the model’s capacity for contextual understanding and nuanced prediction [32,33,34]. Illustrative of this sophisticated design, Figure 6 presents a Bi-directional LSTM (Bi-LSTM) model schematic, visually encapsulating how these counter-propagating LSTM layers converge to enhance feature extraction and sequence analysis. This framework enhances the model’s capacity to capture long-range dependencies, significantly improving its performance in tasks that demand deep contextual sensitivity, such as natural language processing, speech recognition, and time-series prediction. By leveraging these capabilities, the model is better equipped to tackle a wide range of complex, real-world challenges, thereby increasing its applicability across various domains.

In Figure 6, the outputs for the model at time

t

from the forward sequence, denoted as

{\vec{h}}_{t}

, and from the reverse sequence, represented as

{\overset{\leftarrow}{h}}_{t}

, are respectively displayed.

{\vec{h}}_{t} = f (w_{1} x_{t} + w_{2} {\vec{h}}_{t - 1} + b_{t 1})

{\overset{\leftarrow}{h}}_{t} = f (w_{4} x_{t} + w_{3} {\overset{\leftarrow}{h}}_{t + 1} + b_{t 2})

y_{t} = g (w_{5} {\overset{\leftarrow}{h}}_{t} + w_{6} {\vec{h}}_{t} + b_{t 3})

In the described model,

w_{1} ~ w_{6}

represent the weight matrices, while

g (\cdot)

denotes a smooth bounded function. Furthermore,

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

correspond to the forward and reverse outputs at time

t

, respectively. Additionally,

b_{t 1} ~ b_{t 3}

are identified as bias vectors. Finally,

y_{t}

constitutes the model’s ultimate output at time

t

.

Considering Bi-LSTM’s aptitude for time-series analysis, this study transformed the past

t

hours’ data from

n

stations, amounting to

m \times c

variables, into a 2-D array shaped as

72 \times 6 (72 = 6 \times 3 \times 4)

, effectively condensing the information into

v \times t (v = m \times c \times n)

dimensions. The fully connected output layers generate the AQI predictions for

n

stations over the next

t^{'}

hours.

3.4. CNN-Bi-LSTM

The CNN-Bi-LSTM amalgamates the strengths of both CNN and Bi-LSTM, yielding remarkable breakthroughs, particularly in natural language processing and classification tasks [35]. In this study, the CNN-Bi-LSTM model employed a streamlined process. It ingested the previous

t

hours’

c \times m

variables data from

n

stations. The input format mirrored that of a typical CNN, configured as

t \times m \times c \times n (6 \times 6 \times 3 \times 4

). The input, convolution, and pooling layers were utilized for initial feature extraction from the input data. These extracted features were then vectorized into a 1-D array and fed into the Bi-LSTM layer, which examines their sequential patterns, effectively merging spatial and temporal feature analyses. Finally, the air quality index for the next

t^{'}

hours at

n

stations were obtained through the fully connected layer and output layer. Figure 7 illustrates the structure of the CNN-Bi-LSTM model.

Figure 7 illustrates the meticulous pre-processing steps undertaken for the input dataset, underscoring the importance of data normalization before model ingestion. This vital pre-processing phase ensures that all features contribute equally to the learning process without bias induced by differing scales. The Bi-LSTM model, depicted in its intricate cyclic architecture, emerges as a sophisticated evolution of the LSTM, ingeniously marrying the strengths of its predecessor with bidirectional processing capabilities. It retains LSTM’s prowess in tackling the vanishing gradient problem and effectively modeling long-term dependencies while augmenting these qualities by concurrently examining historical and forecasted data. This dual-directional inspection mechanism enables the extraction of a broader spectrum of temporal features, thereby enriching the model’s interpretive depth and enhancing resilience against anomalous data points or outliers. Incorporating a Dropout layer within this architectural blueprint serves a strategic purpose. By randomly “dropping out” or temporarily turning off a fraction of neurons during training, the model is encouraged to avoid reliance on any specific subset of inputs, fostering a more distributed and collaborative learning dynamic. Consequently, this regularization technique enhances the model’s generalization capabilities, mitigating the risk of overfitting—where the model becomes overly tailored to the training data, compromising its ability to generalize to unseen instances. The judicious use of Dropout thus fortifies the Bi-LSTM’s adaptability and predictive accuracy across diverse datasets and real-world scenarios.

3.5. PSO Based on Adaptive Inertia Weight

Particle Swarm Optimization (PSO), a groundbreaking method in computational intelligence, was ingeniously conceived by Kennedy and Eberhart in 1995 [36], inspired by the elegant coordination and foraging dynamics observed in bird flocking. Characterized by its sleek simplicity and universal applicability, PSO stands out for its capacity to navigate complex optimization terrains without necessitating domain-specific preliminaries. This trait has fueled significant academic curiosity and research. This versatility has propelled its application across various disciplines and challenges [37], marking it as a versatile tool for solving optimization problems. At the heart of the PSO algorithm lies a metaphorical swarm of particles, each embodying a potential solution candidate within the search space. These particles are assessed based on their ‘fitness’—a measure determined by the objective function being optimized. As the algorithm iteratively progresses, particles dynamically recalibrate their velocities and modify their positions through a harmonious blend of individual learning from their best-known positions and collective wisdom garnered from the swarm’s globally discovered optimal locations. At any given instant

t + 1

, the velocity and position updating equations for the

d t h

dimension of the

i t h

particle in the PSO algorithm are articulated as follows:

v_{i d} (t + 1) = ω v_{i d} (t) + c_{1} r_{1 d} (t) (p_{i d} (t) - x_{i d} (t)) + c_{2} r_{2 d} (t) (p_{g d} (t) - x_{i d} (t))

x_{i d} (t + 1) = x_{i d} (t) + v_{i d} (t + 1)

In the PSO framework,

v_{i d}

and

x_{i d}

respectively represent the velocity and position of a particle. The

ω

represents the inertia weight, which influences the particle’s momentum and ability to explore the search space. The coefficients

c_{1}

and

c_{2}

, known as acceleration factors, embody cognitive and social components, guiding the particle toward its personal best position

p_{i d}

and the swarm’s global best position

p_{g d}

, respectively. The variables

r_{1 d}

and

r_{2 d}

are two independent random numbers uniformly distributed in the interval [0, 1]. Adding these stochastic elements introduces a degree of randomness to the algorithm’s evolutionary process. This inherent uncertainty not only allows the algorithm to explore the search space more thoroughly but also dramatically enhances the potential of the algorithm to determine the optimal solution to the problem.

In the PSO, the inertia weight

ω

is pivotal for modulating the influence of a particle’s past velocity on its present velocity. This inertia weight is critical in harmonizing the algorithm’s local and global search capabilities. A higher

ω

value augments the algorithm’s proficiency in global optimization, whereas a lower value enhances local optimization capabilities [38].

The concept of an adaptive inertia weight entails initiating the algorithm with a higher inertia weight to bolster the global search potential, thereby enabling particles to traverse a more extensive search space [39,40,41,42]. As the iterations progress, the inertia weight is methodically reduced to enhance the local search capability, allowing particles to identify more precise solutions. This dynamic adjustment of weights during the algorithm’s operation is designed to optimize both the convergence rate and the quality of the solution. By adapting the approach to balance the varying requirements of exploration and exploitation at different phases, the algorithm aims to maintain a continual equilibrium between local and global searches. This balance is crucial for facilitating the identification of the global optimal solution.

The cosine decreasing inertia weight strategy aims to introduce a more nuanced control over the exploration–exploitation balance throughout the optimization process in Particle Swarm Optimization (PSO) [43], adapting better to the inherently nonlinear nature of complex optimization landscapes. Unlike the conventional linear decrease, where the inertia weight reduces monotonically in a predetermined fashion, this adaptive strategy offers a smoother transition that aligns more closely with the varying needs for exploration and intensification during different optimization stages [44]. The formula for calculating this inertia weight is as follows:

ω_{c u r r e n t_i t e r} = ω_{m i n} + \frac{1}{2} (ω_{m a x} - ω_{m i n}) (1 + c o s (\frac{π * c u r r e n t_i t e r}{m a x_i t e r}))

Among them,

c u r r e n t_i t e r

is the current number of iterations,

ω_{m i n}

and

ω_{m a x}

are the minimum and maximum values of weight

ω

. Furthermore,

m a x_i t e r

refers to the maximum permissible iteration count, with

π

representing the mathematical constant

p i

. In this study, the

m a x_i t e r

was 1500, and the acceleration factors

c_{1} = 1

and

c_{2} = 2

, the number of particles was 4, the search dimension was 3, and the Early Stopping mechanism (Patience = 29) was added to save computing resources and prevent over-fitting. APSO-CNN-Bi-LSTM algorithms are shown in Figure 8.

The outlined implementation procedure for integrating the Adaptive Particle Swarm Optimization (APSO) algorithm with a predictive model for air quality prediction embodies a systematic and iterative approach aimed at refining model parameters and enhancing prediction accuracy. Summarizing the steps provided:

(1).: Initialization: Commence by initializing the predictive model and the APSO algorithm. This includes setting up the model structure and parameters and defining the swarm characteristics for PSO, such as the number of particles and their initial positions in the solution space.
(2).: Objective Function Definition: Define the objective function that the PSO algorithm will aim to minimize or maximize. In this case, Mean Absolute Error (MAE) is chosen as the fitness function $f (x)$ , reflecting the error between predicted and actual air quality measurements. Establish a fitness threshold $Y$ to determine the quality of the acceptable solution
(3).: Initial Fitness Calculation: Introduce the initial parameter set into the model and compute its fitness value $y$ . If this initial fitness surpasses the predefined threshold $Y$ , record the parameters $θ^{*}$ and their corresponding fitness $y$ as a candidate solution. Otherwise, proceed to the optimization stage.
(4).: APSO Optimization: Initialize the swarm by assigning random positions and velocities to each particle, each representing a unique configuration of model parameters. Train the model using each particle’s position (parameter set), evaluate its fitness using MAE, and retain the particle’s state $(θ^{*}, y)$ .
(5).: Particle Update: Iterate through generations or iterations, adjusting each particle’s position and velocity using adaptive inertia weights. This dynamic adjustment balances exploration and exploitation to progressively steer the swarm toward better solutions—Recalculate fitness after each update.
(6).: Termination and Selection: Repeat the parameter update process until maximum iterations are reached. Upon termination, review all recorded $(θ^{*}, y)$ pairs and identify the set with the best fitness score. Employ this optimal parameter configuration in the model for the final air quality prediction task.

By systematically refining model parameters through APSO, this methodology seeks to optimize the model’s predictive power, thereby enhancing the accuracy and reliability of air quality forecasts.

3.6. Model Performance Evaluation Indicators

To rigorously assess the generalization capacity of the proposed model, the dataset was meticulously partitioned into two subsets: a training set, encompassing the initial 80% of the data, was utilized for model learning and parameter tuning, while the remaining 20% constituted the validation set, serving to gauge the model’s performance on unseen data. This division adheres to standard practice in machine learning, ensuring that the model’s predictive efficacy can be objectively evaluated. Consistent with many comparative predictive models, this study adopted the Mean Absolute Error (MAE) as the loss function. MAE is a prevalent metric for quantifying the discrepancy between the model’s predicted outputs and the actual observations, with a particular sensitivity to large deviations. It provides a quadratic penalty for errors, emphasizing the significance of substantial prediction inaccuracies. The mathematical expression for calculating MAE is formally defined as follows:

E_{M A E} = \frac{1}{M} \sum_{j = 1}^{M} |x_{j} - {\tilde{x}}_{j}|

Among them,

x_{j}

represents the actual value,

{\tilde{x}}_{j}

denotes the predicted value generated by the model, and

M

represents the total number of data points.

4. Experiment Analysis

In this experiment, the experimental environment used is PyCharm 2023.2.1, Python 3.8, Tensorflow 2.12.0, CUDA 11.4, and GPU 3090. Four real air quality datasets were employed to evaluate the model’s performance. The following questions were validated and answered: (1) whether the APSO-CNN-Bi-LSTM model can achieve more minor prediction errors than the previous model; (2) whether the APSO-CNN-Bi-LSTM model had better convergence; (3) whether the adaptive inertia weight PSO algorithm and the CNN in the APSO-CNN-Bi-LSTM model played a role in the prediction of this model.

4.1. Datasets and Pre-Processing

The first step for accurate air quality prediction was to obtain high-quality data. In this paper, four real air quality datasets of Xi’an City were used to evaluate the performance and convergence speed of the APSO-CNN-Bi-LSTM model. To effectively predict air quality and solve the problem, the collected raw data must be pre-processed to remove irrelevant, redundant, noisy, and unreliable data to prevent misleading results. It also produced more accurate results.

The datasets were first normalized using the function

M i n M a x S c a l e r

, to eliminate the effect on the prediction results due to the difference in the scale [45]. The specific mathematical formulas are shown as follows.

y^{*} = \frac{(y - y_{m i n})}{y_{m a x} - y_{m i n}}

Among them,

y

is the sample data to be normalized, while

y_{m i n}

and

y_{m a x}

denote the minimum and maximum values observed in the sample data column, and

y^{*}

is the result obtained after the normalization of the sample data. Table 3 presents an overview of the four datasets, with detailed dataset information as follows.

In this section, we juxtapose the APSO-CNN-Bi-LSTM model introduced in this paper against a cadre of alternate air quality prediction algorithms. The precision of the model was predominantly quantified by the discrepancy between the actual observational data and the predicted data by the model. To appraise the efficacy of the model delineated herein, we employed specific metrics to ascertain the model’s predictive accuracy [46]. The calculation formulae used for this purpose are outlined as follows:

E_{M A E} = \frac{1}{M} \sum_{j = 1}^{M} | x_{j} - {\tilde{x}}_{j} | E_{R M S E} = \sqrt{\frac{1}{M} \sum_{j = 1}^{M} {(x_{j} - {\tilde{x}}_{j})}^{2}}

Among them,

x_{j}

represents the actual value,

{\tilde{x}}_{j}

denotes the predicted value generated by the model, and

M

represents the total number of data points.

4.2. Effect of Particle Number on Experimental Results

The number of particles is a crucial factor in the Particle Swarm Optimization (PSO) algorithm, directly influencing its optimization performance and convergence speed. Increasing the number of particles allows for a more comprehensive coverage of the search space, enhancing the likelihood of finding the global optimal solution and minimizing the risk of falling into local optima. Conversely, having too few particles can lead to insufficient search space coverage, which might result in premature convergence to suboptimal solutions. An appropriate number of particles facilitates faster convergence because particles can quickly share information and cluster around potential solution areas. However, excessive particles can decelerate convergence due to the increased computational load and the tendency for over-exploration.

Table 4 demonstrates the average AQI prediction performance evaluation metrics for the next 2 h using the APSO-CNN-Bi-LSTM model with varying particle numbers. The results indicate that while the test error does not vary significantly with different particle numbers, optimal performance was achieved when the particle number was set to four. Specifically, the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) reach their minimum values of 49.6534 and 28.5812, with a model training time of 32.3 min.

Choosing an appropriate number of particles in PSO is essential for balancing comprehensive search space coverage and computational efficiency. The results from Table 4 illustrate that a particle number of four provides an optimal balance for the APSO-CNN-Bi-LSTM model, yielding minimal prediction errors and reasonable training time.

4.3. Forecast Error Analysis

Table 5 lists the prediction performance errors of five models (PSO-SVR, BPTT, CNN-LSTM, GA-ACO-BP, and APSO-CNN-Bi-LSTM) proposed in this paper on the four datasets. From Table 5, it can be seen that, as far as RMSE and MAE were concerned, the models in this paper almost achieved the best results. The ablation experiments of the three models (APSO-Bi-LSTM, CNN-Bi-LSTM, and APSO-CNN-Bi-LSTM) also showed that the PSO algorithm based on the optimization of adaptive inertia weights and the two modules of CNN both contribute to the increase in the model prediction accuracy. Comparisons with the APSO module optimized solely with adaptive inertia weights revealed that the APSO-CNN-Bi-LSTM model, featuring both modules, yielded improvements of 3.52%, 3.83%, 3.81%, and 0.9% across the four datasets. Furthermore, in contrast to the CNN-Bi-LSTM model with only a CNN module, the APSO-CNN-Bi-LSTM model exhibited enhancements of 7.32%, 2.51%, 0.23%, and 0.64% on the same datasets, underscoring the complementary roles of the CNN and APSO modules in boosting prediction performance. Moreover, the APSO-CNN-Bi-LSTM model proposed in this study showcased superior performance compared to several other prediction models, reaffirming its effectiveness in the air quality prediction tasks.

As is widely recognized, predicting air quality becomes increasingly challenging as the prediction time step extends. To further evaluate the short-term AQI prediction capability of the APSO-CNN-Bi-LSTM model, we predicted the AQI for the next 1 to 12 h. The RMSE and MAE change curves, illustrated in Figure 9, provide a comparative analysis of the model’s prediction performance over these intervals. The data presented in Figure 9 represent the average performance across the four datasets.

As shown in Figure 9, the predictive ability of all models decreases as the prediction time step increases. It is evident that the APSO-CNN-Bi-LSTM model consistently outperforms other deep learning models based on the two performance evaluation indicators,

E_{R M S E}

and

E_{M A E}

. For prediction time steps less than 6 h, the models’ differences are insignificant. However, when the prediction time step exceeds 6 h, APSO-CNN-Bi-LSTM clearly outperforms other models. This indicates that APSO-CNN-Bi-LSTM is more adept at handling complex data as the prediction problem becomes more challenging.

The CNN module effectively captures spatial dependencies in the data, while the Bi-LSTM module leverages past and future information for predictions. Additionally, the APSO optimization algorithm further refines model hyperparameters, minimizing the need for manual intervention. This combination of techniques enhances the model’s ability to manage and predict complex data scenarios.

4.4. Convergence Analysis

A predictive model’s efficiency is a critical factor in practical engineering scenarios. APSO-CNN-Bi-LSTM model demonstrates superior prediction accuracy and enhanced convergence properties—evidenced by needing fewer iterations to reach optimal performance, as shown in Figure 10—a significant advantage. This quicker convergence implies that the model training phase can be completed in a shorter timeframe, reducing computational resources and energy consumption. This leads to cost savings, a crucial aspect for businesses and organizations implementing such systems at scale. Figure 11 further reinforces this advantage by illustrating that our proposed model requires less time to cease training across all four datasets than alternative models. This rapid training cessation accelerates the initial deployment process and facilitates more efficient iterative upgrades in response to new data or changing environmental conditions. In essence, the ability of the APSO-CNN-Bi-LSTM model to expedite the initial training and future updates is a testament to its practical viability. It underscores how advancements in algorithm design—incorporating adaptive optimization strategies with sophisticated neural network architectures—can translate into tangible benefits beyond mere accuracy improvements, addressing real-world constraints such as time-to-market and operational expenses. This efficiency is particularly valuable in industries like environmental monitoring, where timely insights and adaptive responses to air quality changes can have immediate public health and economic implications. The prediction time step for this part is set to 1 h.

4.5. Analyzing AQI from Spatiotemporal Distribution Characteristics

The detailed analysis of the AQI trends in Xi’an City from 2019 to 2020, as depicted in Figure 12, highlights the marked seasonal fluctuations and their underlying causes, providing crucial insights for environmental management and policymaking. The “U”-shaped pattern observed annually, with AQI peaking in December and reaching its nadir in August, underscores the strong influence of meteorological conditions and human activities on air pollution levels. The summer months benefit from favorable weather conditions for pollutant dispersion—enhanced solar radiation, increased atmospheric instability, and more frequent precipitation. These meteorological phenomena act in concert to cleanse the air, leading to a more consistent and healthier air quality environment. Conversely, the winter season experiences the worst air quality, characterized by a high AQI and substantial variability. The cold temperatures facilitate the formation of temperature inversions, a meteorological phenomenon that traps pollutants near the ground.

Additionally, the heightened demand for heating, often reliant on fossil fuels, and occasional festive fireworks contribute to a surge in pollutant emissions, exacerbating the air pollution problem. Spring and autumn present their unique challenges. Springtime sees an increase in dust storms, introducing additional particulate matter into the atmosphere. While generally more stable regarding weather, Autumn witnesses increased biomass burning from agricultural practices, which releases pollutants such as delicate particulate matter and volatile organic compounds. Understanding these seasonal dynamics is essential for devising targeted interventions to mitigate air pollution. For instance, measures to curb emissions from heating sources and regulate fireworks usage during winter could be prioritized. Similarly, efforts to control dust storms in the spring and manage agricultural burning in autumn would be instrumental in maintaining better air quality throughout the year.

The APSO-CNN-Bi-LSTM model, with its demonstrated capability for accurate and efficient prediction, can be a powerful tool in anticipating these seasonal shifts, enabling authorities to proactively implement strategies to mitigate pollution peaks and safeguard public health. By integrating such predictive analytics into policy planning, cities like Xi’an can work toward more sustainable and resilient urban environments.

As illustrated in Figure 1 and the subsequent discussion, Xi’an’s geographical and meteorological context is crucial in shaping its air quality dynamics. The city’s unique topography significantly influences pollutant dispersion and accumulation patterns in the Weihe River Basin. The Qinling Mountains to the south act as a natural barrier, hindering the northwesterly winds that dominate during fall and winter. These winds, weakened upon encountering the mountain range, fail to effectively ventilate and cleanse the air over Xi’an, leading to higher pollutant concentrations. Furthermore, the mountains also obstruct the influx of moist air from southern regions, exacerbating dry conditions that favor pollutant buildup, especially during dry seasons. The basin-like configuration of the Weihe River Valley exacerbates this situation, trapping pollutants and further intensifying their concentration within the city. The region’s semi-arid climate, characterized by harsh, cold winters and sweltering summers, creates atmospherically stable conditions during the colder months. Lower wind speeds and atmospheric stability hamper the vertical and horizontal dispersion of pollutants, creating a conducive environment for air pollution episodes. Urbanization and energy consumption patterns also contribute to the spatial variation in air quality within Xi’an. Analysis of the four datasets reveals that central urban areas exhibit higher AQI values than the outskirts, indicating a higher pollution load. The AQI progressively declines as one moves away from the densely populated and industrially active city center, highlighting a transparent urban–rural gradient in air quality. This spatial and temporal heterogeneity underscores the complexity of managing air pollution in Xi’an. Strategies for mitigation must account for these geographical and meteorological peculiarities, along with the urban layout and energy consumption profiles. Data-driven insights from the APSO-CNN-Bi-LSTM model can guide targeted interventions, such as optimizing urban planning, promoting cleaner energy alternatives, and implementing pollution control measures during high-risk periods. These measures will contribute to a more sustainable and livable urban environment.

4.6. Visualization and Analysis of Results

Visualizing the model’s prediction results against real-world data, as performed in Figure 13 and Figure 14 by comparing randomly selected periods from datasets 1036A and 1037A, demonstrates the proposed model’s performance. The figure showcases instances where the APSO-CNN-Bi-LSTM model excelled in accurately predicting sudden temporal variations in air quality, showing a closer alignment with actual readings than alternative models. This precision in capturing abrupt changes is fundamental in air quality prediction since sudden spikes or drops in pollutant levels can have immediate implications for public health advisories and environmental management decisions. The minor discrepancy between predicted and actual values underscores the model’s ability to effectively learn and replicate the complex dynamics of air quality fluctuations, even during periods of instability. By accurately tracking these fluctuations, the APSO-CNN-Bi-LSTM model offers decision-makers a reliable tool to anticipate and respond to air quality events promptly. This heightened accuracy can support proactive measures to mitigate exposure risks, such as issuing timely alerts, adjusting industrial operations, or implementing temporary emission controls, thereby contributing to more effective environmental governance and public health protection. Overall, the graphical representations in Figure 13 and Figure 14 provide compelling empirical evidence of the model’s exceptional predictive capabilities, closely mirroring actual air quality measurements. These results underscore the model’s effectiveness and superiority over other approaches in addressing the complex challenges associated with predicting air quality dynamics. The visual alignment between predicted and actual values highlights the model’s robustness and reliability in real-world applications. The prediction time step for this part is set to 1 h.

4.7. Model Generalization Experiment

Model generalization verification is a crucial step when using deep learning models for air quality prediction tasks. Generalization refers to the model’s ability to predict unseen data. In air quality prediction, this means that the model must perform well on training data and accurately predict air quality conditions in new, unknown areas or periods. To verify the generalization capability of our model, we used dataset 1040A, which was collected from another monitoring station in Xi’an, as the data source. We visualized the predicted and actual values for generalization verification over three consecutive days. As shown in Figure 15, this comparison allows us to observe how well the model’s predictions align with the real-world air quality data in an untrained environment. By closely examining these visualizations, we can assess the model’s performance and ability to generalize to new data. This approach is essential for ensuring that our model is not overfitting to the training data and can provide reliable predictions in diverse and previously unseen scenarios, a critical aspect of air quality management and planning.

Table 6 shows the parameter settings.

5. Summary

This study successfully integrates innovative methodologies from deep learning and optimization theory to tackle the intricate challenge of air quality prediction. The introduction of the APSO-CNN-Bi-LSTM model signifies a substantial advancement in environmental prediction capabilities. By synergistically combining the strengths of three key components—Convolutional Neural Networks (CNN) for spatial feature extraction, Bidirectional Long Short-Term Memory networks (Bi-LSTM) for capturing temporal dependencies, and Adaptive Particle Swarm Optimization (APSO) for efficient hyperparameter tuning—the model achieves a remarkable balance between exploration and exploitation. Experimental validations on real-world datasets affirm the model’s superiority in prediction accuracy, especially under sudden air quality shifts, where the model’s predictive power shines through with minimal deviation from actual measurements. This heightened accuracy is paramount for public health interventions and policy planning.

Moreover, the model’s accelerated convergence, which requires fewer iterations to attain optimal performance, underscores its practicality and efficiency. This research promotes the application of advanced computing techniques in environmental science. It provides a new approach to enhance our understanding and management of environmental challenges, ultimately contributing to a more sustainable and healthier living environment.

Future research can explore integrating more types of environmental data, such as satellite remote sensing data and social media data, with existing datasets. By incorporating these additional data sources, the predictive models can better understand the factors influencing air quality. Additionally, deeper neural network structures or novel machine learning algorithms, such as advanced neural networks or Transformer models, can improve prediction accuracy and model interpretability. These advancements will enhance the ability to predict air quality conditions more reliably and provide better environmental management and decision-making tools.

Author Contributions

Conceptualization, F.Z.; Data curation, F.Z. and S.L.; Formal analysis, F.Z.; Methodology, F.Z.; Software, F.Z.; Visualization, F.Z.; Writing—original draft, F.Z.; Writing—review and editing, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request. The download locations of the corresponding datasets were given in the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, H.; Wang, S.; Hao, J.; Wang, X.; Wang, S.; Chai, F.; Li, M. Air pollution and control action in Beijing. J. Clean. Prod. 2016, 112, 1519–1527. [Google Scholar] [CrossRef]
Yeo, I.; Choi, Y.; Lops, Y.; Sayeed, A. Efficient PM_2.5 forecasting using geographical correlation based on integrated deep learning algorithms. Neural Comput. Appl. 2021, 33, 15073–15089. [Google Scholar] [CrossRef]
Han, J.; Liu, H.; Xiong, H.; Yang, J. Semi-supervised air quality forecasting via self-supervised hierarchical graph neural network. IEEE Trans. Knowl. Data Eng. 2022, 35, 5230–5243. [Google Scholar] [CrossRef]
Zhang, Y.; Bocquet, M.; Mallet, V.; Seigneur, C.; Baklanov, A. Real-time air quality forecasting, part I: History, techniques, and current status. Atmos. Environ. 2012, 60, 632–655. [Google Scholar] [CrossRef]
Wu, S.; Deng, F. Systematic Research Progress on Atmospheric PM_2.5 and Health: From Exposure, Harm to Intervention. Chin. J. Pharmacol. Toxicol. 2016, 30, 797–801. [Google Scholar] [CrossRef]
Biancofiore, F.; Busilacchio, M.; Verdecchia, M.; Tomassetti, B.; Aruffo, E.; Bianco, S.; Di Tommaso, S.; Colangeli, C.; Rosatelli, G.; Di Carlo, P. Recursive neural network model for analysis and forecast of PM₁₀ and PM_2.5. Atmos. Pollut. Res. 2017, 8, 652–659. [Google Scholar] [CrossRef]
Chen, Y.C.; Li, D.C. Selection of key features for PM_2.5 prediction using a wavelet model and RBF-LSTM. Appl. Intell. 2021, 51, 2534–2555. [Google Scholar] [CrossRef]
Méndez, M.; Merayo, M.G.; Núñez, M. Machine learning algorithms to forecast air quality: A survey. Artif. Intell. Rev. 2023, 56, 10031–10066. [Google Scholar] [CrossRef]
An, J.; Huang, M.; Wang, Z.; Zhang, X.; Ueda, H.; Cheng, X. Numerical regional air quality forecast tests over the mainland of China. In Proceedings of the Acid Rain 2000: Proceedings from the 6th International Conference on Acidic Deposition: Looking back to the Past and Thinking of the Future, Tsukuba, Japan, 10–16 December 2000; Volume III/III Conference Statement Plenary and Keynote Papers. Springer: Dordrecht, The Netherlands, 2001; pp. 1781–1786. [Google Scholar]
Streets, D.G.; Fu, J.S.; Jang, C.J.; Hao, J.; He, K.; Tang, X.; Zhang, Y.; Wang, Z.; Li, Z.; Zhang, Q.; et al. Air quality during the 2008 Beijing Olympic games. Atmos. Environ. 2007, 41, 480–492. [Google Scholar] [CrossRef]
Tie, X.; Madronich, S.; Li, G.H.; Ying, Z.; Zhang, R.; Garcia, A.R.; Lee-Taylor, J.; Liu, Y. Characterizations of chemical oxidants in Mexico City: A regional chemical dynamical model (WRF-Chem) study. Atmos. Environ. 2007, 41, 1989–2008. [Google Scholar]
Kumar, U.; Jain, V.K. ARIMA forecasting of ambient air pollutants (O₃, NO, NO₂ and CO). Stoch. Environ. Res. Risk Assess. 2010, 24, 751–760. [Google Scholar] [CrossRef]
Vlachogianni, A.; Kassomenos, P.; Karppinen, A.; Karakitsios, S.; Kukkonen, J. Evaluation of a multiple regression model for the forecasting of the concentrations of NO_x and PM₁₀ in Athens and Helsinki. Sci. Total Environ. 2011, 409, 1559–1571. [Google Scholar] [CrossRef]
Boznar, M.; Lesjak, M.; Mlakar, P. A neural network-based method for short-term predictions of ambient SO₂ concentrations in highly polluted industrial areas of complex terrain. Atmos. Environ. Part B Urban Atmos. 1993, 27, 221–230. [Google Scholar] [CrossRef]
Esen, H.; Inalli, M.; Sengur, A.; Esen, M. Performance prediction of a ground-coupled heat pump system using artificial neural networks. Expert Syst. Appl. 2008, 35, 1940–1948. [Google Scholar] [CrossRef]
Kumar, D. Evolving Differential evolution method with random forest for prediction of Air Pollution. Procedia Comput. Sci. 2018, 132, 824–833. [Google Scholar]
Leong, W.C.; Kelani, R.O.; Ahmad, Z. Prediction of air pollution index (API) using support vector machine (SVM). J. Environ. Chem. Eng. 2020, 8, 103208. [Google Scholar] [CrossRef]
Muhammad Adnan, R.; Chen, Z.; Yuan, X.; Kisi, O.; El-Shafie, A.; Kuriqi, A.; Ikram, M. Reference evapotranspiration modeling using new heuristic methods. Entropy 2020, 22, 547. [Google Scholar] [CrossRef]
Zhu, S.; Lian, X.; Liu, H.; Hu, J.; Wang, Y.; Che, J. Daily air quality index forecasting with hybrid models: A case in China. Environ. Pollut. 2017, 231, 1232–1244. [Google Scholar] [CrossRef]
Li, X.; Zhang, S.; Wang, K. PM_2.5 Air Quality Prediction Based on Image Quality Analysis. J. Beijing Univ. Technol. 2020, 46, 191–198. [Google Scholar] [CrossRef]
Septiawan, W.M.; Endah, S.N. Suitable recurrent neural network for air quality prediction with backpropagation through time. In Proceedings of the 2018 2nd International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 30–31 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
Gilik, A.; Ogrenci, A.S.; Ozmen, A. Air quality prediction using CNN+ LSTM-based hybrid deep learning architecture. Environ. Sci. Pollut. Res. 2022, 29, 11920–11938. [Google Scholar] [CrossRef]
Guoqu, D.; Hu, C. Research on Short-term Air Quality Prediction Based on Unequal Weight Clustering Hybrid PSO-SVR. Oper. Res. Manag. Sci. 2023, 32, 106. [Google Scholar] [CrossRef]
Du, Y.; Liu, Y. Air Quality Prediction Using Hybrid Genetic Ant Colony Algorithm Optimized BP Neural Network. Appl. Comput. Syst. 2023, 32, 223–230. [Google Scholar] [CrossRef]
Ning, M.; Guan, J.; Liu, P.; Zhang, Z.; O’hare, G.M.P. GA-BP air quality evaluation method based on fuzzy theory. Comput. Mater. Contin. 2019, 58, 215–227. [Google Scholar] [CrossRef]
Yan, R.; Liao, J.; Yang, J.; Sun, W.; Nong, M.; Li, F. Multi-hour and multi-site air quality index forecasting in Beijing using CNN, LSTM, CNN-LSTM, and spatiotemporal clustering. Expert Syst. Appl. 2021, 169, 114513. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Z.; Zheng, B. Application of regularized GRU-LSTM model in stock price prediction. In Proceedings of the 2019 IEEE 5th International Conference on Computer and Communications (ICCC), Chengdu, China, 6–9 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1886–1890. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef]
Smagulova, K.; James, A.P. A survey on LSTM memristive neural network architectures and applications. Eur. Phys. J. Spec. Top. 2019, 228, 2313–2324. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Shahid, F.; Zameer, A.; Muneeb, M. Predictions for COVID-19 with deep learning models of LSTM, GRU and Bi-LSTM. Chaos Solitons Fractals 2020, 140, 110212. [Google Scholar] [CrossRef]
Le, T.; Vo, M.T.; Vo, B.; Hwang, E.; Rho, S.; Baik, S.W. Improving electric energy consumption prediction using CNN and Bi-LSTM. Appl. Sci. 2019, 9, 4237. [Google Scholar] [CrossRef]
Li, C.; Zhan, G.; Li, Z. News text classification based on improved Bi-LSTM-CNN. In Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 890–893. [Google Scholar] [CrossRef]
Jang, B.; Kim, M.; Harerimana, G.; Kang, S.U.; Kim, J.W. Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism. Appl. Sci. 2020, 10, 5841. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; IEEE: Piscataway, NJ, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar] [CrossRef]
Liu, Z.H.; Zhou, S.W.; Liu, K.; Zhang, J. Permanent magnet synchronous motor multiple parameter identification and temperature monitoring based on binary-modal adaptive wavelet particle swarm optimization. Acta Autom. Sin. 2013, 39, 2121–2130. [Google Scholar] [CrossRef]
Wang, D.; Li, M. Performance Analysis and Parameter Selection of Particle Swarm Optimization Algorithm. Acta Autom. Sin. 2016, 42, 1552–1561. [Google Scholar] [CrossRef]
Nickabadi, A.; Ebadzadeh, M.M.; Safabakhsh, R. A novel particle swarm optimization algorithm with adaptive inertia weight. Appl. Soft Comput. 2011, 11, 3658–3670. [Google Scholar] [CrossRef]
Qin, Z.; Yu, F.; Shi, Z.; Wang, Y. Adaptive inertia weight particle swarm optimization. In Proceedings of the Artificial Intelligence and Soft Computing–ICAISC 2006: 8th International Conference, Zakopane, Poland, 25–29 June 2006; Proceedings 8. Springer: Berlin/Heidelberg, Germany, 2006; pp. 450–459. [Google Scholar] [CrossRef]
Dong, C.; Wang, G.; Chen, Z.; Yu, Z. A method of self-adaptive inertia weight for PSO. In Proceedings of the 2008 International Conference on Computer Science and Software Engineering, Wuhan, China, 12–14 December 2008; IEEE: Piscataway, NJ, USA, 2008; Volume 1, pp. 1195–1198. [Google Scholar] [CrossRef]
Taherkhani, M.; Safabakhsh, R. A novel stability-based adaptive inertia weight for particle swarm optimization. Appl. Soft Comput. 2016, 38, 281–295. [Google Scholar] [CrossRef]
Shi, Y.; Eberhart, R. A modified particle swarm optimizer. In Proceedings of the 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), Anchorage, AK, USA, 4–9 May 1998; IEEE: Piscataway, NJ, USA, 1998; pp. 69–73. [Google Scholar] [CrossRef]
Li, Y.; Zhao, Y.; Liu, J. Dynamic sine cosine algorithm for large-scale global optimization problems. Expert Syst. Appl. 2021, 177, 114950. [Google Scholar] [CrossRef]
Quackenbush, J. Microarray data normalization and transformation. Nat. Genet. 2002, 32, 496–501. [Google Scholar] [CrossRef]
Chai, T.; Draxler, R.R. Root mean square error (RMSE) or mean absolute error (MAE). Geosci. Model Dev. Discuss. 2014, 7, 1525–1534. [Google Scholar] [CrossRef]

Figure 1. Study area.

Figure 2. Flowchart of this study.

Figure 3. Structure of CNN.

Figure 4. Input dimension of CNN.

Figure 5. LSTM network structure.

Figure 6. Bi-LSTM network structure.

Figure 7. Structure of CNN-Bi-LSTM.

Figure 8. Algorithm Flowchart.

Figure 9. (a) RMSE and (b) MAE of all the models at different prediction time steps. Subfigures a and b are the average RMSE and MAE of different models at different prediction steps.

Figure 10. Train Loss. Subfigures (a–d) are the Loss values of different models on four datasets.

Figure 11. Model training time.

Figure 12. Monthly variation.

Figure 13. Comparison of prediction results (1036A).

Figure 14. Comparison of prediction results (1037A).

Figure 15. Model generalization ability.

Table 1. Investigated Air Quality Monitoring Sites in Xian.

Monitoring Station	Dataset	Latitude	Longitude	District
Xing Qing Community	1036A	34.26 °N	108.99 °E	Bei Lin
Yan Liang Aviation Base	1037A	34.64 °N	109.20 °E	Yan Liang
CPPCC Office Building	1038A	34.36 °N	108.62 °E	Zhou Zhi
Tin Ka Ping Secondary School	1039A	34.16 °N	109.52 °E	Lan Tian
Chong Wen Tower Station	1040A	34.24 °N	108.92 °E	Xian Yang

Table 2. Input variables of AQI prediction models.

Datasets Type	Variables	Unit	Min	Max	Transformed
Air quality data	NO₂	$u g / m^{3}$	3	221	[0, 1]
	CO	$m g / m^{3}$	0.1	9.2	[0, 1]
	O₃	$u g / m^{3}$	13	410	[0, 1]
	PM_2.5	$u g / m^{3}$	4	267	[0, 1]
	PM₁₀	$u g / m^{3}$	2	190	[0, 1]
	SO₂	$u g / m^{3}$	2	420	[0, 1]
Meteorological data	Rainfall quantity	$m m$	0.0	26.3	[0, 1]
	Temperature	$℃$	−14.7	38.4	[0, 1]
	Dewpoints	$℃$	−35	27.4	[0, 1]
	Air pressure	$h P a$	986.4	1044.8	[0, 1]
	Wind speed	$m / s$	0.4	10.7	[0, 1]
	Wind direction	/	2.3	322.5	[0, 1]
Spatiotemporal data	Longitude	/	108.89	108.99	[0, 1]
	Latitude	/	34.18	34.31	[0, 1]
	Distance from Xi’an Station	$m$	4884.65	13,115.34	[0, 1]
	Season Index	/	1	4	[0, 1]
	Weekday Index	/	0	1	[0, 1]
	Time	/	0	23	[0, 1]

Table 3. Description of datasets.

Data Type	Data Set	Time Scale	Time Step	Data Size
Urban air quality data	1036A	1 January 2019–31 December 2020	1 h	17,542
	1037A			17,539
	1038A			17,518
	1039A			17,527
	1040A			17,528

Table 4. Effect of Particle Number on Experimental Results in APSO-CNN-Bi-LSTM.

Particle Number	$E_{M A E}$	$E_{R M S E}$	Training Time
3	28.6326	50.2209	31.1 min
4	28.5812	49.6534	32.3 min
5	29.0253	50.4226	33.6 min
6	28.7704	50.4704	34.7 min

Note: the model performance evaluation indicators (

E_{M A E}

,

E_{R M S E}

, and Training time) represent the average values of the predictions for the next 1–6 h.

Table 5. Comparison of prediction errors of each model.

	1036A		1037A		1038A		1039A
Models	$E_{R M S E}$	$E_{M A E}$	$E_{R M S E}$	$E_{M A E}$	$E_{R M S E}$	$E_{M A E}$	$E_{R M S E}$	$E_{M A E}$
PSO-SVR	43.6978	32.8549	23.6843	16.2445	22.5831	17.4816	17.9638	13.3836
BPTT	42.1986	30.1981	23.2581	15.8677	24.1578	18.7674	16.4549	13.0542
CNN-LSTM	42.1453	29.7911	22.2376	16.3033	22.6298	17.4288	16.2873	13.9929
GA-ACO-BP	42.0361	30.1735	22.2159	16.4132	25.7076	18.3761	17.1207	12.8305
CNN-Bi-LSTM	41.2659	30.6524	22.8206	16.3292	22.4862	17.7012	16.0755	12.4518
APSO-Bi-LSTM	42.5511	30.1736	22.6436	16.1949	22.7092	17.2056	15.9328	12.4395
APSO-CNN-Bi-LSTM	38.9324	29.1980	22.2685	15.8166	22.4338	16.9921	15.9729	12.3634

Note: the model performance evaluation indicators (RMSE and MAE) represent the prediction accuracy for the next 1 h.

Table 6. Parameter settings of our model in this study.

Parameter	Value	Parameter	Value
Number of Bi-LSTM Layers	2	Loss function	$E_{M A E}$
Unit number of each Bi-LSTM Layers	64, 64	Number of particles	4
Dropout rate	0.2	Learning Rate	0.001
Batch size	128	Optimizer	Adam
Kernel size of CNN	3*3	Convolution channels	32

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, X.; Zou, F.; Li, S. Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model. Appl. Sci. 2024, 14, 5787. https://doi.org/10.3390/app14135787

AMA Style

Zhu X, Zou F, Li S. Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model. Applied Sciences. 2024; 14(13):5787. https://doi.org/10.3390/app14135787

Chicago/Turabian Style

Zhu, Xuguang, Feifei Zou, and Shanghai Li. 2024. "Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model" Applied Sciences 14, no. 13: 5787. https://doi.org/10.3390/app14135787

APA Style

Zhu, X., Zou, F., & Li, S. (2024). Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model. Applied Sciences, 14(13), 5787. https://doi.org/10.3390/app14135787

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Air Quality Prediction with an Adaptive PSO-Optimized CNN-Bi-LSTM Model

Abstract

1. Introduction

2. Introduction to the Study Area and Dataset Sources

2.1. Air Quality in Xi’an

2.2. Input Variables

3. Model Introduction

3.1. CNN

3.2. LSTM

3.3. Bi-LSTM

3.4. CNN-Bi-LSTM

3.5. PSO Based on Adaptive Inertia Weight

3.6. Model Performance Evaluation Indicators

4. Experiment Analysis

4.1. Datasets and Pre-Processing

4.2. Effect of Particle Number on Experimental Results

4.3. Forecast Error Analysis

4.4. Convergence Analysis

4.5. Analyzing AQI from Spatiotemporal Distribution Characteristics

4.6. Visualization and Analysis of Results

4.7. Model Generalization Experiment

5. Summary

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI