Prediction of PM2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast

Gao, Zihang; Mo, Xinyue; Li, Huan

doi:10.3390/su16114643

Open AccessArticle

Prediction of PM_2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast

by

Zihang Gao

^†,

Xinyue Mo

^*,†

and

Huan Li

^*

School of Cyberspace Security (School of Cryptology), Hainan University, Haikou 570228, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sustainability 2024, 16(11), 4643; https://doi.org/10.3390/su16114643

Submission received: 27 April 2024 / Revised: 24 May 2024 / Accepted: 28 May 2024 / Published: 30 May 2024

(This article belongs to the Topic Accessing and Analyzing Air Quality and Atmospheric Environment)

Download

Browse Figures

Versions Notes

Abstract

:

Accurate and stable prediction of atmospheric PM_2.5 concentrations is crucial for air pollution prevention and control. Existing studies usually rely on a single model or use a single evaluation criterion in multi-model ensemble weighted forecasts, neglecting the dual needs for accuracy and stability in PM_2.5 forecast. In this study, a novel ensemble forecast model is proposed that overcomes these drawbacks by simultaneously taking into account both forecast accuracy and stability. Specifically, four advanced deep learning models—Long Short-Term Memory Network (LSTM), Graph Convolutional Network (GCN), Transformer, and Graph Sample and Aggregation Network (GraphSAGE)—are firstly introduced. And then, two combined models are constructed as predictors, namely LSTM–GCN and Transformer–GraphSAGE. Finally, a combined weighting strategy is adopted to assign weights to these two combined models using a multi-objective optimization algorithm (MOO), so as to carry out more accurate and stable predictions. The experiments are conducted on the dataset from 36 air quality monitoring stations in Beijing, and results show that the proposed model achieves more accurate and stable predictions than other benchmark models. It is hoped that this proposed ensemble forecast model will provide effective support for PM_2.5 pollution forecast and early warning in the future.

Keywords:

air pollution; deep learning; multi-objective optimization; ensemble forecast

1. Introduction

PM_2.5 stands for particulate matter dynamically suspended in air with a diameter less than or equal to 2.5 microns. These fine particles not only penetrate deep into the lungs, but also have the potential to enter the blood system, posing a significant threat to human health. Studies have found that prolonged exposure to high levels of PM_2.5 increases the risk of respiratory and cardiovascular diseases [1]. In particular, PM_2.5 may affect other organs through blood circulation, causing long-term negative impacts on the health of developing children [2]. In addition, atmospheric pollution, especially high concentrations of PM_2.5, poses a direct threat to ecosystems and human health and quality of life [3]. Therefore, accurate prediction of changes in PM_2.5 concentrations is essential for developing effective air pollution prevention and control strategies and protecting public health [4].

Existing PM_2.5 forecasting methods mainly include statistical forecasting, numerical forecasting, and machine learning methods [5,6]. Statistical method is a forecasting method that establishes mathematical relationships between PM_2.5 concentrations and influencing factors based on historical and related data, through regression analysis, time series analysis, and other methods, without considering physicochemical processes. The advantage of statistical methods is that they are easy to operate and can be combined with machine learning to improve forecast accuracy. Its limitation is that it relies on a large amount of data and assumptions, making it difficult to capture the dynamic changes in PM_2.5. Numerical method is a forecasting method based on atmospheric dynamics and atmospheric environmental chemistry, which calculates the spatial and temporal distribution of pollutants by constructing a mathematical model with a system of equations based on air pollution emission source data and meteorological data, and then solves the problem by computer. Numerical methods are able to synthesize physicochemical processes, simulate pollutant behavior in detail, are suitable for multi-scale forecasting, and provide high-resolution results. However, they rely on high-performance computing resources and are sensitive to input data and model parameters, which may bring uncertainties and errors. In recent years, with the rapid development of artificial intelligence technology, the successful application of machine learning, especially deep learning, in the field of air quality forecasting, has gradually become a new direction of research [7,8].

Several researchers have applied artificial neural network (ANN) techniques to air quality forecasting and achieved high forecasting accuracy. For example, Mao et al. forecasted the change in PM_2.5 concentration in 12 h. They used meteorological data, including the PM_2.5 concentration of the previous day as well as wind direction, wind speed, air temperature, and humidity for model training, and eventually achieved better forecasting results [9]. Shishegaran et al. compared four predictive models—ARIMA, Principal Component Regression (PCR), ARIMA-PCR hybrid model, and a combined model of ARIMA and Gene Expression Programming (GEP)—to predict daily air quality in Tehran [10]. Some researchers also try to combine different models to make full use of their respective advantages. For example, considering that air quality monitoring data involve not only temporal but also spatial features, Ma et al. constructed an LSTM–GCN model for predicting PM_2.5 concentration in the next hour by fusing Graph Convolutional Network (GCN) and Long Short-Term Memory Network (LSTM), which was applied in the Hunnan District of Shenyang and demonstrated higher accuracy than the traditional method [11]. Ali Kamali Mohammadzadeh et al. proposed a spatio-temporal deep neural structure combining GCN and exogenous Long Short-Term Memory Network (E-LSTM) for predicting PM_2.5 air quality index (AQI), and the results of the study showed that this framework is significantly more accurate in predicting PM_2.5 AQI than the traditional LSTM and E-LSTM methods, and also shows good robustness to the network structure of EPA stations [12]. Li et al. successfully constructed a hybrid model of a convolutional neural network (CNN) and LSTM, both of which performed well and demonstrated excellent performance in nonlinear time series forecasting [13].

Combined weighted prediction is one of the hot topics in recent years in the research of air pollutant concentration prediction. Moghram and Rahman pointed out that there is no universal optimal strategy [14]. Bates and Granger recommended a combined weighted prediction method, which does not rely on a single model but improves the accuracy of the prediction ensemble through weight assignment [15]. However, the choice of weights is still a difficult part of combinatorial prediction. Many scholars have conducted a lot of research and discussion on the selection of weights for combinatorial models. For example, Yan et al. constructed a self-varying weighted CNN and LSTM combination model based on CNN and LSTM models, and the experimental results showed that the prediction results of this model outperformed other benchmark models [16]. Other researchers, such as Yang and Xiao’s team, used Differential Evolution (DE) and Cuckoo Search Algorithm (CSO) optimization to adjust the weights to enhance the forecasting accuracy [17,18], respectively. The results show that the combined model can improve the forecasting performance to some extent.

The limitations of the above prediction methods are that they usually rely on a single model or use a single evaluation criterion in multi-model combined weighted prediction, such as considering only accuracy or stability, while neglecting the dual needs of both for PM_2.5 forecasts. In order to further improve the performance of ensemble forecast by taking into account both accuracy and stability of PM_2.5 forecasts, an innovative “ensemble forecast” model is proposed in this study. The ensemble forecast model predicts the PM_2.5 concentration at time t by using the PM_2.5 concentration data for 24 consecutive hours before time t, and the site characteristics data at time t-1. The model employs a combined weighted prediction method, which combines two powerful models: the LSTM–GCN model (based on LSTM and GCN techniques) and the Transformer–GraphSAGE model (combining Transformer and GraphSAGE techniques). Both models can effectively capture the temporal and spatial properties and their interactions, providing a multi-dimensional perspective for PM_2.5 concentration forecasting. In this study, bias and variance are introduced as multi-objective optimization metrics to simultaneously optimize the performance of the forecasting models in terms of both accuracy and stability, and the optimal weighting coefficients of the two models are determined by MOEA/D algorithm to achieve more accurate and stable forecasts.

The novelty of the “ensemble forecast” model is that it combines the advantages of LSTM–GCN and Transformer–GraphSAGE, and obtains the optimal weight coefficients through the multi-objective optimization algorithm to improve the accuracy and stability of the prediction at the same time, which is rarely seen in previous studies. Experimental validation shows that the ensemble forecast model outperforms the comparative benchmark models in terms of forecast accuracy and stability. This advancement not only provides new theoretical support for PM_2.5 forecasting, but also provides a solid scientific basis for the development of effective air quality management strategies.

2. Data and Methods

2.1. Data Sources

The data used in this study are derived from “Dataset for Air Quality Forecast” [19]. The dataset covers hourly monitoring data of PM_2.5, meteorological factor, and station latitude and longitude information from 36 air quality monitoring stations in Beijing from 2014 to 2015. Meteorological data include wind speed, temperature, relative humidity, and wind direction, and Table 1 lists the input variables and corresponding units.

Wind direction has 8 different defined values, as shown in Table 2.

In this study, time series data and spatial feature data were used as inputs to the model to predict the moment-specific PM_2.5 concentration values at each monitoring station site at moment t. The time series data contain PM_2.5 concentrations for 24 consecutive hours prior to moment t, extracted from the data table by a sliding window technique to capture the temporal trend. The spatial feature data include the monitoring station data at moment t − 1 and the adjacency matrix between the stations to help the model understand the spatial correlation between the stations. The specific construction process of input data is detailed in Section 2.3.2.

2.2. Data Pre-Processing

2.2.1. Missing Value Processing

For the missing data of PM_2.5 concentration values, according to the characteristics of the data, this study adopts the method of linear interpolation for processing, and the linear interpolation formula is shown in Equation (1):

P (t) = P (t_{1}) + \frac{(P (t_{2}) - P (t_{1}))}{(t_{2} - t_{1})} \cdot (t - t_{1})

(1)

In (1),

P (t)

represents the PM_2.5 concentration value at time point

t

, and

P (t_{1})

and

P (t_{2})

are the known PM_2.5 concentration values at time points

t_{1}

and

t_{2}

, respectively.

t_{1}

and

t_{2}

are the time points at which the concentration values are known.

2.2.2. Data Normalization and Data Segmentation

Data normalization is an important step in data preprocessing, which can convert data of different sizes and scales to the same scale for easy comparison and analysis. The normalized data are usually distributed in the interval (0, 1), which helps to improve the convergence speed and accuracy of the algorithm. The normalization formula is shown in Equation (2):

x^{'} = \frac{x - m i n (x)}{m a x (x) - m i n (x)}

(2)

In (2),

x

represents the original data,

m i n (x)

is the minimum value in the dataset,

m a x (x)

is the maximum value in the dataset, and

x^{'}

represents the normalized data.

In this study, the PM_2.5 concentration data as well as numerical data from other feature data were normalized to standardize the data range and ensure the effectiveness of model training. Then, the entire dataset was divided into three subsets: the training set, the validation set, and the test set. Specifically, the dataset contains 5241 training samples (60% of the total), 1747 validation samples (20% of the total), and 1747 test samples (20% of the total). This division strategy aims to ensure that the model can be trained and evaluated on different data subsets, while effectively avoiding overfitting phenomenon, thus guaranteeing the model’s generalization ability on unknown data.

2.3. Research Methodology

2.3.1. Model

In this study, seven different models were constructed and compared: the LSTM, GCN, LSTM–GCN, Transformer, GraphSAGE, Transformer–GraphSAGE, and the ensemble forecasting model, Ensemble Forecast.

Long Short-Term Memory Network (LSTM)

The LSTM [20] model is a special type of recurrent neural network (RNN) that is capable of learning and memorizing long-term dependent information. It solves the problem of gradient vanishing of traditional RNNs when processing long sequence data by introducing a gating mechanism. The training process of LSTM utilizes forgetting gates, input gates, and output gates to manage the deletion and addition of information. LSTM is widely used in the fields of time series analysis, natural language processing and speech recognition, and is favored for its excellent long-term memory capability.

2.: Graph Convolutional Network (GCN)

The GCN [21] model is a neural network that efficiently handles graph-structured data. It is able to capture complex relationships and features between nodes by applying convolutional operations on the nodes of a graph. GCN performs well in tasks such as node classification, graph classification, and link prediction, and is particularly suited to areas such as social network analysis, bioinformatics, and recommender systems. The power of GCN lies in its ability to exploit the topology of a graph to extract deep feature representations.

3.: LSTM–GCN model

The model combines LSTM and GCN to capture the characteristics of PM_2.5 concentration data in both temporal and spatial dimensions and make effective forecasts accordingly. The core of the LSTM–GCN model lies in its two main modules: the temporal and spatial characterization modules. The temporal characterization module utilizes the LSTM network to process the temporal data and capture the trend of PM_2.5 concentration over time, while the spatial characterization module processes the spatial data through the GCN network and learns the relationship between sites. The output features of these two modules are combined in the feature union layer, and finally, the forecast values are generated through the output layer.

4.: Transformer

Transformer [22] model is a neural network architecture based on a self-attentive mechanism that significantly improves the efficiency of processing long sequential data by processing all elements of a sequence in parallel. The model is suitable for tasks that require capturing long-distance dependencies, such as language translation, text generation, and complex time series forecasting. Transformer’s core strength is its self-attention mechanism, which is capable of capturing the relationship between any two locations within a sequence, regardless of their distance in the sequence, thus providing powerful modeling support for a variety of forecasting tasks.

5.: Graph Sample and Aggregation Network (GraphSAGE)

The GraphSAGE [23] model is an innovative graph neural network (GNN) that generates embedded representations of nodes by sampling and aggregating features from neighboring nodes. Unlike traditional GNNs, GraphSAGE does not need to process the entire graph, but can effectively learn from large-scale graphs. This makes GraphSAGE suitable for domains such as social network analysis, knowledge graph enhancement, and bioinformatics, where graph data can be very large. The key advantage of GraphSAGE is its ability to learn node representations through iterative aggregation by exploiting the local neighborhood information of a node, thus supporting different downstream tasks such as node classification or link forecasting.

6.: Transformer–GraphSAGE model

This model combines Transformer’s self-attention mechanism and GraphSAGE’s graph neural network technology to effectively extract the features of PM_2.5 concentration data in both temporal and spatial dimensions and generate accurate forecasts accordingly. The core of the Transformer–GraphSAGE model lies in its two main modules: the temporal and spatial characterization modules. The temporal characterization module uses the Transformer model to process the time series data to capture the trend of PM_2.5 concentration over time, while the spatial characterization module processes the spatial data through the GraphSAGE model to learn the complex relationship between sites. The output features of these two modules are combined in the feature association layer, and finally, the forecast values are generated through the output layer.

7.: Ensemble Forecast model

The model utilizes a combined weighting strategy, which does not simply pursue a single optimal model, but constructs an ensemble of forecasting models. The ensemble forecast is realized by assigning appropriate weights to the LSTM–GCN and Transformer–GraphSAGE models, a strategy that allows the ensemble of models to take advantage of their respective performance strengths and to complement the limitations of individual models through weight adjustments. Figure 1 shows the process of ensemble forecast.

An example of input and output variables for the proposed ensemble forecast model are shown in Table 3.

2.3.2. Characterization Module

Temporal Characterization Module

This module is used for processing and analyzing time series data, with key variables including

A

and

P

.

A

is used to store input data, which consists of PM_2.5 concentration data from 36 monitoring stations continuously for 24 h prior to time point

t

.

P

used to store the model’s forecast data, representing the predicted PM_2.5 concentration values for these stations at moment

t

.

In this study, the sliding window technique is used to process the data, and the window size is set to 25, covering the 24 h of data before the moment

t

and the actual value at the moment

t

. The window size is set to 25. The specific procedure is as follows: place a sliding window at the top of the data sequence, extract the first 24 data points and store them in variable

A

, and store the 25th data point in variable

P

. Then, move the sliding window down by one hour and repeat the extraction process until the window reaches the bottom of the data sequence.

A_{k}

is used to represent the input data for the temporal feature component, which consists of the continuous 24-h PM_2.5 concentration data prior to time point t from 36 monitoring stations. The data format of

A_{k}

is depicted as Equation (3):

A_{k} = [\begin{matrix} a_{1,1} & a_{1,2} & \dots & a_{1,36} \\ a_{2,1} & a_{2,2} & \dots & a_{2,36} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ a_{24,1} & a_{24,2} & \dots & a_{24,36} \end{matrix}]

(3)

In (3),

a_{i, j}

represents the PM_2.5 concentration data of the

j

th station during the

i

th hour prior to time point

t

.

The forecast data matrix

P_{t}

is shown in Equation (4):

P_{t} = [\begin{matrix} p_{1} & p_{2} & \dots & p_{36} \end{matrix}]

(4)

In (4),

P_{i}

denotes the predicted PM_2.5 concentration at the

i

th station at time

t .

It is worth emphasizing that the output matrix generated by Ensemble Forecast is completely consistent with the dimensions of

P_{t}

, which means that the predicted value matrix and the true value matrix match in size. This feature greatly facilitates the analysis of differences between predicted and true values.

2.: Spatial Characterization Module

This module is used to capture the spatial trends among the 36 monitoring stations. Key variables include the adjacency matrix of the station distribution map and station feature data. Figure 2 displays the geographical distribution of the 36 monitoring stations.

The station distribution map is a weighted undirected graph where monitoring stations serve as vertices, and the inverse of the distance between stations is used as edge weights. It is represented using an adjacency matrix of dimensions

36 \times 36

, with 36 representing the number of stations. The matrix element

c_{i, j}

represents the inverse of the spatial distance between the

i

th and

j

th stations, that is,

c_{i, j} = \frac{1}{d_{i, j}}

. Here,

d_{i, j}

represents the spatial distance between the

i

th and

j

th stations, which is calculated using the Haversine formula. The formula is shown as Equation (5):

d_{i, j} = 2 r a r c s i n (\sqrt{{s i n}^{2} (\frac{Δ ϕ}{2}) + c o s (ϕ_{i}) c o s (ϕ_{j}) {s i n}^{2} (\frac{Δ λ}{2})})

(5)

In (5),

Δ ϕ

represents the dimensional difference between the two sites,

Δ λ

represents the difference in longitude between the two sites, and

r

represents the radius of the earth,

r = 6371 k m

.

The site characterization data selection uses the characterization data of each monitoring site at the moment

t - 1

to construct the characteristics of the monitoring site. These features include not only PM_2.5 concentration but also other related meteorological data such as wind speed, wind direction, temperature and humidity. For the wind direction data, the one-hot encoding (OHE) method was used to encode the wind direction as 8-bit data consisting of 0 and 1, where each direction corresponds to one data bit. For example, if the 8-bit data for east direction is 1 and the 8-bit data for other directions is 0, then the wind direction is east. After the above two types of data are merged by column, the feature data containing 12 columns is formed. Finally, a feature matrix containing 12 features can be constructed, as shown in Equation (6):

T = [\begin{matrix} m_{1,1} & m_{1,2} & \dots & m_{1,12} \\ m_{2,1} & m_{2,2} & \dots & m_{2,12} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ m_{36,1} & m_{36,2} & \dots & m_{36,12} \end{matrix}]

(6)

In (6),

m_{i, j}

denotes the

j

th feature of the

i

th site.

2.3.3. Multi-Objective Optimization

Multi-objective optimization involves optimizing two or more objectives simultaneously, seeking a collection of solutions to achieve a balance between the objectives, often through Pareto-optimal solutions to find a compromise between the different objectives. However, in PM_2.5 concentration forecasting, both accuracy and stability are important. Therefore, this study introduces bias and variance as multi-objective optimization metrics to find the optimal weighting coefficients through a multi-objective optimization algorithm.

First, two forecast models are selected to form a forecast ensemble. Then, the ensemble forecast model based on these two models can be expressed as follows:

\hat{y} = w_{1} \times {\hat{y}}_{L S T M - G C N} + w_{2} {\times \hat{y}}_{T r a n s f o r m e r - G r a p h S A G E}

(7)

In (7),

\hat{y}

represents the ensemble prediction result, where

{\hat{y}}_{L S T M - G C N}

and

{\hat{y}}_{T r a n s f o r m e r - G r a p h S A G E}

correspond to the forecast results of LSTM–GCN and Transformer–GraphSAGE, respectively. The variables

w_{1}

and

w_{2}

denote the weights assigned to the two models, satisfying the constraint

w_{1} + w_{2} = 1

.

In this study, a bias-variance framework is employed to evaluate the accuracy and stability of the ensemble prediction model. Let

x_{t} - {\hat{x}}_{t}

represent the difference between the true value and the predicted value. The average difference across all points is calculated as follows:

\frac{1}{T} \sum_{t = 1}^{T} (x_{t} - {\hat{x}}_{t}) = \frac{1}{T} \sum_{t = 1}^{T} x_{t} - \frac{1}{T} \sum_{t = 1}^{T} {\hat{x}}_{t}

(8)

In (8),

T

represents the number of forecast data points. The expectation of the predicted values is denoted as

E (\hat{x}) = \frac{1}{T} \sum_{t = 1}^{T} {\hat{x}}_{t}

, while the expectation of the observed values is denoted as

x = \frac{1}{T} \sum_{t = 1}^{T} x_{t}

. According to the bias-variance framework, the decomposition can be expressed as follows:

\begin{matrix} E {(\hat{x} - x)}^{2} & = E {(\hat{x} - E (\hat{x}) + E (\hat{x}) - x)}^{2} \\ = E (\hat{x} - E (\hat{x}))^{2} + {(E (\hat{x}) - x)}^{2} \\ = V a r (\hat{x}) + B i a s^{2} (\hat{x}) \end{matrix}

(9)

In (9),

B i a s^{2} (\hat{x})

denotes the forecast accuracy of the forecast model, and

V a r (\hat{x})

denotes the stability.

B i a s^{2} (\hat{x})

and

V a r (\hat{x})

are minimized during multi-objective optimization and can be written as follows:

S = ⌈s_{1}, s_{2}⌉

(10)

In (10),

S

contains the ensemble forecast result

\hat{y}

in Equation (7). Since

\hat{y}

depends on

ω

,

S

also depends on

ω

, written

S (ω)

.

It is important to note that the ensemble forecast results represent the physical quantity of PM_2.5 concentration, so the result is non-negative. Therefore, the constraint can be expressed as follows:

P_{h} \geq 0

(11)

The optimization goal is to minimize the solution of the multi-objective function. Finding such a solution is equivalent to searching for a set of optimal weighting coefficients

ω_{i}

, and obtaining an ensemble forecast that is as close as possible to the actual PM_2.5 concentration values through Equation (7). Thus, the constrained MOO can be represented as follows:

\begin{array}{l} M i n i m i z e \{S (ω)\} \\ S u b j e c t t o \{P_{h} \geq 0\}, h = 1,2, \dots, h_{n} . \end{array}

(12)

This study utilizes the MOEA/D (multi-objective evolutionary algorithm/decomposition) strategy to cope with the proposed multi-objective optimization problem. The main advantages of the MOEA/D algorithm lie in its efficient problem solving capability and in the fact that it maintains the diversity of solutions during the search process, which effectively avoids the possibility of falling into local optima. The algorithm is able to explore the Pareto-optimal frontier more accurately by decomposing a complex multi-objective problem into a number of manageable subproblems and optimizing these subproblems independently. This study uses the platypus library [24] to implement the MOEA/D algorithm. Platypus library is a free and open-source Python library designed for multi-objective optimization that provides a variety of multi-objective evolutionary algorithms and analysis tools. The MOEA/D algorithm resulted in the finalization of two key weights, namely

w_{1} = 0.26

,

w_{2} = 0.74

.

2.3.4. Model Evaluation

In this study, mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R²) were used as the evaluation metrics of the model. The formula for each index is shown in Equations (13)–(16):

M A E = \frac{1}{N} \sum_{k = 1}^{N} |y_{k} - {\hat{y}}_{k}|

(13)

R M S E = \sqrt{\frac{1}{N} \sum_{k = 1}^{N} {(y_{k} - {\hat{y}}_{k})}^{2}}

(14)

M A P E = \frac{1}{N} \sum_{k = 1}^{N} |\frac{{\hat{y}}_{k} - y_{k}}{y_{k}}| \times 100 %

(15)

R^{2} = 1 - \frac{\sum_{k = 1}^{N} {(y_{k} - {\hat{y}}_{k})}^{2}}{\sum_{k = 1}^{N} {(y_{k} - \overline{y})}^{2}}

(16)

where

y_{k}

represents the monitoring value of the

k

th sample,

{\hat{y}}_{k}

represents the predicted value of the

k

th sample,

\overline{y}

represents the average of the monitoring values of all samples, and N denotes the total number of samples.

3. Experimental Results and Discussion

3.1. Overall Evaluation Results

The deep learning models developed and used in this study were constructed based on Windows 10 operating system through the Google Colab platform using the PyTorch framework with NVIDIA T4 GPUs in order to accelerate the computational process, aiming to improve the efficiency and performance of model.

According to the real values on the sites and the predicted values of the seven models, the average MAE, RMSE, and MAPE of the 36 monitoring sites were calculated, and the results are shown in Table 4. These data are visualized through bar charts and line charts, as shown in Figure 3.

It can be seen from Table 4 and Figure 3 that the ensemble prediction model proposed in this study is better than LSTM–GCN, LSTM, GCN, Transformer–GraphSAGE, Transformer, and GraphSAGE models in terms of evaluation results. Among the four single models, the LSTM model has the smallest MAE and MAPE values, and its RMSE value is smaller than Transformer and GraphSAGE and slightly higher than GCN in the four single models. Among the four single models, LSTM showed the best prediction performance, which may be due to the specific features and patterns of the dataset that are more suitable for LSTM. Compared with GCN, the MAE, RMSE, and MAPE of LSTM–GCN are reduced by 3.2%, 3.8%and 3.6%, respectively. Compared with LSTM, LSTM–GCN only has advantages in MAE and RMSE, which are reduced by 1.2% and 5.1% respectively. Transformer–GraphSAGE is different from LSTM–GCN, its MAE, RMSE, and MAPE are 2.514, 4.055, 2.562, respectively. It is significantly lower than Transformer (3.389, 5.292, 3.484) and GraphSAGE (3.280, 4.971, 3.372). From the above data, it can be seen that the combined model has certain advantages over the single model, and this advantage will be different in different models, which may be related to the characteristics of the model itself and the characteristics of the data. Comparing the two combination models, Transformer–GraphSAGE’s MAE, RMSE, and MAPE are comprehensively ahead of LSTM–GCN, which are reduced by 17.5%, 12.9%, and 17.8%, respectively. This shows that Transformer–GraphSAGE can better use the features and patterns of the data. Compared with LSTM–GCN, Ensemble Forecast has these three indicators decreased by 22%, 15.4%, and 21%, respectively, and has decreased by 5.6%, 2.8%, and 3.8%, respectively, compared with Transformer–GraphSAGE. From the above information, it can be seen that Ensemble Forecast, which combines the MOEA/D algorithm to assign weights, is able to use the advantages of LSTM–GCN and Transformer–GraphSAGE to make better predictions.

3.2. Analysis of Predicted and Observed Values

Site No. 11 was randomly selected as the research object. In order to better demonstrate the forecasting effect of the model in the time dimension, the observed values for one week in February, which was more seriously polluted, and the forecast values of all models were extracted, and the curves were drawn as shown in Figure 4.

It can be observed from Figure 4 that the forecast results of all models fluctuate with changes in actual values, and this volatility shows a similar trend among the models. Among them, the forecast trend of Ensemble Forecast is consistent with the actual value, demonstrating its advantages in forecast performance.

A specific moment of high PM_2.5 concentration observation is randomly selected, from which the PM_2.5 observations of 36 monitoring stations and the forecast values of the models are extracted. In order to visualize the forecasting ability of these models in the spatial dimension, the graphs shown in Figure 5 are plotted.

As can be seen from Figure 5, in the spatial dimension, the fluctuation range and evolution trend of the predicted value of the Ensemble Forecast model are relatively consistent with the actual value, and the prediction effect is good.

The correlation between all model predictions and observations under site No. 11 is shown in Figure 6. The black line represents the y = x line, the red line represents the regression line of the model, and the different colored areas represent the density level of the data points.

Figure 6 presents the prediction effects of the seven different models on PM_2.5 concentrations in the form of density scatter plots. The density scatterplot visually identifies high-density areas (shown as red areas in the figure) by revealing the concentration trend of data points, thus demonstrating the correlation between the predicted values and the actual observed values. In Figure 6, the density scatterplot of each model demonstrates the distributional relationship between the predicted results and the actual data, especially the regions where the data points are concentrated, indicating a high degree of consistency between the predicted values and the actual observed values. In addition, the fitted linear equations and R² coefficients in each subplot provide a quantitative assessment of the accuracy of the model predictions.

In all the subgraphs, PM_2.5 concentration is mostly concentrated between 85 and 90 μg/m³, which belongs to the range of light pollution. Sorted in ascending order of R² values, the corresponding models are as follows: Transformer (0.7471), GraphSAGE (0.8332), LSTM–GCN (0.8443), LSTM (0.8493), GCN (0.8667), Transformer–GraphSAGE (0.8793), Ensemble Forecast (0.8831). It can be seen that the R² value of Transformer–GraphSAGE is higher than that of Transformer and GraphSAGE, but the R² value of LSTM–GCN is lower than that of LSTM and GCN, which may be because the time series features and graph structure features in the data are not obvious enough. Thus, the prediction accuracy of LSTM–GCN model is affected.

The Ensemble Forecast model has the highest R² value of all models, and its regression line is closer to the y = x line, which is the highest prediction accuracy of all models. This reflects that the Ensemble Forecast model can well integrate the advantages of LSTM–GCN and Transformer–GraphSAGE to achieve better forecasting effect.

3.3. Limitations and Prospects

From the previous experimental results, the Ensemble Forecast model performs well in terms of accuracy and stability. Specifically, the proposed model exhibits lower MAE, RMSE, and MAPE values, as well as higher R² values than the other compared models. The improvement in these indicators not only reflects the improvement in the performance of the model, but also lays a foundation for its future application and development. Although the proposed ensemble forecast model has shown good performance, there are still some limitations in this study, which needs to be further explored in future. For example, the model could be tested in more regions to further evaluate its generalization ability. In addition, further enhance the predictive power of model by integrating more data sources, such as emission inventories and satellite observations. At the same time, it is also worth exploring the potential of the model in multi-pollutant prediction and long-term prediction.

4. Conclusions

Accurate and stable prediction of PM_2.5 concentration is of great significance for air pollution prevention and control. Existing studies usually rely on a single model or use a single evaluation criterion in multi-model combination weighted forecasting, ignoring the dual needs of accuracy and stability for PM_2.5 forecasting. In order to make up for the above shortcomings, this paper proposes a PM_2.5 forecasting model, called Ensemble Forecast, which integrates multi-model and multi-objective optimization algorithms.

The proposed model combines the LSTM–GCN and Transformer–GraphSAGE models, which can comprehensively consider temporal and spatial features, and introduces bias and variance as multi-objective optimization indicators. The MOEA/D algorithm is used to determine the optimal weight coefficients of LSTM–GCN and Transformer–GraphSAGE models to carry out prediction. The study selects Beijing as a case area to predict the PM_2.5 concentration in the next hour. A detailed evaluation and comparative analysis based on the test set and benchmark models (LSTM, GCN, Transformer, GraphSAGE, LSTM–GCN, Transformer–GraphSAGE) is carried out. The results show that the proposed Ensemble Forecast model has smaller MAE, RMSE, and MAPE values, and larger R² value, and its prediction effect is better than other comparison models. In conclusion, the Ensemble Forecast model is feasible in predicting the PM_2.5 concentration and will be an effective PM_2.5 prediction tool. It is expected that the model will be further improved in the future to provide more powerful support for public health guidance and the government’s air pollution control work.

Author Contributions

Funding acquisition, X.M. and H.L.; methodology, Z.G. and X.M.; software, Z.G.; writing—original draft, Z.G., X.M. and H.L.; writing—review and editing, X.M. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by Hainan Provincial Natural Science Foundation of China (Grant number: 623RC455, 623RC457), the Ministry of Education’s Industry-University Cooperation Collaborative Education Project (Grant number: 220902070162538), and the Scientific Research Fund of Hainan University (Grant number: KYQD (ZR)-22096, KYQD(ZR)-22097).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be provided by the corresponding author by moxinyue@hainanu.edu.cn when requested.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, X.B.; Wei, H.Y. Progress on the health effects of ambient PM2.5 pollution. Chin. Sci. Bull. 2013, 58, 1171–1177. [Google Scholar]
Lavigne, E.; Lima, I.; Hatzopoulou, M.; Van Ryswyk, K.; van Donkelaar, A.; Martin, R.V.; Weichenthal, S. Ambient Ultrafine Particle Concentrations and Incidence of Childhood Cancers. Environ. Int. 2020, 145, 106135. [Google Scholar] [CrossRef] [PubMed]
Mo, X.; Li, H.; Zhang, L.; Qu, Z. Environmental Impact Estimation of PM2.5 in Representative Regions of China from 2015 to 2019: Policy Validity, Disaster Threat, Health Risk, and Economic Loss. Air Qual. Atmos. Health 2021, 14, 1571–1585. [Google Scholar] [CrossRef]
Mo, X.; Li, H.; Zhang, L.; Qu, Z. A Novel Air Quality Evaluation Paradigm Based on the Fuzzy Comprehensive Theory. Appl. Sci. 2020, 10, 8619. [Google Scholar] [CrossRef]
Lu, Y.; Li, B.; Fan, C.; Wang, J.T.; Zhang, H.Y.; Jiang, H.Q. Evolution and Development of Air Quality Prediction and Simulation Technology. Chin. J. Environ. Manag. 2021, 13, 84–92. [Google Scholar]
Mo, X.; Li, H.; Zhang, L.; Qu, Z. A Novel Air Quality Early-Warning System Based on Artificial Intelligence. Int. J. Environ. Res. Public Health 2019, 16, 3505. [Google Scholar] [CrossRef] [PubMed]
Arsov, M.; Zdravevski, E.; Lameski, P.; Corizzo, R.; Koteli, N.; Gramatikov, S.; Trajkovik, V. Multi-Horizon Air Pollution Forecasting with Deep Neural Networks. Sensors 2021, 21, 1235. [Google Scholar] [CrossRef]
Mo, X.; Li, H.; Zhang, L. Design a Regional and Multistep Air Quality Forecast Model Based on Deep Learning and Domain Knowledge. Front. Earth Sci. 2022, 10, 995843. [Google Scholar] [CrossRef]
Mao, X.; Shen, T.; Feng, X. Prediction of Hourly Ground-Level PM2.5 Concentrations 3 Days in Advance Using Neural Networks with Satellite Data in Eastern China. Atmos. Pollut. Res. 2017, 8, 1005–1015. [Google Scholar] [CrossRef]
Shishegaran, A.; Saeedi, M.; Kumar, A.; Ghiasinejad, H. Prediction of air quality in Tehran by developing the nonlinear ensemble model. J. Clean. Prod. 2020, 259, 120825. [Google Scholar] [CrossRef]
Ma, J.W.; Yan, J.H.; Sun, R.W. Prediction Model of PM2.5 Concentration Based on LSTM-GCN. China Environ. Monit. 2022, 38, 153–160. [Google Scholar] [CrossRef]
Mohammadzadeh, A.K.; Salah, H.; Jahanmahin, R.; Hussain, A.E.A.; Masoud, S.; Huang, Y. Spatiotemporal Integration of GCN and E-LSTM Networks for PM2.5 Forecasting. Mach. Learn. Appl. 2024, 15, 100521. [Google Scholar] [CrossRef]
Li, T.; Hua, M.; Wu, X.U. A Hybrid CNN-LSTM Model for Forecasting Particulate Matter (PM2.5). IEEE Access 2020, 8, 26933–26940. [Google Scholar] [CrossRef]
Moghram, I.; Rahman, S. Analysis and Evaluation of Five Short-Term Load Forecasting Techniques. IEEE Trans. Power Syst. 1989, 4, 1484–1491. [Google Scholar] [CrossRef] [PubMed]
Bates, J.M.; Granger, C.W. The Combination of Forecasts. J. Oper. Res. Soc. 1969, 20, 451–468. [Google Scholar] [CrossRef]
Yan, J.; Wang, G.Z. Combined PM2.5 Concentration Prediction Model Based on CNN & LSTM of Variable Weight—A Case Study of Beijing. Adv. Appl. Math. 2022, 11, 2095. [Google Scholar]
Yang, Y.; Chen, Y.; Wang, Y.; Li, C.; Li, L. Modelling a Combined Method Based on ANFIS and Neural Network Improved by DE Algorithm: A Case Study for Short-Term Electricity Demand Forecasting. Appl. Soft Comput. 2016, 49, 663–675. [Google Scholar] [CrossRef]
Xiao, L.; Shao, W.; Liang, T.; Wang, C. A Combined Model Based on Multiple Seasonal Patterns and Modified Firefly Algorithm for Electrical Load Forecasting. Appl. Energy 2016, 167, 135–153. [Google Scholar] [CrossRef]
Zheng, Y.; Yi, X.; Li, M.; Li, R.; Shan, Z.; Chang, E.; Li, T. Forecasting Fine-Grained Air Quality Based on Big Data. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 2267–2276. [Google Scholar]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. arXiv 2015, arXiv:1506.04214. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs. arXiv 2017, arXiv:1706.02216. [Google Scholar]
Lee, A.N.; Hunter, C.J.; Ruiz, N. Platypus: Quick, Cheap, and Powerful Refinement of LLMS. arXiv 2023, arXiv:2308.07317. [Google Scholar]

Figure 1. The flowchart of ensemble forecast.

Figure 2. Longitudinal and latitudinal distribution of 36 monitoring stations.

Figure 3. Comparison of model evaluation metrics.

Figure 4. Forecast effects of all models at site No. 11 during the same period.

Figure 5. Forecast effects of all models on 36 sites at specific times.

Figure 6. Scatter plot of density between model forecast and observed values at site No. 11.

Table 1. The input variables and units.

Data Type	Unit
PM_2.5 Concentration	μg/m³
Wind Speed	m/s
Temperature	°C
Relative Humidity	%
Longitude	°
Latitude	°

Table 2. Defined values for wind direction.

Values	1	2	3	4	5	6	7	8
Wind direction	East	West	South	North	Southeast	Northeast	Southwest	Northwest

Table 3. The inputs and outputs of ensemble forecast model.

Input Variables	Output Variables
24-h PM2.5 timing matrix (24 × 36)	Matrix of predicted values at time t (1 × 36)
Characteristic matrix at time t − 1 (36 × 12)
Adjacency matrix (36 × 36)

Table 4. The evaluation results of different models.

Model	MAE	RMSE	MAPE (%)
Ensemble Forecast	2.374	3.940	2.465
Transformer– GraphSAGE	2.514	4.055	2.562
LSTM–GCN	3.046	4.655	3.118
GCN	3.146	4.840	3.234
LSTM	3.081	4.904	3.079
GraphSAGE	3.280	4.971	3.372
Transformer	3.389	5.292	3.484

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, Z.; Mo, X.; Li, H. Prediction of PM_2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast. Sustainability 2024, 16, 4643. https://doi.org/10.3390/su16114643

AMA Style

Gao Z, Mo X, Li H. Prediction of PM_2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast. Sustainability. 2024; 16(11):4643. https://doi.org/10.3390/su16114643

Chicago/Turabian Style

Gao, Zihang, Xinyue Mo, and Huan Li. 2024. "Prediction of PM_2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast" Sustainability 16, no. 11: 4643. https://doi.org/10.3390/su16114643

APA Style

Gao, Z., Mo, X., & Li, H. (2024). Prediction of PM_2.5 Concentration Based on Deep Learning, Multi-Objective Optimization, and Ensemble Forecast. Sustainability, 16(11), 4643. https://doi.org/10.3390/su16114643

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu