A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction

Li, Pengfei; Zhang, Tong; Jin, Yantao

doi:10.3390/su15097624

Open AccessArticle

A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction

by

Pengfei Li

,

Tong Zhang

^* and

Yantao Jin

School of Electrical and Mechanical Engineering, Pingdingshan University, Pingdingshan 467000, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(9), 7624; https://doi.org/10.3390/su15097624

Submission received: 2 April 2023 / Revised: 28 April 2023 / Accepted: 4 May 2023 / Published: 6 May 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Air pollution is a pressing issue that poses significant threats to human health and the ecological environment. The accurate prediction of air quality is crucial to enable management authorities and vulnerable populations to take measures to minimize their exposure to hazardous pollutants. Although many methods have been developed to predict air quality data, the spatio-temporal correlation of air quality data is complex and nonstationary, which makes air quality prediction still challenging. To address this, we propose a novel spatio-temporal neural network, GCNInformer, that combines the graph convolution network with Informer to predict air quality data. GCNInformer incorporates information about the spatial correlations among different monitoring sites through GCN layers and acquires both short-term and long-term temporal information in air quality data through Informer layers. Moreover, GCNInformer uses MLP layers to learn low-dimensional representations from meteorological and air quality data. These designs give GCNInformer the ability to capture the complex and nonstationary relationships between air pollutants and their surrounding environment, allowing for more accurate predictions. The experimental results demonstrate that GCNInformer outperforms other methods in predicting both short-term and long-term air quality data. Thus, the use of GCNInformer can provide useful information for air pollutant prevention and management, which can greatly improve public health by alerting individuals and communities to potential air quality hazards.

Keywords:

air quality prediction; spatio-temporal correlations; time series data; graph convolutional networks; Informer; deep learning

1. Introduction

The problem of the atmospheric environment has gained widespread public attention due to its significant impact on human health and the ecological environment. Air quality plays a crucial role in our daily lives, and the severe air pollutant problem affects every place in the world. The air quality index (AQI) is widely used to quantitatively indicate the status of air quality, where a higher value and level of AQI indicate more severe air pollution. The AQI mainly focuses on air pollutants such as fine particulate matter (PM_2.5), inhalable particulate (PM₁₀), sulfur dioxide (SO₂), nitrogen dioxide (NO₂), ozone (O₃), and carbon monoxide (CO), and it converts the pollutant data into the AQI values based on the concentrations of different air pollutants. The rapid developments of urbanization and industrialization are causing an air pollution problem, which is becoming an alarming reality in many cities in China. Air pollution can pose a significant threat to global public health. For example, CO is a crucial pollutant in measuring regional atmospheric quality; it indirectly contributes to the greenhouse effect and the near-surface photochemical smog [1]. A high level of O₃ is harmful to human respiratory, cardiovascular, and immune systems, resulting in conditions such as asthma, respiratory tract infections, stroke, and arrhythmia [2,3,4]. PM₁₀, known for its long atmospheric retention time and high specific surface area, can easily transport toxic substances, making it a serious health hazard once it enters the human respiratory system and even the bloodstream [5]. NO₂ can be harmful to the alveoli and cause bronchitis, pneumonia, and emphysema, and high levels of SO₂ can have similarly negative effects [6]. The proper monitoring and control of air quality can help mitigate the harmful effects of air pollution. Thus, accurate prediction of air pollutant concentrations is crucial for management authorities and vulnerable populations to minimize the exposure to hazardous pollutants.

In the past few years, several methods have been proposed to predict air pollutants. In general, air pollutant prediction methods can be divided into three categories: statistical methods, traditional machine learning methods, and deep learning methods. Statistical methods generally establish prediction models based on theoretical assumptions and prior knowledge of data. The autoregressive integrated moving average (ARIMA) is one of the representative statistical models for air quality prediction. For example, ARIMA was used to predict the air pollutants in different cities in India and achieved good performance [7,8]. ARIMA was integrated with a set of other statistical models to improve the AQIs (e.g., PM_2.5, NO₂, and O₃) prediction in Hong Kong [9]. However, the determination of parameters in statistical methods heavily relies on theoretical assumptions and the prior knowledge of data, making it difficult to fit nonlinear and nonstationary air quality data. Thus, the statistical methods may lead to an AQI prediction bias, especially for accurate long-term AQI prediction [10].

Different from statistical methods, traditional machine learning methods and deep learning methods neither rely on prior physical knowledge of data nor theoretical assumptions. They automatically learn their model parameters from the historical data to capture the nonlinear relationships and knowledge of air quality data. Some representative traditional machine learning methods are support vector machine (SVM), artificial neural network (ANN), random forest, and so on. For example, Osowski et al. combined SVM with wavelet decomposition to predict daily air pollutants [11]. Feng et al. used ANN and wavelet transform to predict PM_2.5 concentration [12]. Nouri et al. combined ANN with principle component analysis to predict PM_2.5 concentration in Urmia, Iran [13]. Bi et al. applied random forest to predict PM_2.5 levels in Imperial County, USA [14]. Compared with statistical methods, traditional machine learning methods generally achieve better performance in predicting air pollutants [15,16,17]. However, their performances are generally limited in practical application. This is because the shallow structures of traditional machine learning models often make them fail to capture the complex and nonstationary fluctuations of air pollutant data, resulting in their poor performance in long-term air pollutant prediction, which is crucial to preemptive air pollutant prevention and management.

With the development of machine learning, deep learning models, which have deeper structures and more powerful learning abilities, have been proposed. In recent years, deep learning methods have gained momentum due to their outstanding performances in predicting both short-term and long-term air pollutants. For example, Yi et al. developed a DNN-based model to predict PM_2.5 by fusing heterogeneous data through some distributed subnets [18]. Prakash et al. proposed a wavelet-based recurrent neural network (RNN) to predict short-term and long-term air pollutant concentrations [19]. Tong et al. applied bidirectional long short-term memory (LSTM) to predict the PM_2.5 concentration [20]. Zhang et al. designed an encoder-decoder structure-based model by stacking multiple LSTM layers to predict the next-day air pollutants and achieved good results [21].

However, although the above methods have achieved good prediction results, they only focused on the temporal information in air quality data but ignored the spatial dependencies among surrounding monitoring sites. Efficiently capturing the spatial correlations can guide better air quality prediction results due to the strong spatial correlations among air quality data among different monitoring sites [22]. Recently, a few methods have been proposed to predict air quality data by capturing temporal and spatial correlations in air quality data. For example, Qin et al. used a CNN and LSTM mixed model to learn spatial and temporal dependencies for predicting urban PM_2.5 concentration [23]. Pak et al. improved the prediction accuracy and stability by using the temporal and spatial correlations of air quality data among multiple monitoring sites [24]. Li et al. extended the LSTM and proposed an LSTM-based model to extract temporal and spatial correlations for improving air pollutant prediction [25]. These methods used prior knowledge or CNN to extract spatial information, having the following shortcomings: (1) in practical applications, the spatial information extracted based on prior knowledge may miss some important hidden spatial information due to the complex and changeable application scenarios; (2) although CNN can extract useful information from grid structure data, such as images, regular grids, etc., it cannot essentially characterize the spatial dependence due to the non-grid structure of monitoring sites.

Recently, graph neural networks (GNNs) have shown outstanding performance in dealing with non-grid data, and they are widely used to capture the structural features of graph networks [26,27]. The spatial correlation among different monitoring sites can be naturally modeled as a graph, where different monitoring sites are represented as nodes connected by links. A link represents a strong correlation between two sites, such as the proximity geographical location. Considering the air quality data recorded in a monitoring site as the signal of the corresponding node, GNNs can automatically extract spatial dependence from air quality data without any prior knowledge. A few studies have introduced GNNs into air quality prediction modeling. For example, Wang et al. combined the graph convolution network (GCN) with an RNN-structured gated recurrent unit (GRU) to capture the temporal and spatial correlations for improving PM_2.5 prediction [28]. Feng et al. designed three distributed GCN-based components to extract the recent, daily periodic, and weekly periodic temporal spatial dependencies of air quality data and combined the outputs of the three components with the external information extraction component to generate air quality predictions [22]. The excellent spatial feature extraction ability of GCN helped these methods to have a good prediction performance. However, the ability of these methods to capture time dependence still needs to be improved. For example, the accuracy of the RNN-based model will greatly decrease with the increasing length of the prediction time due to its constraint on global information extraction [29]. The manually extracted temporal components, such as recent, daily periodic, and weekly periodic, may lead to the omission of hidden temporal information in air quality data.

In view of the limitations of the aforementioned methods, in this study, we propose a novel spatio-temporal neural network (named GCNInformer) by combining a graph convolution network (GCN) with Informer [30] to predict air quality data. The main contributions of our study are as follows:

We propose a novel spatio-temporal neural network (GCNInformer) that incorporates the advantages of GCN and Informer to better predict air quality data. GCNInformer not only uses MLP layers to extract useful information from raw meteorological data and air quality data but also captures the complex and nonstationary relationships between air pollutants and their surrounding environment to boost air quality prediction. Meanwhile, incorporating Informer layers gives GCNInformer the ability to easily predict air quality data for multiple monitoring sites with only one trained model.
The experimental results on a real-world dataset demonstrate the outstanding performance of GCNInformer in predicting PM_2.5 concentration over different prediction lengths of time. The prediction results of GCNInformer are steadier and better than most of the comparison methods when the prediction time is longer.
The ablation experiments show the effectiveness of each component of GCNInformer, and GCNInformer not only accurately predicts PM_2.5 concentration but also efficiently predicts other air pollutant concentrations, such as PM₁₀, SO₂, NO₂, CO, and O₃. Moreover, GCNInformer can feasibly predict other similar spatio-temporal data, such as traffic data, wind speed data, and so on.

The rest of this paper is organized as follows. In Section 2, we give the definition of the air quality prediction problem and describe the framework and each component of GCNInformer in detail. In Section 3, we introduce the dataset, experimental settings, and experiments conducted to evaluate the effectiveness of GCNInformer. In Section 4, we provide the conclusion.

2. Methodology

2.1. Problem Formulation

The purpose of our research is to predict future air quality based on the historical data recorded by different monitoring sites and the spatial site network. The spatial site network can be represented as an undirected graph

G = (V, E, A)

, where V is the node set, and each node of V denotes a monitoring site; E denotes the edges connecting different nodes; and the paired nodes’ connectivity can be represented as the adjacency matrix

A \in ℝ^{N \times N}

, where N is the number of nodes. The data of N monitoring sites at timestamp t can be denoted as

X^{t} = {x_{1}^{t}, x_{2}^{t}, \dots, x_{N}^{t}}

, where

x_{n}^{t} \in ℝ^{F}

denotes the recorded F-dimensional features of the n-th monitoring site. Now, given the T historical data from timestamp

t_{0}

and the spatial site network G as inputs, the goal of air quality prediction is to find a mapping function f to predict the future τ time steps air quality data of N monitoring sites:

{X^{t_{0} + 1}, X^{t_{0} + 2}, \dots, X^{t_{0} + τ}} = f (G; {X^{t_{0} - T + 1}, X^{t_{0} - T + 2}, \dots, X^{t_{0}}})

(1)

2.2. Overview of GCNInformer

The overview architecture of the GCNInformer is shown in Figure 1. The GCNInformer mainly contains three components: (1) MLP layers, which extract useful information from the raw monitored data (i.e., the meteorological data and air quality data); (2) GCN layers, which capture the spatial dependence of the air quality data at multiple monitoring sites; and (3) Informer layers, which capture the long sequence temporal dependence of the historical data.

2.3. MLP Layers

Considering the influence of the meteorological context on the air quality and the correlations between different air quality indices (e.g., PM_2.5, PM₁₀, NO₂, CO, SO₂, O₃, etc.), we utilized the MLP to extract the features from the raw air quality data and meteorological data as the input to the GCN layers. For instance, taking an air quality index (e.g., PM_2.5) as the predicting target, the historical data of PM_2.5 with L length of the n-th monitoring site are represented as

X_{n} \in ℝ^{1 \times 1 \times L}

, and the corresponding M-dimensional feature data (i.e., the meteorological data and air quality data except for the PM_2.5) are denoted as

F_{n} \in ℝ^{1 \times M \times L}

. To extract useful information from the raw air quality data and the meteorological data, we first fed

F \in ℝ^{N \times M \times L}

into the MLP to learn the low-dimensional feature

\tilde{F} \in ℝ^{N \times 1 \times L}

from the M-dimensional feature data. Then, we assigned the learnable attention weights

α_{1}

and

α_{2}

to the historical data of PM_2.5

X \in ℝ^{N \times 1 \times L}

and the learned low-dimensional feature

\tilde{F}

to automatically adjust the importance of the information in X and

\tilde{F}

. Finally, we obtained

\tilde{X} = α_{1} X + α_{2} \tilde{F}

as the output of the MLP layers, which was fed into the GCN layers.

2.4. GCN Layers

Capturing the spatial dependence of the air quality data between monitoring sites is helpful in improving the prediction of air quality data. The traditional convolutional neural network (CNN) can capture local spatial features from data in Euclidean space, such as images, video, and speech, which are represented in the form of regular grids. However, the real spatial site network is structured with a graph that is irregular and in non-Euclidean space, which limits the performance of the traditional CNN model in capturing the spatial dependence of the air quality data at multiple monitoring sites. To tackle this problem, the graph convolutional network (GCN) [31] is proposed to handle the graph-structured data by generalizing the CNN from the data represented by regular grids to the data represented by the graphs. Due to its powerful ability to capture spatial dependence in graph-structured data, the GCN has attracted a lot of attention in recent years and has been successfully applied in different fields [26,27]. Given a graph G with the adjacency matrix

A \in ℝ^{N \times N}

, the L length features

X \in ℝ^{N \times L}

of N nodes can be regarded as the graph signals on G. The GCN can learn spatial features from A and X by constructing a convolutional filter in the spectrum domain. A convolutional filter was applied to the node features and captured the spatial dependence between nodes by aggregating the first-order neighboring features of each node, which is called a convolutional layer. A GCN model can stack multiple convolutional layers to aggregate node features from high-order neighborhoods, which can be formulated as:

H^{(l + 1)} = σ ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} H^{(l)} θ^{(l)}),

(2)

where

\tilde{A} = A + I_{N}

denotes the adjacency matrix with self-loops,

I_{N}

is an identity matrix,

\tilde{D} = \sum_{j} {\tilde{A}}_{i j}

denotes the diagonal node degree matrix of

\tilde{A}

,

H^{(l)}

and

θ^{(l)}

are the learned node features and the trainable weigh matrix of the l-th layer, respectively, and

σ (\cdot)

represents a non-linear activation function, such as the ReLu function.

In our GCNInformer, we used a 1-layer GCN model to learn spatial features from adjacency matrix A and node feature matrix X, which was be formulated as:

f (X, A) = ReLU ({\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2} X W_{0}) .

(3)

Here, the X is the node feature matrix, which is generated by the MLP layers; that is,

X = \tilde{X}

,

W_{0}

is the trainable weight matrix, and

f (X, A) \in ℝ^{N \times L}

represents the learned spatial features of N nodes with L length.

2.5. Informer Layers

Capturing the temporal dependence of air quality plays a crucial role in the prediction of air quality data. Here, we introduce the Informer based on the Transformer architecture to capture the temporal dependence. Similar to the typical Transformer model, the Informer also employs the encoder decoder architecture, and the overall architecture of the Informer is shown in Figure 1. The Informer improves the Transformer model by developing the ProbSparse self-attention mechanism and self-attention distilling operation to reduce the time complexity and memory usage when handling long sequence data and predict outputs in a generative style.

The encoder of the Informer receives the long sequence data as input and captures the temporal dependence of long sequence data by the ProbSparse self-attention mechanism and the self-attention distilling operation. (i) ProbSparse self-attention: compared with the original self-attention mechanism of Transformer, the ProbSparse self-attention mechanism efficiently reduces the time complexity. In the self-attention mechanism, each element is paired with all the other elements to determine the distribution of attention, which introduces a heavy computational burden and memory usage when handling long sequence data. However, the distribution of attention is dominated by only several significant pairs of queries and keys in most real-world scenarios. We define the attention of the i-th query on other keys as a probability

p (k_{j} q_{i})

and the uniform distribution as

q (k_{j} q_{i}) = 1 / L_{k}

. The

p (k_{j} q_{i})

of the dominating pairs of query and key tends to show a separation from

p (k_{j} q_{i})

. To measure the separation between p and q, the ProbSparse self-attention introduces an approximate measurement based on the Kullback Leibler divergence [32], formulated as:

\bar{M} (q_{i}, K) = \max_{j} {\frac{q_{i} k_{j}^{T}}{\sqrt{d}}} - \frac{1}{L_{k}} \sum_{j = 1}^{L_{k}} \frac{q_{i} k_{j}^{T}}{\sqrt{d}},

(4)

where

q_{i}

is the i-th row in the query matrix Q,

k_{j}

is the j-th row in the key matrix K, and d is the dimension of input data. A larger value of

\bar{M} (q_{i}, K)

indicates the attention probability p of the i-th query is more diverse; thus, it is more likely to have the dominating pairs in the header field of the long tail self-attention distribution [30]. Based on the measurement of Equation (4), the ProbSparse self-attention can only focus on the top u dominating queries for each key:

A (Q, K, V) = s o f t m a x (\frac{\bar{Q} K^{T}}{\sqrt{d}}) V,

(5)

where

\bar{Q}

is a sparse matrix with the same size of Q and only has top-u queries with the largest value of

\bar{M} (q_{i}, K)

. (ii) Self-attention distilling: as the ProbSparse self-attention may lead to the redundant combinations of values in matrix V, the encoder uses the self-attention distilling operation to extract the dominating features and to construct a focused self-attention feature map. The self-attention distilling operation from the j-th layer to the (j + 1)-th layer is expressed as:

X_{j + 1}^{t} = M a x P o o l i n g (E L U (C o n v 1 d ({[X_{j}^{t}]}_{a t t}))),

(6)

where

C o n v 1 d (\cdot)

represents the 1D convolution operation on the time dimension of data,

{[\cdot]}_{a t t}

represents the Multi-head ProbSparse self-attention operation. The distilling operation adds a max-pooling layer with stride 2 to reduce

X_{j}^{t}

into its half slice after a layer, which reduces the memory usage to

O ((2 - ε) L \log L)

, where

ε

is a small number.

The decoder receives the long sequence data as input in which the predicted elements are set to zeros:

X_{i}^{d e} = C o n c a t (X_{i}^{t o k e n}, X_{i}^{0}),

(7)

where

X_{i}^{d e}

is the i-th input sequence of the decoder,

X_{i}^{t o k e n}

is the start token of the i-th input sequence, and

X_{i}^{0}

is a placeholder for the predicted elements in the i-th input sequence which are all masked by scalar 0 to avoid autoregression, using the future data. Then, the decoder stacks two multi-head attention layers to output predictions for elements that are set to zeros in a generative style. Specifically, the second multi-head attention layer takes the feature map generated by the encoder and the outputs from the first layer as inputs to output predicted elements by a fully connected layer.

3. Experiments

3.1. Dataset and Preprocessing

In this study, the dataset contained two parts: the air quality data and the meteorological data. The air quality data were collected from the Beijing Municipal Environmental Monitoring Center, which included the historical hourly air quality data in 12 nationally controlled air quality monitoring sites in Beijing, China, from 1 July 2015 to 28 February 2017. These 12 monitoring stations are distributed in the downtown and suburban areas of Beijing, as shown in Figure 2. Each of the 12 monitoring sites recorded six types of air pollutants hourly, including PM_2.5, PM₁₀, SO₂, NO₂, CO, and O₃. The recorded data of air pollutants were used to construct the spatial site network G through the Spearman correlation coefficient method. Specifically, the Spearman correlation coefficient between each pair of monitoring sites was calculated by Equation (8),

ρ (X_{n}, X_{m}) = 1 - \frac{6 \sum_{t = 1}^{L} (X_{n}^{t} - X_{m}^{t})}{L (L^{2} - 1)} .

(8)

The Spearman correlation coefficients reflect the spatial correlations between the recorded data of a monitoring site and that of its surrounding sites. Thus, to retain the most influential monitoring sites for a target monitoring site, we set the threshold as 0.9 for the Spearman correlation coefficient. Two monitoring sites were connected in the spatial site network G only if the Spearman correlation coefficient was not less than 0.9.

Additionally, as the meteorological factors show a significant impact on the air quality data [33], we collected four types of meteorological data (i.e., temperature, pressure, dew point temperature, and wind speed) from the weather observing stations of the China Meteorological Administration. To be consistent with the hourly air quality data, we used linear interpolation to transform the six-hourly meteorological data into hourly data. The meteorological data of each air quality monitoring site were obtained from the nearest weather observing station to the corresponding monitoring site.

After collecting the air quality data and meteorological data, data preprocessing needed to be performed. On the one hand, due to the problems of sensor failures, equipment maintenance, and other disruptions in the monitoring site, the air quality data of each monitoring site contained missing values. In this study, we used linear interpolation to fill in the missing values for the air quality data of each monitoring site, which is a simple strategy commonly used in some related works [22]. As a result, each monitoring site contained 14,616 records of air pollutants, including PM_2.5, PM₁₀, SO₂, NO₂, CO, and O₃. On the other hand, to mitigate the impact of different dimensions of the data on the model training and to accelerate the convergence speed of the model, we performed standardization on each feature of air quality data and meteorological data, which can be expressed as:

z = \frac{x - μ}{σ},

(9)

where

x

and

z

are the raw data and standardized data,

μ

is the mean value of

x

, and

σ

is the standard deviation of

x

. As a result, the distribution of each feature was scaled to have a mean of zero and a standard deviation of one.

3.2. Evaluation Metrics

There are two widely used metrics to evaluate the prediction effectiveness of our GCNInformer model and other baselines: the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE), which are expressed as:

R M S E = \sqrt{\frac{1}{M N} \sum_{j = 1}^{M} \sum_{i = 1}^{N} {(y_{i}^{j} - {\hat{y}}_{i}^{j})}^{2}},

(10)

M A E = \frac{1}{M N} \sum_{j = 1}^{M} \sum_{i = 1}^{N} | y_{i}^{j} - {\hat{y}}_{i}^{j} |,

(11)

where

y_{i}^{j}

and

{\hat{y}}_{i}^{j}

denote the ground truth and the predicted value of an air pollutant concentration (e.g., PM2.5), respectively, at the j-th time in the i-th monitoring site, M is the number of all test samples, and N is the number of air quality monitoring sites. The RMSE and MAE measure the prediction error, and the smaller the value of the RMSE or MAE is, the better the prediction performance is.

3.3. Experimental Settings

Our GCNInformer mainly contained three parts, i.e., MLP layers, GCN layers, and Informer layers. The implementation details of each part are described as follows. Firstly, the MLP layers were implemented by three fully connected layers with neurons 9, 100, and 1, where 9 is the dimensionality of the input feature data, 100 is the neuron number of the hidden layer, and 1 is the neuron number of the last layer. Secondly, the GCN layers were constructed by the Chebyshev GCN layer [34] with

K = 2

. Thirdly, the Informer layers were implemented by two encoder layers and one decoder layer with a hidden dimensionality of 512. To implement the ProbSparse self-attention mechanism in the encoder and decoder layers, residual connections, a position-wise feed-forward network layer with 2048 inner-layer neurons, and a dropout layer with a dropout rate of 0.05 were used, and the number of attention heads was set to 8. We used a learning rate of 0.0001, a batch size of 32, the Mean Square Error (MSE) loss function, and the Adam optimizer to train the GCNInformer model. As each monitoring site contained air quality data for 20 months (from 1 July 2015 to 28 February 2017), the train/validation/test was 16/2/2 months. The total number of epochs was 30, and an early stopping strategy with a patience of 5 was used to train the model. All experiments were conducted on the Ubuntu server with an Intel Xeon CPU (2.4 GHz, 128 G RAM) and an Nvidia RTX 3090 GPU (24 G GPU RAM).

3.4. Performance Comparison

To prove the effectiveness of our proposed model, we compared the GCNInformer with six other different baselines. All the methods were trained and tested with the same train, validation, and test datasets. The details of the baselines are as follows:

ARIMA [35]: the Auto-Regressive Integrated Moving Average model is widely used to forecast future data by fitting historical time series data.

SVR [36]: the Support Vector Regression model aims to find a hyperplane that best fits the data. It uses historical data to train the model to discover the relationship between the historical data and the future data and then predicts the future data by the trained model.

DNN: the DNN is an artificial neural network that consists of multiple fully interconnected layers with a sigmoid activation function, which can be used to find the relationship between the historical data and the future data.

LSTM [37]: the Long Short-Term Memory model is a type of recurrent neural network (RNN) that is designed to handle the vanishing gradient problem in traditional RNNs. It is capable of learning long-term dependencies in sequential data by selectively remembering or forgetting information over time.

Transformer [29]: the Transformer is constructed by an encoder and a decoder with a self-attention mechanism, which allows it to capture long-range dependencies in sequential data.

Informer [30]: the Informer is another variant of the Transformer model designed specifically for time series forecasting. Differently, the Informer includes a novel ProbSparse Attention mechanism to reduce the computational cost of self-attention.

To ensure the prediction results not to bias to a particular randomly initialized weight parameters, we ran each method five times and reported the average metric of each method in Table 1. Specifically, Table 1 shows the performance of the GCNInformer model and the other baseline models for the 1 h, 8 h, 16 h, and 24 h predicting tasks on the PM_2.5 air pollutant concentration. We can see that our GCNInformer significantly outperformed other baselines in terms of the RMSE and MAE in all predicting tasks according to the Student’s t-test at level 0.01. Additionally, the deep learning methods generally performed better than the traditional regressive models, such as ARIMA and SVR, indicating the better capability of deep learning methods to handle complex and non-linear air quality data. For example, for the 1 h predicting task, the RMSE errors of the GCNInformer model and the DNN model were reduced by approximately 38.49% and 22.78%, respectively, compared with the ARIMA and were reduced by approximately 24.08% and 4.69% compared with the SVR. For the 24 h predicting task, the RMSE errors of the GCNInformer model and the DNN model were reduced by 63.98% and 41.32%, respectively, compared with the ARIMA and were reduced by 56.36% and 28.90% compared with the SVR. Moreover, the attention-based methods (e.g., Transformer, Informer, and GCNInformer), which use the attention mechanism to capture long-range dependency in time series data, generally performed better than the DNN and LSTM. With the extension of the prediction time, the RMSE errors of the Transformer, Informer, and GCNInformer had fewer changes compared with the ARIMA, SVR, DNN, and LSTM (Figure 3), indicating their effectiveness in long-term prediction. Furthermore, the GCNInformer integrates the GCN and Informer to extract the temporal and spatial features to obtain the relationship between the historical data and the future data; thus, it achieved the lowest RMSE error no matter how the prediction time changed.

3.5. Evaluation of Model Components

In this section, we conducted ablation experiments to evaluate the effectiveness of the MLP layers, which extract useful information from the raw monitored data, and the GCN layers which capture the spatial dependence of the air quality data at multiple monitoring sites. Concretely, we generated two variants of the GCNInformer by separately removing the MLP layers (GCNInformer-MLP) and the GCN layers (GCNInformer-GCN) from the GCNInformer model. The performances of GCNInformer and the two variants in predicting the PM_2.5 air pollutant concentration over different horizons are shown in Table 2. We can see that GCNInformer outperformed the other two variants in predicting the PM_2.5 air pollutant concentration over different horizons. The results demonstrate that using the MLP layers to extract information from the raw monitored data and using the GCN layers to capture the spatial dependence of the air quality data are effective designs to improve short-term and long-term predictions.

3.6. Performance on Predicting Other Air Pollutant Concentrations

To evaluate the performance of the GCNInformer on predicting other air pollutant concentrations, such as PM₁₀, SO₂, NO₂, CO, and O₃, we trained the GCNInformer models for PM₁₀, SO₂, NO₂, CO, and O₃, respectively, with the same model architecture and hyperparameters used for PM_2.5. The 1 h ahead prediction values and ground-truth values of 1438 test samples at the Tiantan monitoring site are illustrated in Figure 4. We can see that the GCNInformer can handle different complex and nonstationary air pollutant data, which demonstrates the transferability and generalization of the GCNInformer to predict other air pollutant data.

4. Conclusions

In this study, we proposed a spatio-temporal graph convolutional network (named GCNInformer), which showed outstanding performance in air quality prediction. To leverage the spatial and temporal correlations of the air quality data, the GCNInformer uses GCN layers to incorporate information about the underlying spatial correlations among different monitoring sites and Informer layers to acquire both short-term and long-term temporal information in the air quality data. Furthermore, the GCNInformer uses MLP layers to learn low-dimensional representations from meteorological data and air quality data. Taken together, these components give the GCNInformer the ability to capture the complex and nonstationary relationships between air pollutants and their surrounding environment, allowing for more accurate predictions. The experimental results demonstrate the outstanding performance of the GCNInformer in predicting both short-term and long-term air quality data compared with other methods. Thus, the use of the GCNInformer can provide better information for preemptive air pollutant prevention and management, which can greatly improve public health by alerting individuals and communities about potential air quality hazards. Meanwhile, the GCNInformer is easily generalized to predict other similar spatio-temporal data, such as traffic data and wind speed data.

Author Contributions

Conceptualization, P.L. and T.Z.; methodology, P.L. and T.Z.; investigation, P.L.; data curation, P.L. and Y.J.; writing—original draft preparation, P.L. and T.Z.; writing—review and editing, P.L., T.Z. and Y.J.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the Project of Pingdingshan University Youth Fund under Grant Number PXY-QNJJ-2019004.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Houweling, S.; Rockmann, T.; Aben, I.; Keppler, F.; Krol, M.; Meirink, J.F.; Dlugokencky, E.J.; Frankenberg, C. Atmospheric constraints on global emissions of methane from plants. Geophys. Res. Lett. 2006, 33, L15821. [Google Scholar] [CrossRef]
Brauer, M.; Freedman, G.; Frostad, J.; van Donkelaar, A.; Martin, R.V.; Dentener, F.; van Dingenen, R.; Estep, K.; Amini, H.; Apte, J.S.; et al. Ambient Air Pollution Exposure Estimation for the Global Burden of Disease 2013. Environ. Sci. Technol. 2016, 50, 79–88. [Google Scholar] [CrossRef]
Li, T.T.; Yan, M.L.; Ma, W.J.; Ban, J.; Liu, T.; Lin, H.L.; Liu, Z.R. Short-term effects of multiple ozone metrics on daily mortality in a megacity of China. Environ. Sci. Pollut. Res. 2015, 22, 8738–8746. [Google Scholar] [CrossRef] [PubMed]
Devlin, R.B.; Duncan, K.E.; Jardim, M.; Schmitt, M.T.; Rappold, A.G.; Diaz-Sanchez, D. Controlled Exposure of Healthy Young Volunteers to Ozone Causes Cardiovascular Effects. Circulation 2012, 126, 104–111. [Google Scholar] [CrossRef] [PubMed]
Zanobetti, A.; Franklin, M.; Koutrakis, P.; Schwartz, J. Fine particulate air pollution and its components in association with cause-specific emergency admissions. Environ. Health 2009, 8, 58. [Google Scholar] [CrossRef]
Chen, C.; Zhao, B.; Weschler, C.J. Assessing the Influence of Indoor Exposure to “Outdoor Ozone” on the Relationship between Ozone and Short-term Mortality in US Communities. Environ. Health Perspect. 2012, 120, 235–240. [Google Scholar] [CrossRef]
Kulkarni, G.E.; Muley, A.A.; Deshmukh, N.K.; Bhalchandra, P.U. Autoregressive integrated moving average time series model for forecasting air pollution in Nanded city, Maharashtra, India. Model. Earth Syst. Environ. 2018, 4, 1435–1444. [Google Scholar] [CrossRef]
Barthwal, A.; Acharya, D. An IoT based Sensing System for Modeling and Forecasting Urban Air Quality. Wirel. Pers. Commun. 2021, 116, 3503–3526. [Google Scholar] [CrossRef]
Liu, T.; Lau, A.K.H.; Sandbrink, K.; Fung, J.C.H. Time Series Forecasting of Air Quality Based On Regional Numerical Modeling in Hong Kong. J. Geophys. Res.-Atmos. 2018, 123, 4175–4196. [Google Scholar] [CrossRef]
Zhao, J.; Lin, S.; Liu, X.; Chen, J.; Zhang, Y.; Mei, Q. ST-CCN-PM_2.5: Fine-grained PM_2.5 concentration prediction via spatial-temporal causal convolution network. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on Advances in Resilient and Intelligent Cities, Beijing, China, 1–10 November 2021; pp. 48–55. [Google Scholar]
Osowski, S.; Garanty, K. Forecasting of the daily meteorological pollution using wavelets and support vector machine. Eng. Appl. Artif. Intell. 2007, 20, 745–755. [Google Scholar] [CrossRef]
Feng, X.; Li, Q.; Zhu, Y.J.; Hou, J.X.; Jin, L.Y.; Wang, J.J. Artificial neural networks forecasting of PM_2.5 pollution using air mass trajectory based geographic model and wavelet transformation. Atmos. Environ. 2015, 107, 118–128. [Google Scholar] [CrossRef]
Nouri, A.; Lak, M.G.; Valizadeh, M. Prediction of PM_2.5 Concentrations Using Principal Component Analysis and Artificial Neural Network Techniques: A Case Study: Urmia, Iran. Environ. Eng. Sci. 2021, 38, 89–98. [Google Scholar] [CrossRef]
Bi, J.Z.; Stowell, J.; Seto, E.Y.W.; English, P.B.; Al-Hamdan, M.Z.; Kinney, P.L.; Freedman, F.R.; Liu, Y. Contribution of low-cost sensor measurements to the prediction of PM_2.5 levels: A case study in Imperial County, California, USA. Environ. Res. 2020, 180, 108810. [Google Scholar] [CrossRef] [PubMed]
Motesaddi, S.; Nowrouz, P.; Alizadeh, B.; Khalili, F.; Nemati, R. Sulfur dioxide AQI modeling by artificial neural network in Tehran between 2007 and 2013. Environ. Health Eng. Manag. Manag. J. 2015, 2, 173–178. [Google Scholar]
Bera, B.; Bhattacharjee, S.; Sengupta, N.; Saha, S. PM_2.5 concentration prediction during COVID-19 lockdown over Kolkata metropolitan city, India using MLR and ANN models. Environ. Chall. 2021, 4, 100155. [Google Scholar] [CrossRef]
Zhou, Y.; Chang, F.-J.; Chen, H.; Li, H. Exploring Copula-based Bayesian Model Averaging with multiple ANNs for PM_2.5 ensemble forecasts. J. Clean. Prod. 2020, 263, 121528. [Google Scholar] [CrossRef]
Yi, X.; Zhang, J.; Wang, Z.; Li, T.; Zheng, Y. Deep Distributed Fusion Network for Air Quality Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 965–973. [Google Scholar]
Prakash, A.; Kumar, U.; Kumar, K.; Jain, V.K. A Wavelet-based Neural Network Model to Predict Ambient Air Pollutants’ Concentration. Environ. Model. Assess. 2011, 16, 503–517. [Google Scholar] [CrossRef]
Tong, W.; Li, L.; Zhou, X.; Hamilton, A.; Zhang, K. Deep learning PM_2.5 concentrations with bidirectional LSTM RNN. Air Qual. Atmos. Health 2019, 12, 411–423. [Google Scholar] [CrossRef]
Zhang, Y.; Lv, Q.; Gao, D.; Shen, S.; Dick, R.; Hannigan, M.; Liu, Q. Multi-group encoder-decoder networks to fuse heterogeneous data for next-day air quality prediction. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 4341–4347. [Google Scholar]
Chang, F.; Ge, L.; Li, S.Y.; Wu, K.Y.; Wang, Y.Q. Self-adaptive spatial-temporal network based on heterogeneous data for air quality prediction. Connect. Sci. 2021, 33, 427–446. [Google Scholar] [CrossRef]
Qin, D.; Yu, J.; Zou, G.; Yong, R.; Zhao, Q.; Zhang, B. A Novel Combined Prediction Scheme Based on CNN and LSTM for Urban PM_2.5 Concentration. IEEE Access 2019, 7, 20050–20059. [Google Scholar] [CrossRef]
Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM_2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Peng, L.; Yao, X.J.; Cui, S.L.; Hu, Y.; You, C.Z.; Chi, T.H. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 10, 75729–75741. [Google Scholar] [CrossRef]
Wang, S.; Li, Y.; Zhang, J.; Meng, Q.; Meng, L.; Gao, F. PM_2.5-GNN: A Domain Knowledge Enhanced Graph Neural Network For PM_2.5 Forecasting. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems, Seattle, WA, USA, 3–6 November 2020; pp. 163–166. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Erven, T.V.; Harremos, P. Rényi Divergence and Kullback-Leibler Divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
Dai, H.; Huang, G.; Wang, J.; Zeng, H.; Zhou, F. Prediction of Air Pollutant Concentration Based on One-Dimensional Multi-Scale CNN-LSTM Considering Spatial-Temporal Characteristics: A Case Study of Xi’an, China. Atmosphere 2021, 12, 1626. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016. [Google Scholar]
Box, G.E.P.; Pierce, D.A. Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. J. Am. Stat. Assoc. 1970, 65, 1509–1526. [Google Scholar] [CrossRef]
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. In Proceedings of the 9th International Conference on Neural Information Processing Systems, Denver, CO, USA, 3–5 December 1996. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]

Figure 1. Overview of the GCNInformer.

Figure 2. The locations of 12 nationally controlled air quality monitoring sites in Beijing.

Figure 3. The RMSE errors of the GCNInformer and the other methods for the 1 h, 8 h, 16 h, and 24 h predicting tasks.

Figure 4. The 1 h ahead prediction and ground-truth of six types of air pollutant concentrations at Tiantan monitoring site.

Table 1. The prediction results of the GCNInformer and other baselines for predicting the PM_2.5 air pollutant concentration.

	1 h		8 h		16 h		24 h
	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
ARIMA	0.5474	0.2565	1.6082	0.8296	2.3783	1.2754	3.0824	1.7150
SVR	0.4435	0.2218	1.3402	0.7035	1.9014	1.1325	2.5438	1.4316
DNN	0.4227	0.2114	1.2173	0.6866	1.5677	1.0449	1.8085	1.3262
LSTM	0.4082	0.2146	1.1821	0.6467	1.3846	0.9155	1.6184	1.1281
Transformer	0.3888	0.2013	0.9045	0.4977	1.0951	0.6503	1.1972	0.7796
Informer	0.3809	0.2049	0.9006	0.4703	1.0870	0.6722	1.1559	0.7721
GCNInformer	0.3367	0.1784	0.8196	0.4664	1.0193	0.6236	1.1101	0.7587

Table 2. The prediction results of the GCNInformer and its variants for predicting the PM_2.5 air pollutant concentration.

	1 h		8 h		16 h		24 h
	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE
GCNInformer-MLP	0.3651	0.1972	0.8230	0.4811	1.0697	0.6686	1.2379	0.7648
GCNInformer-GCN	0.3626	0.1917	0.8590	0.4726	1.0909	0.6659	1.2244	0.7596
GCNInformer	0.3367	0.1784	0.8196	0.4664	1.0193	0.6236	1.1101	0.7587

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, P.; Zhang, T.; Jin, Y. A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction. Sustainability 2023, 15, 7624. https://doi.org/10.3390/su15097624

AMA Style

Li P, Zhang T, Jin Y. A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction. Sustainability. 2023; 15(9):7624. https://doi.org/10.3390/su15097624

Chicago/Turabian Style

Li, Pengfei, Tong Zhang, and Yantao Jin. 2023. "A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction" Sustainability 15, no. 9: 7624. https://doi.org/10.3390/su15097624

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Spatio-Temporal Graph Convolutional Network for Air Quality Prediction

Abstract

1. Introduction

2. Methodology

2.1. Problem Formulation

2.2. Overview of GCNInformer

2.3. MLP Layers

2.4. GCN Layers

2.5. Informer Layers

3. Experiments

3.1. Dataset and Preprocessing

3.2. Evaluation Metrics

3.3. Experimental Settings

3.4. Performance Comparison

3.5. Evaluation of Model Components

3.6. Performance on Predicting Other Air Pollutant Concentrations

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI