Next Article in Journal
Efficiency of Transport Infrastructure in Asian Russia, China, Mongolia, and Kazakhstan in the Context of Creating New Trans-Eurasian Transport Corridors
Previous Article in Journal
The Role of Management in Sustainable Tourism: A Bibliometric Analysis Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Combination Model for Air Pollutant Concentration Prediction: A Case Study of Xi’an, China

School of Management, Xi’an University of Architecture and Technology, Xi’an 710055, China
*
Authors to whom correspondence should be addressed.
Sustainability 2023, 15(12), 9713; https://doi.org/10.3390/su15129713
Submission received: 17 May 2023 / Revised: 8 June 2023 / Accepted: 14 June 2023 / Published: 17 June 2023

Abstract

:
As energy demand continues to increase, the environmental pollution problem is becoming more severe. Governments and researchers have made great efforts to avoid and reduce air pollution. The prediction of PM2.5, as an important index affecting air quality, has great significance. However, PM2.5 concentration has a complex change process that makes its prediction challenging. By calculating both PM2.5 concentration and that of other pollutants in the atmosphere and meteorological factors, it is evident that the variation in PM2.5 concentration is influenced by multiple factors, and that relevant features also influence each other. To reduce the calculated loss, with full consideration given to the influencing factors, we used the maximum correlation and minimum redundancy (MRMR) algorithm to calculate the correlation and redundancy between features. In addition, it is known from the Brock–Dechert–Scheinman (BDS) statistical results that the change in PM2.5 is nonlinear. Due to the outstanding performance of bidirectional long short-term memory (BiLSTM) neural networks in nonlinear prediction, we constructed an encoder–decoder model based on BiLSTM, named ED-BiLSTM, to predict the PM2.5 concentration at monitoring stations. For areas without monitoring sites, due to the lack of historical data, the application of neural networks is limited. To obtain the pollutant concentration distribution in the study area, we divided the study area into a 1 km × 1 km grid and combined the ED-BiLSTM model via the use of the inverse distance weighting (IDW) algorithm to obtain the PM2.5 concentration values in a region without monitoring stations. Finally, ArcGIS was used to visualize the results. The data for the case study were obtained from Xi’an. The results show that, compared with the standard long short-term memory (LSTM) model, the RMSE, MAE, and MAPE of our proposed model were reduced by 24.06%, 24.93%, and 22.9%, respectively. The proposed model has a low error for PM2.5 prediction and can provide a theoretical basis for the formulation of environmental protection policies.

1. Introduction

Environment pollution is becoming increasingly severe as industrialization and urbanization accelerate, attracting the attention of governments and experts worldwide [1]. PM2.5, an essential factor causing haze [2] and reducing atmospheric visibility [3], contributes much more to heavily polluted weather than other pollutants [4]. Many researchers have studied the relationship between PM2.5 and health. One study shows that breathing air with excessive pollution levels for a long time can worsen cardiovascular and respiratory diseases [5]. Ashtari et al. [6] demonstrated a positive relationship between air quality and multiple sclerosis severity through an eight-year study in the Isfahan region of Iran. There are an estimated 1.65 million to 2.19 million early deaths every year associated with PM2.5 [7]. In order to prevent air pollution and protect the environment, air pollution prediction is crucial [8]. An accurate PM2.5 concentration forecast model can help the public to make sensible travel arrangements, lessen the risks to people’s health, and guide governments and the public to make the right choices.
Obtaining accurate PM2.5 concentration prediction values through the existing large quantity of meteorological and pollutant historical monitoring data is a crucial issue in air pollution research. The concentration of pollutants at a given point in time can affect future pollutant concentrations. The effects may last hours, days, or longer. At present, the time granularity in pollutant prediction is mainly assessed in hourly [9] and daily [10] terms, and some studies have examined pollutant emissions on a yearly scale [11,12]. The current pollutant prediction models are classified into mechanism models, statistical models and deep learning models. Mechanism models simulate the chemical and physical evolution of pollutant transport through atmospheric kinetic theory. The typical methods are community multiscale air quality (CMAQ) [13], the nested air quality prediction and modeling system (NAQPMS) [14], and the weather research and forecasting (WRF) model [15]. The advantage of these methods is that the modeling does not require historical data. Still, the complexity of model calculations and parameter settings requires a strong background in atmospheric science, limiting the practical application of the method. With the development of sensing technology, data acquisition has become easy. Statistical models are used for pollutant prediction. Statistical models are divided into traditional statistical models and machine learning models. For pollutant prediction, the traditional statistical models known as autoregressive integrated moving average (ARIMA) [16] and autoregressive moving average with extra input (ARMAX) [17] are frequently utilized. However, they are based on linear presumptions and have straightforward model structures that cannot be used to solve nonlinear issues. Machine learning techniques are also used to address nonlinear issues in pollutant prediction. Machine learning-based prediction methods include back propagation (BP) [18], extreme learning machine (ELM) [19], and support vector regression (SVR) [20]. Even though machine learning algorithms have produced positive results for pollutant prediction, PM2.5 concentration sequences exhibit complex fluctuations over time and are influenced by a variety of factors. The prediction accuracy of statistical models is constrained because they cannot satisfy the requirements of multivariate nonlinear data prediction.
With the development of computer technology, deep learning has demonstrated outstanding nonlinear data processing capabilities. For example, computer vision [21,22], image classification [23], and natural language processing [24] have demonstrated the effectiveness of deep learning models. Because recurrent neural networks (RNNs) are good at dealing with the difficulties encountered when dealing with sequential data, they have been used in numerous models to predict pollution concentration [25]. Nevertheless, RNNs suffer from gradient disappearance and gradient explosion when predicting long time series. Long short-term memory (LSTM) [26], as a variant of an RNN, can solve this problem [27]. To demonstrate the viability of using LSTM models in short-term prediction, LSTM-based models have been compared with statistical models [28,29,30]. Both Wu et al. [31] and Wang et al. [32] predicted air pollution using LSTM; the difference was that Wang et al. [32] took into account the role of influencing factors through the chi-square test. As the LSTM can only encode data in one direction, Graves and Schmidhuber [33] proposed the bidirectional long short-term memory (BiLSTM) model, which contained forward LSTM unit and a backward LSTM unit. Experiments have shown that the BiLSTM is more accurate and stable in pollutant prediction than the single LSTM model, particularly for maximums and minimums. A single neural network cannot handle complex prediction tasks since the changes in pollutant concentration are complex and have spatiotemporal characteristics. So, more and more combination models are being developed. Bai et al. [34] considered the seasonal characteristics of PM2.5 concentration variation and then obtained PM2.5 predicted values using a stacked auto-encoder model. Pak et al. [35] extracted the spatial features of multidimensional time series using a convolutional neural network and then used LSTM to make predictions. Chang et al. [36] achieved PM2.5 prediction by aggregating multiple LSTM units and considered the correlation of influencing factors in the model. A combination of auto-encoder and BiLSTM was proposed as a new model [37]. Furthermore, the encoder–decoder model based on LSTM was also applied to time series prediction [38,39]. Because encoders and decoders based on LSTM can fully extract time series features, they are widely used in pollutant concentration prediction [40,41]. These models use an LSTM unit as an encoder to extract the features of the sequence data and then use another LSTM unit as a decoder to decode the encoded vector to obtain the predicted values. The existing studies show that BiLSTM performs better than LSTM in time series prediction and so we construct an encoder–decoder prediction model using BiLSTM.
In addition to prediction models, the presence of excessive features in PM2.5 short-term prediction can cause information redundancy and lead to error accumulation [42]. Multiple-series data can lead to the overfitting of models [43]. Based on the above issues, we select features from the many pollutants and a large quantity of meteorological historical monitoring data. Feature selection improves prediction accuracy by selecting strongly correlated variables from multidimensional data. The methods commonly used for feature selection include mutual information [44,45], Pearson correlation coefficient [46,47], and Kendall correlation coefficient [48]. As part of their study, Bai et al. [34] calculated the Kendall correlation coefficient for PM2.5 and meteorological variables. Zhang et al. [37] solved the same problem using the Pearson correlation coefficient, but the Pearson correlation coefficient is not a good measure of nonlinear problems [49]. To measure correlations between PM2.5 and other pollutants, Wang et al. [2] used interval gray incidence analysis. However, the above study did not consider the redundancy between features. This study shows that adjacent monitoring sites can provide important information for the prediction model. To overcome the limitation, we use the maximum correlation minimum redundancy (MRMR) algorithm based on mutual information [50] to decide the model’s input variables. Mutual information is also used to calculate the correlation of PM2.5 between adjacent stations.
Although the existing methods can adequately extract time series features, they only consider the correlation between multidimensional feature variables and ignore the effect of feature redundancy on model accuracy. In addition, few studies have been conducted to predict pollutant concentrations without monitoring stations. In this paper, a new combination model for PM2.5 concentration prediction is proposed based on the above analysis. This study contributes to the following areas:
(1)
Based on the nonlinear characteristics of the PM2.5 concentration series, this paper adopts the MRMR algorithm based on mutual information for feature selection, removing the effect of redundant features with full consideration of the influencing factors.
(2)
To fully extract of spatial and temporal features, we construct a dual encoder–decoder prediction model based on BiLSTM for predicting PM2.5 concentration named ED-BiLSTM. The proposed model accurately predicts PM2.5 concentration.
(3)
To obtain the PM2.5 concentration distribution in Xi’an, China, we combine the ED-BiLSTM model with inverse distance weight (IDW) spatial interpolation to get the predicted PM2.5 concentration values without monitoring stations. Finally, the overall distribution is obtained by ArcGIS visualization.

2. Study Area and Available Data

2.1. Study Data

The study area is located in Xi’an, Shaanxi Province, China. Xi’an is in the middle of the Guanzhong Basin, with the Loess Plateau in the north and the Qinling Mountains in the south. The area is long and narrow from east to west, with a low altitude in the east and high altitude in the west. Only the cold air from the northwest is beneficial to the diffusion of pollutants, while the contaminants entering the basin from the east are not easy to diffuse and remove. However, the northeast direction is the distribution area of heavy industries in China and is prone to the retention and accumulation of pollutants due to the influence of climate and topography. Therefore, the pollution problem is an urgent issue to be solved in Xi’an. PM2.5 prediction can provide a theoretical basis for pollution control. We have collected pollutant data from Xi’an environmental monitoring station. The information on the monitoring station is shown in Table 1.
The study data include pollutant and meteorological data for each hour from 1 January 2017 to 31 December 2019. With 26,280 historical records, the data set contains 1180 missing values. The pollutant features are PM2.5, PM10, SO2, NO2, O3, and CO, and the meteorological features are wind direction (WD), wind speed (WS), temperature (Tem), dewpoint temperature (TDew), pressure (Pre), and relative humidity (RH), totaling 12 features in total. A detailed description of the feature variables can be found in Table 2.

2.2. Data Processing

2.2.1. Missing Value Filling

Pollutant and meteorological data are time series data collected continuously by sensors. Some missing values inevitably exist due to instrument failure and other reasons. In order to ensure data integrity, missing values need to be filled. We use a linear function to fill the missing value in Equation (1). The missing values at the time x t is replaced by F x t .
F ( x t ) = a x t + b
where a and b represent the slope and intercept obtained by fitting the sample points of the characteristic variable x. The fitting value at the time x t is expressed as F x t .

2.2.2. Data Normalization

The multidimensional feature variables collected in this paper have different units and value ranges. We utilize the maximum and minimum normalization to eliminate the dimensional difference between data to reduce the computational complexity and increase convergence speed. Equation (2) shows the calculation formula.
x i = x x m i n x m a x x m i n
x max and x min , respectively, represent the maximum and minimum values of the original sample x , and x i represents the normalized sample value.

3. Methods

We propose a new three-stage combination model for use in short-term PM2.5 prediction. In the first part of the research, we select the temporal and spatial features related to PM2.5 concentration through the MRMR algorithm. For the second part, we construct a dual encoder–decoder model based on BiLSTM to train and learn the spatiotemporal features and perform prediction. Finally, the study area is divided into 1 km × 1 km grids, and the IDW method is used to interpolate the grids to obtain PM2.5 concentration values of all grids. Figure 1 presents a flow chart for the proposed model. This section describes the methods involved in the model in detail.

3.1. MRMR Feature Selection Algorithm

PM2.5 concentration is affected by other atmospheric pollutants, meteorological conditions [51], and PM2.5 concentration at adjacent sites [52]. However, using multiple-feature variables may result in information redundancy and reduce the prediction model’s accuracy [42]. To eliminate information redundancy between features, we use the MRMR feature selection algorithm based on mutual information to remove irrelevant and redundant features, reduce computational consumption, and improve prediction accuracy.
The MRMR algorithm considers the correlation and redundancy between features simultaneously. The correlation between the random variables x and y can be calculated using mutual information (MI) from Equation (3).
I x , y = P x , y l g P x P y P x , y d x d y
where P x and P y are the probability distribution functions of x and y , P x , y is the joint probability distribution function of x and y, and I(x, y) is the mutual information of x and y .
Suppose that S denotes the set of feature variables x i and S = m , and that m represents the number of features. The correlation between the characteristic variables and the target variable is calculated by Equation (4). The redundancy between the characteristic variables is calculated by Equation (5).
C S , y = m a x 1 S x i S I x i ; y
R S = m i n 1 S 2 x i , x j ϵ S I x i ; x j
where x i is the value of the ith feature variable, y is the value of the target variable, I x i ; x j is the MI between the feature variables, I x i ; y is the MI between the target variable and the feature variables, C is the correlation between the feature variable and the target variable, and R is the redundancy between the features.
The objective function for selecting the initial feature subset is shown in Equations (6) and (7). After determining the initial feature subset, an incremental search is performed using Equation (8). The steps for conducting an incremental search are as follows:
Step 1: The initial feature set determined by the MRMR feature selection algorithm is represented as S k 1 , which contains k 1 features. The remaining features are expressed as S S k 1 . The feature variables contained in S S k 1 are denoted as x 1 , x 2 , x j .
Step 2: Add x 1 to S k 1 to form feature set S . Input S k 1 and S into the prediction model to calculate the evaluation index of the model, respectively. If the prediction error corresponding to input S k 1 is small, the original data set is kept. If the error corresponding to input S is small, the S replaces S k 1 as the new data set.
Step 3: Add the remaining features in S S k 1 to the feature set with the minimum error in turn and perform Step 2 until the final feature set with the minimum error is obtained.
m a x C , R , C , R = C R
m a x C , R , C , R = C / R
max x j ϵ S S k 1 I x j ; y 1 k 1 x i ϵ S k 1 I x j ; x i
The final feature set fully preserves the correlation between the multidimensional characteristic variables and the prediction target, and achieves a data dimensionality reduction of the prediction model, which reduces the calculation loss.

3.2. ED-BiLSTM Prediction Model

The ED-BiLSTM model comprises a dual encoder and a decoder based on BiLSTM. An encoder encodes a temporal feature, and an encoder encodes a spatial feature. Then, the encoded vectors are aggregated and inputted to the decoder to obtain the ED-BiLSTM model. Finally, the PM2.5 concentration prediction is realized via the test set.
In the deep neural network prediction model, the time series modeling and time step calculation for the characteristic variables are the basis of the realization of the prediction model. Therefore, before proposing the ED-BiLSTM prediction model, we first introduce the time series modeling and time step calculation methods.

3.2.1. Time Series Sample Modeling

To convert the data into a form that the computer can understand, we model the time series through a rolling window. An example of time series sample modeling by a rolling window is shown in Figure 2. Suppose that the period contains 10 records X T 1 , X T 2 , …, X T 10 , where X = x 1 , x 2 , , x n , and n is the number of feature dimensions. When ∆t = 6, for Sample 1, X T 1 , X T 2 , X T 3 , X T 4 , X T 5 , and X T 6 are the features and X T 7 is the label. For Sample 2, the features are X T 2 , X T 3 , X T 4 , X T 5 , X T 6 and X T 7 ; X T 8 is the label, and all samples are modeled in the same way. In the prediction model, a small Δ t will make the input information incomplete, and a larger one will increase the noise and the computational consumption [27]. We determined the ∆t by calculating the autocorrelation coefficient (ACF) and the partial autocorrelation coefficient (PACF), as implemented in Section 3.2.2.

3.2.2. Time Step Calculation

The number of stamps in each sample is determined by the size of time step Δ t , which is called the lag effect. A small value of Δ t does not ensure sufficient information input, while a large value will introduce irrelevant information and cause computational loss. Therefore, determining an appropriate Δ t is crucial to designing the prediction model. We determined the Δ t by analyzing ACF and PACF.
As a measure of correlation, the ACF describes the degree of correlation between values over time. For the time series Y t , t T , the ACF between the values of Y t Δ t , Y t and Y t can be measured by Equation (9). A larger ACF value indicates a stronger correlation between Y t Δ t , Y t and Y t .
ρ Δ t = t = Δ t + 1 T Y t Y ¯ Y t Δ t Y ¯ t = 1 T Y t Y ¯ 2
where Y t is the label, Y t Δ t is the sample with a Δ t time lag, and Y ¯ is the average value of the sample.
Compared to ACF, PACF is more focused on the correlation between Y t Δ t and Y t . The PACF of Y t Δ t and Y t is the correlation coefficient between Y t Δ t and Y t after removing the indirect effects of Y t Δ t + 1 , Y t Δ t + 2 , Y t 1 . The formula of PACF is shown in Equations (10)–(12). When the value of PACF is closer to 0, it indicates that the correlation between Y t Δ t and Y t is weaker.
ρ Y t , Y t Δ t Y t 1 , , Y t Δ t + 1 = E Y t E ^ Y t Y t Δ t E ^ Y t Δ t E Y t Δ t E ^ Y t Δ t 2
E ^ Y t = E Y t Y t 1 , , Y t Δ t + 1
E ^ Y t Δ t = E Y t Δ t Y t 1 , , Y t Δ t + 1

3.2.3. ED-BiLSTM Model

The BiLSTM [33] neural network structure contains a forward LSTM cell and a backward LSTM cell. The BiLSTM can fully mine time series information and has shown outstanding performance capacity in time series prediction tasks [27,29,53]. Figure 3 shows a comparison between LSTM and BiLSTM. We can see that BiLSTM performs not only forward, but also backward, extraction of features compared with LSTM. To extract bi-directional features from sequence data, we use the BiLSTM as the main component to construct the dual encoder and decoder model. To provide a deeper understanding of BiLSTM, we give the detailed calculation process of LSTM.
Figure 4 shows that each LSTM neural unit consists of an input gate, an output gate, and a forget gate. Data are received through the input gate, historical information is preserved through the forgetting gate, and information is output through the output gate. Equations (13)–(18) give the calculation formula.
f t = σ U f h t 1 + W f x t + b f
i t = σ U i h t 1 + W i x t + b i
c ˜ t = tan h U c h t 1 + W c x t + b c
c t = f t * c t 1 + i t * c ˜ t
o t = σ U O h t 1 + W O x t + b o
h t = o t * t a n h c t
where h t 1 is the output of the hidden at t 1 , c t 1 is the output of the memory cell at t 1 , and x t is the current input vector. U f , U i , U c , and U O are the parameter matrices controlling the hidden state. W f , W i , W c , and W O are the parameter matrices controlling the input information, b f , b i , b c , and b o are the bias vectors, σ is the sigmoid function, tanh is the function tanh, and * is the Hadamard operation. f t is the forget gating used to control the degree of forgetting of the previous state c t 1 , c ˜ t is the memory of the current input information, i t is memory gating, and c t and i t perform Hadamard operations to determine the storage of new information. The forgetting of past states and the updating of current input information together determine the c t state vector. The output state h t is obtained by the operation of o t and c t .
The output of the BiLSTM neural network unit is a combination of forward LSTM hidden layer output h t and backward LSTM hidden layer output h t . The calculation process is shown in Equation (19).
h t = α h t + β h t
where α and β are the output weights of the forward and backward LSTM hidden layers, respectively.
For multivariate time series, one hidden layer cannot express all the information. Adding hidden layers may give more accurate results. The general approach is to stack multiple hidden layers [54] and use the output of the previous hidden layer as the input of the next hidden layer. Since both temporal and spatial features are considered in our study, simply adding hidden layers cannot solve the complex data feature problem. Therefore, we propose the use of a dual encoder and decoder model (ED-BiLSTM) based on BiLSTM. Figure 5 shows the model structure, which consists of two parts: the encoder and the decoder. One of the encoders encodes the temporal features obtained from MRMR, and the other encoder extracts spatial features from adjacent sites. Then, the encoded vectors from both encoders are aggregated and fed into the decoder for prediction. The ED-BiLSTM prediction model will be more beneficial for calculating the grid space interpolation without monitoring stations.

3.3. Inverse Distance Weight (IDW) Interpolation Method

With increasing popular concern over environmental pollution, predictions about air pollutants have made significant progress in recent years. However, for deep learning to be effective, large amounts of historical data are required for prediction, which leads most of the research about pollutant prediction to target monitoring stations [30,42]. There have also been a few studies into new sites with limited historical data [27]. However, for areas without monitoring sites, the absence of historical data has led to a lack of research in this area. In this study, the ED-BiLSTM prediction model was combined with the IDW algorithm to interpolate the 1 km × 1 km spatial grid and obtain each grid’s predicted values. The method is divided into the following steps. Firstly, the PM2.5 concentration of all monitoring stations in the study area is predicted using the ED-BiLSTM model at the next moment. Then, the place is divided into grids of 1 km × 1 km; then, the coordinates of each grid center point are determined based on latitude and longitude. Finally, the IDW spatial interpolation algorithm obtains the predicted values for all grids.
IDW is a spatial interpolation algorithm based on points. For the interpolated grid with the center point coordinates ( x , y ), the predicted PM2.5 value at this grid is calculated by Equation (20), where n is the number of monitoring stations, z i is the predicted value at the monitoring station i and d i indicate the distance between the interpolation grid ( x , y ), and the monitoring point x i , y i is calculated by Equation (21). k is the power exponent of distance. When defining higher values of k, the more adjacent stations have a more significant effect on the interpolation. By comparison, when defining smaller values of k, the more remote stations have more significant effects on interpolation. The core of the IDW algorithm is the selection of k. It has been shown that the search radius does not affect the results when k is 3 or more [55]. Based on the analysis of previous studies, we set k as 3.
z = i = 1 n z i d i k / i = 1 n 1 d i k
d i = x x i 2 + y y i 2

3.4. Evaluation Index

To quantify the prediction model performance, we evaluated the ED-BiLSTM model and compared models using the root mean square error (RMSE), the mean absolute error (MAE), the mean absolute percentage error (MAPE), and R squared ( R 2 ). Deviation between actual and predicted values is measured by RMSE and MAE, and MAPE can determine the percentage deviation. The smaller the RMSE, MAE, and MAPE values are, the more accurate the prediction will be. R 2 reflects the ability to fit data, and the value ranges from 0 to 1. The higher the R 2 is, the better the model fits the data will be. Extreme values do not affect MAE and MAPE, while RMSE uses the square of the error. This amplifies the prediction error and is more sensitive to anomalies. Therefore, we combine the evaluation metrics to evaluate the ED-BiLSTM model. The formulas are shown in Equations (22)– (25).
R M S E = 1 N i = 1 N y i y ^ i 2
M A E = 1 N i = 1 N y i y ^ i
M A P E = 1 N i = 1 N y i y ^ i y i
R 2 = 1 i = 1 N y i y ^ i 2 i = 1 N y i y ¯ i 2
where y i is the observed value of PM2.5 concentration, y ^ i is the predicted value, y ¯ i is the mean of the monitored value, and N denotes the number of observed samples.

4. Case Study

4.1. Feature Selection

PM2.5 concentration is affected by other atmospheric pollutants, meteorological conditions, and the concentration of PM2.5 at adjacent sites. However, in short-term predictions, the presence of multiple feature variables may result in information redundancy and reduce the prediction model’s accuracy. To eliminate information redundancy caused by excessive features, we calculated the MI of PM2.5 using other pollutants, meteorological factors, and the MI on PM2.5 from adjacent sites.
To choose the appropriate correlation analysis method, we first checked the linearity of the PM2.5 series using Brock–Dechert–Scheinman (BDS) statistics [56]. The BDS test result is presented in Table 3 for when the dimension is set to 6. Here, the Z-statistic is higher than the critical value 1.96, and the p value is less than 0.05, indicating that the BDS statistic rejects the initial linearity hypothesis and proves that nonlinearity of PM2.5 time series. Therefore, we used mutual information to calculate the correlation between variables. Then, we used the MRMR algorithm to remove redundant features and determine the optimal subset of features with which to implement the dimensionality reduction of the input data.
Figure 6 displays the heat map of the mutual information values between PM2.5 concentrations and influence factors at the 1463A site. The heat map’s color is darker, suggesting a higher value for mutual information and a higher correlation between the variables. PM2.5 concentration at time t + 1 and other influence factors values at time t were used to calculate mutual information. Figure 6 illustrates that PM10, NO2, O3, RH, SO2, and PM2.5 are highly correlated, and that the mutual information value is greater than 1. When k = 6, we determined the feature subsets according to the mutual information values as PM2.5, PM10, NO2, SO2, O3, and RH. However, the subset of features calculated by the MRMR algorithm considering redundancy included PM2.5, PM10, NO2, SO2, CO, and RH. In Figure 6, we see that O3 strongly correlates with PM10, NO2, and RH, indicating that the MRMR feature selection algorithm fully accounts for the redundancy between features.
We conducted experiments by inserting a subset of features, determined by two feature selection algorithms based on mutual information and the MRMR algorithm, into the prediction model. Table 4 shows the values of evaluation metrics for different feature input prediction models. The MRMR algorithm obtained the optimal feature set. Then, we added the remaining features to the subset of features using an incremental search algorithm. We found by comparing the model evaluation metrics that the features identified by the MRMR algorithm at k = 6 could be used to achieve the best prediction results. Therefore, the input feature subset of the prediction model contains PM2.5, PM10, NO2, SO2, RH, and CO. In addition, we calculated the mutual information value for PM2.5 concentration between adjacent stations. Figure 7 shows a strong correlation between PM2.5 concentrations at adjacent sites. Comparing the model before and after adding spatial features, we found that RMSE decreased by 12.2% and MAE decreased by 27.3% after adding spatial features. Based on the feature selection algorithm, we used PM2.5, PM10, NO2, SO2, RH, CO, and PM2.5 from adjacent sites as inputs in the prediction model.

4.2. Parameter Setting of ED-BiLSTM Model

4.2.1. Time Step Calculation

In the time series forecasting task, the time step Δ t determines how many historical record values are input to the model, and a suitable value of Δ t can increase the model’s forecasting performance. In this paper, we decide the value of Δ t by calculating ACF and PACF.
Figure 8 shows the ACF calculation results of PM2.5 concentration at different sites in Xi’an. We can see that the ACF presents a decreasing trend as the time lag lengthens. When Δ t is less than 5, the ACF of most sites is greater than 0.9, and the data are highly correlated. When Δ t is greater than 30, the correlation is reduced.
Compared to ACF, PACF is more focused on the correlation between Y t Δ t and Y t . Figure 9 shows the PACF calculation results for PM2.5 concentrations at site 1463A. The PACF tends to 0 when Δ t = 4, and the ACF is 0.905 at this point, indicating a strong correlation between Y t , Y t 1 , Y t 2 , Y t 3 , and Y t 4 . However, when Δ t > 4, the PACF fluctuates around zero until Δ t > 20, when the fluctuation becomes relatively flat.
To further determine Δ t , we construct training samples with ∆t in the range of [1,24] in order to train and make predictions using ED-BiLSTM. The model prediction results under different Δ t settings are shown in Table 5. As seen in Table 5, when ∆t is set to 4, the RMSE is 6.603, the MAE is 4.066, and the prediction model has the most minor error. To see the effect of ∆t more intuitively on the prediction performance, we plot the line graphs of RMSE and MAE for different time lags in Figure 10. There is a similar trend between the RMSE and MAE, and a regular pattern can establish that the prediction error is more minor when ∆t is a multiple of 4. This is in agreement with the conclusion reached by performing PACF, and is also confirmed by the study of Huang et al. [9]. Through ACF and PACF calculations, we set ∆t to 4, which means that when predicting Y t 1 , Y t 2 , Y t 3 , and Y t 4 are input into the prediction model.

4.2.2. Experimental Settings

The experimental data in this paper, taken from the historical records of PM2.5, PM10, NO2, SO2, RH, CO, and adjacent sites’ PM2.5 concentrations from a previous 4 h period, are used to predict the PM2.5 concentration for the next hour. For time series modeling, we individually model the time series features and spatial features. In model construction, the hidden layer of the encoder is the BiLSTM unit, and a concatenate layer is added after the encoder to fuse the feature vectors of the two encoders. Finally, the decoder sets a hidden layer. The prediction model with the lowest error on the training set is obtained by continuously adjusting of the parameters. Table 6 shows the parameters set for the ED-BiLSTM model.
We model 26,280 historical records. Due to the time step being set to 4, we get 26,276 samples, including 21,020 training samples, 2628 validation samples, and 2628 test samples. In the model training, we input the samples into the model in batches; the batch size is 128. We set the epoch to 200 and the number of neurons to 128 in the BiLSTM. The optimizer is Adam, the loss function is a mean absolute error (MAE), and the learning rate is 1 × 10−3. The activation function is set to ReLU. At the same time, the dropout parameter is set to 0.5 to prevent overfitting. The number of encoder and decoder layers is set to 1. As soon as the model training process is complete, a prediction is performed on the basis of the test set, and an evaluation index is calculated for the model.

4.3. Prediction Results

To verify the predictive performance of the proposed model, we predicted the PM2.5 concentration at station 1463A for the next hour using the ED-BiLSTM and the comparative models. We constructed seven comparative models, including the machine learning model SVR, single LSTM, and single BiLSTM, as well as four combination models, namely, ED-LSTM, AE-BiLSTM, ED-BiLSTM-One, and ED-BiLSTM-IF. ED-LSTM means that the hidden layers of the encoder and decoder use LSTM neural network units. The combined model AE-BiLSTM denotes that the model uses an auto-encoder for the encoder and BiLSTM for the decoder. ED-BiLSTM-One indicates that an encoder with the hidden layer of BiLSTM encodes spatiotemporal features. ED-BiLSTM-IF indicates the selection of influencing factors, a practice which considers the correlation feature variables of the target site alone and disregards the spatial correlation of PM2.5 concentrations at adjacent sites. All models were constructed in the Keras framework. To ensure the validity of the experimental results, all models were kept with the same parameter settings. Additionally, a 10-time repetition of each deep learning model was conducted to reduce randomness in neural network training, and Table 7 shows the experimental results.
Table 7 displayed the quantitative results of RMSE, MAE, MAPE, and R2, comparing the ED-BiLSTM model with SVM, LSTM, BiLSTM, ED-LSTM, AE-BiLSTM, ED-BiLSTM-One, and ED-BiLSTM-IF. The hyperparameter settings for all deep learning models are identical to those of the ED-BiLSTM model. The results in Table 7 show that the RMSE, MAE, and MAPE of the ED-BiLSTM model are lower than those of the comparative model, and that the R2 is closer to 1. This indicates that the ED-BiLSTM model can better fit the PM2.5 time series than the other machine learning and deep learning models. The average RMSE of the ED-BiLSTM model is 6.603, and the MAE is 4.066. Compared with the machine learning model SVM, the average RMSE is 53.91% lower, and the MAE is 65.73% lower. Compared with the single-LSTM and BiLSTM models, the average RMSE is 24.06% and 17.63% lower, and the average MAE is 24.93% and 22.18% lower. It proved that the ED-BiLSTM model is more suitable for PM2.5 short-term prediction than the machine learning and single-neural network models. By comparing all models’ predicted and true value curves, Figure 11 shows that the ED-BiLSTM model has the best fit.
In addition, we construct four hybrid models and compare the prediction results with our model. Table 7 shows the experimental results. The AE-BiLSTM model is provided by the work of Zhang et al. [37]. We construct the ED-LSTM, ED-BiLSTM-One, and ED-BiLSTM-IF models. There is a significant difference between these models due to the time series feature modeling and the hidden layer setting. The difference between ED-LSTM and ED-BiLSTM model is that LSTM replaces the neural network unit of the hidden layer. Compared with ED-LSTM, ED-BiLSTM is 26.12% lower in RMSE and 24.04% lower in MAE on average, indicating that using BiLSTM as the hidden layer unit of encoder and decoder can better extract the dependencies forward and backward from the sequence data. The encoder of AE-BiLSTM adopts an auto-encoder. Compared with AE-BiLSTM, the average RMSE of our model is 37.12% lower, and the MAE is 52.25% lower, indicating that BiLSTM has a better interpretation of sequence data than the auto-encoder. The ED-BiLSTM-One model uses one encoder to extract spatiotemporal features. Compared with ED-BiLSTM-One, our model has an average reduction of 12.49% in RMSE and 14.27% in MAE, indicating that the use of independent encoders for spatial feature extraction of adjacent stations improves the model’s prediction accuracy. The ED-BiLSTM-IF model ignores the mutual influence between adjacent stations, with an average reduction of 12.21% in RMSE and 27.3% in MAE for ED-BiLSTM compared with ED-BiLSTM-IF, which demonstrates the necessity of considering spatial features in pollutant prediction. The comprehensive comparison shows that ED-BiLSTM has the smallest error and the best fitting in PM2.5 short-term prediction.
Figure 11 shows a graph of the prediction results for all models and a partial enlargement. As the prediction effect of each model is difficult to see from multiple curves directly, the 50 prediction points are partially enlarged to observe the prediction results of the different models more clearly. We can see from Figure 11 that the ED-BiLSTM model fits the predicted value curve better than the machine learning model, the simplex deep learning model, and the other combined models. Figure 12 plots a curve of predicted and actual values for the ED-BiLSTM model; the red line is the predicted value, and the blue line is the monitored value. The figure shows that the ED-BiLSTM model can track the fluctuations of PM2.5 concentration well, especially the sudden changes and the maximum and minimum points. Figure 13 plots the scatter plot of predicted and actual values, with the horizontal and vertical coordinates representing the monitored and predicted, respectively. The black line symbolizes that y = x, and the scatter points are closely around y = x. Combining the prediction result plot in Figure 12 and the scatter plot in Figure 13, it is easy to see that the ED-BiLSTM fits the PM2.5 concentration sequence best. The proposed model is significantly better than the comparative model and can be effectively applied to PM2.5 concentration prediction.

4.4. Other Cases

To evaluate the robustness and stability of our model, we select 12 monitoring sites in Xi’an from 1 January 2017 to 31 December 2019 for PM2.5 concentration prediction, and the information on the monitoring stations is shown in Table 1. The feature selection and prediction model construction process is the same as the one for the 1463A site. To simplify the description and avoid repetition, we directly give the prediction results of 12 monitoring stations. The evaluation indexes of each model are shown in Table 8 and Table 9 for the 12 stations, and the prediction results differ among monitoring stations. The evaluation index of the proposed model’s RMSE, MAE, and MAPE is smaller than that of other comparative models, and R2 is the largest. The largest value of RMSE, MAE, and MAPE in the proposed model is 9.553, 5.786, and 0.128. The smallest R2 is 0.974. Despite the different prediction results for each site, the proposed model’s RMSE, MAE, and MAPE remain at a small value, and the R2 is greater than 0.97. It appears that the proposed model can be used to estimate PM2.5 concentrations.
In addition, the prediction results are compared for all monitoring stations in Xi’an, and it is found that the proposed model has a minor prediction error for 1470A, while the worst result is obtained for 1467A. 1470A is Yanliang Industrial Park, with a high PM2.5 pollutant emission concentration. In contrast, 1467A is in Xi’an High-Tech Industrial Park. A high-tech industrial park combines modern technology and natural ecology with a low concentration of pollutant emissions. It proves that the PM2.5 prediction value of our model tends to be high, but that the error is still in the low range.

4.5. PM2.5 Concentration Prediction on Areas without Stations

In Section 4.3 and Section 4.4, we discuss the feasibility and prediction performance of the model. However, the information one can obtain from the PM2.5 prediction results at the monitoring stations is limited. To have a more comprehensive grasp of PM2.5 in the study area in the future time, a 1 km × 1 km resolution gridding is performed for Xi’an. Then, according to the prediction results of monitoring stations, the IDW algorithm is used for grid interpolation to obtain the PM2.5 concentration prediction value for each grid. Finally, the interpolation results are visualized through ArcGIS for researchers and managers to get the distribution of PM2.5 concentration in the whole region.
To see more clearly the variation process of PM2.5 concentration in a whole day, Figure 14 visualizes the distribution of PM2.5 concentration at two-hour intervals. We can see from the figure that the PM2.5 concentration varies significantly at each moment of the day, and the distribution of PM2.5 concentration in each region is imbalanced. To explain more clearly the reason for this phenomenon, the distribution of administrative divisions of Xi’an is given in Figure 15. Xi’an, the largest central city in northwest China, consists of 11 districts (Xincheng District, Beilin District, Lianhu District, Baqiao District, Weiyang District, Yanta District, Yanliang District, Lintong District, Chang’an District, Goaling District, and Huyi District) and 2 counties (Lantian County and Zhouzhi County), the main urban areas of which include Weiyang District, Baqiao District, Yanta District, Lianhu District, Xincheng District, and Beilin District.
We combined Figure 14 and Figure 15 to analyze the variation process of PM2.5 concentration in Xi’an on 10 September 2019. From the perspective of the regional division, the PM2.5 concentrations in Zhouzhi County, Huyi County, and Lantian County are low. In contrast, the PM2.5 concentrations in the main urban area and Yanliang District are higher than in other districts and counties. Because of the large population and dense traffic in the central metropolitan area, Yanliang District has a dense industrial park. From the perspective of temporal evolution, PM2.5 concentrations are higher at night than during the day. The reason for this phenomenon is that the atmosphere easily forms a cold inversion layer at night in autumn and winter, making it impossible for air to undergo convection. The accumulation of PM2.5 emissions during the day and the passage of high-emission trucks at night are also significant sources of PM2.5. After a night of accumulation and motor vehicle emissions, the highest PM2.5 concentration occurs at 8:00 am during the latter part of the morning peak. Then, as the ground temperature gradually rises, the inversion layer is broken, and the airflow rate accelerates. The lowest pollutant concentration of the day occurs from 14:00 to 16:00, which is also the best time for outdoor activities. The PM2.5 concentration gradually increases after 18:00 as the ground-level temperature decreases and the evening peak time arrives.
In summary, the complex road traffic in the main urban area and the enormous number of PM2.5 emission sources increase the difficulty of the model prediction. The worst prediction result was obtained at 1467A site in Section 4.4. 1467A site is in the Yanta district of the urban area, and so the interpolation analysis results are the same as the conclusions obtained from the model prediction results. Therefore, we can add more stations if we want to predict the PM2.5 concentration in areas with complex emission sources. Secondly, we strongly recommend that the government invest more in efforts to deal with the problem of high PM2.5 concentration in industrial parks. Finally, according to the distribution of pollutant concentrations in the entire day, it is recommended that residents should choose the mid-afternoon for outdoor activities when the PM2.5 concentration is lowest.

5. Discussion

This paper proposes a new combination prediction model, combining the MRMR feature selection algorithm, the BiLSTM neural network, and IDW spatial interpolation algorithm to predict PM2.5 concentrations in Xi’an. Firstly, we select the appropriate feature subset using the MRMR algorithm, and experiments show that the algorithm fully considers the correlation and redundancy among features, which helps to improve the accuracy of the prediction model. Secondly, we compare our model with the machine learning and single deep learning models. Due to our model considering the spatiotemporal correlation of multivariate time series, it shows strong advantages. Thirdly, the model based on BiLSTM units is compared with the model based on LSTM and autoencoder units, and it is found that BiLSTM has a more robust feature extraction ability. Next, our model improves the predictive ability by considering the spatial correlation between adjacent sites. Finally, the experimental analysis finds that the dual encoder model outperforms a single encoder in extracting the spatiotemporal features of multiple time series. In our proposed comparative model prediction experiments, the ED- BiLSTM model has the lowest prediction error. In addition, we divided the Xi’an into 1 km × 1 km grids to predict PM2.5 concentrations in each grid and explored the process of PM2.5 concentration variation all day through ArcGIS visualization. The proposed model combines the feature selection algorithm, neural network, and interpolation algorithm, which solves the feature redundancy in PM2.5 prediction and the difficulty of PM2.5 prediction in regions without monitoring stations.
According to the analysis of PM2.5 distribution results in Figure 14, it can be found that industrial pollution and traffic pollution are the primary sources of PM2.5. The public can reduce PM2.5 emissions by choosing more public transportation and reducing the use of motor vehicles. Regarding industrial emissions, industrial production processes should be optimized to reduce and purify industrial waste gas. In addition, the emission of PM2.5 precursors, such as NOx, SO2, and volatile organic compounds, should be controlled in the management of PM2.5. When the level of PM2.5 pollution in the atmosphere is high, it is reasonable for the public to wear masks when traveling, which can effectively protect the cardiovascular system. Effective source prevention and control can cut off pollution sources and solve pollution problems. As such, environmental protection departments should formulate more reasonable pollution prevention and control countermeasures to eliminate pollution at the source.
As our model predicts PM2.5 concentration at the hourly level in the short term, and in the case analysis, the model’s accuracy is verified by selecting stations only in Xian. However, this does not mean that the proposed model is geographically limited, and the model can be applied to PM2.5 or other pollutant prediction in different cities. Due to the limitation of data sources, we only consider the effect of other pollutants and meteorological factors on PM2.5 concentration. In addition, we only consider the effects of feature correlation and PM2.5 concentration nonlinear features on the prediction results in this paper, ignoring the errors caused by the non-stationarity of PM2.5 concentration series. In the future, we will analyze the non-stationarity of PM2.5 to achieve a more accurate PM2.5 concentration prediction. Besides, model parameter optimization by using optimization algorithms is also a future research direction.

6. Conclusions

To accurately predict the PM2.5 concentration, we construct a new combination prediction model based on feature selection, BiLSTM, encoder–decoder, and spatial interpolation algorithms. The model introduces the following three parts. (1) Feature selection, whereby the correlation calculation of PM2.5 concentration with other pollutants and meteorological factors is performed using the maximum correlation minimum redundancy algorithm based on mutual information. (2) Neural network model construction, whereby the temporal and spatial correlation features of PM2.5 concentration are extracted using a dual encoder–decoder model. (3) Spatial interpolation, whereby the predicted value of PM2.5 concentration without monitoring site is obtained by the IDW spatial interpolation algorithm, and ArcGIS obtains the distribution map of pollutant concentration in the study area. This paper takes the research data of Xi’an city as an example with which to verify the proposed model, and the main conclusions are as follows.
(1)
Considering the correlation and redundancy between features can achieve data dimension reduction and increase prediction performance. The experimental results showed that the prediction models considering the influence factors RMSE, MAE and MAPE decreased by 12.2%, 27.3%, and 11.4%, respectively.
(2)
The model based on dual encoder–decoder fully extracts the multi-dimensional time series features. The results showed that RMSE, MAE, and MAPE of ED-BiLSTM were reduced by 17.63%, 22.18%, and 9.8%, respectively, compared with a single-BiLSTM model.
(3)
IDW spatial interpolation results and ArcGIS visualization conclude that Weiyang, Gaoling, and Yangling districts are seriously polluted, and that pollution prevention and control in these areas should be strengthened. In addition, it is recommended that more air quality monitoring stations be installed in the main urban area of Xi’an, where air quality prediction is complicated due to the complex roads and high traffic flow during the daytime.
The main contribution of this paper is to propose a new method for predicting PM2.5 concentration. The prediction results of the model can not only provide real-time air quality warnings and fill in the missing monitoring values caused by instrument failures, they can also capture the distribution of PM2.5 concentrations in the study area. In addition, it should be noted that the innovation of this paper is the idea that other feature selection algorithms can be used in future research and that the encoder–decoder model structure can be adapted by other neural network models. The proposed model has a good generalization ability.

Author Contributions

Writing—original draft and methodology, F.Y.; writing—review and funding acquisition, G.H. writing—reviewing and editing, revision. Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

The article is supported by the National Natural Science Foundation of China (71874134). Key Project of Basic Natural Science Research Plan of Shaanxi Province (2019JZ-30).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this paper are publicly available at China Air Quality Online Monitoring and Analysis Platform (https://www.aqistudy.cn/historydata/, accessed on 1 May 2022), and China Meteorological Data Network (http://data.cma.cn/, accessed on 1 May 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

CMAQcommunity multiscale air quality
NAQPMSnested air quality prediction and modeling system
WRFweather research and forecasting
ARIMAautoregressive integrated moving average
ARMAXautoregressive moving average with extra input
BPback propagation
SVRsupport vector regression
ELMextreme learning machine
RNNrecurrent neural network
LSTMlong- and short-term memory
BiLSTMbidirectional long short-term memory
MRMRmaximum correlation minimum redundancy
IDWinverse distance weight
MImutual information
ACFautocorrelation coefficient
PACFpartial autocorrelation coefficient
RMSEroot-mean-square error
MAEmean absolute error
MAPEmean absolute percentage error
R2R squared
BDSBrock–Dechert–Scheinman

References

  1. Jiang, F.; Zhang, C.; Sun, S.; Sun, J. Forecasting hourly PM2.5 based on deep temporal convolutional neural network and decomposition method. Appl. Soft Comput. 2021, 113, 107988. [Google Scholar] [CrossRef]
  2. Wang, Z.; Chen, L.; Ding, Z.; Chen, H. An enhanced interval PM2.5 concentration forecasting model based on BEMD and MLPI with influencing factors. Atmos. Environ. 2020, 223, 117200. [Google Scholar] [CrossRef]
  3. Liu, F.; Tan, Q.; Jiang, X.; Yang, F.; Jiang, W. Effects of relative humidity and PM2.5 chemical compositions on visibility impairment in Chengdu, China. J. Environ. Sci. 2019, 86, 15–23. [Google Scholar] [CrossRef] [PubMed]
  4. Kan, H.; Chen, R.; Tong, S. Ambient air pollution, climate change, and population health in China. Environ. Int. 2012, 42, 10–19. [Google Scholar] [CrossRef]
  5. Weber, S.A.; Insaf, T.Z.; Hall, E.S.; Talbot, T.O.; Huff, A.K. Assessing the impact of fine particulate matter (PM2.5) on respiratory-cardiovascular chronic diseases in the New York City Metropolitan area using Hierarchical Bayesian Model estimates. Environ. Res. 2016, 151, 399–409. [Google Scholar] [CrossRef] [Green Version]
  6. Ashtari, F.; Esmaeil, N.; Mansourian, M.; Poursafa, P.; Mirmosayyeb, O.; Barzegar, M.; Pourgheisari, H. An 8-year study of people with multiple sclerosis in Isfahan, Iran: Association between environmental air pollutants and severity of disease. J. Neuroimmunol. 2018, 319, 106–111. [Google Scholar] [CrossRef]
  7. Li, J.; Liu, H.; Lv, Z.; Zhao, R.; Deng, F.; Wang, C.; Qin, A.; Yang, X. Estimation of PM2.5 mortality burden in China with new exposure estimation and local concentration-response function. Environ. Pollut. 2018, 243, 1710–1718. [Google Scholar] [CrossRef]
  8. Zhang, B.; Zou, G.; Qin, D.; Lu, Y.; Jin, Y.; Wang, H. A novel Encoder-Decoder model based on read-first LSTM for air pollutant prediction. Sci. Total Environ. 2021, 765, 144507. [Google Scholar] [CrossRef]
  9. Huang, G.; Li, X.; Zhang, B.; Ren, J. PM2.5 concentration forecasting at surface monitoring sites using GRU neural network based on empirical mode decomposition. Sci. Total Environ. 2021, 768, 144516. [Google Scholar] [CrossRef]
  10. Sun, W.; Huang, C. A hybrid air pollutant concentration prediction model combining secondary decomposition and sequence reconstruction. Environ. Pollut. 2020, 266, 115216. [Google Scholar] [CrossRef]
  11. Bildirici, M.; Ersin, Ö. Markov-switching vector autoregressive neural networks and sensitivity analysis of environment, economic growth and petrol prices. Environ. Sci. Pollut. Res. 2018, 25, 31630–31655. [Google Scholar] [CrossRef] [PubMed]
  12. Bildirici, M.; Ersin, Ö.Ö. Economic growth and CO2 emissions: An investigation with smooth transition autoregressive distributed lag models for the 1800–2014 period in the USA. Environ. Sci. Pollut. Res. 2018, 25, 200–219. [Google Scholar] [CrossRef] [PubMed]
  13. Astitha, M.; Luo, H.; Rao, S.T.; Hogrefe, C.; Mathur, R.; Kumar, N. Dynamic evaluation of two decades of WRF-CMAQ ozone simulations over the contiguous United States. Atmos. Environ. 2017, 164, 102–116. [Google Scholar] [CrossRef]
  14. Wang, Z.; Itahashi, S.; Uno, I.; Pan, X.; Osada, K.; Yamamoto, S.; Nishizawa, T.; Tamura, K.; Wang, Z. Modeling the long-range transport of particulate matters for January in East Asia using NAQPMS and CMAQ. Aerosol Air Qual. Res. 2017, 17, 3065–3078. [Google Scholar] [CrossRef] [Green Version]
  15. Tie, X.; Geng, F.; Peng, L.; Gao, W.; Zhao, C. Measurement and modeling of O3 variability in Shanghai, China: Application of the WRF-Chem model. Atmos. Environ. 2009, 43, 4289–4302. [Google Scholar] [CrossRef]
  16. Zhang, L.; Lin, J.; Qiu, R.; Hu, X.; Zhang, H.; Chen, Q.; Tan, H.; Lin, D.; Wang, J. Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol. Indic. 2018, 95, 702–710. [Google Scholar] [CrossRef]
  17. Yu, H.; Yuan, J.; Yu, X.; Zhang, L.; Chen, W. Tracking prediction model for PM2.5 hourly concentration based on ARMAX. J. Tianjin Univ. 2017, 50, 105–111. [Google Scholar] [CrossRef]
  18. Dong, L.; Li, S.; Yang, J.; Shi, W.; Zhang, L. Investigating the performance of satellite-based models in estimating the surface PM2.5 over China. Chemosphere 2020, 256, 127051. [Google Scholar] [CrossRef]
  19. Wang, D.; Wei, S.; Luo, H.; Yue, C.; Grunder, O. A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine. Sci. Total Environ. 2017, 580, 719–733. [Google Scholar] [CrossRef]
  20. Zhou, Y.; Chang, F.J.; Chang, L.C.; Kao, I.F.; Wang, Y.S.; Kang, C.C. Multi-output support vector machine for regional multi-step-ahead PM2.5 forecasting. Sci. Total Environ. 2018, 651, 230–240. [Google Scholar] [CrossRef]
  21. Voulodimos, A.; Doulamis, N.; Doulamis, A.; Protopapadakis, E. Deep learning for computer vision: A brief review. Comput. Intell. Neurosci. 2018, 2018, 7068349. [Google Scholar] [CrossRef] [PubMed]
  22. Ioannidou, A.; Chatzilari, E.; Nikolopoulos, S.; Kompatsiaris, I. Deep learning advances in computer vision with 3d data: A survey. ACM Comput. Surv. (CSUR) 2017, 50, 1–38. [Google Scholar] [CrossRef]
  23. Rawat, W.; Wang, Z. Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
  24. Kasthuri, E.; Balaji, S. Natural language processing and deep learning chatbot using long short term memory algorithm. Mater. Today Proc. 2021, 81, 690–693. [Google Scholar] [CrossRef]
  25. Biancofiore, F.; Busilacchio, M.; Verdecchia, M.; Tomassetti, B.; Aruffo, E.; Bianco, S.; Tommaso, S.D.; Colangeli, C.; Rosatelli, G.; Carlo, P.D. Recursive neural network model for analysis and forecast of PM10 and PM2.5. Atmos. Pollut. Res. 2017, 8, 652–659. [Google Scholar] [CrossRef]
  26. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  27. Ma, J.; Li, Z.; Cheng, J.C.P.; Ding, Y.; Lin, C.; Xu, Z. Air quality prediction at new stations using spatially transferred bi-directional long short-term memory network. Sci. Total Environ. 2020, 705, 135771. [Google Scholar] [CrossRef]
  28. Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long short-term memory—Fully connected (LSTM-FC) neural network for PM2.5 concentration prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef]
  29. Zhang, L.; Liu, P.; Zhao, L.; Wang, G.; Zhang, W.; Liu, J. Air quality predictions with a semi-supervised bidirectional LSTM neural network. Atmos. Pollut. Res. 2021, 12, 328–339. [Google Scholar] [CrossRef]
  30. Seng, D.; Zhang, Q.; Zhang, X.; Chen, G.; Chen, X. Spatiotemporal prediction of air quality based on LSTM neural network. Alex. Eng. J. 2021, 60, 2021–2032. [Google Scholar] [CrossRef]
  31. Wu, X.; Liu, Z.; Yin, L.; Zheng, W.; Song, L.; Tian, J.; Yang, B.; Liu, S. A Haze Prediction Model in Chengdu Based on LSTM. Atmosphere 2021, 12, 1479. [Google Scholar] [CrossRef]
  32. Wang, J.; Li, J.; Wang, X.; Wang, J.; Huang, M. Air quality prediction using CT-LSTM. Neural Comput. Appl. 2021, 33, 4779–4792. [Google Scholar] [CrossRef]
  33. Graves, A.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
  34. Bai, Y.; Li, Y.; Zeng, B.; Li, C.; Zhang, J. Hourly PM2. 5 concentration forecast using stacked autoencoder model with emphasis on seasonality. J. Clean. Prod. 2019, 224, 739–750. [Google Scholar] [CrossRef]
  35. Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef]
  36. Chang, Y.-S.; Chiao, H.-T.; Abimannan, S.; Huang, Y.-P.; Tsai, Y.-T.; Lin, K.-M. An LSTM-based aggregated model for air pollution forecasting. Atmos. Pollut. Res. 2020, 11, 1451–1463. [Google Scholar] [CrossRef]
  37. Zhang, B.; Zhang, H.; Zhao, G.; Lian, J. Constructing a PM2.5 concentration prediction model by combining auto-encoder with Bi-LSTM neural networks. Environ. Model. Softw. 2020, 124, 104600. [Google Scholar] [CrossRef]
  38. Bui, T.-C.; Le, V.-D.; Cha, S.-K. A deep learning approach for air pollution forecasting in south korea using encoder-decoder networks & lstm. arXiv 2018, arXiv:1804.07891. [Google Scholar] [CrossRef]
  39. Lyu, P.; Chen, N.; Mao, S.; Li, M. LSTM based encoder-decoder for short-term predictions of gas concentration using multi-sensor fusion. Process Saf. Environ. Prot. 2020, 137, 93–105. [Google Scholar] [CrossRef]
  40. Feng, H.; Zhang, X. A novel encoder-decoder model based on Autoformer for air quality index prediction. PLoS ONE 2023, 18, e0284293. [Google Scholar] [CrossRef]
  41. Jamei, M.; Ali, M.; Malik, A.; Karbasi, M.; Rai, P.; Yaseen, Z.M. Development of a TVF-EMD-based multi-decomposition technique integrated with Encoder-Decoder-Bidirectional-LSTM for monthly rainfall forecasting. J. Hydrol. 2023, 617, 129105. [Google Scholar] [CrossRef]
  42. Yang, H.; Wang, C.; Li, G. A new combination model using decomposition ensemble framework and error correction technique for forecasting hourly PM2.5 concentration. J. Environ. Manag. 2022, 318, 115498. [Google Scholar] [CrossRef] [PubMed]
  43. Sun, W.; Sun, J. Daily PM2. 5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. J. Environ. Manag. 2017, 188, 144–152. [Google Scholar] [CrossRef] [PubMed]
  44. Jain, N.; Murthy, C.A. A new estimate of mutual information based measure of dependence between two variables: Properties and fast implementation. Int. J. Mach. Learn. Cybern. 2016, 7, 857–875. [Google Scholar] [CrossRef] [Green Version]
  45. Ryu, U.; Wang, J.; Kim, T.; Kwak, S.; Juhyok, U. Construction of traffic state vector using mutual information for short-term traffic flow prediction. Transp. Res. Part C Emerg. Technol. 2018, 96, 55–71. [Google Scholar] [CrossRef]
  46. Jebli, I.; Belouadha, F.-Z.; Kabbaj, M.I.; Tilioua, A. Prediction of solar energy guided by pearson correlation using machine learning. Energy 2021, 224, 120109. [Google Scholar] [CrossRef]
  47. Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef]
  48. Gao, D.; Zhou, Y.; Wang, T.; Wang, Y. A method for predicting the remaining useful life of lithium-ion batteries based on particle filter using Kendall rank correlation coefficient. Energies 2020, 13, 4183. [Google Scholar] [CrossRef]
  49. Schober, P.; Boer, C.; Schwarte, L.A. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
  50. Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
  51. Huang, G.; Zhao, X.; Lu, Q. A new cross-domain prediction model of air pollutant concentration based on secure federated learning and optimized LSTM neural network. Environ. Sci. Pollut. Res. 2023, 30, 5103–5125. [Google Scholar] [CrossRef] [PubMed]
  52. Liu, X.; Qin, M.; He, Y.; Mi, X.; Yu, C. A new multi-data-driven spatiotemporal PM2.5 forecasting model based on an ensemble graph reinforcement learning convolutional network. Atmos. Pollut. Res. 2021, 12, 101197. [Google Scholar] [CrossRef]
  53. Ma, J.; Ding, Y.; Gan, V.J.L.; Lin, C.; Wan, Z. Spatiotemporal Prediction of PM2.5 Concentrations at Different Time Granularities Using IDW-BLSTM. IEEE Access 2019, 7, 107897–107907. [Google Scholar] [CrossRef]
  54. Zhou, Y.; Chang, F.-J.; Chang, L.-C.; Kao, I.-F.; Wang, Y.-S. Explore a deep learning multi-output neural network for regional multi-step-ahead air quality forecasts. J. Clean. Prod. 2019, 209, 134–145. [Google Scholar] [CrossRef]
  55. Harman, B.I.; Koseoglu, H.; Yigit, C.O. Performance evaluation of IDW, Kriging and multiquadric interpolation methods in producing noise mapping: A case study at the city of Isparta, Turkey. Appl. Acoust. 2016, 112, 147–157. [Google Scholar] [CrossRef]
  56. Greenblatt, S.A. Nonlinear Dynamics, Chaos, and Instability: Statistical Theory and Economic Evidence. Econ. J. 1993, 103, 751–752. [Google Scholar] [CrossRef]
Figure 1. The flow chart of the proposed model.
Figure 1. The flow chart of the proposed model.
Sustainability 15 09713 g001
Figure 2. Modeling of time series samples. Black is the feature, red is the label.
Figure 2. Modeling of time series samples. Black is the feature, red is the label.
Sustainability 15 09713 g002
Figure 3. LSTM and BiLSTM. Blue indicates forward propagation and orange indicates backward propagation.
Figure 3. LSTM and BiLSTM. Blue indicates forward propagation and orange indicates backward propagation.
Sustainability 15 09713 g003
Figure 4. LSTM network structure.
Figure 4. LSTM network structure.
Sustainability 15 09713 g004
Figure 5. ED-BiLSTM model.
Figure 5. ED-BiLSTM model.
Sustainability 15 09713 g005
Figure 6. Mutual information of characteristic variables for site 1463A.
Figure 6. Mutual information of characteristic variables for site 1463A.
Sustainability 15 09713 g006
Figure 7. Mutual information between PM2.5 concentration at sites.
Figure 7. Mutual information between PM2.5 concentration at sites.
Sustainability 15 09713 g007
Figure 8. The ACF of different time lags for each site.
Figure 8. The ACF of different time lags for each site.
Sustainability 15 09713 g008
Figure 9. The PACF at the 1463A site.
Figure 9. The PACF at the 1463A site.
Sustainability 15 09713 g009
Figure 10. The RMSE and MAE with different time lags.
Figure 10. The RMSE and MAE with different time lags.
Sustainability 15 09713 g010
Figure 11. Results of all models and partial magnification.
Figure 11. Results of all models and partial magnification.
Sustainability 15 09713 g011
Figure 12. Result of the ED-BiLSTM prediction model.
Figure 12. Result of the ED-BiLSTM prediction model.
Sustainability 15 09713 g012
Figure 13. The scatter plot of predicted and true values on the test set.
Figure 13. The scatter plot of predicted and true values on the test set.
Sustainability 15 09713 g013
Figure 14. The distribution of PM2.5 concentrations at two-hour intervals on 10 September 2019.
Figure 14. The distribution of PM2.5 concentrations at two-hour intervals on 10 September 2019.
Sustainability 15 09713 g014
Figure 15. The Map of Xi’an District.
Figure 15. The Map of Xi’an District.
Sustainability 15 09713 g015
Table 1. The information on the monitoring stations.
Table 1. The information on the monitoring stations.
Monitoring Station CodingMonitoring Station NameLongitudeLatitude
1462AHigh-voltage switch factory108.88234.2749
1463AXinqing district108.99334.2629
1464ATextile city109.0634.2572
1465AXiaozhai108.9434.2324
1466ACity stadium108.95434.2713
1467AXian high-tech zone108.88334.2303
1468AEconomic development zone108.93534.3474
1469AChang’an district108.90634.1546
1470AYanliang district109.234.6575
1471ALintong district109.218634.3731
1472ACaotan108.86934.378
1473AQujiang Culture Industry Group108.98534.1978
1474AGuangyuntan109.04334.3274
Table 2. The attributes of feature variables.
Table 2. The attributes of feature variables.
CategoriesVariableUnitMinMaxMeanStd
Pollutant variablesPM2.5 μ g / m 3 179564.2662.98
PM10 μ g / m 3 11967118.3095.7
SO2 μ g / m 3 147513.8710.96
NO2 μ g / m 3 129353.1629.9
O3 μ g / m 3 142651.4852.39
CO mg / m 3 110.51.20.61
Meteorological variablesWD ° 0360148.34104.55
WS m / s 0122.361.34
Tem −9.3041.4015.6810.45
TDew −24.2025.606.6410.67
PrehPa991.901048.801026.8911.18
RH%8.0399.3359.2521.75
Table 3. The BDS results of the PM2.5 concentration series at the 1463A site.
Table 3. The BDS results of the PM2.5 concentration series at the 1463A site.
DimensionZ-Statisticp Value95% Confidence Interval
2258.9080.000[−1.96, 1.96]
3273.8140.000[−1.96, 1.96]
4291.6130.000[−1.96, 1.96]
5318.0660.000[−1.96, 1.96]
6354.3480.000[−1.96, 1.96]
Table 4. PM2.5 prediction results of different feature selection algorithms.
Table 4. PM2.5 prediction results of different feature selection algorithms.
Feature Selection MethodFeatureRMSEMAEMAPER2
Original FeaturePM2.5, PM10, NO2, SO2, RH, CO, O3, Tem, TDew, Pre, WD, WS7.6605.6960.1320.976
MIPM2.5, PM10, NO2, SO2, RH, O37.0804.3970.1260.982
MRMRPM2.5, PM10, NO2, SO2, RH, CO6.6034.0660.1010.986
Incremental SearchPM2.5, PM10, NO2, SO2, RH, CO, O36.6864.0750.1030.985
PM2.5, PM10, NO2, SO2, RH, CO, Tem7.6865.0650.1170.985
PM2.5, PM10, NO2, SO2, RH, CO, TDew7.8535.2640.1220.984
PM2.5, PM10, NO2, SO2, RH, CO, Pre8.0045.2380.1050.984
PM2.5, PM10, NO2, SO2, RH, CO, WD7.7065.2620.1280.986
PM2.5, PM10, NO2, SO2, RH, CO, WS7.6895.1690.1150.986
Table 5. ED-BiLSTM model evaluation index with different time lags.
Table 5. ED-BiLSTM model evaluation index with different time lags.
∆tRMSEMAEMAPER2∆tRMSEMAEMAPER2
17.7564.9550.1350.981138.0015.4170.1790.980
26.7954.2870.1070.986147.7864.6470.1060.981
37.2404.6250.1060.984156.8834.3830.1300.985
46.6034.0660.1010.986166.7384.3750.1390.986
57.2714.4670.1150.984176.7924.2020.1030.986
67.4304.8260.1460.983187.5824.9550.1100.982
78.0755.0230.1160.980197.0844.4990.1140.984
86.8434.4890.1460.985206.7364.1870.1040.986
97.0104.2510.1040.985217.0694.4740.1080.984
107.0274.5260.1360.985226.8664.3320.1130.985
116.9264.5820.1410.985238.3745.0790.1090.978
127.3554.6420.1110.983247.5734.5390.1080.982
Table 6. The parameter setting of the prediction model.
Table 6. The parameter setting of the prediction model.
ParameterValue
Training set80% (21,020)
Validation set10% (2628)
Test set10% (2628)
Number of BiLSTM neurons128
Epochs200
Batch size128
OptimizerAdam
LossMAE
Activation functionReLU
Learning rate1 × 10−3
Dropout0.5
Table 7. The prediction result of all models at the 1463A site.
Table 7. The prediction result of all models at the 1463A site.
Prediction ModelRMSEMAEMAPER2
SVR14.32711.8640.4160.936
LSTM8.6955.4160.1310.985
BiLSTM8.0165.2250.1120.980
ED-LSTM8.9375.3530.1640.980
AE-BiLSTM10.5018.5150.4040.966
ED-BiLSTM-One7.5454.7480.1190.982
ED-BiLSTM-IF7.5215.5930.1140.982
ED-BiLSTM6.6034.0660.1010.986
Table 8. Prediction results of different monitoring stations, and the bolded represents the best value.
Table 8. Prediction results of different monitoring stations, and the bolded represents the best value.
ModelMetric1462A1464A1465A1466A1467A1468A
SVRRMSE23.76516.99621.10120.70418.47215.683
MAE21.41514.81418.09315.50114.31512.639
MAPE0.6440.5080.5820.4800.4200.433
R20.8320.9090.8540.8850.9030.927
LSTMRMSE10.1738.7789.4709.88310.10210.199
MAE6.7565.4156.2174.8736.9187.269
MAPE0.2610.1010.1450.0980.1150.114
R20.9690.9760.9710.9750.9720.965
BiLSTMRMSE9.6077.9598.0889.73810.0029.378
MAE5.7414.7815.0014.8486.7187.010
MAPE0.1260.1100.1250.0930.1120.125
R20.9760.9780.9760.9760.9700.968
ED-LSTMRMSE9.0927.9248.5319.59410.1059.113
MAE5.8644.8685.2294.6156.3255.759
MAPE0.1390.0940.1340.0780.1080.121
R20.9750.9800.9760.9760.9720.977
AE-BiLSTMRMSE10.2289.8709.31813.77314.78411.622
MAE6.9996.9536.0136.10310.3617.665
MAPE0.2560.1600.1310.1250.2640.131
R20.9670.9750.9720.9490.9380.959
ED-BiLSTM-OneRMSE8.4277.8718.0758.9389.8649.061
MAE5.3644.9754.9194.5255.9115.760
MAPE0.1270.1030.1240.0840.1060.120
R20.9790.9800.9780.9790.9750.975
ED-BiLSTM-IFRMSE8.9168.3199.1109.00110.5569.108
MAE5.5604.9355.6394.4656.9215.880
MAPE0.1290.0970.1220.0740.1050.123
R20.9760.9780.9730.9780.9740.978
ED-BiLSTMRMSE7.3936.6097.9978.0239.5538.001
MAE4.2234.5934.9384.3655.7864.745
MAPE0.1140.0920.1190.0760.0910.106
R20.9790.9820.9790.9830.9780.985
Table 9. Prediction results from monitoring stations 1469A-1474A, and the bolded represents the best value.
Table 9. Prediction results from monitoring stations 1469A-1474A, and the bolded represents the best value.
ModelMetric1469A1470A1471A1472A1473A1474A
SVRRMSE19.74311.15014.90222.46516.37121.171
MAE17.3258.57011.69519.50013.48918.849
MAPE0.5530.1910.3120.8420.5310.768
R20.8920.9440.9240.8500.9200.863
LSTMRMSE9.2487.33410.48510.0679.19610.132
MAE6.0604.9136.3525.8495.3866.002
MAPE0.1140.1280.1180.1210.1310.151
R20.9710.9750.9620.9700.9740.963
BiLSTMRMSE8.6897.1119.9539.6698.0839.886
MAE5.2874.7255.8025.5584.5436.132
MAPE0.1070.1150.1170.1290.1280.143
R20.9790.9730.9680.9720.9820.965
ED-LSTMRMSE9.2277.26310.0189.8018.04810.050
MAE5.6884.5475.8825.5635.0445.812
MAPE0.1080.1110.1140.1170.1230.131
R20.9760.9740.9660.9710.9810.969
AE-BiLSTMRMSE13.1997.80411.25311.2929.48411.334
MAE10.6766.3286.9737.8006.9107.210
MAPE0.1220.1170.1160.3260.2860.209
R20.9520.9730.9570.9620.9730.961
ED-BiLSTM-OneRMSE8.3005.5189.5578.6947.6569.259
MAE5.2844.2455.7254.6934.6625.578
MAPE0.1050.1120.1130.1500.1120.132
R20.9810.9860.9690.9720.9830.973
ED-BiLSTM-IFRMSE8.6455.8079.7878.9598.1279.453
MAE5.3194.3835.6844.8284.8715.775
MAPE0.1060.0950.1080.1500.1240.133
R20.9790.9850.9670.9710.9800.973
ED-BiLSTMRMSE8.0954.8728.1248.6257.0379.228
MAE4.9623.5915.2483.3434.3045.003
MAPE0.1020.0690.1070.1240.1110.128
R20.9820.9890.9790.9810.9850.974
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, F.; Huang, G.; Li, Y. A New Combination Model for Air Pollutant Concentration Prediction: A Case Study of Xi’an, China. Sustainability 2023, 15, 9713. https://doi.org/10.3390/su15129713

AMA Style

Yang F, Huang G, Li Y. A New Combination Model for Air Pollutant Concentration Prediction: A Case Study of Xi’an, China. Sustainability. 2023; 15(12):9713. https://doi.org/10.3390/su15129713

Chicago/Turabian Style

Yang, Fan, Guangqiu Huang, and Yanan Li. 2023. "A New Combination Model for Air Pollutant Concentration Prediction: A Case Study of Xi’an, China" Sustainability 15, no. 12: 9713. https://doi.org/10.3390/su15129713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop