**1. Introduction**

The air pollution incidents caused by haze have often occurred in metropolises such as Los Angeles and London. Respiratory diseases caused by haze have killed ten thousand people and caused widespread public panic [1,2]. In China, social industrialization and urbanization have brought economic development, while the awareness and measures for environmental protection have lagged [2–5].

Sichuan Basin is an area of severe haze pollution in the western part of China. In the 74 major cities monitored, the average annual concentration of PM2.5 ranged from 26 to 160 μg/m3, the average concentration was 72 μg/m3, the proportion of qualified cities was 4.1%, and the average annual concentration of PM10 ranged from 47 to 305 μg/m3, the average concentration was 118 μg/m3, and the proportion of qualified cities was 14.9%. In the following years, haze pollution has been alleviated, but the overall pollution situation is still not optimistic.

The prevention and prediction of haze have become the focus of the public and researchers. The haze research mainly focuses on two aspects: the cause of haze and the prediction of haze. Hinton et al. [6] studied photochemical haze in Los Angeles. They concluded that primary pollutants emitted by motor vehicles and chemical plants and secondary chemical pollutants caused by photochemistry are the primary pollutants. Gupta [7,8] compared the PM10 concentration between the residential and industrial areas in Kolkata, India, and found that soot and motor vehicle emissions had the most significant impact on haze pollution in the area. Minguillón et al. [9] used a positive definite matrix to analyze the main components and formation factors of PM2.5 in Switzerland. Ho et al. [10]

**Citation:** Wu, X.; Liu, Z.; Yin, L.; Zheng, W.; Song, L.; Tian, J.; Yang, B.; Liu, S. A Haze Prediction Model in Chengdu Based on LSTM. *Atmosphere* **2021**, *12*, 1479. https://doi.org/ 10.3390/atmos12111479

Academic Editors: Duanyang Liu, Kai Qin and Honglei Wang

Received: 12 October 2021 Accepted: 4 November 2021 Published: 9 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

collected and tested the chemical composition of PM2.5 in the suburbs of Hong Kong, evaluated the relatively enriched factors in the crustal elements, and used multivariate correlation techniques to determine the source of PM2.5 and its impact.

In terms of haze prediction, researchers have used many different methods to predict haze pollution in different regions. Haze prediction methods include multivariate statistical methods [11–14], chemical transformation models [15–17], and prediction methods based on remote sensing satellite imagery [3,18–21].

RNN (recurrent neural network) and LSTM (long short-term memory) [22–27] have been gradually applied to haze prediction. Qin et al. [28] proposed the new concentration prediction scheme of urban PM 2.5 based on CNN (convolutional neural network) and LSTM. Tsai et al. [29] used RNN and LSTM networks to predict air pollution in Taiwan. Li et al. [16] developed a hybrid called the CNN-LSTM model, which is used to predict the concentration of PM2.5 in Beijing in the next 24 h. Bai et al. [30] proposed an E-LSTM neural network, which constructs multiple LSTM models in different modes for integrated learning with an hourly PM 2.5 concentration forecast. Our time series model did show a better result [26,31–33].

In this paper, we applied neural network methods to predict the concentration of haze pollutants, including PM2.5 and PM10. First, we argued that the concentration of PM2.5/PM10 is related to other gaseous pollutants, such as O3, CO, NO2, SO2, and the concentration of PM2.5/PM10 has time series continuity, which means that the curve of concentration is smooth. Furthermore, the concentration at different times is correlated within a certain time window. Based on the two assumptions, we collected real-time PM2.5, PM10, O3, CO, NO2, and SO2 concentrations published by six ground monitoring stations in Chengdu from 1 June 2014 to 30 June 2017, and meteorological data, such as wind power and temperature. Then, we analyzed the correlation between the collected data and PM2.5, constructed different datasets for predicting PM2.5 and PM10, respectively, and constructed a haze prediction model based on LSTM. The LSTM-based haze prediction model uses the O3, CO, NO2, SO2 concentrations, and PM2.5/PM10 concentrations in the last 24 h as inputs to predict future PM2.5/PM10 concentrations. We also focused on adjusting the haze prediction model's hidden layers to explore the model's best performance.

#### **2. Approach**

The long short-term memory [34] neural network (LSTM) is a new deep machine learning network built on RNN. In order to avoid the vanishing gradient issue and the gradient explosion problem, a long-term delay process is added to the network. Thus, the state unit can keep the error stream, which successfully solves the defects that exist in the RNN and has been widely used in many fields.

RNN has only one hidden layer state, so short-term input is compassionate when dealing with time series, while long-term input is relatively slow. Therefore, LSTM adds a cell state based on the RNN network for long-term state preservation. An unrolled LSTM is shown in Figure 1. It represents the input of the LSTM network at t-time, the output, and the LSTM network's cell state of the LSTM network at T-1 time. These three data are the input, output, and cell state of the LSTM network at t-time, and *X*, *h*, and *c* are vectors.

**Figure 1.** Expanded view of LSTM.

The LSTM implements the preservation, update, and input of long-term state c through the internal forget gate, input, and output gate, as shown in Figure 2. The forget gate and output gate control the cell state of the LSTM. They respectively determine the preserved information of the cell state and the preserved information of the input at the moment of t. The output gate controls the parts of the cell state that we want to include in the output.

**Figure 2.** The internal structure of LSTM.

Gates are the full connection layers, and the expression of the gate is as shown in Formula (1), where the input and output are all represented by vectors. The output is the real number vector with a range of [0, 1]. Thus, in Formula (1), *W* denotes the gate's weight matrix, and *b* represents the error vector. *σ* is the sigmoid function, whose output ranges from 0 to 1 and determines whether the input can pass through the gate.

$$\mathbf{g}(\mathbf{x}) = \sigma(\mathbf{Wx} + \mathbf{b}) \tag{1}$$

The forget gate is shown in Formula (2). It allows the LSTM to forget the memories based on the current input selectively. The input of the forget gates is the input of the current time and the output of the hidden layer node at the previous time. Weight matrix *W* and error *b* are used to adjust the input. The sigmoid function is used to filter out outdated information that is useless for the current output.

$$f\_t = \sigma\left(W\_f \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_f\right) \tag{2}$$

The calculation of the input gate is shown in (3). The input gate controls input information. *Wi* represents the weight matrix of the output gate, and *bi* represents the bias of the input gate. *it* <sup>=</sup> *<sup>σ</sup>*(*Wi* · [*ht*−1, *xt*] <sup>+</sup> *bi*) (3)

$$
\dot{y}\_t = \sigma(\mathcal{W}\_i \cdot [h\_{t-1}, \mathbf{x}\_t] + b\_i) \tag{3}
$$

The input state unit *c <sup>t</sup>* at the current time *t* is calculated through the output of the network at the time *t* − 1 and the input at the time *t*, as shown in (4).

$$c\_t' = \tanh(\mathcal{W}\_c[h\_{t-1}, \mathbf{x}\_t] + b\_c) \tag{4}$$

Therefore, the cell state at the current time *ct* can be obtained by the Formula (5), where the symbol ◦ denotes the elementwise multiplication.

$$c\_t = f\_t \circ c\_{t-1} + i\_t \circ c'\_t \tag{5}$$

The output gate can be expressed as (6). Output gate controls the influence of longterm information on the current output. The output of the long-term memory neural network is determined according to the output gate and cell state, as shown in (7).

$$o\_t = \sigma(\mathcal{W}\_o[h\_{t-1}, \mathbf{x}\_t] + b\_o) \tag{6}$$

$$h\_l = o\_l \circ \tanh(c\_l) \tag{7}$$
