**3. Dataset**

In order to predict PM2.5 and PM10 concentrations, we collected two types of data: gas concentration data and meteorological data [7]. First, we collected the real-time concentration data of PM2.5, PM10, O3, CO, NO2, and SO2 released by six ground monitoring stations in the Chengdu urban area from 1 June 2014 to 30 June 2017. The data update frequency is once an hour. PM2.5, PM10, O3, and SO2 units: μg/m<sup>3</sup> and CO, NO2 units: mg/m3. We used the average concentration simultaneously in six ground monitoring stations as gas concentration data in Chengdu. Moreover, we have collected the temperature, humidity, and wind data released by the China Weather Network (WEATHER) as meteorological data (http://www.cnemc.cn/ (accessed on 16 October 2019)).

The six monitoring stations selected in the study cover the whole urban area of Chengdu and can completely monitor the changes of air quality in Chengdu. The geographical location of the ground monitoring stations in Chengdu is shown in Figure 3.

**Figure 3.** The geographical location of the ground monitoring stations in Chengdu.

#### *3.1. Correlation Analysis*

Since autumn and winter are higher frequency seasons of haze than spring and summer, it can be assumed that haze has different causes in different seasons. When studying the correlation between haze and meteorological conditions, we selected pollutant concentrations such as PM2.5, PM10, O3, CO, NO2, and SO2 and the meteorological data such as temperature, humidity, and wind power in two different time ranges (from 0:00 on 4 July 2016 to 23:00 on 10 July 2016, and from 0:00 on 24 December 2016 to 23:00 on 30 December 2016). The correlation analysis tool in MATLAB was used to complete the correlation analysis between meteorological factors and PM2.5. The results are shown in Table 1.

**Table 1.** Correlation coefficient value of PM2.5 and meteorological factors.


The correlation coefficient table shows that in winter, the pollutant most related to PM2.5 is PM10, followed by NO2, CO, SO2. O3, wind power, and temperature have a low correlation. However, in summer, the correlation between meteorological factors and PM2.5 is different. The correlation ranking is PM10 - CO - O3 - SO2 - NO2. If the | correlation coefficient | < 0.4, it has a low correlation; if 0.4 ≤ | correlation coefficient | < 0.7, it has a significant linear correlation; if 0.7 ≤ | correlation coefficient | 1, it is highly correlated. In general, PM2.5 in Chengdu has a low correlation with temperature, humidity, and wind power, a significant correlation with CO, SO2, NO2, and O3, and a high correlation with PM10.

Because PM2.5 and PM10 are both essential factors affecting haze, this paper uses PM2.5 and PM10 concentration to represent haze pollution, which is also the research object of our LSTM-based haze prediction model. According to the correlation analysis results, PM2.5 has a low correlation with temperature, humidity, and wind power in the short term. It is considered that the weather parameters are stable in the short term. Therefore, we selected CO, SO2, NO2, O3, historical PM10, and historical PM2.5 as inputs to train the haze prediction model and achieve the goal of predicting the concentration of PM2.5/PM10.
