**5. Discussion**

This paper shows that LSTM with multiple layers stacked could dramatically increase the prediction's accuracy. Moreover, it is correlated to the general rule of deep learning models: a deep structure could better cope with complicated multi-dimension datasets than models with limited depth.

Furthermore, using correlation analysis could let us decide which part of the whole dataset should be included, which prevents us from just pouring all data into the network to waste time and damage the accuracy.

Compared with the CNN+LSTM model in [28], the multilayer LSTM model proposed in this paper can achieve more accurate results. The reason may be that the six monitoring stations selected in this article are all from Chengdu, with even distance intervals. The climate data except for the main pollutants are similar. Therefore, the mutual influence

between the data is small, leading to more accurate results. Compared with [29], its paper mentions that in the past 3, 8, 24, and 72 h forecast results, 72 h is the best forecast accuracy. We used the data of the past 24 h to predict PM2.5, and the result is better than its 72 h forecast accuracy. Compared with [16], the original text uses the previous week's data (7 days) as the input of the data model. This paper uses the data 24 h ago as the input, which reduces the amount of calculation. Compared with [30], our multilayer LSTM shows more accurate and less biased results. We found that our model made more accurate predictions for such prediction tasks.

To keep increasing the model's accuracy and improve its ability to generalize, we are considering the following methods.


Since our experiment shows that gas concentration data work when using them as materials to make haze concentration predictions, we are considering the potential of utilizing neural networks to make predictions because neural networks could learn some patterns of meteorological phenomena. However, even if we know much about the mechanics behind many meteorological phenomena, we can hardly predict what will take place a few more days later because there are too many noises and uncertain interferences. We can achieve better accuracy through neural networks because our simulation methods are limited when generating long-term predictions.

Since the volume of meteorological data could be tremendous, it makes sense to use deep learning structures to learn the hidden patterns. Therefore, in future work, besides achieving better performance when using current data to predict the target quantity, there is also a need to develop models for predicting the future since simulation does have its limitations.

#### **6. Conclusions**

This paper proposes a multilayer LSTM haze prediction model to predict the PM2.5/PM10 concentration in Chengdu, utilizing O3, CO, NO2, SO2, and PM2.5/PM10 in the last 24 h as inputs. Analyzing the result accuracy of PM2.5 and PM10, we argued that the model fits well with the correlation between O3, CO, NO2, SO2, and haze pollutants and achieves accurate predictions both on haze concentration and level. At the same time, the prediction results show that, within a certain range, the greater the number of hidden layers, the higher the prediction accuracy. When a specific value is reached, the accuracy is roughly equivalent. Under the same network, the prediction accuracy of PM2.5 is significantly higher than that of PM10. Besides pre-processing the data, the primary approach to boost the prediction performance is adding layers above a single-layer LSTM model. Moreover, it is proved that by doing so, we could let the network make predictions more accurately and efficiently.

**Author Contributions:** S.L. and W.Z. contributed to the conception of the paper and supervision; L.Y. performed the formal experiment; W.Z. contributed significantly to analysis and manuscript preparation; J.T., L.S. and X.W. performed the data analyses and wrote the manuscript; B.Y. and Z.L. helped perform the analysis with constructive discussions; S.L. performed the formal analysis and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was jointly supported by the Sichuan Science and Technology Program (2021YFQ0003).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data that support the findings of this study are available from China National Environmental Monitoring Centre, but restrictions apply to the availability of these data, which were used under license for the current study, so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of the China National Environmental Monitoring Centre.

**Conflicts of Interest:** The authors declare no conflict of interest.
