**3. Methods**

#### *3.1. Data Preprocessing*

The data with concentrations less than 0 μg/m<sup>3</sup> and more than 1000 μg/m<sup>3</sup> are eliminated. If one item of meteorological data is missing or abnormal, all data of that day will be eliminated. Outliers are data points that are far from other data points. They are problematic for many statistical analyses because they can cause tests to either miss significant findings or distort real results and are defined as values that deviate from the mean by more than 3 times the standard deviation. Outliers strongly influence the output of a machine learning model. In this paper, the mean value of the data is used to replace the abnormal and missing values.

In our experiment, the concentrations and the raw meteorological data were scaled to a fixed range from 0 to 1 by using the min-max normalization method. We standardize the data by using scikit-learn with the StandardScaler class. The normalization formula is as follows [32]:

$$y\_i = \frac{\mathbf{x}\_i - \min\_{1 \le j \le n} \mathbf{x}\_j}{\max\_{1 \le j \le n} \mathbf{x}\_j - \min\_{1 \le j \le n} \mathbf{x}\_j}, i = 1, 2, \dots, n. \tag{1}$$

where *yi* is the normalized data, *xi* is the data before normalization, *n* is the number of observations.
