*2.1. Study Area and Data Selection*

Northeast China spans the middle temperate and cold temperate zones from south to north, and has a temperate monsoon climate with four distinct seasons, warm and rainy in summer and cold and dry in winter. From the southeast to the northwest, the annual precipitation drops from 1000 mm to less than 300 mm, transitioning from the humid and semi-humid zones to the semi-arid zone.

Considering the meteorological conditions and the current urban development situation in northeast China, three typical large cities in northeast China were selected as the subjects of this study. A total of 27 automatic air quality monitoring stations in the above three cities were selected for this study, as shown in Figure 1, and the hourly PM2.5 and O3 concentration data for 2016–2020 were obtained from the China Environmental Quality Monitoring Platform, with 10 stations selected in Harbin, nine in Changchun, and eight in Shenyang (https://www.aqistudy.cn/historydata/) (accessed on 25 June 2021) The meteorological data were obtained from city stations in each city and are available on the China Meteorological Data Network (http://data.cma.cn/) (accessed on 16 June 2021).

#### *2.2. Backward Trajectory Clustering Analysis*

This study used the HYSPLIT (Hybrid Single Particle Lagrangian Integrated Trajectory) model developed by the National Oceanic Atmospheric Center (NOAA) and the Australian Bureau of Meteorology (BOM) (http://ready.arl.noaa.gov/HYSPLIT.php) (accessed on 21 June 2021) simulates the 72 h backward trajectory at 500 m height in the central city of three provincial capitals to analyze the atmospheric pollutant transport and dispersion trajectory, and other scholars have also conducted similar studies using this model in different areas [16–18].

To facilitate the analysis of pollutant migration paths, we used the stepwise cluster analysis (SCA) algorithm to cluster the backward trajectories with some optimization [19,20]. The clustering analysis process is shown by Equations (1)–(3):

$$\mathbf{D} = \sqrt{\sum\_{\mathbf{j}=0}^{\mathbf{t}} \mathbf{d}\_{\mathbf{j}}^{2}} \tag{1}$$

$$\text{SPVAR} = \sum\_{i=1}^{\mathcal{X}} \sum\_{\mathbf{j}=0}^{t} \mathbf{D}\_{\mathbf{i}\mathbf{j}}^{2} \tag{2}$$

$$\text{TSV} = \sum \text{SPVAR} \tag{3}$$

**Figure 1.** Geographical location and site distribution in the study area.

In the above equation, i is the number of trajectories; j is the number of passing points; t is the movement time of airflow; dj is the distance between the jth point of two trajectories; X is the number of trajectories in the cluster; D is the distance between trajectories, so Dij indicates the distance from the jth passing point in the ith trajectory to the corresponding point on the average trajectory; SPVAR is the spatial variation of each group of trajectories; and TSV is the total spatial variation. The stepwise cluster analysis method can group the adjacent points into one category in a large number of statistical samples, and then select the trajectories with higher similarity for classification. The more classifications, the closer the situation is to the real situation, and the smaller the error of the results.

Cluster analysis was performed on 8760 or 8784 trajectories for each year throughout the study period to identify the transport pathways of pollutants in different periods. In this study, the clustering was set to result in eight trajectories.
