2.2.1. PM2.5 Ground Measurements

The PM2.5 observation station data used in this study are real-time ground-measured air quality data (including PM2.5) in China from the China National Environment Monitoring Center. In this study, a total of 775 ground observation stations were chosen from the eastern region of China (Figure 1). Daily measurements recorded at each station were subsequently calculated and subjected to rigorous data quality control following Wei et al. (2019) [30]. These validated ground measurements were then utilized as ground truth for ML-modeling purposes.

### 2.2.2. MODIS AOD Products

The MODIS AOD product serves as the primary predictor for estimating surface PM2.5 in this study. Specifically, Terra and Aqua MCD19A2 AOD products at a spatial resolution of 1 km are employed. This product is retrieved using the MAIAC inversion algorithm over land and incorporates various quality assurance (QA) measures [40,41]. For this study, to ensure the quality of the data, we only employed those MAIAC AOD retrievals passing the recommended QA measures, including cloud screening (QACloudMask = Clear) and adjacency (QAAdjacencyMask = Clear), following the methodology outlined in our previous study [30].

#### 2.2.3. Auxiliary Data

Meteorological reanalysis data used in this paper are collected from the fifth-generation European Reanalysis Interim dataset (ERA5) released by the European Centre for Medium-Range Weather Forecasts. The global hourly dataset has characterized the states of the atmosphere, oceans, and surface since 1979 [42]. Specifically, seven meteorological parameters were employed: boundary layer height (BLH; unit: m), evaporation (ET; unit: mm), relative humidity (RH; unit: %), surface pressure (SP; unit: hPa), 2 m air temperature (TEM; unit: K), and 10 m U and V wind components (unit: m s−1). Copernicus Atmospheric Monitoring Service (CAMS) emission inventories, including the four main precursors of PM2.5, i.e., ammonia, nitrogen oxides, sulfur dioxide, and volatile organic compounds, were also considered [43,44]. In addition, parameters related to surface conditions and human activities, including the normalized vegetation index, a digital elevation model, and population density, were involved. In total, 15 predictor variables, including AOD, are utilized for PM2.5 modeling through ML.

#### *2.3. Methodology*
