**Air Quality Research Using Remote Sensing**

Editors

**Maria Jo˜ao Costa Daniele Bortoli**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Maria Joao Costa ˜ Institute of Earth Sciences and Earth Remote Sensing Lab Portugal

Daniele Bortoli Institute of Earth Sciences and Earth Remote Sensing Lab Portugal

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Remote Sensing* (ISSN 2072-4292) (available at: https://www.mdpi.com/journal/remotesensing/ special issues/AQ RS).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-5893-6 (Hbk) ISBN 978-3-0365-5894-3 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


Its Impact on Air Pollution in Yunnan Plateau, Southwest China


## **About the Editors**

#### **Maria Jo ˜ao Costa**

Maria Joao Costa is Associate Professor at the Department of Physics, University of ˜ Evora, ´ Portugal. She completed her habilitation in atmospheric and climate physics in 2020 and her PhD in physics, awarded by the University of Evora, in 2004. She is an integrated member of the Institute ´ of Earth Sciences (ICT) and the coordinator of the Atmospheric Sciences, Water and Climate Group at ICT. She is also the director of the Earth and Space Sciences Doctoral Program and of the Earth Remote Sensing Laboratory (EaRSLab) at the University of Evora. Her main research interests ´ concern cloud and aerosol physics, air and water quality, remote sensing, solar radiation, and atmospheric radiative transfer.

#### **Daniele Bortoli**

Daniele Bortoli is Assistant Professor at the Department of Physics, University of Evora, ´ Portugal. He completed his PhD in physics in 2005 at the University of Evora. Since 2007, he ´ has also worked as a non-stipendiary invited research fellow of the Italian Research Council. In 2007 (International Polar Year) he participated in the creation of the Portuguese Polar Program. He participated in scientific collaborations with the Antarctic Projects of Italy, India, Bulgaria, and New Zealand. His main research fields are the physics and chemistry of atmospheric compounds at high-/mid-latitudes, the ozone hole in the Arctic and Antarctica, air quality, the characterization of solar radiation, and the development of remote sensing instrumentation for the study of the atmospheric composition

## *Editorial* **Editorial for the Special Issue "Air Quality Research Using Remote Sensing"**

**Maria João Costa 1,2,3,\* and Daniele Bortoli 1,2,3**


Air pollution is a worldwide environmental hazard with serious consequences for health and climate as well as for agriculture, ecosystems, and cultural heritage, among others. According to the WHO, there are 8 million premature deaths every year resulting from exposure to ambient air pollution. In addition, more than 90% of the world's population lives in places where air quality is poor, exceeding the recommended limits; most of these places are in low- or middle-income countries. Air pollution and climate influence each other through complex physicochemical interactions in the atmosphere, altering the Earth's energy balance, with implications for climate change and air quality.

It is vital to measure specific atmospheric parameters and pollutant concentrations, monitor their variations, and analyze different scenarios with the aim of assessing air pollution levels and developing early-warning and forecast systems; such developments provide a means of improving air quality and assuring public health in favor of a reduction in air pollution casualties and a mitigation of climate change phenomena. Eleven research papers were published in this Special Issue, comprising one communication paper [1], seven articles [2–8], two technical notes [9,10], and one letter [11]. The published research signals the potential of applying remote sensing data in air quality studies, including combination with in situ data [1,3,6,8], modeling approaches [2,9,11], and the synergy of different instrumentations and techniques [4,5,7,10]. Significant pollutants considered in the studies include aerosols—using PM2.5 and aerosol optical depth (AOD) as quantification variables [1,2,4,5,9]—nitrogen dioxide (NO2) [7,8,11], formaldehyde (HCHO) [3], and carbon monoxide (CO) [6,10], among others [10].

The influence of meteorology on seasonal PM2.5 concentrations and AOD was analyzed, providing insight that may contribute to improving the retrievals of surface PM2.5 from satellite AOD [2]. The mechanisms of PM2.5 regional transport from biomass burning in Southeast Asia were examined for a case study during springtime, with an emphasis on the role of meteorology [9]. Furthermore, the influence of urban form on PM2.5 surface concentrations was investigated, providing a seasonal analysis method which is relevant for urban planning strategies surrounding air quality improvement in populated areas [4]. New methods combining remote sensing data and additional ancillary datasets with machine learning algorithms were proposed, allowing us to retrieve surface PM2.5 concentrations [1] and AOD [5]. Such prediction schemes can provide significant information for advances in air quality research.

The importance of drones for monitoring limited areas, often in areas of difficult access, is increasingly being recognized. An application of drones over a wastewater treatment plant, permitting the real-time monitoring of gaseous pollutants, was demonstrated in [10], and open challenges were identified.

An evaluation of satellite retrievals of HCHO, a recognized hazardous air pollutant, using ground-based data was carried out for a ten-year period [3]. Results suggest that

**Citation:** Costa, M.J.; Bortoli, D. Editorial for the Special Issue "Air Quality Research Using Remote Sensing". *Remote Sens.* **2022**, *14*, 5566. https://doi.org/10.3390/rs14215566

Received: 19 October 2022 Accepted: 31 October 2022 Published: 4 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

satellite results are more prone to seasonal variations than ground-based measurements and show evidence of a latitude dependency with a seasonal bias. Studies of satellite retrievals in comparison with ground-based measurements are very pertinent considering the use of new Earth observation sensors for air quality monitoring. CO concentration variability was also assessed from both satellite and ground-based measurements [6]. The authors of [6] examined the horizontal and vertical variations in CO concentrations caused by the COVID-19 lockdown in 2020 and compared the contributions from different sources with results from 2019.

The distribution and trends of tropospheric NO2 at a global scale were analyzed for a 13-year period using satellite retrievals [8]. Ground-based measurements were also used for comparison purposes. Hotspots of high concentrations of this air pollutant were identified, as well as regions of negative and positive trends during the period of study. The highest concentrations of tropospheric NO2 were detected in recent years, indicating the importance of monitoring anthropogenic emissions and implementing further actions for their reduction. The authors of [11] used satellite data combined with air quality modelling to estimate the impact of the COVID-19 lockdown on tropospheric NO2, while analyzing the role of meteorology and sampling variability in the process. Satellite data were used in combination with data from ground-based NO2 concentration measurements, NOx emissions, land uses, road networks, and population densities, in order to develop a regression model for determining surface NO2 with a high spatial resolution [7]. The model was applied at a city scale, with the results highlighting the key role of Earth observation technologies in support of exposure assessments and policy development for air quality control.

The publications in this Special Issue highlight the importance and topicality of air quality studies and the potential of remote sensing, particularly from Earth observation platforms, in contributing to this topic.

**Author Contributions:** Conceptualization, M.J.C. and D.B.; methodology, M.J.C.; resources, M.J.C. and D.B.; writing—original draft preparation, M.J.C.; writing—review and editing, M.J.C. and D.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The Guest Editors would like to thank all authors who contributed to this Special Issue for sharing their scientific findings in this forum. We would also like to thank the reviewers for their valuable work and the editorial team for all the support in the process.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Communication* **Machine Learning-Based Approach Using Open Data to Estimate PM2.5 over Europe**

**Saleem Ibrahim 1,\*, Martin Landa 1, Ondˇrej Pešek 1, Lukáš Brodský <sup>2</sup> and Lena Halounová <sup>1</sup>**


**\*** Correspondence: saleem.ibrahim@fsv.cvut.cz

**Abstract:** Air pollution is currently considered one of the most serious problems facing humans. Fine particulate matter with a diameter smaller than 2.5 micrometres (PM2.5) is a very harmful air pollutant that is linked with many diseases. In this study, we created a machine learning-based scheme to estimate PM2.5 using various open data such as satellite remote sensing, meteorological data, and land variables to increase the limited spatial coverage provided by ground-monitors. A space-time extremely randomised trees model was used to estimate PM2.5 concentrations over Europe, this model achieved good results with an out-of-sample cross-validated R<sup>2</sup> of 0.69, RMSE of 5 μg/m3, and MAE of 3.3 μg/m3. The outcome of this study is a daily full coverage PM2.5 dataset with 1 km spatial resolution for the three-year period of 2018–2020. We found that air quality improved throughout the study period over all countries in Europe. In addition, we compared PM2.5 levels during the COVID-19 lockdown during the months March–June with the average of the previous 4 months and the following 4 months. We found that this lockdown had a positive effect on air quality in most parts of the study area except for the United Kingdom, Ireland, north of France, and south of Italy. This is the first study that depends only on open data and covers the whole of Europe with high spatial and temporal resolutions. The reconstructed dataset will be published under free and open license and can be used in future air quality studies.

**Keywords:** PM2.5; AOD; machine learning; Europe; open data

#### **1. Introduction**

Air quality monitoring is one of the most important fields when it comes to the individual's health due to the high risks related to its low quality. Fine particulate matter is an air pollutant that consists of liquid and solid molecules such as acid condensates, sulphates, and nitrates that have negative effects on human health [1]. The harmful effects of these particles vary depending on the concentrations, time exposure, and the particulate diameter. Risks are higher when the diameter gets smaller; PM2.5 can penetrate deep into the lungs and may reach the blood circulation causing dangerous diseases such as cardiovascular problems, diabetes, prenatal disorder, and even mortality [2–5]. The effects are more notable in urban areas, where higher population density can be found, and more exposure will occur [6]. The form of the urban area plays an important role in the concentration of PM2.5 [7].

The U.S. Environmental Protection Agency (EPA) has set an annual average standard of 12 μg/m3 and a daily (24 h) of 35 μg/m3 for PM2.5 and when the amounts of these pollutants in the ambient air exceed these limits that could cause serious health issues [8]. The revised Directive 2008/50/EC of the European Parliament (EP) and of the Council on ambient air quality and cleaner air for Europe set limit values of annual PM2.5 to 25 μg/m3 since 1 January 2015 and not to exceed 20 μg/m3 since 1 January 2020. PM2.5

**Citation:** Ibrahim, S.; Landa, M.; Pešek, O.; Brodský, L.; Halounová, L. Machine Learning-Based Approach Using Open Data to Estimate PM2.5 over Europe. *Remote Sens.* **2022**, *14*, 3392. https://doi.org/10.3390/ rs14143392

Academic Editors: Maria João Costa and Daniele Bortoli

Received: 14 May 2022 Accepted: 12 July 2022 Published: 14 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

ground-based monitors are used to measure PM2.5 with high accuracy. These stations are considered the backbone in almost all analyses related to these particles. However, the high cost of establishing these monitors limits the overall spatial coverage and the researchers who are focusing on air quality were seeking new methodologies to increase the spatial coverage so they have a better understanding on larger geographical scales. Numerous techniques were used to increase PM2.5 spatial coverage, in other words, to estimate the pollutant concentrations in the areas where no monitors exist. Examples of that are interpolation techniques that count only on the ground stations [9,10]. The accuracy of these interpolations is highly related to the spatial distribution of the stations; although they can have good estimations in the areas that are surrounded by the network stations, they will probably fail to have good estimations where there is a lack of the stations [9]. Land use regression (LUR) models were also used to analyse pollution, particularly in densely populated areas [11,12].

Satellite remote sensing provides wide spatial coverage compared to the spatial coverage obtained from ground monitors. Aerosol optical depth (AOD) is an air quality indicator that can be observed from satellite remote sensing, and it is defined as the measure of the columnar atmospheric aerosol content. Numerous studies have found a positive correlation between satellite-based AOD and surface particulate matter [13,14]. Researchers have utilised satellite AOD to estimate PM2.5 by developing different types of models such as physical models that were built based on the physical relationship between AOD and surface PM2.5 [15]. Statistical methods which train the relationship between AOD and PM2.5 using different statistical models [16,17] are suitable for the regions with a sufficient number of ground stations since they require a large amount of training data [18]. The generalised additive model (GAM) empowers the AOD–PM relationship by adding meteorological and land use information [19]. In the last few decades, artificial intelligence models have been applied to estimate PM2.5 and were found to give a better description of the complex non-linear relationship between PM2.5, AOD, and other independent variables than the previously mentioned methods [18] based on the usage of machine learning algorithms [20–22] or deep neural networks [23,24]. These algorithms utilise satellite observations, various modelled meteorological variables, population, land use, land cover, etc., to estimate PM2.5. The importance of the inputs differs from one area to another, but generally, they can enhance PM2.5 estimations since counting solely on AOD to estimate near-surface particulate matter values is not sufficient [25]. AOD without other variables was not enough to provide good PM2.5 estimations over Europe [26]. In Great Britain, AOD was not among the 15 most important variables when predicting PM2.5 levels [20]. Satellite AOD are more correlated with surface PM when the aerosols are well mixed within the planetary boundary layer height (PBLH) [9]. A global study found that 69% of the total AOD are within the PBLH [27], other studies have shown that temperature plays an important role in capturing AOD and understanding its vertical distribution that improve PM analysis [28]. Moreover, a higher humidity atmosphere is likely to have higher AOD without affecting the levels of PM2.5 [9]. Other meteorological variables that affect PM2.5 are the precipitation that showed a negative correlation in some areas [29] and a positive correlation in other parts of the world [30], and wind speed (WS) that also has different effects from one area to another [30,31].

In this study, we report the modelling of spatiotemporal heterogeneity of PM2.5 using machine learning to generate daily estimations of PM2.5 over the European Union member states, together with the United Kingdom, Iceland, Liechtenstein, Norway, Switzerland, Albania, Bosnia and Herzegovina, Kosovo, Montenegro, North Macedonia, and Serbia [32]. We will refer to the area of study as "Europe" located inside the coordinates box 26◦W, 72◦N, 42◦E, and 36◦S. The total study area covers 13,391,504 of 1 km grid cells; 5,450,009 of the total cell number are located over land. The study period covers the years 2018–2020 with full coverage of 1 km spatial resolution using various open data. In the following sections, we will introduce the study area and period and present the preliminary data that were tested while building the predicting model.

#### **2. Primary Data**

In this section, we will introduce the primary data we investigated while building the model. Not all these data were utilised while building the model. The chosen data can be found in Section 3.3.

#### *2.1. PM2.5 Measurements*

PM2.5 observations were collected from 848 stations across Europe represented in Figure 1. Data was downloaded from OpenAQ which is a non-profit organisation that collects air quality data from different governmental and research institutions and provides it to the users [33].

**Figure 1.** The location of PM2.5 ground stations with the number of valid measurements used in this study.

For each station, data between 10 a.m. and 2 p.m. local time were averaged where there are at least 2 available observations to be consistent with MODIS satellites overpassing. We identified a skewed distribution for PM2.5 as shown in Figure 2, we calculated the 25th percentile (Q1), the 75th percentile (Q2) of the dataset, and the inter-quartile range (IQR = Q3 − Q1). All PM2.5 values that are higher than 2 × (Q3 + 3 × IQR which is refer as outer fence [34]) were removed, which counted less than 1% of the total data. The number of valid PM2.5 observations was 123,248 in 2018, 143,048 in 2019, and 158,964 in 2020 totalling 425,260 observations throughout the study period.

**Figure 2.** The distribution of the measured PM2.5 used in this study.

#### *2.2. AOD Data*

AOD data were downloaded from GHADA, which is a Geo-Harmonized Atmospheric Dataset for Aerosol optical depth at 550 nm [35]. It contains daily estimations of AOD550 over Europe with 1 km spatial resolution. GHADA was built based on the MODIS MCD19A2 product [36] and modelled AOD data from Copernicus Atmosphere Monitoring Service (CAMS) [37] that were used to overcome the high percentage of gaps found in the MCD19A2 product. This dataset showed good results when validated with NASA's Aerosols Robotic Network (AERONET).

#### *2.3. Meteorological Data*

Meteorological data of the following variables wind component *u*, wind component *v*, PBLH, total column water vapour, total perception, evaporation, surface pressure, and temperature at 2 m (T2m) were collected from ERA5-Land which is a reanalysis dataset offering a consistent view of the development of land parameters over several decades with a spatial resolution of ~9 km. ERA5-Land was produced by replaying the land component of the European Centre for Medium-Range Weather Forecasts ERA5 climate reanalysis [38]. Relative humidity was collected from ERA5 with 0.25 × 0.25 horizontal resolution.

#### *2.4. Digital Elevation Model*

The Japan Aerospace Exploration Agency (JAXA) provides a worldwide digital surface model with a horizontal resolution of ~30 m by the Panchromatic Remote-sensing Instrument for Stereo Mapping (PRISM), which was carried on the Advanced Land Observing Satellite "ALOS" [39]. Data were accessed on the 8 March 2021 from https: //www.eorc.jaxa.jp/ALOS/.

#### *2.5. Normalised Difference Vegetation Index*

MODIS Terra satellite provides a monthly normalised difference vegetation index (NDVI) product called MOD13A3 [40]. It has 1 km spatial resolution, and it quantifies vegetation presence with values ranging between −1 and 1. NDVI is commonly expressed as shown in Equation (1):

$$\text{NDVI} = \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}} \tag{1}$$

where NIR and Red are spectral reflectance values in the near-infrared and red wavelengths.

#### *2.6. Land Cover*

Land cover data were extracted from the 2018 CORINE Land Cover (CLC) inventory that was built based on ortho-rectified satellite images having a spatial resolution ranging from 5 m to 60 m and were aggregated into 100 m. We grouped the original 44 CLC classes into seven level 1 classes defined as: agricultural areas, artificial areas, continues urban areas, discontinues urban areas, forests, industrial areas, and water surfaces. Then, we calculated the percentage of each class in every 1 × 1 km2 grid cell.

#### *2.7. Population Data*

Population data was extracted from the Visible Infrared Imaging Radiometer Suite (VIIRS) night-time lights (NTL) data by averaging the monthly data of the year 2019.

#### **3. Methodology**

#### *3.1. Data Pre-Processing*

All data were reprojected to the European Terrestrial Reference System 1989 (EPSG:3035) that uses metres as measuring units. This system is used for statistical mapping and other purposes which requires a true area representation, using a 1 km grid cell with bilinear interpolation method for ECMWF data and the cubic convolution for the ALOS elevation model. In addition, we calculated WS based on the two wind U and V components.

A spatio-temporal dataset was created by extracting the information from all input data at the locations of PM2.5 stations. The Julian day, month, and year were added as the temporal information; longitude and latitude were added as the spatial information. The generated dataset was used to train and test the model.

#### *3.2. Model Development*

We first analysed the linear relationship between the primary independent variables and PM2.5 values. PBLH was negatively correlated to PM2.5 with Pearson correlation of r = −0.24. Most of the meteorological variables were also negatively related to PM2.5 with r = −0.2 for WS, r = −0.15 for T2m, r = −0.13 for RH, and r = −0.1 for TP. AOD and evaporation had the highest positive correlation with PM2.5 r = 0.14. Based on this initial data exploratory analysis, we excluded some primary inputs that had high correlation with other inputs such as skin temperature, which was correlated to T2m with r = 0.93. We tested linear models to estimate PM2.5. These models suffered from underfitting issues and failed to describe the relationship between the independent variables and PM2.5. Therefore, we used a more complex algorithm called Extremely Randomised Trees (ET).

ET is a very similar decision tree-based ensemble method to the widely used Random Forest (RF). Both algorithms are composed of large number of trees, where the final decision is obtained from the prediction of every tree by majority vote in classification problems and arithmetic average in regression problems. Both algorithms have the same growing tree procedure and selecting the partition of each node. Additionally, both algorithms randomly choose a subset of input features.

ET, on the other hand, strongly randomises the selection of both attribute and cut point while splitting a tree node using the whole learning sample to grow the trees which adds randomisation, making it a more robust algorithm against overfitting. From computational point of view, the complexity of the tree growing procedure is on the order of N log N with respect to learning sample size [41]. The main parameters in the ET splitting process are the number of attributes that are randomly selected at each node and the minimum sample size for splitting a node. For further information on how the ET algorithm operates refer to Table 1 in [41]. In addition to accuracy, the ET algorithm has higher computational efficiency than the RF algorithm since it chooses the splits randomly and does not look for the optimum split as the latter one [41]. The number of estimators (number of trees in the forest), the maximum depth of the trees, the number of samples required to split an internal node, and the minimum number of samples required to be at a leaf node were the main parameters while tuning our model.


**Table 1.** The dependent and independent variables used to build the ET model.

To reduce model complexity due to the large number of independent variables we excluded the input variables based on the feature importance in the ET algorithm. Besides the spatio-temporal information, we used PM2.5 with the independent variables that are shown in Table 1 to develop our model.

#### *3.3. Model Validation*

3.3.1. Sample-Based Cross Validation

Cross validation (CV) is a common method to analyse the model performance and detect potential overfitting problems where the model achieves high accuracy on the training set and performs badly on new data or the test set. We applied a 10-fold CV where all samples in the training dataset were randomly divided into 10 equal subsets. Then, in each round, 9 subsets were used to fit the model, and the remaining subset was used for testing the model performance [42]. This approach is used widely in PM studies [20,21,43–45].

#### 3.3.2. Spatial and Temporal 10-Fold Cross Validation

In this validation, we divided the samples based on two factors. For the spatial 10-fold cross validation we splatted the data based on the location of the stations, the stations were divided randomly into 10 folds. In each fold, the model was trained on the samples from 90% of the stations and the samples from the remaining 10% for testing. For the temporal 10-fold cross validation, we divided the samples into 10 folds based on the Julian day and applied the cross validation in a similar way to the previously mentioned one.

#### **4. Results**

The results of sample-based, spatial, and temporal 10-fold cross validation are shown in Table 2. The density scatter plot for the sample-based cross validation is shown in Figure 3.



**Figure 3.** Density scatter plot of the sample-based 10-CV results of the model.

It must be noted that PM2.5 levels in general are low in Europe when compared to more polluted areas and this is reflected by the low RMSE we obtained in our study when compared to some studies outside Europe with higher R2 values [44,45]. Our model proved its efficiency in predicting PM2.5 when our results (out-of-sample R<sup>2</sup> = 0.69, RMSE = 5 μg/m3) were comparable with results obtained from a recent study over a smaller geographic area in Europe (Great Britain; out-of-sample R<sup>2</sup> = 0.77, RMSE = 4 μg/m3) [20]. It is also noted that the model underestimates high PM2.5 values (>40 μg/m3) since such values are not abundant over our study area.

To justify the difference in the model performance spatially and temporally, we applied site-based cross validation where we used samples from one station as the test set, and the samples from all remaining stations were used to train the model. We applied this method to analyse the model performance spatially, since the standard 10-CV may not be able to detect potential spatial overfitting [18].

The results are shown in Figure 4. The model performs well in most of the locations in Central Europe with an average R2~0.7. A total of 63% of all stations in Europe have R2 > 0.6. The accuracy of the model is lower in the northern and southern parts of Europe. However, the RMSE and MAE are relatively small even in the northern and southern parts.

**Figure 4.** Spatial distribution of the site-based cross validation of coefficient of determination, the root mean square error, and the mean absolute error.

#### **5. Creating PM2.5 Maps**

Daily PM2.5 maps during MODIS satellite overpassing were created for the period 2018–2020 over Europe. Figure 5 shows the average PM2.5 for the year 2018, 2019, and 2020.

**Figure 5.** The average PM2.5 for the years 2018, 2019, and 2020 over Europe.

A significant decline in PM2.5 levels has occurred over Europe throughout the study period. Poland had the highest PM2.5 average level in the year 2018 with an average level~19.5 μg/m3, in 2019 Romania had the highest average~16.5 μg/m<sup>3</sup> whereas Serbia had the highest average in 2020 with an average~15.8 μg/m3. Finland had the lowest PM2.5 average level in all three years with 7.1 in 2018, 6.3 in 2019, and 5.8 in 2020. Comparing the results of the average PM2.5 levels for the years 2018, 2019, and 2020 were highly compatible with the reports of the European Environment Agency (EEA). According to EEA the highest PM2.5 concentrations were found in central and eastern Europe and northern Italy. For the central and western parts, the main reason for high PM2.5 is the usage of solid fuels with older vehicle compared to other parts of Europe [46], besides using the solid fuels for heating as was found in Poland [47]. For the northern part of Italy, the high levels of PM2.5 are due to the combination of a high density of anthropogenic emissions and meteorological conditions [46,48]. Furthermore, Milan, the largest city in the north of Italy previously reported levels of PM2.5 exceeding the safety limit set by the EU [49].

As an application, we used the proposed machine learning-based prediction approach in PM2.5 levels analysis to study the effect of the COVID-19 lockdown (March to June of the year 2020) on air quality over Europe. As an attempt to verify the influence of the lockdown on air quality, we compared the average PM2.5 of the previous 4 months (November to December in 2019 and January to February 2020) and the following 4 months (July to October 2020) to the 4 months of the lockdown by calculating the relative percentage difference (RPD). By doing so, we masked the general improvement trend in air quality over Europe. RPD calculated using Equation (2).

$$\text{RPD} = \frac{\text{PM}\_{2.5} \text{ (lockdown)} - \text{PM}\_{2.5} \text{avg} (\text{before clockdown}, \text{after clockdown})}{\text{PM}\_{2.5} \text{avg} (\text{before clockdown}, \text{after clockdown})} \times 100 \quad (2)$$

We found a significant improvement in air quality over Europe except for UK, Ireland, north of France, and south of Italy as shown in Figure 6. Our results are in agreement with another study over Poland (Eastern Europe), where the air quality represented by PM2.5 has significantly improved in the months of March to April in 2020 when the authors compared to the same months from the previous two years [50]. Interestingly, the unusual increase in PM2.5 levels in the UK was consistent with what was reported in [51] as the authors justified such increase by unusual meteorological conditions. The latter conditions may also justify the increase in PM2.5 over northern France. In Italy, where people were spending most of their time at home, the increased house heating during the lockdown period limited the decrease in PM2.5 levels besides the effects of the agriculture sector that kept performing during the lockdown [52].

**Figure 6.** Relative percentage difference of PM2.5 for the lock down period of the year 2020 with the average of the previous 4 months and the following 4 months.

#### **6. Discussion**

In this study, we proposed the first machine learning-based scheme to estimate PM2.5 levels over Europe with high spatial resolution of 1 km. We trained an extra trees model using observed PM2.5 from 848 stations as the target variable. AOD, different meteorological variables, land variables, and NDVI as the independent variables.

The sample-based 10-fold CV showed that our model underestimates high PM2.5 values (>40 μg/m3) which may limit the model ability to detect hazard situations. This underestimation occurred since high PM2.5 values were not common over our study area as shown in Figure 2. The spatial cross validation showed that the model estimates PM2.5 with a higher R2 in the areas with high ground stations density the compared to the areas with a lower density. The occurred spatial overfitting is expected to happen due to spatially unbalanced data.

In Central Europe (Czech Republic, Poland, Slovakia, and surrounding areas), the model performed with a higher R<sup>2</sup> compared to the northern and southern parts of Europe. However, the RMSE in Central Europe was comparably higher than the ones in the previously mentioned parts. This is due to the fact that the average PM2.5 value in Central Europe is higher and have more variations than the northern and southern parts. The highest RMSE in Central Europe can be found in three stations in the Czech Republic. These stations are located near mining areas with higher PM2.5 values compared to other stations that are mostly located in urban areas. This issue can be potentially solved by including a detailed land cover data with an appropriate classification for each station which is usually difficult to achieve on a large scale such as in our study.

Having unbalanced spatial-temporal data made the modelling more complex than other studies which focused on smaller areas with well-balanced data and with similar instruments in measuring PM2.5 values. However, by tuning the parameters in the model, we were able to achieve acceptable results for most parts of our study area. The effect of the chosen independent variables in estimating PM2.5 differs across the study area. We analysed the spatial potential relationships of the independent variables in estimating PM2.5 by calculating feature importance in four parts of Europe: north-west (latitude > 50 and longitude < 10), north-east (latitude > 50 and longitude > 10), south-west (latitude < 50 and longitude < 10), and south-east (latitude < 50 and longitude > 10). AOD and PBLH had the most feature importance in all parts of Europe with an average of 10.4% and 14.1%, respectively. WS and temperature had more effect in estimating PM2.5 in the south of Europe compared to the north. Rh had more importance in estimating PM2.5 in the western part of Europe compared to the eastern part.

Table 3 shows the effects of AOD and the most important meteorological variables on PM2.5 estimates. We tried to train multiple models based on the area. However, this approach did not improve the overall performance over the whole study area.


**Table 3.** The effects (%) of AOD and the most important meteorological variables on PM2.5 estimations in the four chosen parts of our study area.

#### **7. Conclusions**

In this study, we developed a spatio-temporal machine learning model to estimate daily PM2.5 levels for the years 2018–2020 with 1 km spatial resolution over Europe using open data from multiple sources such as remote sensing satellite-based products, meteorological reanalysis datasets, and other land variables.

The developed model was used to estimate PM2.5 values over 5,450,009 land cells (1 km2) for a 3-year period (1096 days) totalling more than 5.973 billion estimations with a good sample-based CV coefficient of 0.69, RMSE of 5 μg/m3, and MAE of 3.3 μg/m3.

We calculated the yearly average of PM2.5 levels, and we found that PM2.5 values have dropped in almost all parts of Europe during the study period.

The full coverage dataset of PM2.5 that we produced can be used to investigate air quality over Europe with higher spatial resolution compared to the available products which may provide better understanding in time series analysis in this field.

**Author Contributions:** Conceptualization, S.I.; Data curation, S.I.; Formal analysis, S.I., M.L., O.P. and L.B.; Funding acquisition, M.L.; Investigation, S.I.; Methodology, S.I., L.B. and L.H.; Project administration, M.L. and L.H.; Resources, S.I. and L.H.; Software, S.I., M.L. and O.P.; Supervision, L.H.; Validation, S.I.; Visualization, S.I.; Writing—original draft, S.I.; Writing—review & editing, S.I., M.L., O.P., L.B. and L.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is co-financed under the Grant Agreement Connecting Europe Facility (CEF) Telecom project 2018-EU-IA-0095 by the European Union and by the Grant Agency of the Czech Technical University in Prague, grant No. SGS22/047/OHK1/1T/11.

**Data Availability Statement:** The data and data analysis methods are available upon request.

**Acknowledgments:** The authors sincerely thank the OpenAQ organisation for providing PM2.5 observations, NASA EOSDIS for providing the daily MODIS MAIAC AOD product (MCD19A2) which was used to build GHADA and that is available from the Land Processes Distributed Active Archive Centre (LPDAAC), the European Centre for Medium-Range Weather Forecasts (ECMWF) for providing global reanalysis of atmospheric composition, and the Japan Aerospace Exploration Agency (JAXA) for providing the digital surface model used in this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Effects of Meteorology Changes on Inter-Annual Variations of Aerosol Optical Depth and Surface PM2.5 in China—Implications for PM2.5 Remote Sensing**

**Ling Qi 1, Haotian Zheng 2, Dian Ding 3, Dechao Ye <sup>2</sup> and Shuxiao Wang 2,4,\***


**Abstract:** PM2.5 retrieval from satellite-observed aerosol optical depth (AOD) is still challenging due to the strong impact of meteorology. We investigate influences of meteorology changes on the inter-annual variations of AOD and surface PM2.5 in China between 2006 and 2017 using a nested 3D chemical transport model, GEOS-Chem, by fixing emissions at the 2006 level. We then identify major meteorological elements controlling the inter-annual variations of AOD and surface PM2.5 using multiple linear regression. We find larger influences of meteorology changes on trends of AOD than that of surface PM2.5. On the seasonal scale, meteorology changes are beneficial to AOD and surface PM2.5 reduction in spring (1–50%) but show an adverse effect on aerosol reduction in summer. In addition, major meteorological elements influencing variations of AOD and PM2.5 are similar between spring and fall. In winter, meteorology changes are favorable to AOD reduction (−0.007 yr−1, <sup>−</sup>1.2% yr−1; *<sup>p</sup>* < 0.05) but enhanced surface PM2.5 between 2006 and 2017. The difference in winter is mainly attributed to the stable boundary layer that isolates surface PM2.5 from aloft. The significant decrease in AOD over the years is related to the increase in meridional wind speed at 850 hPa in NCP (*p* < 0.05). The increase of surface PM2.5 in NCP in winter is possibly related to the increased temperature inversion and more stable stratification in the boundary layer. This suggests that previous estimates of wintertime surface PM2.5 using satellite measurements of AOD corrected by meteorological elements should be used with caution. Our findings provide potential meteorological elements that might improve the retrieval of surface PM2.5 from satellite-observed AOD on the seasonal scale.

**Keywords:** meteorology; PM2.5; AOD

#### **1. Introduction**

Fine particle (PM2.5) lowers visibility [1,2], affects human health [3], modifies cloud properties [4], and exerts radiative forcing on the Earth's surface and at the top of the atmosphere [5]. Accurate estimates of high spatio-temporal resolution of surface PM2.5 concentrations is critical for assessing its effects. However, national-wide in situ measurements of surface PM2.5 in China were unavailable until 2013. To study the driving forces of long-term variations of surface PM2.5, many studies use long-term measurements of fog–haze days [6], visibilities [7], and conducive weather conditions [8] as approximations. Specifically, Niu et al. showed that in the past three decades, the doubled frequencies of fog events in wintertime over eastern-central China was strongly related to the weakening

**Citation:** Qi, L.; Zheng, H.; Ding, D.; Ye, D.; Wang, S. Effects of Meteorology Changes on Inter-Annual Variations of Aerosol Optical Depth and Surface PM2.5 in China—Implications for PM2.5 Remote Sensing. *Remote Sens.* **2022**, *14*, 2762. https://doi.org/10.3390/ rs14122762

Academic Editors: Maria João Costa and Daniele Bortoli

Received: 28 April 2022 Accepted: 7 June 2022 Published: 8 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

of the East Asian winter monsoon (EAWM), which showed decreased surface wind speed and number of cold air outbreaks and increased relative humidity and frequency of light wind events [6]. Li et al. found that the number of wintertime fog–haze days correlates with the inter-annual variations of the winter monsoon index, with a correlation coefficient of −0.41 [9]. Shi et al. also found that wind speed change in the lower troposphere explains 81.6% of the visibility variance between 1960 and 2014 in Eastern China [7]. Lower wind speed decreased dust emissions, and this decrease moderated the wintertime land–sea surface air temperature difference and further decreased wind speed [10]. However, these meteorological approximations only partly reflect the aerosol variations, and their in situ observations are sparse.

Many studies use satellite observations of aerosol optical depth (AOD, [11]) to retrieve surface PM2.5. AOD are observed with large spatio-temporal coverage by remote sensing instruments on board satellites. For example, the total Ozone Mapping Spectroradiometer [12], the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP, [13]), and the moderate resolution imaging spectrora-diometer [14]. A number of studies [15–17] have developed different methods, such as statistical relations [18] and machine learning [15], to derive surface PM2.5 with high spatiotemporal resolution in China to study the long-term exposure of population to surface PM2.5. However, studies showed that AOD–surface PM2.5 relations show large spatial and temporal variations (e.g., *r* = 0.17–0.57, varying within regions in China [19]). Meteorological elements, such as cloud cover, wind speed, boundary layer height, and relative humidity, are used to correct the AOD–surface PM2.5 relation for better prediction of surface PM2.5 ([11,20–22]). However, to the knowledge of the authors, no study has systematically quantified the different responses of AOD and surface PM2.5 to changes in meteorological elements to better understand the AOD–surface PM2.5 relations.

In this study, we systematically quantify the contributions of meteorology changes to the inter-annual variations of AOD and surface PM2.5 in different seasons and regions in China between 2006 and 2017 by fixing emissions at the 2006 level using a nested global 3D chemical transport model, GEOS-Chem. We study the relationship of AOD and surface PM2.5 in different regions and seasons and their relationship with mesoscale weather patterns. We further identify the major meteorological elements that control the inter-annual variations of AOD and surface PM2.5 in key regions during different seasons using multiple linear regressions.

#### **2. Materials and Methods**

#### *2.1. Model Description*

We used the nested 3D chemical transport model, GEOS-Chem version 11.01, over Asia to simulate surface PM2.5 and AOD. The nested model has a horizontal resolution of 0.5◦ latitude × 0.667◦ longitude, with boundary conditions archived from global simulations at 2◦ latitude × 2.5◦ longitude. We simulated AOD in Asia between 2006 and 2017 [23] with a model spin-up of one month. The model was driven with Modern-Era Retrospective analysis for Research and Application, Version 2 (MERRA-2) meteorological fields. We ran GEOS-Chem with full gaseous chemistry and online aerosol calculations, including sulphate-nitrate-ammonium particles [24], black carbon (BC, [25,26]), primary [27] and secondary organic carbon (OC, [28]), natural dusts [29–31], and sea salts [32]. The model coupled aerosol and gas-phase chemistry through nitrate and ammonium partitioning, sulphur chemistry, secondary organic aerosol formation, and uptake of acidic gases by sea salt and dust [24]. We used monthly Asian anthropogenic emissions of SO2, NOx, BC, OC, NMVOCs, and NH3 from [33]. We developed anthropogenic emission inventories of these gases and aerosols from industrial, transport, residential, and agricultural sectors in China between 2006 and 2017 using a bottom-up approach [34]. We used daily open fire emissions from the Global Fire Emissions Database, Version 4. Dust emissions followed Fairlie et al. [29] with an improved dust size distribution scheme from [31]. Dry deposition of aerosols followed a resistance-in-series method [35] with updates of dry deposition

velocity [36]. Wet removal of aerosols in convective updrafts and large-scale precipitation followed Liu et al. [37], with updates of below-cloud and in-cloud scavenging in ice clouds [38,39] and in-cloud scavenging in mixed-phase clouds [40].

Assuming external mixing, AOD at wavelength λ in each layer was estimated as the sum of AODs of each component *i*

$$\pi\_{\lambda} = \sum\_{i=1}^{n} \frac{3}{4} \frac{m\_i Q\_{\lambda,i}}{\rho\_i r\_{eff,i}} = \sum\_{i=1}^{n} m\_i \beta\_{\lambda,i}$$

where *τλ* is AOD at wavelength *λ*, *n* is the number of aerosol components, *mi* is aerosol mass concentration of component *i*, *Qλ*,*<sup>i</sup>* is extinction efficiency factor at wavelength λ calculated with Mie theory, *ρ<sup>i</sup>* is aerosol mass density, *reff*,*<sup>i</sup>* is particle effective radius. We accounted for the hygroscopicity growth of aerosol particles, as all parameters in the above equation are functions of relative humidity for hydrophilic aerosol components. We used the updated aerosol size distribution and refractive index [41] to calculate *Qλ,i* and *reff*,*<sup>i</sup>* in a Mie code. Uncertainties of the model-simulated AOD stemmed from aerosol vertical profiles and assumptions on aerosol physical and chemical properties, including mixing state, density, refractive index, and hygroscopic growth [42]. Among the aerosol physical and chemical properties, the mixing state is the most important factor, causing an uncertainty of between 30 and 35% on the simulated AOD [42]. Uncertainties caused by other properties were less than 10% [42]. We used in situ station radiometer AOD measurements, AOD measurements from a moderate resolution imaging spectrora-diometer, and surface in situ measurements of PM2.5 to validate model simulations (see details in Supplementary Materials S1 and S2).

We conducted two experiments to quantify the contributions of meteorological changes in AOD and surface PM2.5 in China. In the BASE simulation, we used varying meteorology and emissions from 2006 to 2017. The observed and simulated inter-annual variations and trends of annual mean AOD and surface PM2.5 are discussed in Supplementary Materials S3. To investigate the contribution of meteorology, we kept emissions at the level of 2006 and varying meteorology in simulation FIXEMISS. The inter-annual variations of AOD and surface PM2.5 in this experiment reflect the effects of varying meteorology. Comparing trends of AOD and surface PM2.5 simulated by the two experiments shows the relative contribution of meteorology changes in these variations.

#### *2.2. Multiple Linear Regression*

We analysed results in three hot spot key regions, including the North China Plain (NCP: 35–41◦N, 110–120◦E), the Yangtze River Delta (YRD: 27–35◦N, 116–122◦E), and the Pearl River Delta (PRD: 22–25◦N, 110–117◦E) as shown in Figure S1. We built multiple linear regression models for all the target regions during the four seasons to investigate the contribution of meteorological elements to variations of AOD and surface PM2.5. The candidate variables included temperature (T), zonal wind speed (U), meridional wind speed (V), vertical air movement (O), relative humidity (RH), potential vorticity (PV) at surface, 850 hPa and 500 hPa, pressure at surface (PS), sea level pressure (SLP), tropopause pressure (TROPPT), planetary boundary layer height (PBLH), and precipitation (PREC). The list of abbreviations is shown in Table S2. To avoid redundancy, we use adjusted (adj) the R2 criterion to determine the best subset of regressors.

$$\text{adj } \mathbb{R}^2 = 1 - \frac{(1 - \mathbb{R}^2)(n - 1)}{n - k - 1} \tag{1}$$

where *k* is the number of model parameters, *n* is the number of pairs of data in the data set, R2 is the coefficient of determination because the inclusion of the penalty term *<sup>n</sup>* − *<sup>k</sup>* − 1, adj R<sup>2</sup> decreases when redundant variables are included. We used a stepwise procedure to select variables. Subsets with the smallest number of variables and largest adj R2 are regarded as the best predictors. GEOS-Chem-simulated and the best multiple linear regression model-fitted AOD and surface PM2.5 are shown in Figures 1 and 2.

**Figure 1.** GEOS-Chem-simulated (solid line) and multiple linear regression-predicted (dashed line) monthly mean AOD in NCP, YRD, and PRD.

**Figure 2.** GEOS-Chem-simulated (solid line) and multiple linear regression-predicted (dashed line) monthly mean PM2.5 in NCP, YRD, and PRD.

#### **3. Results**

*3.1. Meteorology Changes Have Larger Influences on Inter-Annual Variations of AOD than Those of Surface PM2.5*

GEOS-Chem shows that the influence of meteorology changes on AOD trends between 2006 and 2017 is relatively larger than their influences on surface PM2.5 (Table 1). Contributions from meteorology changes to downward trends of annual mean AOD are 33% in PRD, larger than that of surface PM2.5 (23%). In YRD, meteorology changes reduce AOD by 10%, but are counter-productive with regard to surface PM2.5 reduction. Meteorology changes show negligible effects on trends of AOD and surface PM2.5 in NCP. In spring, meteorology changes between 2006 and 2017 are favorable to AOD and surface PM2.5 reduction. GEOS-Chem attributes 36%, 22%, and 50% of AOD reduction in NCP, YRD, and PRD, respectively, to meteorology changes during the 12 years. However, for surface PM2.5 reduction, the contributions of meteorology changes are much smaller: 1% in NCP, 12% in YRD, and 40% in PRD. In summer, meteorology changes show adverse effects on aerosol reduction during the 12 years. AOD enhancements due to meteorology change offset 42%, 25%, and 33% of the reduction caused by anthropogenic emission control in NCP, YRD, and PRD, respectively. Influences of meteorology changes on surface PM2.5 are much smaller in NCP (5%) and PRD (10%). In fall, the model suggests a significant adverse effect of meteorology on aerosols in NCP between 2006 and 2017 (+0.011 yr−<sup>1</sup> and +0.26 μg m−<sup>3</sup> yr−1). These enhancements offset the reduction caused by anthropogenic emission control by 85% for AOD and by 11% for surface PM2.5, respectively. Meteorology changes contribute 25% and 40% to the total reduction in AOD in YRD and PRD (Table 1). Their contributions to surface PM2.5 reductions are much lower, with 8% in NCP and 35% in YRD.

**Table 1.** GEOS-Chem-simulated inter-annual trends of annual and seasonal mean AOD (yr−1, % yr−<sup>1</sup> in parentheses) and surface PM2.5 (μg m−<sup>3</sup> yr−<sup>1</sup> and % yr−<sup>1</sup> in parentheses) in NCP, YRD, and PRD between 2006 and 2017.

