Bayesian Aerosol Retrieval-Based PM2.5 Estimation through Hierarchical Gaussian Process Models

Zhang, Junbo; Li, Daoji; Xia, Yingzhi; Liao, Qifeng

doi:10.3390/math10162878

Open AccessArticle

Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models

by

Junbo Zhang

¹,

Daoji Li

²,

Yingzhi Xia

¹ and

Qifeng Liao

^1,*

¹

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

²

Department of Information Systems and Decision Sciences, California State University, Fullerton, CA 92831, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(16), 2878; https://doi.org/10.3390/math10162878

Submission received: 4 July 2022 / Revised: 9 August 2022 / Accepted: 10 August 2022 / Published: 11 August 2022

(This article belongs to the Special Issue Fixed Point Theory and Its Applications in Nonlinear Analysis and Optimization)

Download

Browse Figures

Versions Notes

Abstract

:

Satellite-based aerosol optical depth (AOD) data are widely used to estimate land surface

{PM}_{2.5}

concentrations in areas not covered by ground

{PM}_{2.5}

monitoring stations. However, AOD data obtained from satellites are typically at coarse spatial resolutions, limiting their applications on small or medium scales. In this paper, we propose a new two-step approach to estimate 1-km-resolution

{PM}_{2.5}

concentrations in Shanghai using high spatial resolution AOD retrievals from MODIS. In the first step, AOD data are refined to a

1 \times 1 {km}^{2}

resolution via a Bayesian AOD retrieval method. In the second step, a hierarchical Gaussian process model is used to estimate

{PM}_{2.5}

concentrations. We evaluate our approach by model fitting and out-of-sample cross-validation. Our results show that the proposed approach enjoys accurate predictive performance in estimating

{PM}_{2.5}

concentrations.

Keywords:

Bayesian retrieval algorithm; PM_2.5; hierarchical Gaussian process model; MAIAC

MSC:

62M20; 62J05; 62F15

1. Introduction

Particulate matter with an aerodynamic diameter of less than 2.5 μm (

{PM}_{2.5}

) can affect human health (see [1] for detailed discussions). Exposure to

{PM}_{2.5}

over a few hours to weeks can trigger cardiovascular disease-related mortality; longer-term exposure (e.g., a few years) increases the risk for cardiovascular mortality and ischaemic heart disease and even reduces life expectancy [2,3,4,5]. In recent years, the public has regarded

{PM}_{2.5}

as one of the primary air pollutants, especially after the Chinese government proposed the National Ambient Air Quality Standard for

{PM}_{2.5}

and began to establish ground-based

{PM}_{2.5}

monitor networks in 2012 [6]. Although the ground-based

{PM}_{2.5}

monitor networks can accurately measure the land surface

{PM}_{2.5}

concentrations, the number of monitoring sites is limited.

To assess

{PM}_{2.5}

concentrations in large areas, satellite-based products such as the aerosol optical depth (AOD) are commonly used due to their broad coverage. Various statistical models have been developed to predict

{PM}_{2.5}

concentrations using AOD data in the literature, including the geographically weighted regression model and the linear mixed-effects model [7,8]. In addition, based on a moderate resolution imaging spectroradiometer (MODIS), Gaussian process modeling in a Bayesian hierarchical setting has been used to estimate

{PM}_{2.5}

concentrations [9]. However, AOD data obtained from satellites are typically at coarse spatial resolutions, limiting their applications on small or medium scales. To address this issue, many methods have been introduced to refine AOD data. For example, Lipponen et al. [10] proposed a Bayesian aerosol retrieval algorithm for MODIS AOD retrieval over land. Wang et al. [11] suggested a hierarchical Bayesian approach for multi-angle imaging spectroradiometer (MISR) data such that the resolution of AOD can be improved from 17.6 to 4.4 km. Based on the multi-angle implementation of the atmospheric correction (MAIAC) retrieval algorithm in [12], Wei et al. [13] developed a new space-time random forest model to predict

{PM}_{2.5}

concentrations using MODIS MAIAC AOD and other pertinent variables related to meteorological conditions, land use, and human activities.

Although the space-time random forest model has good performance, one main limitation of the random forest model is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions. There is a tradeoff between the training time (and space) and increased number of trees. In addition, random forest may not obtain good results for small data. When estimating

{PM}_{2.5}

1-km-resolution, it can be expected that the results from the random forest model are not good if only a few observations are available at some location. To address these challenges, in this paper, we propose a new two-step approach to predict

{PM}_{2.5}

concentrations using high spatial resolution AOD retrievals from MODIS. In the first step, AOD data are refined to a

1 \times 1 {km}^{2}

resolution via a Bayesian AOD retrieval method. In the second step, a hierarchical Gaussian process model is used to estimate

{PM}_{2.5}

concentrations. Our approach combines the strengths of recent advances in [9,10]. In addition, the studies on

{PM}_{2.5}

concentration estimates in Shanghai are limited in the literature, although Shanghai is one of the largest cities in China and a popular tourist destination. Our work tries to fill this gap.

The rest of the paper is organized as follows. Section 2 describes the study region and the datasets used in this work. Section 3 introduces the Bayesian algorithm to retrieve AOD and the Hierarchical Gaussian process model to estimate

{PM}_{2.5}

concentrations. Section 4 provides the experimental results for our approach. We discuss some limitations and extensions of our work in Section 5.

2. Datasets

In this study, we are interested in estimating

{PM}_{2.5}

concentrations in Shanghai. Shanghai is one of the largest cities in China, with intensive human activities and high amounts of complex aerosols in the air. However, the studies on

{PM}_{2.5}

concentration estimates in Shanghai are limited in the literature. We would like to remark that the proposed procedure in this paper can be used to estimate

{PM}_{2.5}

concentrations in any other city.

The

{PM}_{2.5}

concentration monitoring network in Shanghai consists of 20 monitoring stations across all districts of Shanghai, and these are shown as dots in Figure 1. The solid lines are the city border and district borders, while the green filling is the mainland of Shanghai. We consider the hourly ground-level

{PM}_{2.5}

concentrations in Shanghai from 1 January 2021 to 31 December 2021, which are provided by the China Environmental Monitoring Center (http://www.cnemc.cn/, accessed 1 February 2022).

The MODIS instruments on the Earth Observation System’s (EOS) Terra and Aqua satellites provide measurements of the atmosphere, land, and ocean in the visible, near-infrared, and infrared from 0.4 to 14.4 μm of the 36 spectral bands. The instrument collects the data once or twice a day at a given location and views the entire globe every one to two days. The MODIS product

M O D 04_L 2

provides initial AOD data in Shanghai with a spatial resolution of about

1 km \times 1 km

at nadir and other variables needed in the Bayesian retrieval algorithm of our procedure, while

M C D 19 A 2

offers MAIAC AOD products. Both can be downloaded from Level-1 and the Atmosphere Archive and Distribution System Distributed Active Archive Center (https://ladsweb.modaps.eosdis.nasa.gov/, accessed 1 February 2022).

3. Methods

Now we are ready to introduce our procedure for

{PM}_{2.5}

concentrations estimation. As mentioned earlier, our procedure includes two steps. In the first step, we use the Bayesian aerosol retrieval algorithm [10] to refine the AOD data, while in the second step, the proposed hierarchical Gaussian process model estimates land surface

{PM}_{2.5}

concentrations using the refined AOD data from the first step.

3.1. Bayesian Aerosol Retrieval Algorithm

The Bayesian aerosol retrieval algorithm was firstly proposed by Lipponen et al. [10] for the retrieval of AOD over land in Finland using MODIS aerosol products, including the retrieved aerosol properties and measurement data with a spatial resolution of about

10 \times 10 {km}^{2}

at nadir. In the first step of our procedure, we use the Bayesian aerosol retrieval algorithm proposed to refine the AOD data in Shanghai with a spatial resolution of about

1 \times 1 {km}^{2}

at nadir. The Bayesian aerosol retrieval algorithm is composed of three main parts. The first part is building a radiative transfer model called the forward model. The second part is the AOD retrieval based on the forward model, which is called the inversion part. The last part is combining the retrieved AOD with MAIAC AOD products.

3.1.1. Establishing Forward Models

We first construct a radiative transfer model using MODIS products to simulate top of Atmosphere (TOA) reflectance. Then we use the simulated TOA reflectance and the TOA reflectance measured by MODIS to do the retrieval. The AOD values are the outputs of the retrieval. The radiative transfer model we used here is the same as the Dark Target (DT) algorithm [14], except that we perform the retrieval using a Bayesian algorithm. There are two versions for the DT algorithm. One is for retrieval over land, and the other one is for retrieval over the ocean. We will use the DT algorithm over land. The DT algorithm uses bright aerosols against a dark target because it can reflect so much solar radiation to the space that the MODIS sensor can capture it. The main idea of this DT algorithm is to find the aerosol properties that minimize the difference between the TOA reflectance measured by the MODIS sensor and the TOA reflectance calculated by radiative transfer simulations using the aerosol properties. We reformulate it as a Bayesian form, and more details will be given later.

In this study, the TOA reflectance is calculated as follows:

\begin{matrix} f (τ, η, ρ^{s}; γ) = η \times f_{f i n e} (τ, ρ^{s}; γ) + (1 - η) \times f_{c o a r s e} (τ, ρ^{s}; γ), \end{matrix}

(1)

where

f (τ, η, ρ^{s}; γ)

denotes the total simulated reflectance,

f_{f i n e}

represents the simulated reflectance corresponding to fine aerosol models,

f_{c o a r s e}

is the simulated reflectance corresponding to coarse aerosol models,

τ

denotes AOD,

η

is the fine mode fraction (FMF),

ρ^{s}

denotes the surface reflectance, and

γ

denotes the vector of all additional parameters corresponding to auxiliary variables, such as wavelength, aerosol models, solar zenith angle, view zenith angle, and relative azimuth angle in Table 1. There are five main aerosol models related to this work, including three different fine aerosol models, one coarse aerosol model, and one continental aerosol model. In the DT retrieval, the fine aerosol model is taken from a predefined database that contains aerosol model information based on the location and the season. The TOA reflectances and other radiative transfer-related variables corresponding to each aerosol model are precomputed and stored in lookup tables (LUT) to make the algorithm computationally more efficient. We use the same LUT as the Collection 6 MODIS aerosol products in [15]. The variables in the LUT and corresponding input parameters are listed in Table 1.

We use the same aerosol models and data processing procedures (such as cloud screening) as those in the DT algorithm [14]. Therefore, we retrieve the same pixels as the DT algorithm. For each pixel, the forward model at different wavelength

λ

is as follows:

\begin{matrix} f_{λ} (τ, ρ_{λ}^{s}; γ) = ρ_{λ}^{a} (τ; θ_{0}, θ, ϕ) + \frac{T_{λ} (τ; θ_{0}) T_{λ} (τ; θ) ρ_{λ}^{s} (τ; θ_{0}, θ, ϕ)}{1 - s_{λ} (τ) ρ_{λ}^{s} (τ; θ_{0}, θ, ϕ)}, \end{matrix}

(2)

where

ρ_{λ}^{a}

denotes the atmospheric path reflectance depending on AOD

τ

, wavelength

λ

, solar zenith angle

θ_{0}

, view zenith angle

θ

, and relative azimuth angle

ϕ

,

T_{λ} (τ; θ_{0})

and

T_{λ} (τ; θ)

are the downward and upward atmospheric transmissions at wavelength

λ

, respectively,

s = s_{λ}

is the atmospheric backscattering ratio at wavelength

λ

, and

ρ_{λ}^{s}

denotes the surface reflectance at wavelength

λ

. The simulated reflectances

f_{f i n e}

and

f_{c o a r s e}

in (1) are calculated separately using forward model (2) based on the LUT.

3.1.2. Retrieve AOD Using Bayesian Methods

The inversion part of the Bayesian aerosol retrieval algorithm can be formulated as an optimization problem. The retrieval problem is to find the parameters that maximize the following posterior distribution

\begin{matrix} π (τ, η, ρ^{s}; γ | ρ^{T O A}), \end{matrix}

(3)

where

ρ^{T O A}

denotes TOA reflectances measured by the MODIS instrument. We apply the Bayes theorem to the posterior distribution, and the posterior becomes

\begin{matrix} π (τ, η, ρ^{s}; γ | ρ^{T O A}) \propto π (ρ^{T O A} | τ, η, ρ^{s}; γ) π (τ, η, ρ^{s}) . \end{matrix}

(4)

The function that describes the relationship between TOA reflectance measured by the MODIS instrument and TOA reflectance simulated by radiative transfer models is

\begin{matrix} ρ^{T O A} = f (τ, η, ρ^{s}; γ) + e, \end{matrix}

(5)

where

f (τ, η, ρ^{s}; γ)

is the TOA reflectance simulated by radiative transfer models in (1), e denotes the total error, including the observation noise and the approximation error in TOA reflectances due to aerosol and radiative transfer models. In our study, we assume that the random error e follows a Gaussian distribution, that is,

e \sim N (E_{e}, Γ_{e})

. In this retrieval algorithm, errors mainly come from two sources, which are observation noise n and approximation error u. The observation noise is caused by the instrument, while the approximation error is caused by the variables such as geometry and aerosol models used in the radiative transfer model. In this study, the observation noise is modeled as Gaussian zero-mean random variable, and its variances are based on MODIS aerosol product

S T D_R e f l e c t a n c e_L a n d

. The approximation error is modeled as an additive Gaussian random variable. We use the approximation error established in [10], which runs N simulations using AOD, FMF values from the AERONET, and surface reflectance values from the MODIS MCD43C3 product to construct a database of

(ρ^{M O D I S} - ρ^{s i m u l a t i o n})

for all regions (AERONET stations all over the world) and month and estimates the expected value and covariance matrix using sample median and sample covariance of the approximation error model, respectively. Then, we can obtain

\begin{matrix} π (ρ^{T O A} | τ, η, ρ^{s}; γ) = π_{e} (ρ^{T O A} - f (τ, η, ρ^{s}; γ)) . \end{matrix}

(6)

In this study, we retrieve total AOD

τ

at 0.55 μm, fine mode fraction η, and land surface reflectance

ρ^{s}

at four different wavelengths: 0.47, 0.55, 0.67, and 2.1 μm. We model AOD, FMF, and surface reflectances as mutually uncorrelated variables at all bands. We depict all unknown parameters in a granule using multivariate Gaussian prior models. The prior probability density models encode prior knowledge such as spatial correlation information, seasonal variability, or positivity constraints into the retrieval. The prior models are fully described by their expected value vectors and covariance matrices:

$τ \sim N (E_{τ}, Γ_{τ})$ , where $E_{τ}$ is the expected value vector of AOD, and $Γ_{τ}$ is the covariance matrix of AOD. Following [16], the nearest value from the MAC-V2 climatology is taken as the prior expectation for each pixel to be retrieved. The MAC-V2 climatology is a $12 \times 180 \times 360$ tensor (denotes month, latitude, and longitude), and we use the nearest value to the pixel we retrieve as the prior expectation. We define the $(i, j)$ element of the prior covariance matrix $Γ_{τ}$ of AOD as

$\begin{matrix} Γ_{τ} (i, j) = σ_{τ, n u g g e t}^{2} δ_{i, j} + σ_{τ, s i l l}^{2} e x p \{- 3 {∥\frac{x_{i} - x_{j}}{r_{τ, r a n g e}}∥}^{p}\}, \end{matrix}$

(7)

where $δ_{i, j} = 1$ if $i = j$ and 0 otherwise, $σ_{τ, n u g g e t}$ is the so-called nugget, representing the local variance, $σ_{τ, s i l l}$ is the spatial correlation, $∥ x_{i} - x_{j} ∥$ measures the distance between the pixel i and j, $r_{τ, r a n g e}$ describes the spatial correlation length, and p is the spatial smoothness of the AOD fields. The values of these covariance matrix parameters used in (7) are listed in Table 2.
$η \sim N (E_{η}, Γ_{η})$ , where $E_{η}$ is the expected value vector of FMF, and $Γ_{η}$ is the covariance matrix of FMF. Similar to the AOD, the prior expectation value for FMF is also computed from the MAC-V2 climatology. The prior covariance matrix $Γ_{η}$ of FMF is the same as the covariance matrix $Γ_{τ}$ of AOD, except for the values of those covariance matrix parameters. See Table 2 for the values of covariance matrix parameters used in the prior model for the FMF.
$ρ^{s} \sim N (E_{ρ^{s}}, Γ_{ρ^{s}})$ , where $E_{ρ^{s}}$ and $Γ_{ρ^{s}}$ are the expected value vector and covariance matrix of land surface reflectance, respectively. We also use Gaussian prior models for the surface reflectances. We use the blue-sky albedos computed with the weighting coefficient of 0.5 ( $50 %$ of the white-sky albedo and $50 %$ of the black-sky albedo) in the MODIS MCD43C3 albedo product as the expected values for the surface reflectances. For the Bayesian aerosol retrieval algorithm, the monthly surface reflectance is computed as the temporal average of surface reflectances ±45 days around the middle day of the month. The expected values for the surface reflectances in the retrieval are computed as an average of the three closest pixels in the monthly surface reflectance. Both the temporal variance in the original surface albedo product and the variance due to averaging are taken into account in the construction of the surface reflectance variance.

Then, with the prior models above, we can rewrite (4) as

\begin{matrix} π (τ, η, ρ^{s}; γ | ρ^{T O A}) \\ \propto & π_{e} (ρ^{T O A} - f (τ, η, ρ^{s}; γ)) π (τ) π (η) π (ρ^{s}) \\ \propto & exp \{- \frac{1}{2} {(ρ^{T O A} - f (τ, η, ρ^{s}; γ) - E_{e})}^{T} Γ_{e}^{- 1} (ρ^{T O A} - f (τ, η, ρ^{s}; γ) - E_{e}) \\ - \frac{1}{2} {(τ - E_{τ})}^{T} Γ_{τ}^{- 1} (τ - E_{τ}) - \frac{1}{2} {(η - E_{η})}^{T} Γ_{η}^{- 1} (η - E_{η}) \\ - \frac{1}{2} {(ρ^{s} - E_{ρ^{s}})}^{T} Γ_{ρ^{s}}^{- 1} (ρ^{s} - E_{ρ^{s}})\} . \end{matrix}

We simultaneously retrieve all dark land pixels in a granule. The inversion part of the Bayesian aerosol retrieval algorithm looks for the maximum a posteriori (MAP) estimate for the unknown parameters. This is equivalent to the following optimization problem

\begin{matrix} {(τ, η, ρ^{s})}_{MAP} & = \underset{τ, η, ρ^{s}}{arg min} \{{∥L_{e} [ρ^{T O A} - f (τ, η, ρ^{s}; γ) - E_{e}]∥}^{2} + {∥ L_{τ} (τ - E_{τ}) ∥}^{2} \\ + ∥ L_{η} (η - E_{η}) ∥^{2} + {∥ L_{ρ^{s}} (ρ^{s} - E_{ρ^{s}}) ∥}^{2}\}, \end{matrix}

(8)

where

L_{τ}

,

L_{η}

, and

L_{ρ^{s}}

are the Cholesky factors of

Γ_{τ}^{- 1}

,

Γ_{η}^{- 1}

, and

Γ_{ρ^{s}}^{- 1}

, respectively. We use AOD, FMF, and the surface reflectances at all bands in MODIS products MMOD04_L2 as the initial values for

τ

,

η

, and

ρ^{s}

, and use the L-BFGS-B optimization algorithm in [17] to find

{(τ, η, ρ^{s})}_{MAP}

.

3.1.3. Combine AOD Retrievals with MAIAC AOD Products

We create a

0 . 01^{\circ} \times 0 . 01^{\circ}

grid with a total of

118 \times 137

grid cells that cover the whole region of Shanghai. If there is an AOD retrieval in a grid cell, we fill it with the AOD retrieval. After that, we fill the empty grid cells with MAIAC AOD products. The final grid cells are the products to be used in our hierarchical Gaussian process model for estimating

{PM}_{2.5}

concentrations in Shanghai.

3.2. Hierarchical Gaussian Process Model

Next, with the refined data obtained in the first step by the Bayesian aerosol retrieval algorithm above, we use the hierarchical Gaussian process model introduced in [9] to estimate

{PM}_{2.5}

concentrations in Shanghai. To be more specific, the hierarchical Gaussian process model is given by

\begin{matrix} \begin{matrix} p_{i} = β_{0} + β_{1} τ_{i} + ω_{i} + ϵ_{i}, \\ ω_{i} & \sim N (0, K (h; σ^{2}, ϕ)), ϵ_{i} \sim N (0, σ_{ϵ}^{2}), \end{matrix} \end{matrix}

(9)

where

p_{i}

and

τ_{i}

denote the

{PM}_{2.5}

and AOD at location i, respectively,

β_{0}

and

β_{1}

are, respectively, the intercept and the coefficient for AOD,

ω_{i}

is a spatial random effect for the location i,

ϵ_{i}

is the random error,

K

is the covariance function,

σ^{2}

is a variance parameter, h is the Euclidean distance between any two spatial locations, and

ϕ

represents the spatial decay. Here, we assume that each random effect

ω_{i}

follows a multivariate Gaussian process with a mean zero and covariance function of

K (h; σ^{2}, ϕ)

, and the random error

δ_{i}

also follows a Gaussian distribution with mean zero and variance

σ_{ϵ}^{2}

. Following [9], the covariance function

K (h; σ^{2}, ϕ)

is given by

\begin{matrix} K (h; σ^{2}, ϕ) = \{\begin{matrix} σ^{2} [1 - 1.5 h ϕ + 0.5 {(h ϕ)}^{3}] & i f 0 < h < ϕ^{- 1}, \\ 0 & o t h e r w i s e . \end{matrix} \end{matrix}

(10)

We also set prior distributions for parameters

β_{0}

,

β_{1}

,

σ^{2}

,

ϕ

, and

σ_{ϵ}^{2}

. To be more precise, following [9,18], we use the following prior distributions:

$(\begin{matrix} β_{0} \\ β_{1} \end{matrix}) \sim N (0, [\begin{matrix} 1000 & 0 \\ 0 & 1000 \end{matrix}])$
$σ^{2}$ follows the inverse gamma distribution with shape parameter 2 and scale parameter 2.
$ϕ$ follows the uniform distribution $U [3, 100]$ .
$σ_{ϵ}^{2}$ follows the inverse gamma distribution with shape parameter 2 and scale parameter 0.1.

These parameters are updated by the Metropolis–Hastings algorithm with 5000 iterations for each parameter. Similar to [19,20,21], we use the first 2000 iterations as burn in and the last 3000 iterations to recover the

{PM}_{2.5}

concentrations at location i. Let

β_{0 j}

,

β_{1 j}

,

σ_{j}^{2}

,

ϕ_{j}

, and

σ_{j ϵ}^{2}

be the values of these parameters

β_{0}

,

β_{1}

,

σ^{2}

,

ϕ

, and

σ_{ϵ}^{2}

from the prior distributions above for the jth iteration,

j = 2001, \dots, 5000

. Then we can obtain the random effect

ω_{i j}

from

N (0, K (h; σ_{j}^{2}, ϕ_{j}))

and then generate

p_{i j}

from

N (β_{0 j} + β_{1 j} τ_{i} + ω_{i j}, σ_{j ϵ}^{2})

. The sample mean of these 3000

p_{i j}

is defined as the

{PM}_{2.5}

concentrations at location i, which is the sample mean of all 3000

p_{i j}

at location i.

We evaluate the performance of our approach by out-of-sample validation using the cross-validation (CV). In this study, we use 5-fold CV on the day when there are more than five working

{PM}_{2.5}

concentration monitoring stations. The training data were randomly split into five equal subsets based on the

{PM}_{2.5}

concentration monitoring stations in 5-fold CV. Four subsets were used for the model fitting, and the remaining one was used for model validation. This process was repeated five times until all the subsets were tested. Three commonly used statistical performance metrics, including the coefficient of determination (

R^{2}

), the mean absolute error (MAE), and the root-mean-square error (RMSE), are used to evaluate the model performance quantitatively. MAE represents the difference between model-estimated

{PM}_{2.5}

concentrations and actual

{PM}_{2.5}

concentrations measured at monitoring stations, while RMSE is the square root of the mean of the squares of the differences between model-estimated

{PM}_{2.5}

concentrations and actual

{PM}_{2.5}

concentrations measured at monitoring stations. Finally, we apply the hierarchical Gaussian process model to estimate 1-km-resolution

{PM}_{2.5}

concentrations. We use Python and R in this study.

4. Results and Discussion

4.1. Evaluation of the Model Performance

In this study, we improve the accuracy and coverage of the traditional MODIS AOD products using the Bayesian aerosol retrieval algorithm. Compared with existing results in [10,12], we replace the

10 \times 10 {km}^{2}

grid cells with

1 \times 1 {km}^{2}

grid cells and improve the accuracy by about 10–20%. Figure 2 shows the traditional MODIS AOD grid cells and AOD grid cells established by our approach in Shanghai. Note that we use different color scales for the two colorbars in this figure for better visibility. It can be seen from Figure 2 that the traditional MODIS grid cells are too coarse, and thus, many identical AOD values are used to estimate

{PM}_{2.5}

concentrations at different geolocations. This can decrease the estimation accuracy of

{PM}_{2.5}

concentrations. However, the grid cells in our approach are fine enough to capture the variation of AOD values at different geolocations.

Figure 3 plots estimated

{PM}_{2.5}

values in Shanghai obtained by our approach vs. actual

{PM}_{2.5}

values for the whole year in Figure 3. It shows that our approach can estimate

{PM}_{2.5}

concentrations in Shanghai very well. Three commonly used statistical performance metrics, MAE,

R^{2}

, and RMSE, are also reported in the same figure. The value of

R^{2}

is about

0.9536

. The mean absolute error (MAE) is 3.6042079 μg/m

^{3}

and the root-mean-square error (RMSE) is 2.47042 μg/m

^{3}

. A large value of

R^{2}

and small values of the RMSE and MAE indicate the predictive ability of our approach.

To consider the seasonal variations in the

{PM}_{2.5}

concentrations, we also plot estimated

{PM}_{2.5}

values in Shanghai obtained by our approach vs. actual

{PM}_{2.5}

values for different seasons in Figure 4. It is clear from Figure 4 that our approach performs very well, especially for spring, autumn, and winter seasons. The performance of our approach for the summer season is not as good as that for other seasons. One of the possible reasons is that the cloud condition during the summer season in Shanghai hinders the accuracy of our Bayesian AOD retrieval and thus affects the estimation accuracy of

{PM}_{2.5}

concentrations in the summer season.

Table 3 records the average, minimum and maximum of estimated

{PM}_{2.5}

concentrations. It is clear that the

{PM}_{2.5}

concentrations in Shanghai during the winter and spring seasons are high, while

{PM}_{2.5}

concentrations during the summer and autumn seasons are relatively low. This finding is not surprising. One of the most important reasons is that there is more coal burning for heat in winter and early spring.

4.2. Spatial and Seasonal Variations of ${PM}_{2.5}$ Concentrations in Shanghai

To further explore the spatial and seasonal variations of the estimated

{PM}_{2.5}

concentrations in Shanghai by our approach, we show the annual and seasonal distributions of

{PM}_{2.5}

concentrations in Shanghai estimated by our approach in Figure 5 and Figure 6. It can be seen that the

{PM}_{2.5}

concentrations in the central region of Shanghai are higher than those in surrounding suburban areas, especially during the summer season. This is not surprising because Shanghai is one of the largest cities in China and a global financial hub, and there are more human activities in the central region of Shanghai than the surrounding suburban areas. This causes more air pollutants such as vehicle exhaust, which accounts for the high

{PM}_{2.5}

concentrations. This finding is also consistent with the fact that there are more visitors from other places to Shanghai during the summer season beause Shanghai is a popular tourist destination renowned for its historical landmarks and a great mix of the traditional and modern.

We also find that the estimated

{PM}_{2.5}

concentrations on the southwest side of Chongming island (which is the island in the upper right corner in Figure 5) are higher than in other regions of Shanghai. As we can see from Figure 1, there is only one

{PM}_{2.5}

concentration monitoring station in Chongming island. The number of observations of

{PM}_{2.5}

concentrations on the southwest side of the island is limited. Therefore, the

{PM}_{2.5}

concentrations estimated by our approach in this particular region may be misleading.

5. Conclusions

In this paper, we propose a new two-step approach to estimate 1-km-resolution

{PM}_{2.5}

concentrations in Shanghai using high spatial resolution AOD retrievals from MODIS. It enjoys accurate predictive performance in model fitting and cross-validation. Our model only needs AOD products to estimate ground

{PM}_{2.5}

concentrations without sophisticated auxiliary variables.

We use the random cross-validation when evaluating the predictive performance of our approach. As pointed out by one anonymous reviewer, the random cross-validation can be misleading in the presence of spatial correlations. There are many other approaches for cross-validation available in the literature; for example, see [22,23,24,25,26]. One can use one of these cross-validation techniques to evaluate the predictive performance of the proposed approach.

In addition, only two variables,

{PM}_{2.5}

and AOD, are used in our hierarchical Gaussian process model. It is possible to include more additional variables, such as population, wind, and rivers, in our hierarchical Gaussian process model. In this case, one may need to set non-Gaussian priors for these additional variables [27]. This is beyond the scope of our current paper, and we leave it for future research.

Author Contributions

Conceptualization, J.Z. and Q.L.; methodology, Q.L. and D.L.; software, J.Z. and Y.X.; validation, J.Z. and Y.X.; formal analysis, J.Z. and Y.X.; investigation, J.Z. and Y.X.; resources, J.Z. and Y.X.; data curation, J.Z. and Y.X.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z., D.L. and Q.L.; visualization, J.Z. and D.L.; supervision, Q.L.; project administration, Q.L. and D.L.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science and Technology Commission of Shanghai Municipality (No. 20JC1414300) and the Natural Science Foundation of Shanghai (No. 20ZR1436200).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AOD	Aerosol Optical Depth
FMF	Fine Mode Fraction
MODIS	Moderate Resolution Imaging Spectroradiomete
MAIMC	Multi-Angle Implementation of Atmospheric Correction
CV	Cross Validation
${PM}_{2.5}$	Particulate matter with an aerodynamic diameter less than 2.5 μm

References

Yan, X.D.; Wang, Q.M.; Tie, C.; Jin, H.T.; Han, Y.X.; Zhang, J.L.; Yu, X.M.; Hou, Q.; Zhang, P.P.; Wang, A.P. Polydatin protects the respiratory system from PM_2.5 exposure. Sci. Rep. 2017, 7, 40030. [Google Scholar] [CrossRef] [PubMed]
Brook, R.D.; Rajagopalan, S.; Pope, C.A., III; Brook, J.R.; Bhatnagar, A.; Diez-Roux, A.V.; Holguin, F.; Hong, Y.; Luepker, R.V.; Mittleman, M.A. Particulate matter air pollution and cardiovascular disease: An update to the scientific statement from the American Heart Association. Circulation 2010, 121, 2331–2378. [Google Scholar] [CrossRef] [PubMed]
Hayes, R.B.; Lim, C.; Zhang, Y.; Cromar, K.; Shao, Y.; Reynolds, H.R.; Silverman, D.T.; Jones, R.R.; Park, Y.; Jerrett, M. PM_2.5 air pollution and cause-specific cardiovascular disease mortality. Int. J. Epidemiol. 2020, 49, 25–35. [Google Scholar] [CrossRef] [PubMed]
Wu, J.Z.; Ge, D.D.; Zhou, L.F.; Hou, L.Y.; Zhou, Y.; Li, Q.Y. Effects of particulate matter on allergic respiratory diseases. Chronic Dis. Transl. Med. 2018, 4, 95–102. [Google Scholar] [CrossRef] [PubMed]
Dominici, F.; Peng, R.D.; Bell, M.L.; Pham, L.; McDermott, A.; Zeger, S.L.; Samet, J.M. Fine particulate air pollution and hospital admission for cardiovascular and respiratory diseases. JAMA 2006, 295, 1127–1134. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Liu, S.; Castro, R.; Pan, X. PM_2.5 monitoring and mitigation in the cities of China. Environ. Sci. Technol. 2012, 46, 3627–3628. [Google Scholar] [CrossRef] [PubMed]
Song, W.; Jia, H.; Huang, J.; Zhang, Y. A satellite-based geographically weighted regression model for regional PM_2.5 estimation over the Pearl River Delta region in China. Remote Sens. Environ. 2014, 154, 1–7. [Google Scholar] [CrossRef]
Ma, Z.; Liu, Y.; Zhao, Q.; Liu, M.; Zhou, Y.; Bi, J. Satellite-derived high resolution PM_2.5 concentrations in Yangtze River Delta Region of China using improved linear mixed effects model. Atmos. Environ. 2016, 133, 156–164. [Google Scholar] [CrossRef]
Yu, W.; Liu, Y.; Ma, Z.; Bi, J. Improving satellite-based PM 2.5 estimates in China using Gaussian processes modeling in a Bayesian hierarchical setting. Sci. Rep. 2017, 7, 7048. [Google Scholar] [CrossRef] [PubMed]
Lipponen, A.; Mielonen, T.; Pitkänen, M.R.; Levy, R.C.; Sawyer, V.R.; Romakkaniemi, S.; Kolehmainen, V.; Arola, A. Bayesian aerosol retrieval algorithm for MODIS AOD retrieval over land. Atmos. Meas. Tech. 2018, 11, 1529–1547. [Google Scholar] [CrossRef]
Wang, Y.; Jiang, X.; Yu, B.; Jiang, M. A hierarchical Bayesian approach for aerosol retrieval using MISR data. J. Am. Stat. Assoc. 2013, 108, 483–493. [Google Scholar] [CrossRef]
Lyapustin, A.; Wang, Y.; Korkin, S.; Huang, D. MODIS collection 6 MAIAC algorithm. Atmos. Meas. Tech. 2018, 11, 5741–5765. [Google Scholar] [CrossRef]
Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-km-resolution PM_2.5 concentrations across China using the space-time random forest approach. Remote Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
Kaufman, Y.; Tanré, D.; Remer, L.A.; Vermote, E.; Chu, A.; Holben, B. Operational remote sensing of tropospheric aerosol over land from EOS moderate resolution imaging spectroradiometer. J. Geophys. Res. Atmos. 1997, 102, 17051–17067. [Google Scholar] [CrossRef]
Levy, R.; Mattoo, S.; Munchak, L.; Remer, L.; Sayer, A.; Patadia, F.; Hsu, N. The Collection 6 MODIS aerosol products over land and ocean. Atmos. Meas. Tech. 2013, 6, 2989–3034. [Google Scholar] [CrossRef]
Kinne, S.; O’Donnel, D.; Stier, P.; Kloster, S.; Zhang, K.; Schmidt, H.; Rast, S.; Giorgetta, M.; Eck, T.F.; Stevens, B. MAC-v1: A new global aerosol climatology for climate studies. J. Adv. Model. Earth Syst. 2013, 5, 704–740. [Google Scholar] [CrossRef]
Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
Finley, A.O.; Banerjee, S.; Carlin, B.P. spBayes: An R package for univariate and multivariate hierarchical point-referenced spatial models. J. Stat. Softw. 2007, 19, 1–24. [Google Scholar] [CrossRef] [PubMed]
Finley, A.O.; Banerjee, S.; Gelfand, A.E. spBayes for large univariate and multivariate point-referenced spatio-temporal data models. J. Stat. Softw. 2015, 63, 1–28. [Google Scholar] [CrossRef]
Bakar, K.S.; Kokic, P. Bayesian Gaussian models for point referenced spatial and spatio-temporal data. J. Stat. Res. 2017, 51, 17–40. [Google Scholar] [CrossRef]
Brenning, A. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: The R package sperrorest. In Proceedings of the 2012 IEEE International Geoscience and Remote Sensing Symposium, Munich, Germany, 22–27 July 2012; pp. 5372–5375. [Google Scholar] [CrossRef]
Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]
Valavi, R.; Elith, J.; Lahoz-Monfort, J.J.; Guillera-Arroita, G. block CV: An r package for generating spatially or environmentally separated folds for k-fold cross-validation of species distribution models. Methods Ecol. Evol. 2019, 10, 225–232. [Google Scholar] [CrossRef]
Ploton, P.; Mortier, F.; Réjou-Méchain, M.; Barbier, N.; Picard, N.; Rossi, V.; Dormann, C.; Cornu, G.; Viennois, G.; Bayol, N. Spatial validation reveals poor predictive performance of large-scale ecological mapping models. Nat. Commun. 2020, 11, 4540. [Google Scholar] [CrossRef] [PubMed]
Brenning, A. Spatial machine-learning model diagnostics: A model-agnostic distance-based approach. arXiv 2021, arXiv:2111.08478. [Google Scholar]
Ren, M.; Zhang, Q.; Zhang, J. An introductory survey of probability density function control. Syst. Sci. Control Eng. 2019, 7, 158–170. [Google Scholar] [CrossRef]

Figure 1. Spatial distributions of

{PM}_{2.5}

concentrations monitoring stations in Shanghai. Solid lines: city border and district borders. Black dots:

{PM}_{2.5}

concentrations monitoring stations.

Figure 1. Spatial distributions of

{PM}_{2.5}

concentrations monitoring stations in Shanghai. Solid lines: city border and district borders. Black dots:

{PM}_{2.5}

concentrations monitoring stations.

Figure 2. Top panel: The traditional MODIS AOD grid cells. Bottom panel: AOD grid cells by our approach. Solid lines: city border and district borders. Black dots:

{PM}_{2.5}

concentrations monitoring stations in Shanghai. Colored rectangles: the traditional MODIS AOD grid cells. For better visibility, different color scales are used in the two colorbars.

Figure 2. Top panel: The traditional MODIS AOD grid cells. Bottom panel: AOD grid cells by our approach. Solid lines: city border and district borders. Black dots:

{PM}_{2.5}

concentrations monitoring stations in Shanghai. Colored rectangles: the traditional MODIS AOD grid cells. For better visibility, different color scales are used in the two colorbars.

Figure 3. Scatter plot of estimated

{PM}_{2.5}

values by our approach and actual

{PM}_{2.5}

values for the whole year in Shanghai. The dashed line is the 1:1 reference line.

Figure 3. Scatter plot of estimated

{PM}_{2.5}

values by our approach and actual

{PM}_{2.5}

values for the whole year in Shanghai. The dashed line is the 1:1 reference line.

Figure 4. Scatter plot of estimated

{PM}_{2.5}

values by our approach and actual

{PM}_{2.5}

values for four seasons in Shanghai. The dashed line is the 1:1 reference line. (a) Spring; (b) Summer; (c) Autumn; (d) Winter.

Figure 4. Scatter plot of estimated

{PM}_{2.5}

values by our approach and actual

{PM}_{2.5}

values for four seasons in Shanghai. The dashed line is the 1:1 reference line. (a) Spring; (b) Summer; (c) Autumn; (d) Winter.

Figure 5. The mean values of estimated

{PM}_{2.5}

concentrations for the year 2021 using our approach.

Figure 5. The mean values of estimated

{PM}_{2.5}

concentrations for the year 2021 using our approach.

Figure 6. The mean values of estimated

{PM}_{2.5}

concentrations for four seasons using our model. (a) Spring; (b) Summer; (c) Autumn; (d) Winter.

Figure 6. The mean values of estimated

{PM}_{2.5}

concentrations for four seasons using our model. (a) Spring; (b) Summer; (c) Autumn; (d) Winter.

Table 1. Input variables and parameters in LUT.

Variables	Input Parameters
AOD	From 0 to 6, step 1
Wavelength	0.466, 0.554, 0.645, 2.113
Aerosol type	0, 1, 2, 3, 4
Solar zenith angle	From 0 to 84, step 12
View zenith angle	From 0 to 84, step 12
Relative azimuth angle	From 0 to 180, step 12
Topographic altitude	From −0.2 to 9, step 0.1

Table 2. The covariance matrix parameters used in AOD and FMF prior models.

Parameter	AOD	FMF
$r_{τ, r a n g e}$	50 km	50 km
$σ_{τ, n u g g e t}^{2}$	0.0025	0.01
$σ_{τ s i l l}^{2}$	0.10	0.25
p	1.5	1.5

Table 3. The average, minimum, and maximum of estimated

{PM}_{2.5}

concentrations in Shanghai.

Table 3. The average, minimum, and maximum of estimated

{PM}_{2.5}

concentrations in Shanghai.

	Average	Minimum	Maximum
Spring	34.16 μg/m $^{3}$	16.53 μg/m $^{3}$	68.90 μg/m $^{3}$
Summer	18.69 μg/m $^{3}$	8.07 μg/m $^{3}$	38.20 μg/m $^{3}$
Autumn	25.10 μg/m $^{3}$	10.86 μg/m $^{3}$	55.57 μg/m $^{3}$
Winter	39.19 μg/m $^{3}$	17.28 μg/m $^{3}$	82.15 μg/m $^{3}$
Whole year	29.62 μg/m $^{3}$	10.86 μg/m $^{3}$	74.41 μg/m $^{3}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, J.; Li, D.; Xia, Y.; Liao, Q. Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models. Mathematics 2022, 10, 2878. https://doi.org/10.3390/math10162878

AMA Style

Zhang J, Li D, Xia Y, Liao Q. Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models. Mathematics. 2022; 10(16):2878. https://doi.org/10.3390/math10162878

Chicago/Turabian Style

Zhang, Junbo, Daoji Li, Yingzhi Xia, and Qifeng Liao. 2022. "Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models" Mathematics 10, no. 16: 2878. https://doi.org/10.3390/math10162878

APA Style

Zhang, J., Li, D., Xia, Y., & Liao, Q. (2022). Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models. Mathematics, 10(16), 2878. https://doi.org/10.3390/math10162878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models

Abstract

1. Introduction

2. Datasets

3. Methods

3.1. Bayesian Aerosol Retrieval Algorithm

3.1.1. Establishing Forward Models

3.1.2. Retrieve AOD Using Bayesian Methods

3.1.3. Combine AOD Retrievals with MAIAC AOD Products

3.2. Hierarchical Gaussian Process Model

4. Results and Discussion

4.1. Evaluation of the Model Performance

4.2. Spatial and Seasonal Variations of ${PM}_{2.5}$ Concentrations in Shanghai

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Bayesian Aerosol Retrieval-Based PM2.5 Estimation through Hierarchical Gaussian Process Models

Abstract

1. Introduction

2. Datasets

3. Methods

3.1. Bayesian Aerosol Retrieval Algorithm

3.1.1. Establishing Forward Models

3.1.2. Retrieve AOD Using Bayesian Methods

3.1.3. Combine AOD Retrievals with MAIAC AOD Products

3.2. Hierarchical Gaussian Process Model

4. Results and Discussion

4.1. Evaluation of the Model Performance

4.2. Spatial and Seasonal Variations of PM 2.5 Concentrations in Shanghai

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Bayesian Aerosol Retrieval-Based PM_2.5 Estimation through Hierarchical Gaussian Process Models

4.2. Spatial and Seasonal Variations of ${PM}_{2.5}$ Concentrations in Shanghai