Next Article in Journal
Nurse Manager Core Competencies: A Proposal in the Spanish Health System
Previous Article in Journal
Ebola, Zika, Corona…What Is Next for Our World?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Functional Spatio-Temporal Statistical Model with Application to O3 Pollution in Beijing, China

1
Guanghua School of Management, Peking University, Beijing 100871, China
2
School of Statistics, University of International Business and Economics, Beijing 100029, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Environ. Res. Public Health 2020, 17(9), 3172; https://doi.org/10.3390/ijerph17093172
Submission received: 28 March 2020 / Revised: 19 April 2020 / Accepted: 28 April 2020 / Published: 2 May 2020

Abstract

:
In recent years, with rapid industrialization and massive energy consumption, ground-level ozone ( O 3 ) has become one of the most severe air pollutants. In this paper, we propose a functional spatio-temporal statistical model to analyze air quality data. Firstly, since the pollutant data from the monitoring network usually have a strong spatial and temporal correlation, the spatio-temporal statistical model is a reasonable method to reveal spatial correlation structure and temporal dynamic mechanism in data. Secondly, effects from the covariates are introduced to explore the formation mechanism of ozone pollution. Thirdly, considering the obvious diurnal pattern of ozone data, we explore the diurnal cycle of O 3 pollution using the functional data analysis approach. The spatio-temporal model shows great applicational potential by comparison with other models. With application to O 3 pollution data of 36 stations in Beijing, China, we give explanations of the covariate effects on ozone pollution, such as other pollutants and meteorological variables, and meanwhile we discuss the diurnal cycle of ozone pollution.

1. Introduction

As one of the major pollutants, ground-level ozone ( O 3 ) has received a lot of public attention. Lots of studies have shown that O 3 could have detrimental effects on human health, including exacerbation of cardiovascular and respiratory dysfunction, and even premature mortality [1,2]. Additionally, tropospheric ozone, as a greenhouse gas, plays an important role in climate change, and further affects, for example, agricultural crop production [3,4]. In recent years, as the consequence of rapid industrialization and alarmingly increasing energy consumption, China has encountered severe air pollution [5,6,7,8]. Particularly, ozone becomes one of the serious and worsening pollutants in major areas of China, such as Beijing–Tianjin–Hebei urban agglomeration, and the Pearl River delta [9,10]. With a population of over 20 million, Beijing is one of the world’s largest mega cities. Due to coal burning, fugitive dust, and more recently a rapid increase in vehicular emissions, Beijing faces serious air pollution problems, and especially, studies regarding photochemical ozone pollution are attracting more and more attention [11,12].
The Chinese government identifies the urgency for air quality assessment and emission control, and has built a large monitoring network since 2013. Now, there are over 1500 national pollution monitoring stations in over 300 cities. Hourly readings of air pollutants are regularly recorded and directly transferred to China National Environmental Monitoring Center (CNEMC). The real-time observation and recording of the air pollution data provide a solid basis for studying the dynamic changes of pollutants and the underlying causes. Air quality data are collected over space and time; thus, the amount of data are large, and the analysis is complex. One important and common statistical characteristic of such data worthy of our notice is that the nearby (both in space and time) observations tend to be more alike than those far apart. Consequently, an assumption that spatio-temporal data follows the “independent and identically distributed” (iid) statistical paradigm should typically be avoided. Based on the underlying spatio-temporal structure of the pollution data, spatio-temporal statistical model, which simultaneously considers both the spatial covariance and temporal dependence, is thus a sensible and reasonable choice [13]. Moreover, O 3 data show a clear diurnal cycle. It peaks during the day and reaches a minimum at night. Since ozone data are sampled at a high frequency in time, it provides an overview of the daily cycle of pollutant concentrations.
A spatio-temporal statistical model is powerful to reveal spatial correlation structure and temporal dynamic mechanism in data. Huang and Cressie (1996) [14] introduced a dynamic random field with a separable spatio-temporal covariance structure, which is widely used in the environmental field. When the spatio-temporal dependencies become complicated, the power of the hierarchical statistical modeling (HM), which is capable of decomposing an uncertainty source of data, becomes apparent. The HM’s strength is well discussed in Cressie et al. [15]. Moreover, the daily pattern of ozone pollution needs more exploration. To do this, we divided the collection time into two parts, one related to intra-day fluctuations and the other related to intra-day changes. Geographic space is defined by latitude and longitude, with the date being the third dimension, and the intra-day hour is regarded as the fourth dimension, which gives a four-dimensional representation of the data. In this way, the functional data analysis (FDA) approach [16] is used to model the intra-day variation of the measurement data, and the remaining dimensions are processed according to the classic spatio-temporal data modeling. To summarize, in addition to the dynamic random field and the hierarchical modeling, the third building block is based on the functional representation of daily profiles of atmospheric pollution through a functional data analysis approach, which is the main innovation of the method.
In the present study, we propose a functional spatio-temporal statistical model, which is also a two-level hierarchical spatio-temporal model. A fruitful approach is based on the representation of random functional objects as linear combinations of the basis functions with Gaussian random coefficients. This allows for representing a functional model as a random components model and inheriting the related inferential machinery, e.g., Wood [17]. Based on the Kalman filter and expectation–maximization (EM) algorithm, a model inference for parameter estimates is implemented [18,19]. In addition, from the marginal likelihood function, an information matrix is obtained to measure the uncertainty of the model parameters [20]. The proposed model has the following advantages: (i) the dynamic random field is used to describe the spatio-temporal characterization of emissions of air pollution; (ii) and covariate effects are incorporated to analyze the underlying formation mechanism of atmospheric pollutants; (iii) in addition, the main innovation is the introduction of the functional data analysis approach, which is performed to explore the daily pattern of pollutants. In the paper, we show the capability of the model by using O 3 pollution data from 36 pollution monitoring stations in Beijing, China.
The paper is organized as follows. In Section 2, we describe the data in the study region, and introduce the Fourier basis functions, and the functional spatio-temporal statistical model, including the implementation of model estimation and cross-validation. In Section 3, we first show the selection of covariates and basis numbers. After comparing our model with others, we show the outstanding model capability, and finally give a comprehensive interpretation of the results. Conclusions are in Section 4.

2. Material and Methods

In this section, we first describe the data in the study region. Then, we introduce the Fourier basis function, and describe the functional spatio-temporal statistical model. In particular, model equations, model estimation, and cross-validation are discussed.

2.1. Data Description

The World Health Organization (WHO) set a guideline of 100 μ g/m 3 for a maximum daily 8-h average exposure to ground-level O 3 ; otherwise, adverse impacts on human health may occur [21]. Considering the increasing public concern on ozone, we attempt to analyze the effects from other pollutants and meteorological variables on ozone pollution, and provide some insight into the diurnal cycle of O 3 , which peaks in the mid-day and reaches minimum at night-time.
In this study, we collect hourly concentration of the ground-level ozone in spring, summer, and autumn of year 2017, from thirty-six pollution monitoring stations in Beijing, China, which are directly managed by the Ministry of Environment and Protection (MEP). We also collect four other pollutant gases—particulate matter ( PM 10 ), sulfur dioxide ( SO 2 ), nitrogen dioxide ( NO 2 ), and carbon monoxide ( CO ). All of the pollutant gases are measured in μ g/m 3 . The oxides of nitrogen ( NO x ) and the volatile organic components (VOC) constitute are known to be the important precursors of the ground ozone generation [22]. However, components of VOC are not measured by the air quality monitoring network.
We also collect meteorological data: barometric pressure ( P R E S , in hectopascal), air temperature ( T E M P , in degree celsius), dew point temperature ( D E W P , in degree celsius), integrated rainfall ( I R A I N , in millimeter), and integrated wind speed ( I w s , in meter per second) from nine weather stations of China Meteorological Administration (CMA). All the measurements are recorded hourly. We match between air quality stations and meteorological stations by the geodesic distance. Figure 1 displays the spatial locations of the air quality stations with red dots as well as the meteorological stations with blue triangles [23]. In addition to these meteorological variables, ultraviolet radiation is also a significant meteorological factor that influences O 3 generation. Therefore, we download the data of UVB (in J/m 2 ) with wavelengths between 200 and 440 nanometers from the European Centre for Medium-Range Weather Forecasts (ECMWF, https://cds.climate.copernicus.eu). The UVB data are provided at a grid size of 0.25 ° × 0.25 ° at hourly frequency available over the study region. Since the UVB data vary greatly during the day and night, we take their log-transform before adding to the model. Note that the integrated rainfall and integrated wind speed are respectively calculated by:
I W S t = W S t , W D t ! = W D t 1 , I W S t 1 + W S t , W D t = = W D t 1 .
I R A I N t = R A I N t , R A I N t = 0 , I R A I N t 1 + R A I N t , R A I N t ! = 0 .

2.2. Fourier Basis

The basic philosophy of functional data analysis is to think of observed data functions as single entities, rather than merely as a sequence of individual observations. In practice, functional data are usually observed and recorded discretely as n pairs ( t j , y j ) , and y j is a snapshot of the function at time t j , possibly blurred by measurement error. Time is so often the continuum over which functional data are recorded that we may slip into the habit of referring to t j as such, but certainly other continua may be involved, such as spatial position, frequency, weight, and so forth:
y j = x ( t j ) + ϵ j
In functional data analysis, we need a strategy for constructing functions, which balances the model fitting and complexity. We built a set of functions where ϕ k , k = 1 , , K are called basis functions, and their linear combination is defined as a function:
x ( t ) = k = 1 K c k ϕ k ( t ) = c ϕ ( t ) ,
the expansion of the basis function, where the parameters c k , k = 1 , , K are the expansion coefficients to be estimated. In effect, basis expansion methods represent the potentially infinite dimensional world of functions within the finite-dimensional framework of vectors like c . The functional data analysis is simplified to multivariate data analysis.
The basis functions used for data modeling mostly belong to two categories: periodic and non-periodic. Most functional data analyses involve either a Fourier basis for periodic data, or a B-spline basis for non-periodic data. Since we are interested in the diurnal variations of ozone, we introduce the Fourier basis functions in detail. In order to express the repeated pattern in long-term sequences, basis functions need to be repeated within a certain time period T. The famous basis function extension for periodic data provided by the Fourier series is:
x ^ ( t ) = c 0 + c 1 s i n ( ω t ) + c 2 c o s ( ω t ) + c 3 s i n ( 2 ω t ) + c 4 c o s ( 2 ω t ) +
where ω = 2 π / T . Defining a Fourier basis system requires two pieces of information: the number of basis functions K and the period T. Figure 2 shows the Fourier basis system with K = 5 and T = 1 . Followed by the constant, the Fourier basis functions are arranged in consecutive sine/cosine pairs:
We select the ozone data from one of the pollution stations—Wanliu Monitoring Station, which is located at Haidian District, Beijing, for preliminary analysis. The time span is one week from 21 May 2017 to 27 May 2017. We capture the daily variation of ozone data by using five Fourier basis functions. The mean square error (MSE) of fitted residuals is 14.79 μ g/m 3 . As shown in Figure 3, the predicted value at hour 24 matches the predicted value at hour 0 in the next day, guaranteeing the periodic nature of the daily cycle.

2.3. Model Equation

Let s = ( s l a t , s l o n ) be the generic spatial location on the Earth’s sphere with sample size n, and t = 1 , , T the day index, and domain H = h 1 , h 2 R the time within the day expressed in hours. The model for ozone observations O 3 ( s , t , h ) is:
O 3 ( s , t , h ) = x ( s , t , h ) β h + ϕ ( h ) z ( s , t ) + ε ( s , t , h ) ,
z ( s , t ) = G z ( s , t 1 ) + η ( s , t ) .
This model is referred to as the functional dynamic spatio-temporal model. In Equation (6), ε is a zero-mean Gaussian measurement error independent in space and time with functional variance σ ε 2 h , which implies that ε is heteroskedastic across the domain H . The variance is modeled as
log ( σ ε 2 h ) = ϕ ( h ) c ε ,
where ϕ ( h ) is a p × 1 vector of basis functions evaluated at h while c ε is a vector of coefficients to be estimated. In Equation (6), x ( s , h , t ) is a b × 1 vector of covariates while β h = ( β 1 ( h ) , , β b ( h ) ) is the vector of functional parameters modeled as
β j ( h ) = ϕ ( h ) c β , j ,
and c β = c β , 1 , , c β , b is the p b × 1 vector of coefficients to be estimated. Additionally, z ( s , t ) is a p × 1 latent space-time variable with Markovian dynamics given in Equation (7). Matrix G is a diagonal transition matrix with diagonal elements in the p × 1 vector g . The vector η is described by a multivariate Gaussian process independent in time but correlated across space with matrix spatial covariance function given by
Γ ( s , s ; θ ) = d i a g v 1 ρ ( s , s ; θ 1 ) , , v p ρ ( s , s ; θ p ) ,
and v = v 1 , , v p is the vector of scale coefficients while ρ ( s , s ; θ j ) is a valid spatial correlation function for locations s , s S 2 parametrized by θ j , and θ = ( θ 1 , , θ p ) . The unknown model parameter vector is ψ = c ε , c β , g , v , θ .
In Figure 4, we summarize the methodology. The main innovation is to incorporate the function data analysis approach to the classic spatio-temporal statistical model, which facilitates exploring the intra-day fluctuations of ozone pollution as well as the functional effects of covariates. Note that, in order to ease the notation, the same p-dimensional basis functions ϕ ( h ) are used to model σ ε 2 , β j and ϕ ( h ) z ( s , t ) in Equations (6) and (7). In the empirical analysis, we choose different numbers of basis functions for modeling according to the model criteria, such as the mean square error (MSE), and R 2 (see Section 2.5 for details).

2.4. Model Estimation

The estimation of ψ and the latent space-time variable z ( s , t ) is based on the maximum likelihood approach and Kalman filter. At a specific location s i and time t, q measurements are taken at hour points h = 1 , 2 , , q and collected in the vector
y s i , t = ( O 3 ( s i , t , 1 ) , , O 3 ( s i , t , q ) ) ,
where q = 24 as pollutants are hourly recorded. Daily profiles of ozone data observed at time t across spatial locations S are then stored in the vector y t = ( y s 1 , t , , y s n , t ) . Accordingly, Equations (6) and (7) are rewritten as
y t = X ˜ t c β + Φ z , t z t + ε t ,
z t = G ˜ z t 1 + η t ,
where X ˜ t = X t Φ β , t is a n q × b p matrix, with X t the matrix of covariates and Φ β , t the basis matrix for β . Φ z , t is the n q × n p basis matrix for the latent n p × 1 vector z t = ( z ( s 1 , t ) , , z ( s n , t ) ) . η t = ( η ( s 1 , t ) , , η ( s n , t ) ) is the n p × 1 innovation vector, while ε t is the n q × 1 vector of measurement errors. Additionally, G ˜ = G I n is the n p × n p diagonal transition matrix.
The complete-data likelihood function L ( ψ ; Y , Z ) can be written as
L ( ψ ; Y , Z ) = L ( ψ z 0 ; z 0 ) t = 1 T L ( ψ y ; y t | z t ) L ( ψ z ; z t | z t 1 ) ,
where Y = y 1 , , y T , Z = z 0 , z 1 , , z T , ψ z = g , v , θ , ψ y = c ε , c β , and z 0 is the Gaussian initial vector with parameter ψ z 0 . The model parameter set ψ is initialized with starting values ψ 0 and then updated at each iteration ι of the EM algorithm. The algorithm terminates if any of the conditions is satisfied
ψ ι ψ ι 1 / ψ ι < ϵ ,
L ( ψ ι ; Y ) L ( ψ ι 1 ; Y ) / L ( ψ ι ; Y ) < ϵ ,
where   is the l 2 norm, ψ ι is the parameter set at the ι - t h iteration, L ( ψ ι ; Y ) is the observed-data likelihood function evaluated at ψ ι , and ϵ is a small positive number (e.g., 10 3 ).
The EM algorithm provides a point estimate of the parameter vector ψ but without uncertainty information. Note that Y is a vector with dimension N = n q T . Generally speaking, inverting the full variance–covariance matrix of the N-dimensional data vector Y has a computational complexity in the order of O ( N 3 ) , which is clearly unfeasible. Thanks to the state space representation of model, we estimate the variance–covariance matrix Σ ^ ψ = V ψ Y from the marginal likelihood, which may be used for model selection and inference.

2.5. Cross-Validation

We implement a 2-fold cross-validation by partitioning the original spatial locations S into subsets S e s t and S x v a l . Data related to S e s t are used for model estimation while data related to S x v a l are used for cross-validation. The cross-validation mean squared errors are then computed by
M S E s = 1 B t = 1 T h h s , t O 3 s , h , t O 3 ^ s , h , t 2 ,
where O 3 ^ s , h , t = E ϕ ^ ( O 3 s , h , t | Y ) is the prediction of ozone data at the cross-validation stations, and B is the number of terms in each sum. We also obtain the cross-validation R 2 with respect to station s :
R s 2 = 1 M S E s V A R { O 3 s , h , t , t , h }
The choice of the numbers of basis functions is very essential for model estimation. Here, based on the cross-validated mean square error and other model criteria, we choose the reasonable numbers of basis functions to estimate σ ε 2 , β j and ϕ ( h ) z ( s , t ) respectively. After implementing leave-one-station-out cross-validation, we take the average M S E ¯ and R 2 ¯ as our criteria:
M S E ¯ = 1 n i = 1 n M S E s i ,
R 2 ¯ = 1 n i = 1 n R s i 2 .

3. Analysis of O 3 Pollution in Beijing

In the paragraph, we first show the selection of covariates and basis numbers with application to ozone data in Beijing, and then focus on the summertime modeling. By comparing our proposed model with other models, we show the outstanding advantage of the functional spatio-temporal statistical model. Finally, we show the model results and interpret the parameter estimates, especially the functional effects of covariates.

3.1. Selection of Covariates and Basis Numbers

In the following text, we select the covariates in x ( s , t , h ) by using the Akaike information criterion (AIC). Table 1 displays the results of forward selection based on AIC, which means starting with no covariates, and iteratively adding the most contributive covariates. For instance, in the summertime modeling, at the beginning (Iter 0), we select the variable NO 2 , which results in the best model performance with maximum AIC. Then, at the next iteration (Iter 1), the variable particulate matter ( PM 10 ) is further selected. Table 1 shows that the importance of covariates varies among seasons, but the most important variables are SO 2 , NO 2 , and PM 10 .
The ozone concentrations display a significant seasonal pattern, being pretty high in summer, while meanwhile being moderate in winter [24,25]. Therefore, we focus on the analysis of O 3 pollution in summer. Figure 5 shows the maximum AIC at each iteration for summertime modeling.
The improvement of model AIC is no longer significant after five iterations; therefore, we find the optimal subset of covariates— N O 2 , P M 10 , S O 2 , T E M P , I R A I N , and U V B . Hence, the measurement equation for ozone data is
O 3 ( s , h , t ) = β 0 ( h ) + x N O 2 ( s , h , t ) β N O 2 ( h ) + x P M 10 ( s , h , t ) β P M 10 ( h ) + x S O 2 ( s , h , t ) β S O 2 ( h ) + x T E M P ( s , h , t ) β T E M P ( h ) + x I R A I N ( s , h , t ) β I R A I N ( h ) + x U V B ( s , h , t ) β U V B ( h ) + ϕ ( h ) z ( s , t ) + ε ( s , h , t ) ,
where data are available at h = 1 , , 24 , s { s 1 , , s 36 } , and t = 1 , , 92 . Moreover, due to the circularity of time, Fourier basis functions are adopted. This implies that β j h , σ ε 2 h are periodic functions. Under the different combinations of the numbers of basis functions, the model criteria M S E ¯ and R 2 ¯ in Section 2.5 are obtained and shown in Table 2.
From the table, when the number of basis functions for estimating ϕ ( h ) z ( s , t ) increases, it significantly reduces the M S E ¯ . When the number of basis functions increases from 5 to 7, the M S E ¯ is reduced more than that from 7 to 9. Considering such enormous calculation stress, we choose seven basis functions to estimate the latent component ϕ ( h ) z ( s , t ) . However, increasing the number of basis functions for the variance σ ε 2 h of the residual ε ( s , h , t ) does not significantly reduce M S E ¯ but is helpful to improve the AIC. We find that an increase from 3 to 5 has an improvement in AIC, but the improvement becomes very minor from 5 to 7. Thus, we choose five basis functions to estimate the variance σ ε 2 h . Finally, we choose five basis functions to estimate the effects from covariates β j h , considering the trade-off between the model interpretation and over-fitting problem. Based on the analysis above, the number of basis functions for β j h , σ ε 2 h and ϕ ( h ) z ( s , t ) is chosen to be 5, 5, and 7, respectively.

3.2. Model Comparison

In the paragraph, we compare the five models, namely Equations (22), (23), (24), (25), and (26). Equation (22) is an ordinary regression model; Equation (23) is a regression model with functional β ( h ) estimates; Equation (24) introduces the latent spatio-temporal variable z ( s , t ) to characterize the spatio-temporal correlation; Equation (25) is a simplified version of the proposed functional spatio-temporal statistical model that is β ( h ) β , σ ε 2 h σ ε 2 ; Equation (26) is the functional spatio-temporal statistical model:
O 3 = X β + ϵ
O 3 ( h ) = X ( h ) β ( h ) + ϵ ( h )
O 3 ( s , t ) = X ( s , t ) β + z ( s , t ) + ϵ ( s , t ) z ( s , t ) = G z ( s , t 1 ) + η ( s , t )
O 3 ( s , h , t ) = X ( s , h , t ) β + ϕ ( h ) z ( s , t ) + ϵ ( s , h , t ) z ( s , t ) = G z ( s , t 1 ) + η ( s , t )
O 3 ( s , h , t ) = X ( s , h , t ) β ( h ) + ϕ ( h ) z ( s , t ) + ϵ ( s , h , t ) z ( s , t ) = G z ( s , t 1 ) + η ( s , t )
Similar to the selection of the numbers of basis functions, the average M S E ¯ and R 2 ¯ , and AIC are used to assess the model performance. As shown in Table 3, our model Equation (26) is the optimal among the five models in view of the three model criteria. Equation (23) is much improved from Equation (22) in terms of M S E ¯ and R 2 ¯ , which means a better model forecast in general. Benefiting from the latent spatio-temporal variable z ( s , t ) , Equation (24) has an unbeatable advantage over the ordinary regression models, accessing much smaller M S E ¯ and much larger R 2 ¯ and AIC. Equation (25) introduces the functional data analysis approach, and characterizes the latent component as a linear combination of the basis functions and the latent random spatio-temporal variable z ( s , t ) . Although the AIC is only a little increased, a smaller M S E ¯ and larger R 2 ¯ are achieved. Eventually, when Equation (26) adds the functional covariate effects β ( h ) and the functional variance of the residuals σ ϵ ( h ) , M S E ¯ , and R 2 ¯ is not improved much. However, AIC is further improved, which benefits from the more capable interpretation of covariates and the flexibility of the residual variance.
In Equation (26), firstly, the latent hidden variable z ( s , t ) captures the spatial correlation by range parameter θ , and variance parameter v, which shows that an average standard deviation of 48 μ g/m 3 of ozone data are explained by z ( s , t ) (refer to Table 5). Secondly, the functional β ( h ) shows that the covariate effects are both significant and nonlinear, indicating the complicated formation of ozone pollution by using the functional representation (refer to Figure 6). In summary, the hierarchical spatio-temporal statistical model, combined with functional data analysis approach, contributes to the high amount of R 2 ¯ .

3.3. Model Result

Figure 6 shows the estimated β ( h ) and σ ϵ 2 ( h ) for model Equation (21). Thanks to Fourier basis functions, the estimation result at the end of the day matches the beginning of the next day.Since, in general, the confidence bands of estimated β ( h ) may contain zero, it may be useful to test the significance of covariates. The χ 2 tests are introduced as follows:
β j ( h ) = ϕ ( h ) c β , j , c β , j N ( 0 , Σ c β , j ^ ) .
Thus, c β , j Σ c β , j ^ 1 c β , j χ 2 ( r a n k ( Σ c β , j ^ ) ) . In Figure 6, I R A I N fluctuates around zero. The results of χ 2 tests for the significance of covariates are reported in Table 4, and indicate that the effect of variable I R A I N is not jointly significant.
Therefore, it comes to the final model equation by excluding the I R A I N variable:
O 3 ( s , h , t ) = β 0 ( h ) + x N O 2 ( s , h , t ) β N O 2 ( h ) + x P M 10 ( s , h , t ) β P M 10 ( h ) + x S O 2 ( s , h , t ) β S O 2 ( h ) + x T E M P ( s , h , t ) β T E M P ( h ) + x U V B ( s , h , t ) β U V B ( h ) + ϕ ( h ) z ( s , t ) + ε ( s , h , t ) .
In Table 5, we show the estimates and standard deviation of parameters relevant to the latent spatio-temporal variable z ( s , t ) , which are the transition coefficient g , range parameter θ , and variance vector v . Most estimates of g parameter are positive, and the absolute values are all within one, which guarantees the stability of the 7-variate spatio-temporal process z ( s , t ) . Compared with the geodesic distance of Beijing (around 50 km), the values of θ parameter, ranging from 31.92 km to 63.12 km, indicate a strong spatial correlation within the city. The average v estimate is around 2313 (with standard deviation of 48 μ g/m 3 ), and shows that the latent variable z ( s , t ) accounts for much more proportion of original O 3 variance than the unexplained term σ ϵ 2 ( h ) . Hence, introducing the latent spatio-temporal variable z ( s , t ) guarantees the advantage of the proposed model.
Finally, in Figure 7, we show the estimated β ( h ) and σ ϵ 2 ( h ) . The last figure is the plot of functional variance σ ϵ 2 ( h ) , which represents the unexplained portion of O 3 variance. The plot shows that the model is more capable when explaining the situation during the daytime [26].
As shown in Figure 7, the coefficient curves of T E M P , u v b and P M 10 are similar, increasing from early morning and attending the peak at 12:00 p.m.–2:00 p.m., then falling down. Focusing on daytime, we see that the three curves are consistent with the trend of temperature (or uvb), which implicates that the relationship between ozone and temperature (or uvb) might be quadratic [27], or there were interactions between temperature and uvb, that is, the ozone concentrations were dependent on T E M P 2 , u v b 2 , or T E M P × u v b . The coefficients of T E M P and u v b in daytime are positive, which is consistent with the present research [28]. While the coefficient of P M 10 is negative at 5:00 a.m.–10:00 a.m. and positive during other time periods. The positive correlation between P M 10 and ozone may be caused by their common sources, secondary nature, and interactions of their precursors [29], and the negative correlation could be explained by PM’s consumption of hydroperoxy (HO2) radicals, which would otherwise react with N O for ozone generation [30]. Furthermore, the positive correlation becomes the strongest at 3:00 p.m., at which time the ozone concentration attains the largest.
In addition, the coefficient curves of N O 2 and S O 2 both have two spikes, while the coefficient of N O 2 is negative and the other is positive. The negative relationship between N O 2 and ozone is consistent with results in many studies [31,32], and the positive correlation between S O 2 and ozone could be explained by their common dependences on meteorology [33]. The strongest correlation between N O 2 and ozone in daytime appears at about 11:00 a.m., and the weakest correlation appears at 5:00 a.m. and 6:00 p.m. In contrast, the correlation between S O 2 and ozone is the strongest at 9:00 a.m. and 8:00 p.m., in other words, approximately the end of morning/evening rush hours in Beijing, respectively, and such correlation is the weakest at 3:00 p.m.

4. Conclusions

In this paper, we propose a functional spatio-temporal statistical method to analyze air quality data, and explore the mechanism of pollution formation.
  • The method has several advantages. First, as a hierarchical spatio-temporal statistical model, it is flexible enough to handle latent variable while capturing spatio-temporal dynamics. Second, the proposed model also takes covariates into consideration, thereby being efficient in discovering relational patterns from chemical reaction, and meteorological factors on the formation of O 3 pollution. Third, in the framework of the spatio-temporal models, we are the first to explore the intra-day variation of ozone through the functional data analysis approach, which is the most innovative part of the model.
  • The model has made the following progresses. First of all, our model outperforms other models in many ways, as shown in Section 3.2. Second, the latent spatio-temporal variable z ( s , t ) well captures the temporal dynamic and spatial structure of ozone data. Third, from the functional effects of the covariates, we explore the possible effects of air pollutants and meteorological variables on ozone data.
  • Our model is flexible enough to model any kind of data with spatio-temporal structure; therefore, it can be applied in many fields, such as economy and agriculture, apart from the environment. The introduction of the functional data analysis approach in the functional spatio-temporal model is not restricted to model the daily pattern of the data, and provides us more capability to explore the nature of the data of our interest.

Author Contributions

Conceptualization, Y.W. and K.X.; Methodology, Y.W. and K.X.; Software, Y.W. and K.X.; Validation, Y.W. and K.X.; Writing, Y.W. and K.X.; Writing-Review & Editing, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is funded by China’s National Key Research Special Program Grant 2016YFC0207701. Ke Xu is supported by “the Fundamental Research Funds for the Central Universities” in UIBE (No. 19QD22).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lippmann, M. Health effects of ozone a critical review. Japca 1989, 39, 672–695. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Bell, M.L.; McDermott, A.; Zeger, S.L.; Samet, J.M.; Dominici, F. Ozone and short-term mortality in 95 US urban communities, 1987–2000. JAMA 2004, 292, 2372–2378. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Ashmore, M.R.; Marshall, F. Ozone impacts on agriculture: An issue of global concern. In Advances in Botanical Research; Elsevier: Amsterdam, The Netherlands, 1998; Volume 29, pp. 31–52. [Google Scholar]
  4. Simpson, D.; Arneth, A.; Mills, G.; Solberg, S.; Uddling, J. Ozone—The persistent menace; interactions with the N cycle and climate change. Curr. Opin. Environ. Sustain. 2014, 9, 9–19. [Google Scholar] [CrossRef] [Green Version]
  5. Chan, C.K.; Yao, X. Air pollution in mega cities in China. Atmos. Environ. 2008, 42, 1–42. [Google Scholar] [CrossRef]
  6. Hu, H.; Yang, Q.; Lu, X.; Wang, W.; Wang, S.; Fan, M. Air pollution and control in different areas of China. Crit. Rev. Environ. Sci. Technol. 2010, 40, 452–518. [Google Scholar] [CrossRef]
  7. Liang, X.; Zou, T.; Guo, B.; Li, S.; Zhang, H.; Zhang, S.; Huang, H.; Chen, S.X. Assessing Beijing’s PM2.5 pollution: Severity, weather impact, APEC and winter heating. Proc. R. Soc. A 2015, 471, 20150257. [Google Scholar] [CrossRef] [Green Version]
  8. Lin, J.; Zhang, A.; Chen, W.; Lin, M. Estimates of Daily PM2.5 Exposure in Beijing Using Spatio-Temporal Kriging Model. Sustainability 2018, 10, 2772. [Google Scholar] [CrossRef] [Green Version]
  9. Wang, T.; Xue, L.; Brimblecombe, P.; Lam, Y.F.; Li, L.; Zhang, L. Ozone pollution in China: A review of concentrations, meteorological influences, chemical precursors, and effects. Sci. Total Environ. 2017, 575, 1582–1596. [Google Scholar] [CrossRef]
  10. Li, J.; Lu, K.; Lv, W.; Li, J.; Zhong, L.; Ou, Y.; Chen, D.; Huang, X.; Zhang, Y. Fast increasing of surface ozone concentrations in Pearl River Delta characterized by a regional air quality monitoring network during 2006–2011. J. Environ. Sci. China 2014, 26, 23–36. [Google Scholar] [CrossRef]
  11. Xue, L.K.; Wang, T.; Gao, J.; Ding, A.J.; Zhou, X.H.; Blake, D.R.; Wang, X.F.; Saunders, S.M.; Fan, S.J.; Zuo, H.C.; et al. Ground-level ozone in four Chinese cities: Precursors, regional transport and heterogeneous processes. Atmos. Chem. Phys. 2014, 14, 13175–13188. [Google Scholar] [CrossRef] [Green Version]
  12. Wang, W.N.; Cheng, T.H.; Gu, X.F.; Chen, H.; Guo, H.; Wang, Y.; Bao, F.W.; Shi, S.Y.; Xu, B.R.; Zuo, X.; et al. Assessing Spatial and Temporal Patterns of Observed Ground-level Ozone in China. Sci. Rep. 2017, 7, 3651. [Google Scholar] [CrossRef]
  13. Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  14. Huang, H.C.; Cressie, N. Spatio-temporal prediction of snow water equivalent using the Kalman filter. Comput. Stat. Data Anal. 1996, 22, 159–175. [Google Scholar] [CrossRef]
  15. Cressie, N.; Calder, C.A.; Clark, J.S.; Hoef, J.M.V.; Wikle, C.K. Accounting for uncertainty in ecological analysis: The strengths and limitations of hierarchical statistical modeling. Ecol. Appl. 2009, 19, 553–570. [Google Scholar] [CrossRef]
  16. Ramsay, J.O.; Silverman, B.W. Applied Functional Data Analysis: Methods and Case Studies; Springer: Berlin, Germany, 2007. [Google Scholar]
  17. Wood, S.N. Generalized Additive Models: An Introduction with R; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  18. Kalman, R.E. A new approach to linear filtering and prediction problems. J. Basic Eng. 1960, 35–45. [Google Scholar] [CrossRef] [Green Version]
  19. Krishnan, T.; McLachlan, G. The EM algorithm and extensions. Wiley 1997, 1, 58–60. [Google Scholar]
  20. Shumway, R.H.; Stoffer, D.S. Time Series Analysis and Its Applications: With R Examples; Springer: Berlin, Germany, 2017. [Google Scholar]
  21. WHO Regional Office for Europe. Air Quality Guidelines Global Update 2005: Particulate Matter, Ozone, Nitrogen Dioxide and Sulfur Dioxide; World Health Organization: Geneva, Switzerland, 2006. [Google Scholar]
  22. Sillman, S. The relation between ozone, NOx and hydrocarbons in urban and polluted rural environments. Atmos. Environ. 1999, 33, 1821–1845. [Google Scholar] [CrossRef]
  23. Kahle, D.; Wickham, H. Ggmap: Spatial Visualization with ggplot2. R J. 2013, 5, 144–161. [Google Scholar] [CrossRef] [Green Version]
  24. Pf, C.; Q, Z.; Jn, Q.; Y, G.; My, H. Temporal and Spatial Distribution of Ozone Concentration by Aircraft Sounding over Beijing. Environ. Sci. 2012, 33, 4141. [Google Scholar]
  25. Dufour, G.; Eremenko, M.; Orphal, J.; Flaud, J.M. IASI observations of seasonal and day-to-day variations of tropospheric ozone over three highly populated areas of China: Beijing, Shanghai, and Hong Kong. Atmos. Chem. Phys. 2010, 10, 3787–3801. [Google Scholar] [CrossRef] [Green Version]
  26. Dohan, J.; Masschelein, W. The photochemical generation of ozone: Present state–of–the–art. Ozone Sci. Eng. 1987, 315–334. [Google Scholar] [CrossRef]
  27. Belan, B.D.; Savkin, D.E.; Tolmachev, G.N. Air-Temperature Dependence of the Ozone Generation Rate in the Surface Air Layer. Atmos. Ocean. Opt. 2018, 31, 187–196. [Google Scholar] [CrossRef]
  28. Awang, N.R.; Ramli, N.A.; Yahaya, A.S.; Elbayoumi, M. Multivariate methods to predict ground level ozone during daytime, nighttime, and critical conversion time in urban areas. Atmos. Pollut. Res. 2015, 6, 726–734. [Google Scholar] [CrossRef]
  29. Lamarque, J.; Kiehl, J.T.; Hess, P.G.; Collins, W.D.; Emmons, L.K.; Ginoux, P.; Luo, C.; Tie, X. Response of a coupled chemistry-climate model to changes in aerosol emissions: Global impact on the hydrological cycle and the tropospheric burdens of OH, ozone, and NOx. Geophys. Res. Lett. 2005, 32. [Google Scholar] [CrossRef] [Green Version]
  30. Li, K.; Jacob, D.J.; Liao, H.; Zhu, J.; Shah, V.; Shen, L.; Bates, K.H.; Zhang, Q.; Zhai, S. A two-pollutant strategy for improving ozone and particulate air quality in China. Nat. Geosci. 2019, 12, 906–910. [Google Scholar] [CrossRef]
  31. Chou, C.C.K.; Liu, S.C.; Lin, C.Y.; Shiu, C.J.; Chang, K.H. The trend of surface ozone in Taipei, Taiwan, and its causes: Implications for ozone control strategies. Atmos. Environ. 2006, 40, 3898–3908. [Google Scholar] [CrossRef]
  32. Geng, F.; Tie, X.; Xu, J.; Zhou, G.; Peng, L.; Gao, W.; Tang, X.; Zhao, C. Characterizations of ozone, NOx, and VOCs measured in Shanghai, China. Atmos. Environ. 2008, 42, 6873–6883. [Google Scholar] [CrossRef]
  33. National Research Council. Rethinking the Ozone Problem in Urban and Regional Air Pollution; National Academies Press: Washington, DC, USA, 1992. [Google Scholar]
Figure 1. Thirty-six air quality monitoring stations with red dots and nine meteorological stations with blue triangles.
Figure 1. Thirty-six air quality monitoring stations with red dots and nine meteorological stations with blue triangles.
Ijerph 17 03172 g001
Figure 2. Fourier basis function system with K = 5 and T = 1 .
Figure 2. Fourier basis function system with K = 5 and T = 1 .
Ijerph 17 03172 g002
Figure 3. Ozone data fitting by using five Fourier basis functions.
Figure 3. Ozone data fitting by using five Fourier basis functions.
Ijerph 17 03172 g003
Figure 4. Methodology summary.
Figure 4. Methodology summary.
Ijerph 17 03172 g004
Figure 5. Improvement of AIC at each iteration for summertime modeling.
Figure 5. Improvement of AIC at each iteration for summertime modeling.
Ijerph 17 03172 g005
Figure 6. Estimated β c o n s ( h o u r ) , β P M 10 ( h o u r ) , β S O 2 ( h o u r ) , β N O 2 ( h o u r ) , β T E M P ( h o u r ) , β I R A I N ( h o u r ) , β U V B ( h o u r ) and σ ϵ 2 ( h o u r ) , with 90 % , 95 % , a n d 99 % - confidence bands.
Figure 6. Estimated β c o n s ( h o u r ) , β P M 10 ( h o u r ) , β S O 2 ( h o u r ) , β N O 2 ( h o u r ) , β T E M P ( h o u r ) , β I R A I N ( h o u r ) , β U V B ( h o u r ) and σ ϵ 2 ( h o u r ) , with 90 % , 95 % , a n d 99 % - confidence bands.
Ijerph 17 03172 g006
Figure 7. Estimated β c o n s ( h o u r ) , β P M 10 ( h o u r ) , β S O 2 ( h o u r ) , β N O 2 ( h o u r ) , β T E M P ( h o u r ) , β U V B ( h o u r ) and σ ϵ 2 ( h o u r ) , with 90 % , 95 % , a n d 99 % - confidence bands.
Figure 7. Estimated β c o n s ( h o u r ) , β P M 10 ( h o u r ) , β S O 2 ( h o u r ) , β N O 2 ( h o u r ) , β T E M P ( h o u r ) , β U V B ( h o u r ) and σ ϵ 2 ( h o u r ) , with 90 % , 95 % , a n d 99 % - confidence bands.
Ijerph 17 03172 g007
Table 1. The selection of model covariates according to AIC.
Table 1. The selection of model covariates according to AIC.
Iteration
SeasonIter 0Iter 1Iter 2Iter 3Iter 4Iter 5Iter 6Iter 7Iter 8Iter 9
SpringNO2SO2PM10COPRESUVBIwsIRAINTEMPDEWP
SummerNO2PM10SO2TEMPIRAINUVBDEWPPRESCOIws
AutumnNO2SO2TEMPDEWPPM10UVBCOIwsPRESIRAIN
Table 2. Criteria M S E ¯ , R 2 ¯ , and A I C under different numbers of Fourier basis.
Table 2. Criteria M S E ¯ , R 2 ¯ , and A I C under different numbers of Fourier basis.
ϕ ( h ) z ( s , t ) β ( h ) σ ε 2 MSE ¯ R 2 ¯ AIC
533357.580.9206−255,385
535356.320.9209−254,607
537356.470.9208−254,599
553352.610.9215−254,235
555352.250.9216−253,459
557352.390.9215−253,454
573352.140.9215−254,116
575351.620.9217−253,309
577351.760.9217−253,306
733332.950.9259−249,514
735331.880.9261−248,723
737331.990.9261−248,713
753330.060.9264−248,848
755329.800.9264−248,066
757329.900.9264−248,062
773329.090.9266−248,656
775328.970.9266−247,879
777329.050.9266−247,876
933324.080.9278−246,673
935323.130.9280−245,937
937323.190.9280−245,928
953322.070.9281−246,056
955321.800.9282−245,327
957321.860.9282−245,322
973321.280.9283−245,879
975321.120.9283−245,152
977321.130.9283−245,150
Table 3. M S E ¯ , R 2 ¯ , and AIC for the five models.
Table 3. M S E ¯ , R 2 ¯ , and AIC for the five models.
Number of BasisModel Criteria
β ϕ ( h ) z ( s , t ) σ ϵ MSE ¯ R 2 ¯ AIClogL 1Npar 2
Equation (22)0001880.540.5863−414,714−414,7007
Equation (23)5001171.550.7423−395,874−395,81231
Equation (24)000552.70.879−256,960−256,94010
Equation (25)070336.880.925−252,426−252,37028
Equation (26)570329.80.9264−248,066−247,95456
1 log likelihood, 2 number of parameters.
Table 4. χ 2 tests for significance of fixed effects.
Table 4. χ 2 tests for significance of fixed effects.
Covariate χ 2 Statisticp-Value
Cons282.770
P M 10 2114.060
S O 2 1048.500
N O 2 29,032.230
T E M P 5554.160
I R A I N 0.910.96
U V B 30,9340
Table 5. Estimates and standard error of parameter g , θ , and v .
Table 5. Estimates and standard error of parameter g , θ , and v .
Transition g θ [km]Variance v
EstStd.errEstStd.errEstStd.err
Basis 10.7390.01863.124.578422.14549.12
Basis 20.2290.02650.940.963799.47176.33
Basis 30.1790.0336.981.022027.63106.59
Basis 40.0340.03236.340.54896.8650.61
Basis 50.1060.03439.750.84702.6441.65
Basis 60.0430.04331.920.87191.0913.53
Basis 7−0.2100.04237.100.35151.8010.78

Share and Cite

MDPI and ACS Style

Wang, Y.; Xu, K.; Li, S. The Functional Spatio-Temporal Statistical Model with Application to O3 Pollution in Beijing, China. Int. J. Environ. Res. Public Health 2020, 17, 3172. https://doi.org/10.3390/ijerph17093172

AMA Style

Wang Y, Xu K, Li S. The Functional Spatio-Temporal Statistical Model with Application to O3 Pollution in Beijing, China. International Journal of Environmental Research and Public Health. 2020; 17(9):3172. https://doi.org/10.3390/ijerph17093172

Chicago/Turabian Style

Wang, Yaqiong, Ke Xu, and Shaomin Li. 2020. "The Functional Spatio-Temporal Statistical Model with Application to O3 Pollution in Beijing, China" International Journal of Environmental Research and Public Health 17, no. 9: 3172. https://doi.org/10.3390/ijerph17093172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop