1. Introduction
Ozone concentrations are the daily maximum 8 h moving averages of hourly ozone concentration data recorded in micrograms per cubic meter,
g/m
, which are key indicators of air quality. Monitoring the changes both spatially and temporally is very important for the assessment of air quality change, which has a great impact on our environment, society and economy. However, modeling the ozone concentrations is not an easy task since the ozone concentrations vary over space and time with complicated spatial structures, temporal structures and spatio-temporal interactions. Furthermore, the presence of missing data brings even more difficulties. As commented in [
1], although we cannot escape the “curse of dimensionality”, we can take advantage of recent developments in computing speed and numerical advances (e.g., Markov chain Monte Carlo) that allow us to implement Bayesian spatio-temporal dynamical models in a hierarchical framework. Such a framework provides simple strategies for incorporating complicated spatio-temporal interactions at different stages of the models’ hierarchy, and the models are feasible to be implemented for high-dimensional data. Two popular hierarchical Bayesian spatio-temporal models can be found in [
1,
2], among others. The latter one was used in [
3].
Ref. [
3] studied the ozone concentrations within
to
longitude and
to
latitude around the Pittsburgh region (
,
), in which all of the monitoring stations have missing data. That paper dealt with the missing problems in two steps. First, it filled in some of the missing measurements by using linear models so that the pattern of missing data became monotone (the monotone missing is also referred to as the staircase pattern). Second, it applied
hierarchical
Bayesian
spatio-
temporal (HBST) modeling proposed in [
2] on this staircase of missing data to estimate the hyperparameters of the spatial-temporal model. Based on the estimated hyperparemeters, it estimated the spatial correlation function for the monitoring stations. Then, it estimated the covariance matrix for all of the stations and derived the predictive distribution for the ungauged sites.
Generalized linear models can be used to accommodate non-Gaussian geostatistical data (e.g., see [
4]). Ref. [
3] selected the generalized linear model with the quasi-Poisson family as an appropriated spatial correlation function by examining the pattern of spatial correlations obtained via the hierarchical model in the plot. However, their link function is not appropriate if there are negative correlations. This is a strong restriction because negative correlations are common for the ozone concentrations and other spatial-temporal data. Moreover, choosing a model by examining the plots derived in terms of the observed data set is not rigorous enough and may only be suitable just for a particular data set.
In this paper, we propose a method to estimate the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields based on the estimations from hierarchical Bayesian spatio-temporal modeling. For demonstration, we apply the proposed method on the same data as in Jin et al. [
3]. Without any assumption on the correlation structure, the proposed method is more general than the method in [
3] such that it is applicable to other spatio-temporal data sets. Using the covariance matrix estimated by the proposed method on the entropy criterion in the environmental network design problem, our study provides interesting findings, and the locations of the selected ungauged stations are more reasonable. We provide comparison of these two methods through leave-one-out cross-validation, which shows that the proposed method provides improved results.
The paper is arranged as follows. In
Section 2, we briefly introduce hierarchical Bayesian spatio-temporal modeling. In
Section 3, we describe the ozone concentrations in the Pittsburgh region and apply the hierarchical Bayesian spatio-temporal modeling techniques for filling in missing measurements following [
3]. In
Section 4, we model the ozone concentrations in the Pittsburgh region. We first introduce the method for estimating the covariance matrix through a dimension expansion method for modeling the semivariograms in nonstationary fields, and we then give spatial predictive distributions on the ungauged sites using the covariance matrix estimated by the proposed method. In
Section 5, we present the results of the entropy of the predictive distributions and an optimality criterion for extending an environmental network. In
Section 6, we provide the model evaluation through leave-one-out cross-validation. We conclude this paper with a conclusion in
Section 7.
Throughout the rest of the paper, the -norm of a vector is denoted by , a identity matrix is denoted by , the transpose of a matrix A is denoted by and the trace of a square matrix B is denoted by . In addition, ‘⊗’ represents the Kronecker product, refers to a matrix Gaussian distribution, denotes a matric-t distribution, stands for the inverted Wishart distribution (see (a) of the appendix for definitions of these distributions) and denotes the generalized inverted Wishart distribution.
2. Hierarchical Bayesian Spatio-Temporal Modeling
We briefly describe HBST modeling in this section, which is the same as that given in [
3] excluding Step 3 in the HBST modeling procedure. It is noted that this modeling is a special case of the HBST modeling presented in Chapter 10 of Le and Zidek (2006) excluding Step 3 in the HBST modeling procedure.
Define the following notations:
d = number of different type stations (e.g., agricultural, residential, commercial and industrial);
n = number of time points (e.g., number of days);
u = number of locations with no monitors (i.e., ungauged sites);
g = number of locations with monitors (i.e., gauged sites).
The stations are organized into
k blocks where the
(
) sites in the
jth block have the same number of timepoints
at which no measurements are taken. These blocks are numbered so that the measurements correspond to a monotone data pattern or a staircase structure, that is,
The response variables are written as
Here,
of dimension
denotes the unobserved responses at ungauged sites while
of dimension
is given by
where
is an
matrix of missing measurements at the
gauged sites for the
time points and
is an
matrix of observed measurements at the
gauged sites for the
time points.
We assume that the response matrix
Y follows the Gaussian and generalized inverted Wishart model specified by
where
B is an
coefficient matrix with the hyperparameter mean matrix
and the variance components
,
X is the matrix of covariates which is defined in (
4) and
is a set of model parameters specified below.
We partition
B corresponding to the
l time-varying covariates in conformance with the block structure as
By assuming an exchangeable structure across sites, B can be written as , where is the hyperparameter matrix and with for Station j under Class i and otherwise.
Likewise, we partition the
covariance matrix
over gauged and ungauged sites conformably as
where
is a
matrix being for the ungauged sites. Further, we partition the
covariance matrix
for the gauged site blocks as follows:
Similarly, for
, we put
We reparametrize the matrix
through the recursive one-to-one Bartlett transformation for the two blocks:
where
and
. Similarly, by applying the Bartlett decomposition, we can represent the submatrix
, for
, as
where
and for
,
with
Therefore, the GIW prior distribution for
in (
1) is equivalently defined in terms of
and
,
as follows:
where
is the slope of the optimal linear predictor of
based on
and
is the residual covariance of the optimal linear predictor. Similar interpretations can be applied to
and
, for
Let
be the set of the hyperparameters in (
1) and (
2), i.e.,
, where
,
with degrees of freedom parameters
. Write
. Here,
, which represents the hyperparameters involved in the marginal distribution of
.
If a data matrix appears to be an ascending staircase, the HBST modeling procedure is given as follows:
- Step 1.
Compute the hyperparameter values that maximize the marginal distribution
using an empirical Bayesian approach (see (b) of
Appendix A). The EM algorithm is used to obtain
.
- Step 2.
Obtain the predictive distributions
of missing measurements as in (c) of
Appendix A. Fill in the missing data by using the predictive distributions.
- Step 3.
Obtain the estimate
from the estimate of
. In terms of
, obtain the estimate of the covariance matrix by using a dimension expansion method given in Qin et al. [
5] and the thin-plate spline method given in Wabba and Wendelberger (1980). The details are given in
Section 4.1.
- Step 4.
Estimate the hyperparameters
and obtain the conditional predictive distribution
(see
Section 4.2).
5. Environmental Network Extension
Assume that
Y has the density function
f. The total reduction in uncertainty of
Y can be presented by the entropy of its distribution, i.e.,
, where
is a not necessarily integrable reference density (Jaynes [
11]). According to the predictive distribution (
7), the total entropy
can be defined as
where
is a constant depending on the degree of freedom and the dimension of the ungauged sites.
The key step in expanding an environmental network is to find appropriate ungauged sites to add to the existing network that maximizes the corresponding entropy. We use the following optimality criterion as given in [
3]:
The
sites, in a vector of dimension
, are selected to maximize the entropy in (
8). In [
3], the grid points
were selected with the highest entropy 11.3774. The proposed method selects the grid points
with entropy 12.1207. This selection is more reasonable, as they are not gathered in the southeast corner of the region like
The selected sites among 100 grid points by the two methods are shown in
Figure 4 below.
6. Model Evaluation
In this section, we use the leave-one-out cross-validation to evaluate the accuracy of the predictive model derived using the proposed method and compare the proposed method with the one in [
3]. We select the observations from one of the original 25 stations as validation data, and observations in the remaining 24 stations are treated as training data. We use the data from day 855 to day 1586 at the end of the study from each station to evaluate the prediction because during this period, none of the stations has missing data. By choosing this period, we avoid using the Bayesian hierarchical modeling technique for estimating the missing data in the training data set, which is time-consuming and not our intention for evaluating the proposed method on estimating the covariance matrix. Station 22 is excluded because it is the only industrial station in the study. For each of the 24 stations, we generate 100 samples from the predictive distribution with parameters estimated using observations from the rest of the 23 stations. We compute the average of relative absolute bias (ARAB) as
, where
is the
jth sample generated from the predictive distributions and
is the observation from Station
i on time
t. The results are given in
Table 2.
In
Table 2, “-” means that there is no prediction for the station because there are negative correlations and the method in [
3] is not applicable to estimate the predictive distribution. The results in
Table 2 show that the proposed method provides slightly more accurate predictions than the one in [
3] for most of the stations. More important is that, when there are negative correlations obtained from the estimations of the hierarchical Bayesian spatio-temporal modeling technique, the method in [
3] fails to estimate the covariance matrix, while the proposed method still provides accurate predictions except for Station 3. This is expected because Station 3 is an influential station. Therefore, if we use observations at Station 3 as the validation data set, it has a great impact on estimating the covariance matrix.