1. Introduction
The occurrence of extreme events such as extreme temperatures, excess flood peaks, and rapid increases in pollutant concentrations has steadily increased over the past decade. However, the volume of data generated by such events is relatively small compared to big data generated by normal events occurring daily. In this case, providing accurate predictions for extreme events might be a challenging task due to the insufficient number of past records. Bayesian methods may be more suitable for modeling data to small sample sizes with skewness or lack of symmetry as in the case of record values, provided that the Bayesian method involves a reasonable choice for the prior distribution because the Bayesian methods do not rely on asymptotic theory in the same way that frequentist methods do.
The concept of record values was introduced by Chandler [
1] and it can be described as follows. Let
be a sequence of independent and identically distributed random variables. Then, for every
,
is called the upper (lower) record value if
, which indicates that
is higher (lower) than all previous observations. That is, the upper (lower) record values include the members of a series that are larger (smaller) than all preceding members. The indices for which upper record values occur are informed by the record times
, where
,
, with
. The record times for the lower record values are
, where
, with
. Therefore, the sequences of the upper and lower record values are denoted as
and
, respectively, from the original sequence
.
Since such record values arise in many real-world situations related to climate, economics, sports, and life test, relevant studies have been conducted in various fields. Coles and Tawn [
2] analyzed a daily rainfall series for modeling the extremes of the rainfall process in the context of record values. Madi and Raqab [
3] analyzed average temperatures in Neuenburg, Switzerland, using a Bayesian predictive method for record values from the Pareto distribution. Wergen et al. [
4] analyzed both the probability of occurrence and PDF of record-breaking values for temperatures in Europe and the United States. Seo and Kim [
5] proposed an objective Bayesian inference method for record values from the Gumbel distribution, which was applied to the concentration analysis of sulfur dioxide. Seo and Kang [
6] proposed a estimation method using record values from an exponentiated half-logistic distribution in Bayesian and non-Bayesian perspectives. The authors demonstrated the efficiency of their method by comparing it to the existing estimation methods through rainfall data analysis.
This paper proposes a predictive method based on an objective Bayesian approach that can save the effort of finding an exact prior distribution when there is no sufficient information in the context of record statistics values from the exponentiated Gumbel distribution (EGD) with cumulative distribution function (CDF)
where
and
denote the scale and shape parameters, respectively. The EGD is a generalized version of the GD, which is the most widely applied statistical distribution in the extreme value analysis of extreme events such as global warming, floods, heavy rainfall, and high wind speeds. The EGD can be considered as being simply the
th power of the CDF of the GD with the scale parameter
. Therefore, the EGD can lead to an improved performance and applicability of models built over a variety of complex datasets compared to the GD. Note that it is possible to apply time-series techniques if no data are lost during acquiring record values since the lower record time
is the serial number of record values in an infinite time series. For comparison with the proposed objective Bayesian predictive method, two types of time-series models are considered in this study: the autoregressive integrated moving average (ARIMA) model introduced by Box et al. [
7] and the dynamic linear model (DLM) developed by West and Harrison [
8].
The rest of the paper is organized as follows.
Section 2 presents objective priors for unknown parameters of the EGD in the context of record statistics values, along with the corresponding posterior analysis and predictive method.
Section 3 provides a brief description of the ARIMA and DLM that are employed in this study as benchmarks to validate the proposed objective Bayesian method.
Section 4 presents the results of applying both the time-series and objective Bayesian models to real data.
Section 5 concludes this paper.
2. Bayesian Prediction
The aim of this study is to predict future lower record values based on an objective Bayesian approach. To accomplish this, an objective Bayesian approach that does not require determining hyperparameters is presented first. The following subsection introduces objective priors based on the Fisher information (FI) matrix for unknown parameters of the EGD with the CDF (
1).
2.1. Objective Prior
Let
be the lower record values from the CDF (
1). Then, the corresponding likelihood function and its natural logarithm can be expressed as
and
respectively. For computational convenience, let
=
. Then, based on the log-likelihood (
2), the FI matrix for
can be defined as follows.
Proposition 1. The FI matrix for is of the formwhereand and are the digamma and trigamma functions, respectively. Proof. In the FI matrix
the element
can be easily computed, while the other elements can be expressed as
and
Then, the proof is completed given the marginal density function of
defined in Ahsanullah [
9] as
and assuming
. □
The objective priors such as the Jeffreys and reference priors based on the FI matrix (
3) are defined according to the following theorem.
Theorem 1. The Jeffreys prior for is Proof. According to the definition of the Jeffreys prior (Jeffreys [
10]), it follows that
where
denotes the determinant of the FI matrix (
3). This completes the proof. □
In the following, the reference priors for each parameter of interest are derived from the algorithm provided in Berger and Bernardo [
11].
Theorem 2. If λ is the parameter of interest, the reference prior for isand, if θ is the parameter of interest, the reference prior for is Proof. When
is the parameter of interest, the conditional prior distribution of
given
can be defined based on the FI matrix (
3) as
Then, by choosing a sequence of compact sets
for
such that
,
,
as
, it follows that
and
In addition, the marginal reference prior for
can be defined based on the FI matrix (
3) and (
4) as
which leads to the following reference prior:
for any fixed point
. When
is the parameter of interest, the same argument is applied.
In addition, the marginal reference prior for
can be expressed as
from which the reference prior can be expressed as
for any fixed point
. This completes the proof. □
Note that, since all the derived objective priors are improper, the corresponding posterior distribution should be proved to be proper. Since the Jeffreys prior and reference prior have the same form, the notation is used from now on.
2.2. Posterior Analysis
Let
be the observed lower record values. Then, the objective prior
results in the following marginal posterior density functions of
and
:
and
respectively. Note that the marginal posterior distribution of
has a gamma distribution with the parameters
k and
. Then, the Bayes estimators under the squared error loss function from the marginal posterior density functions (
5) and (
6) can be expressed as
and
respectively.
In terms of
, the corresponding marginal posterior density function can be expressed as
Since it is the PDF of an inverse gamma distribution with the parameters
k and
, the Bayes estimator of
under the squared error loss function can be expressed as
Theorem 3. From the Frequentist perspective, the estimator is an unbiased estimator of σ.
Proof. According to Lemma 2 provided in Wang and Ye [
12], independent and identically distributed random variables from the uniform distribution on
are defined as
where
are independent random variables having
distributions with
degrees of freedom. Then, the estimator
has a gamma distribution with the parameters
and
for any
because
has a gamma distribution with the parameters
and 1. This completes the proof. □
The highest posterior density (HPD) credible intervals (CrIs) for
and
can be constructed by generating the MCMC samples from the marginal posterior density functions (
5) and (
6), respectively. However, since the marginal posterior density function (
5) is not a well-known probability distribution, sampling from it is a difficult task. Instead, sampling can be indirectly performed from the relationship
because the conditional posterior distribution
has a gamma distribution with the parameters
k and
. To achieve this,
should be generated from its marginal posterior distribution first, and then
should be generated from its conditional posterior distribution given the generated value of
. Finally,
can be computed. Then, the
equal-tails (ETs) and HPD CrIs can be constructed for
using the method proposed in Chen and Shao [
13].
Under the prior , the resulting posterior is proper because . However, since it is not possible to express it in a closed form as we know it, MCMC samples for and can be generated using the Metropolis–Hastings algorithm. For efficient mixing, the proposal variance-covariance structure is updated adaptively. For the corresponding Bayes estimators under the squared error loss function, the notations and are used.
2.3. Prediction
Let
be the future lower record values. Since the sequence
,
is a Markov chain, the conditional density function of
given
is the same as that of
given
. That is, it follows that
by Ahsanullah [
9]. Then, for the EGD with the CDF (
1), (
7) becomes
and the corresponding Bayesian predictive density function can be expressed as
where
is a general joint posterior distribution for
.
Note that it requires very complex and tedious computations. In fact, there is no guarantee that it can be expressed in a closed form. Instead, a much simpler approach is to use the pivotal quantity that can be obtained by the transformation of a random variable.
Let
in the conditional density function (
8). Then, it has a gamma distribution with the parameters
and 1 with a PDF of
because
maps onto
and the Jacobian of the transformation is
which leads to the following algorithm for generating the MCMC samples
.
- Step 1a.
Generate from the gamma distribution with the parameters and 1.
- Step 1b.
Generate and from the joint posterior distribution .
- Step 2.
- Step 3.
Repeat steps 1 and 2, N times.
Then, the corresponding
predictive interval (PI) for
can be constructed using the method proposed in [
13] as in the case of
and
. For the purpose of clarity,
and
are used to denote future lower record values under the priors
and
, respectively.
3. Time Series Approach
Providing that record values are observed from time series of uncorrelated random variables sampled from continuous probability distributions, the proposed Bayesian method presented in the previous section is compared to the ARIMA and DLM time-series techniques as described below.
The ARIMA model is the most widely used approach to time-series forecasting. Conventionally, it is defined using three components (p, d, q), where
p denotes the order of the autoregressive (AR) term
d denotes the number of differencing required to make the time series stationary
q denotes the order of the moving average (MA) term.
Here, the autoregressive (AR) process assumes that the current value of the series
can be expressed as a function of
p past values
in a form of
for
, where
,
is the mean of this process,
are constants
, and
is a weak white noise series with a mean of zero and a variance of
. The MA process uses past forecast errors expressed as
that is, a weighted average of the past values of the white noise process
. Then, the time series
is an ARIMA
process if
is ARMA
obtained by combing the AR and MA terms, where
is the
dth-order differencing operator. For non-stationary data, one usually fits an ARMA model after taking differences for the data until stationarity is achieved.
The second time-series approach considered in this study for comparison with the proposed method is the DLM. Let
be an
m-dimensional vector observed at time
t, while
be a generally unobservable
p-dimensional state vector of the system at time
t. Then, the DLM can be defined as
for each time
together with a prior distribution for the
p-dimensional state vector at time
,
, where
and
are known matrices of
and
, respectively, and
and
are variance matrices. Furthermore, it is assumed that the error sequences
and
are independent, and independent of
.
Note that the lower record value from a univariate time series has a strong trend of decreasing through time. Therefore, a DLM with
is considered, namely, the linear growth model (LGM)
where
and
denote the local level and local growth rate at time
t, respectively, and
,
, and
denote uncorrelated errors.
4. Sulfur Dioxide Data
This section demonstrates the superiority of the proposed Bayesian method by comparing it to the ARIMA and DLM methods.
The three methods are applied to the time-series data representing sulfur dioxide emissions in the United States (U.S.) from 1970 to 2017 (in 1000 tons) measured by the U.S. Environmental Protection Agency. Due to the implementation of the Acid Rain Program created under Title IV of the 1990 Clean Air Act, sulfur dioxide emissions have decreased significantly over the last decades through a cap and trade program for fossil-fuel powered plants. The observed volume of sulfur dioxide emissions and its descriptive statistics are presented in
Figure 1 and
Table 1, respectively. Note that each data point was divided by 1000 for computational convenience; given that the data continued to decrease during the observation period as shown in
Figure 1, they were all used without losing data during acquiring the lower record values.
To conduct the goodness-of-fit test for the observed sulfur dioxide data, the replicated data are first considered. If the estimated model is adequate, then it should look similar to the observed data. The replicated data are generated from the Bayesian predictive function
and denoted as
for
. Under each prior distribution, the correlation coefficient of the mean
and observed lower record values
r can then be computed. For further examination, the weighted mean squared error (WMSE)
is also computed. These results are reported in
Figure 2.
It can be noticed from
Figure 2 that the estimated models fit the observed sulfur dioxide lower record values very well, and the estimated models under the priors
and
provide almost the same results for the considered statistical criteria.
Table 2 reports the estimation results for the derived priors for comparison and corresponding maximum likelihood counterparts. For the maximum likelihood procedure, the maximum likelihood estimators (MLEs)
and
are obtained by maximizing the log-likelihood function (
2), while the approximate
confidence intervals (CIs) are calculated based on the MLEs as
where
denotes the upper
percentile point of the standard normal distribution, and
and
are the diagonal elements of the asymptotic variance-covariance matrix of the MLE obtained by inverting the Fisher information matrix (
3). For the shape parameter
, the Bayes estimate
has a slightly lower value than the other estimates
and
that have almost the same values. However, the
HPD CrI under the prior
has the shortest length. For the scale parameter
, all estimates vary slightly, while the approximate
CI based on the
has a shorter length than the other CrIs have.
For prediction, the last lower record value is assumed to be known. As mentioned earlier, since no data are lost during acquiring the observed lower record values, the time-series analysis outlined in
Section 3 can be conducted at the same time.
Table 3 reports the prediction results for the next lower record value
. The R package
dlm (Petris [
14]) was used to estimate the parameters and state vector in the LGM (
9) with
In addition, for the ARIMA model, ARIMA
was chosen as the best model in terms of the corrected Akaike Information Criterion (AICc), indicating that it has the smallest AICc value when
with
after differencing the data twice. The forecast accuracy are evaluated in terms of the mean absolute deviation (MAD), the mean square error (MSE), and the mean absolute percentage error (MAPE), defined respectively as
where
and
is a point forcast of
. In this example,
and
. These results are reported in
Table 4, which indicates that there is little difference in the predictive performance of the two models.
Table 3 shows that the proposed Bayesian PI has a shorter length than those obtained for the considered time-series ARIMA model and DLM, especially under the prior
. That is, the predictive result under the prior
shows the best performance in terms of uncertainty. Using the best performing Bayesian predictive model in terms of uncertainty, the Bayesian predictive density functions for the three future record values are estimated as the kernel density functions based on their MCMC samples. The results are plotted in
Figure 3, which shows that both estimated Bayesian predictive models under the priors
and
have a greater variance as the future record time
increases.
5. Conclusions
This paper defined the Jeffreys and reference priors for unknown parameters of the EGD based on record values and proposed a Bayesian method for predicting future record values. The method makes it very easy to generate MCMC samples for prediction. To validate the proposed method, it was compared to two time-series approaches, namely, the ARIMA model and DLM, using a sulfur dioxide emissions dataset. The results of the comparison demonstrated that the proposed method outperforms the time-series approaches in terms of uncertainty.
While there was no clear difference in the results of the goodness-of-fit tests among the proposed objective prior distributions when analyzing the observed data, the results of forecasting under the prior distribution were better than those under the prior distribution ; both derived prior distributions were proved to be valid.