1. Introduction
The energy markets have been known as one of the most volatile financial markets for a long time, where significant fluctuations have been observed in their production and consumption [
1]. Till now, a majority of previous research has focused on modeling and forecasting the physical production and consumption of energy, as well as the movement of the prices determined. Typically employed models include econometric models, artificial intelligence models,
etc. [
2,
3]. However, the observed significant volatility and the risk measurement in the financial energy markets have received much less attention, despite their significant impact on the trading, operation and risk management practice of the energy markets. Accurate measurement of the market price fluctuations and risk level in the energy markets has both theoretical and practical merits [
4].
Theoretically, energy market risk measurement represents an important and tough research problem in the field of risk measurement [
4]. Firstly, as energy is difficult to store and usually costs too much in storage, it would result in unique price fluctuation patterns and very significant market risk exposure, so that the current market risk measurement methodology no longer suffices. New and innovative market risk measurement models are needed theoretically. For example, since electricity is difficult to store and the price is settled instantaneously, it is a financial asset mostly of a consumption nature and demonstrates a significantly high level of volatility [
5]. The extreme and transient events are prevalent in the market with very different behavioral patterns than the other financial markets [
6]. The electricity risk measurement needs to be conducted on a much shorter scale and needs to pay particular attention to the finer market structure, especially the impact of the transient and disruptive risk factors on a short-term time horizon [
7]. Secondly, market risk measurement is closely related to other important research topics, including volatility forecasting, electricity derivative pricing, portfolio optimization,
etc. Value at risk serves as an important
ex post measure for the evaluation of the volatility forecasting accuracy, which is a critical parameter for financial derivative pricing.
Practically accurate electricity market risk measurement serves as the foundation for the risk monitoring system and the enterprise risk management system in the electricity related industry. It is a critical factor to consider and control during the operation management and capital budgeting process for any electricity-intensive companies, such as the electric grid, paper manufacturing industry, steel industry,
etc. It is also an important risk factor to measure and needs to be integrated into the enterprise-wide risk measure for the cost management and operation management of most manufacturing companies. For example, according to the survey study by KPMG management consulting firm, half of the power plants in Russia pay close attention to the risk measurement, and they usually use value at risk to measure the risk [
8].
Over the years, numerous empirical research has conducted the measurement of the market risk in the energy markets. In the long-term time horizon, we have witnessed widespread applications of econometric approaches to model the market volatility, especially considering the spillover effects among different energy markets and other financial markets, in the equilibrium conditions in different energy markets. For example, Chevallier [
9] proposes a bivariate model to forecast the density function of West Texas Intermediate (WTI) crude oil future returns. They found that the stochastic jump components contain useful information that needs to be incorporated into the model and could help improve the forecasting accuracy [
9]. Barunik
et al. [
10] analyzed the dynamic asymmetry in the volatility spillover among petroleum markets. They found that there are significant regime changes in volatility asymmetry around both 2001 and 2008 [
10]. Lin and Li [
11] used the Vector Error Correction-Multivariate Generalized Autoregressive Conditional Heteroskedasticity (VEC-MGARCH) model to identify the spillover between crude oil and gas markets [
11]. In the short-term time horizon, the volatility level increases significantly, and its nonlinear dynamic movement becomes more complex. More assumption-free data-driven approaches show the superior performance. Typical examples would be the multi-fractal models, neural network model,
etc. Particularly, Youssef
et al. [
12] used the Fractional Integrated Generalized Autoregressive Conditional Heteroskedasticity (FIGARCH) model to analyze the fractal, long memory and extreme risk characteristics and estimated downside risk measures. They found that the fractal and extreme value-based model would result in the improved risk estimate accuracy in the energy markets [
12]. Wang
et al. [
13] proposed a Markov switching multifractal model to forecast the crude oil volatility at a higher level of accuracy [
13]. Bildirici and Ersin [
14] introduced the neural network model to augment the nonlinear econometric models and achieved improved crude oil volatility forecasting performance [
14].
Although the data-driven approaches have shown that empirical data contain more complex data features that violate the assumptions of most traditional approaches and result in interior performance, what exactly constitute these complex data features remains an open question in the literature. Recently, in the literature, the recognition and modeling of the heterogeneous risk factors, which exhibit drastically different data characteristics, emerged as one alternative approach to explain and model the widely-acknowledged nonlinear risk movements. For example, we would expect the supply and demand risk factors to demonstrate more stationary long-term behaviors. The shocks in some other macroeconomic factors may translate into more short-term stochastic behaviors. This is why current modeling approaches encounter difficulty, because they usually assume that the market is efficient and homogeneous, dominated with one particular data feature only. This assumption may approximate the data over the long-term time horizon. However, for the short and medium time horizons, when the markets do not reach the equilibrium, this assumption is violated. They need to be relaxed further to allow the existence of heterogeneous data features for more accurate modeling of the nonlinear data movement.
When the homogeneous assumption is relaxed, we may assume that the market is dominated with several heterogeneous risk factors. These risk factors are governed by different underlying Data Generating Processes (DGPs). Some well-known data features include autocorrelation, volatility clustering, etc. The multivariate time series models have been proposed to model the multivariate autocorrelation in the mean and correlations and volatility clustering data characteristics in practice. Typical models include the multivariate Autoregressive Moving Average (ARMA) models and multivariate GARCH models. Recently, the non-normal dependence structure among financial portfolio and the multiscale data characteristics are two recent emerging data features that have attracted much research attention. They also give rise to some promising methodologies to deal with the complex data features.
With the heterogeneous data structure in the practical data, the selection and integration of appropriate models capturing different DGPs is an important research problem. The key research issue is to explore a new methodology to integrate different models to capture different components of data. Traditionally, ensemble models in the literature adopting linear or artificial intelligence models have been widely used. For example, Alamaniotis
et al. [
15] proposed a genetic algorithm that combined relevance vector machines and linear regression models. They applied it in electricity price forecasting and achieved improved forecasting accuracy, comparing to the traditional ARMA model [
15]. However, traditional ensemble model assumed that models are independently capturing data characteristics. This integrates different models
ex post, where significant estimate bias resulting from the violation of the assumptions of different models is accumulated. Recently, an alternative
ex ante approach emerged, under different names, such as divide and conquer, decomposition,
etc. For example, Yu
et al. [
16] purposed the divide and conquer principle to construct a data characteristics-driven decomposition ensemble methodology. The application of this methodology in the crude oil price forecasting showed improved performance [
16]. Yu
et al. [
17] proposed a decomposition and ensemble learning paradigm for crude oil price forecasting [
17]. The key to all of these approaches is to separate data components with unique data characteristics
ex ante and to select models with the data-matching assumption. Then, the chosen models are integrated together, with lower biases and higher forecasting accuracy. However, current approaches are mainly
ad hoc. We argue that different scales in the data could be used as the criteria of separation, so that different models can be used to model different data features. Thus, we propose a new multiscale methodology to estimate value at risk for high dimensional portfolio assets. Following this methodology, we propose the BEMD-copula-based model to demonstrate the effectiveness of the model in capturing the multiscale dependence structure in the data. We also propose to use entropy as the criteria to determine the parameters for BEMD-copula model. Empirical studies using the Australian electricity data as a typical example show that there exists a dynamic time varying dependence structure between New South Wales (NSW) and Queensland (QLD) markets. We show that the proposed model incorporating the multiscale dependence structure tracks the risk evolutions more closely and achieves statistically significant performance improvement in risk measurement accuracy and reliability. This lends further support to the effectiveness of our proposed methodology.
Work in this paper contributes to the literature in the following aspects. Firstly, a new multiscale methodology has been proposed to analyze the multiscale characteristics of risk movement and co-movement. For the proposed methodology, we have identified some important research problems, including the choice of models and the determination of model parameters. Secondly we have proposed a BEMD-copula model under the proposed methodology. We have proposed the alternative entropy-based approach to tackle these research problems. We show that there are different constituent portfolio components with a multiscale structure characterized by different copula functions. The mixture of time varying dependence structure is revealed and explicitly modeled in the BEMD transformed domain. We have confirmed in the empirical studies that the proposed model provides the most accurate characterization of the empirical data. More importantly, with the proposed model, we address explicitly the determination of the multiscale model specifications. We found that the dependence structure is time varying in both the model specifications and parameters across different time scale. During the model fitting process, we found that the entropy measure serves as a better measure to determine the BEMD decomposed multiscale structure.
The rest of the paper proceeds as follows. In
Section 2, we present the comprehensive review on the multiscale analysis in the energy markets, as well as the copula model in the energy markets. In
Section 3, we propose and illustrate the multiscale methodology for the Portfolio Value at Risk (PVaR) estimate. We also illustrate in detail the numerical algorithm of the BEMD-copula-based model. In
Section 4, we report and analyze in detail the original results from experiments conducted using the Australian electricity market data.
Section 5 provides some summarizing remarks about the main results, contributions and implications of the work in this paper.
3. A Multiscale Dependence-Based Methodology for Portfolio Value at Risk Estimate
PVaR is defined as the threshold downside risk statistics measuring the maximal level of losses on the portfolio under the normal market conditions, at the given confidence level and investment time horizon [
48]. As far as statistics is concerned, PVaR is mostly concerned with the estimation of the particular quantile of the empirical distribution. The prevailing methods to estimate VaR in the literature mainly include parametric, non-parametric and semi-parametric approaches [
48]. The parametric approach goes to one extreme by imposing very strict assumptions in deriving the analytical solutions. Riskmetrics’ Multivariate Exponential Weighted Moving Average (MEWMA) and DCC-GARCH models are two typical approaches [
49]. The non-parametric approach goes to the other extreme by imposing the least amount of assumptions and taking data-driven approaches. The ideal model should impose more realistic assumptions and have a high level of traceability. Thus, in recent years, new methodologies with more relaxed assumptions have emerged. They are usually classified as semi-parametric approaches and may include copula theory, Extreme Value Theory (EVT) and signal processing techniques, such as Fourier analysis, wavelet analysis, the EMD algorithm,
etc. [
34].
Given the diverse range of data features available, we propose an integrated methodology that takes a decomposition-based multiscale approach to estimate the PVaR of the empirical data distribution. We assume that the data are generated with different DGPs. In the proposed model, we assume that different DGPs are distinguished from one another with different time frequency characteristics. This would imply a one-to-one correspondence between the time scales and frequency scales. One implication of the assumption in the case of risk assessment is that volatilities are independent and heterogeneous, while volatilities are homogeneous within scales. Thus, with the chosen basis functions, the time scale can be used to project the original data into the time scale domain and to separate different DGPs. In the multiscale approach, there are different model specifications and parameters employed across different time scales, to capture different frequency information. Some typical multiscale models used would include the recent emergence of Empirical Model Decomposition (EMD), wavelet analysis,
etc. Multiscale theory can be used to empirically derive the constituent data components at different scales, given that the constituent data components are separable and distinguishable by their different speeds of oscillation and the size of fluctuation. The methodology is illustrated as in
Figure 1.
In the first step, in the general case of a portfolio of two assets with returns
at time
t, a particular multiscale model is used to project the given multivariate energy data series simultaneously into the multiscale domain up to scale
N, as in Equation (1).
where
and
refer to the original return series and the decomposed return series at scale
j at time
t, respectively. Among them, decomposed data at one scale
n are assumed to be the transient data governed by disruptive factors. Data at the rest of the scales are assumed to be the main factors governing the normal data behaviors in the market.
In the second step, using the training data, we fit a series of multivariate models across different scales. More specifically, we estimate the parameters for current multivariate models with different distributions used to fit the energy data in each market individually.
In the third step, we assume that one scale corresponds to the transient data, while the other scales correspond to the normal data. The transient data reflect the market behavior of the transient and disruptive risk factors, while the normal data reflect the joint influences of the normal market risk factors. We assume that the risks of both normal and transient data contribute equally to the overall market risk level. Thus, using the model tuning data, the aggregated variance-covariance estimate is calculated with the equal average of the individual variance-covariance matrix for both normal and transient risk factors as in Equation (2).
where
is the aggregated conditional variance-covariance forecast matrix at time
t.
and
are conditional variance-covariance forecast matrices for returns over both normal and transient phenomenon, respectively, at time
t.
is the covariance matrix at time
t,
and
.
In the fourth step, we estimate PVaR for the model tuning data using the estimated conditional mean matrix and conditional variance-covariance matrix.
is estimated as in Equation (3) [
48].
where
.
refers to the variance-covariance matrix.
refers to the weight matrix for the portfolio.
refers to the relevant quantile from the standard normal distribution.
P is the invested portfolio value.
h is the holding period.
is the
threshold at the confidence level
.
In the fifth step, assuming that the in-sample PVaR model performance indicates the out-of-sample generalizing PVaR model performance, we adopt different performance measures and decision rules to evaluate the in-sample model performance using the model tuning data. Some typical decision rules and performance measures may include the minimization of the number of exceedances, the maximization of the p-value of the Kupiec backtesting procedure and the minimization of the entropy value.
In the sixth step, with the determined model specifications and parameters, including the optimal scale for the transient factors, we estimate the PVaR for the test dataset. Different performance measures, such as the Kupiec backtesting procedure, are used to evaluate the model performance.
Under the proposed multiscale dependence methodology, we propose a BEMD-copula-based portfolio value at risk model to estimate the portfolio risk assessment. Key to the proposed BEMD-copula PVaR model is the introduction of the BEMD model and copula model, as particular implementations of the proposed methodology.
Firstly, EMD is a recently-emerging signal processing technique to explore, decompose and analyze the underlying data structure, with a multiscale approach. It adaptively extracts the underlying data components, represented as a set of oscillation functions called Intrinsic Mode Functions (IMFs) with different amplitudes and frequency bands, i.e., different fluctuation bands. In the univariate setting, these data characteristics of different amplitudes and frequencies across different scales correspond to different oscillations or fluctuation levels. In the multivariate setting, defining the boundary for different data structures across different scales represents much more difficult research problems. The original concept of univariate oscillations is extended to the rotation in the multivariate case. There are several attempts to define the boundary in the multivariate setting in the literature, including modulated multivariate oscillation models, complex EMD, rotation-invariant EMD and univariate EMD. The bivariate EMD is usually preferred, as it guarantees the same number of IMFs across data channels and is very flexible in the number of projection channels. In the bivariate setting, this provides an important alternative to the traditional multivariate distribution-based analysis to separate the data components with unique characteristics in both their dependence structure and individual patterns, regardless of their particular distributional assumptions.
The BEMD algorithm is illustrated in Algorithm 1.
Algorithm 1: Bivariate Empirical Mode Decomposition Algorithm |
Identify the number of projection directions N and compute the projection directions; , Project the complex valued signal x(t) on directions; Extract all local maxima of , namely (i is denoted as the No. of individual local maxima); Interpolate the set by spline interpolation to obtain the partial envelop curve in direction, and repeat Step 2 to Step 4 until the envelop curves in all N projected directions are obtained; Calculate the mean of all envelop curves according to the following equation: Extract and obtain as such; Check whether is an IMF. If not, replace ) with and repeat Step 2 to Step 6 until is an IMF; otherwise, repeat the step from Step 2 on the residual signal; Finally, the original signal can be expressed as the following equation: where K indicates the total number of IMFs and respectively represent the subtracted complex IMF and residue.
|
Secondly, the copula theory is also introduced in the proposed BEMD-copula algorithm to model the non-normal dependence structure in the decomposed risk structure, as well as its dependence structure across different scales. In Heterogeneous Market Hypothesis (HMH), the dependence structure across different scales is assumed to be different, exhibiting distinctive frequency characteristics, reflecting different time varying dependence. This demands the use of different copula functions appropriate to characterize the dependence structures at different scales. Then, we can construct the time varying multiscale dependence structure in the BEMD decomposed domain, revealing the heterogeneous time varying risk structure.
When modeling the individual DGPs with the multiscale approach, we assume that the DGPs follow the same model specification, but with different parameters. This implies that different trading strategies are adopted with different parameters. In the proposed model, we introduce the conditional copula model to analyze and model the multivariate conditional distributions. The proposed conditional copula theory extends the original unconditional copula theory to the conditional case, which is of relevance to economics and financial forecasting.
Given some information set
w, let
be random variables with conditional distribution functions
and conditional joint distribution
H of
, where
and
W has support
ω. The conditional Sklar theorem states that there exits a copula
C, such that, for any
and
,
[
50].
With the conditional extension of the Sklar theorem, any n dimensional joint distribution can be represented as n dimensional marginal distributions and a dependence function, given that the information set w is consistently used for both the marginal distributions and the copula.
The copula GARCH model is proposed to model the movement dynamics and the dependence structure conditional on the past observations. It is one particular realization of the conditional copula theory, where the random variables X in the marginal distributions are assumed to follow some time series models, such as the ARMA model for the conditional mean and GARCH model for the conditional volatility.
The conditional ARMA-GARCH model following the assumed marginal distributions is defined as in Equation (5).
where
and
are the conditional mean and conditional variance at time
t, respectively.
is the lag
r returns with parameter
, and
represents the lag
m residuals in the previous period with parameter
.
is the lag
p variance with parameter
, and
is the lag
q squared errors with parameter
in the previous period.
refers to the selected probability density function of the residuals from the ARMA model, where
ϑ refers to the shape parameters for the assumed distribution.
The estimation of the multivariate joint distributions boils down to the estimation of the marginal distribution and the copula function. Popular methods include the inference of margins and maximum likelihood function. It is a two-step process, where the parameters for the marginal distributions are estimated first, then the parameters for the dependence structure characterized by the copula function are estimated.
In the copula GARCH literature, the inference of margin is the dominant method to estimate the parameters for the marginal distribution, while the maximum likelihood estimation method is usually used to estimate the parameters for the dependence structure characterized by the copula function [
51].
Following the conditional Sklar theorem, in the two-dimensional case, suppose
and
C are
n differentiable; the conditional density function can be obtained as in Equation (6) [
50].
where
and
is a vector of parameters of the joint density.
The log-likelihood function is defined as in Equation (7).
Typical copula functions consist of the Gaussian copula family, including the normal copula and Student’s t copula, as well as the Archimedean copula family, including the Clayton copula, the Gumbel copula, the symmetrized Joe-Clayton copula (SJC), and so on.
There are different performance measures available to select the optimal BEMD model specifications, such as the scale. These include the number of exceedances, the
p-value for the Kupiec backtesting procedure and the entropy measure. The adoption of different performance measures as the objective function may result in different model specifications determined and will critically affect the performance of the proposed model in the out-of-sample data: choosing the number of exceedances to minimize searches for the BEMD-copula PVaR that are more conservative and that provides the maximal coverage of risks in the market; choosing the
p-value to maximize searches for the BEMD-copula PVaR for which statistically, the number of exceedances correspond to the theoretical value, but risks overfitting the in-sample data, as the
p-value is used as the main criteria for the out-of-sample model performance valuation. We also propose the entropy measures as the objective function to minimize. This would choose the BEMD-copula model that has the lowest level of information complexity and that provides the minimization of the information content in the predictors, which suggests that more relevant information is captured in more orderly predictors. As a well-defined measure of complexity in information theory, the Shannon entropy is defined as in Equation (8).
where
refers to the Shannon entropy of the predictor
and
refers to the Probability Density Function (PDF).
and
are the model forecasts and the actual observations, respectively.