Next Article in Journal
Characterization of Bi-Starlike Functions: A Daehee Polynomial Approach
Previous Article in Journal
Practical Canonical Labeling of Multi-Digraphs via Computer Algebra
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Probability Distribution of Extreme Events in Complex Systems: Application to Climate Data

1
Department of Statistics, University of Brasília, Brasília 70910-900, Brazil
2
Department of Statistics, University of São Paulo, São Paulo 05508-090, Brazil
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(12), 1639; https://doi.org/10.3390/sym16121639
Submission received: 10 November 2024 / Revised: 28 November 2024 / Accepted: 4 December 2024 / Published: 11 December 2024
(This article belongs to the Section Mathematics)

Abstract

:
It has been observed that the statistical structure of certain climate vectors, such as wind speed versus air density and temperature versus humidity, may exhibit more than one mode due to the complexity of climate systems. This study proposes a new bivariate extreme value distribution, called the transformed symmetric logistic extreme value distribution, which can capture the multimodal characteristics of the joint distribution of extreme observations in complex systems. We derive some of its properties, such as marginal distributions, tail indices, conditional distribution, and P ( Y < X ) . The parameters of the new distribution were estimated using the maximum likelihood method. The applicability of the proposed model is illustrated with climate data, including the analysis of the result P ( Y < X ) .

1. Introduction

Extreme value theory (EVT) is widely applied in various fields, such as hydrology, climatology, insurance, finance, and geology ([1]). In particular, in hydrology, the interest is in modeling the frequency of floods and thus estimating the return time T of the flood discharge; in climatology, the interest lies in analyzing individually or together the random variables maximum wind speeds, maximum temperatures, minimum relative humidity, maximum radiation, and maximum pressure.
This work focuses on modeling bivariate extreme observations (maxima or minima) in complex systems characterized by heterogeneous or multimodal data. An example of a climate variable from such systems is wind speed, which often exhibits bimodal behavior. In [2,3,4,5,6,7,8,9,10,11,12,13,14], wind speed data were modeled using Weibull, Lognormal, and Gamma distributions, as well as with mixtures of these distributions. Through the analysis of ten mixture models with two-component distributions, including mixtures of Gamma, Weibull, Gumbel, and Truncated Normal, ref. [10] conducted a case study using wind speed data from 83 stations in the province of Quebec (Canada), a northern region with great potential for wind energy production. They tested the modeling of the univariate data using mixtures of two-component distributions, including Gamma, Weibull, Gumbel, and Truncated Normal mixtures. The results indicated that, for several stations, the Gumbel component mixture was the most suitable.
Accurate assessments of the joint distributions of extreme events are important in a variety of applications, from the evaluation of new wind turbine projects [14] to hydrological risk management [15,16]. There are two possible parametric approaches to model bivariate extreme data. One is through copula functions and the other is through bivariate extreme value (BEV) distributions [1,17,18]. In the context of complex systems, Yang et al. (2023) [14] applied a mixture of Gaussian copulas to jointly model wind speed and air density. The marginal distributions of this proposal are a mixture of Weibull distributions and a mixture of lognormal distributions. However, this approach may not be the most suitable for joint modeling of extreme events in complex systems because the Gaussian copula and the mixture of Weibull distributions do not have heavy tails, which are inherent properties of extreme events. Along the same lines, ref. [19] combined bimodal triangular distributions in a Gaussian copula to model multimodal data. Again, the limitations of the Gaussian copula do not allow for modeling heavy-tailed data.
To achieve greater accuracy in fitting multimodal extreme data (bivariate data from complex systems) using the classical EVT, it is necessary to fit these data with mixtures of bivariate extreme value distributions, whose marginal distributions are also mixtures of extreme value distributions. This type of model becomes complex due to the lack of a closed-form bivariate cumulative distribution function and the need for exhaustive procedures to estimate the large number of parameters. The symmetric logistic distribution, defined by [20], is one of the most widely used bivariate extreme value models in applied areas. Ref. [15] applied this distribution to analyze the frequency of floods in which their marginal distributions are mixtures of Gumbel distributions. Although the bimodal characteristics of the marginals were captured through these mixtures, the symmetric bivariate logistic distribution cannot follow the multimodal behavior of the bivariate data.
Motivated by the need to model bivariate data from complex systems with possible multimodal densities and by the lack of a multimodal bivariate probability distribution for modeling this type of data whose statistical inference is not complex, in this work, we developed a new bivariate distribution of extreme values that is multimodal and has some mathematical properties that allow for its applicability in several areas, without complications.
We propose a new generalization of the logistic symmetric bivariate extreme value (LSBEV) distribution that can accommodate multiple modes and has bimodal marginal distributions of extreme values. This is a new model for extreme bivariate data in complex systems. We call this new generalization, the transformed logistic symmetric bivariate extreme value distribution (TLSBEV). If a random vector ( X , Y ) has a TLSBEV distribution, we show that X and Y have a bimodal GEV (BGEV) distribution, as defined in [21]. The tail dependence of X and Y, the conditional distribution of Y | X = x , and an analytical expression for the hazard function P ( Y < X ) , known as the reliability or stress–strength function [22], were also studied. This last function was applied to the wind speed data to indicate the most suitable location to install two possible wind turbines.
The rest of this paper is organized as follows. In Section 2, we present preliminary concepts necessary for the development of the other sections. The proposed new distribution, TLSBEV, is described in Section 3, where we also derive some of its properties, such as marginal distributions, tail indices, conditional distribution, and  P ( Y < X ) . The computational algorithms required to estimate the parameters of the new distribution using the maximum likelihood method are presented in Section 4. Finally, in Section 5, we demonstrate the applicability of the proposed model using climatic data, including the analysis of the result P ( Y < X ) .

2. Preliminaries

Extreme value distributions are limit distributions of suitably normalized extremal statistics. More precisely, let X 1 , , X n be independent and identically distributed (iid) random variables with distribution function F(.) and let
M n = max ( X 1 , , X n )
be the extremal statistic of the maxima. If there are sequences of normalization constants a n > 0 , b n , such that
lim n P M n b n a n x = lim n F n ( a n x + b n ) = F G ( x ) ,
where F G is a non-degenerate distribution function, so G must be one of the extreme value distributions first identified by [23]. These distributions can be summarized by the Generalized Extreme Value (GEV) distribution, with the cumulative distribution function (CDF)
F G ( x ; ξ , μ , σ ) = exp 1 + ξ x μ σ 1 / ξ , ξ 0 , exp exp x μ σ , ξ = 0 ,
defined for x > μ σ / ξ when ξ > 0 , for  x < μ σ / ξ when ξ < 0 , and in all reals for ξ = 0 . A random variable X with CDF (2), X F G ( · ; ξ , μ , σ ) , has parameters ξ R , μ R , and  σ > 0 of shape, location, and scale, respectively. The parameter ξ is also known as the tail index because it defines the weight of the tails. When ξ > 0 , the distribution is heavy-tailed, otherwise, the distribution is light-tailed. In the bivariate case, let ( X i , Y i ) , i = 1 , , n be iid random vectors with the common distribution function F and let
( M 1 n , M 2 n ) = ( max 1 i n { X i } , max 1 i n { Y i } )
be the bivariate extremal statistic of the maxima. As in the univariate case, if there are sequences of constants a i n > 0 and b i n R ( i = 1 , 2 ) , such that
lim n P M 1 n b 1 n a 1 n x , M 2 n b 2 n a 2 n y = lim n F n ( a 1 n x + b 1 n , a 2 n y + b 2 n ) = G ( x , y ) ,
then, if  G is a non-degenerate distribution, the function G is called bivariate extreme value distribution.
One of the particular cases of (3), most used in applications, is the symmetric logistic distribution, defined in [23] by
G ( x , y ) = exp { [ ln ( F G 1 ( x ; ξ 1 , μ 1 , σ 1 ) ) ] r + [ ln ( F G 2 ( y ; ξ 2 , μ 2 , σ 2 ) ) ] r } 1 r , r [ 1 , + )
where the marginal distributions F G 1 ( x ) = lim y G ( x , y ) and F G 2 = lim x G ( x , y ) are GEV distributions as defined in (2).
We identify the probability density function (PDF) of (4) with the notation
g ( x , y ) = 2 G ( x , y ) x y .
Note that the CDF (4) or PDF (5) has seven parameters. They are ( r , ξ 1 , μ 1 , σ 1 , ξ 2 , μ 2 , σ 2 ) , where r is the dependence parameter of G and the other six parameters are from the marginal distributions.

3. Main Results

In this section, we present the main results of this work. As a generalization of the logistic symmetric bivariate extreme value distribution, we obtain the TLSBEV distribution using a compositional methodology. We show some properties of the new model, such as marginal distributions, tail dependence, conditional distribution, and stress–strength function.
Definition 1.
Let G be a CDF as defined in (4) and consider the transformations
T 1 ( x ) = ( x μ 1 ) | x μ 1 | δ 1 a n d T 2 ( y ) = ( y μ 2 ) | y μ 2 | δ 2 ,
where δ i > 0 e μ i R , i = 1 , 2 . A random vector ( X , Y ) G Θ has TLSBEV distribution if it is defined by
G Θ ( x , y ) = G ( T 1 ( x ) , T 2 ( y ) ) .
Explicitly, the CDF and PDF of the TLSBEV distribution are given by
G Θ ( x , y ) = exp { [ ln ( F G 1 ( T 1 ( x ) ; ξ 1 , μ 1 , σ 1 ) ) ] r + [ ln ( F G 2 ( T 2 ( y ) ; ξ 2 , μ 2 , σ 2 ) ) ] r } 1 r
and
g Θ ( x , y ) = g ( T 1 ( x ) , T 2 ( y ) ) ( δ 1 + 1 ) ( δ 2 + 1 ) | x μ 1 | δ 1 | y μ 2 | δ 2 ,
where Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) .
Depending on whether r = 1 or r = , the G distribution measures the total independence or dependence of the components of the vector ( X , Y ) G Θ .
Identifying the marginal distributions of a bivariate model is essential to determine several properties of the model. The marginal distributions of the TLSBEV model are derived in Proposition 1.
Proposition 1.
Let ( X , Y ) G Θ be from (8). Then, X F B G 1 and Y F B G 2 are bimodal distributions as [21].
Proof. 
The proof is straightforward. Since T 1 and T 2 are continuous and increasing, the limits
lim y G Θ ( x , y ) = F G 1 ( T 1 ( x ) ; ξ 1 , 0 , σ 1 ) and lim x G Θ ( x , y ) = F G 2 ( T 2 ( x ) ; ξ 2 , 0 , σ 2 )
are the BGEV distributions F B G 1 ( x ) = F G 1 ( T 1 ( x ) ; ξ 1 , 0 , σ 1 ) and F B G 2 ( y ) = F G 2 ( T 2 ( y ) ; ξ 2 , 0 , σ 2 ) as defined in [21]. □
Remark 1.
According to [21], the bimodal generalized extreme value distribution, denoted by BGEV distribution, consists of composing the distribution of a random variable following the GEV distribution with a location parameter µ = 0;  F ξ , 0 , σ  with the transformation  T μ , δ ( x ) = ( x μ ) δ | x μ | δ . Thus, the cumulative distribution function of a random variable BGEV, denoted as  X F B G , is given by 
F B G ( x ) = F ξ , 0 , σ ( T μ , δ ( x ) ) .
 In [24], the BGEV density function “pbgev <- function(y, xi, mu, sigma, delta)” shows various bimodal forms for variations of the parameters xi, mu, sigma, and delta.
Remark 2.
By Proposition 1, the TLSBEV distribution can be written as
G Θ ( x , y ) = exp { [ ln ( F B G 1 ( x ) ) ] r + [ ln ( F B G 2 ( y ) ) ] r } 1 r
where Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) in which ξ 1 , σ 1 , μ 1 , δ 1 are parameters of F B G 1 and ξ 2 , σ 2 , μ 2 , δ 2 are parameters of F B G 2 .
Remark 3.
Quantifying the dependence of extreme data is a central theme in probabilistic and statistical methods for multivariate extreme value analysis [17]. Given a random vector, ( X , Y ) with X F 1 and Y F 2 identically distributed (i.d.), a natural measure of dependence is the bivariate tailed dependence index:
χ = lim t t * P ( Y > t | X > t ) ,
where t * is the upper limit of the support of the marginal distributions. When χ = 0 , the variables are said to be asymptotically independent, and if χ > 0 , the variables are said to be asymptotically dependent.
For a bivariate random vector ( X , Y ) with standard GEV marginal distributions, F G 1 ( . ; 1 , 0 , 1 ) and F G 2 ( . ; 1 , 0 , 1 ) , ref. [25] showed that (12) satisfies
P ( Y > t | X > t ) L ( t ) t 1 1 η
as  t , where  1 / 2 η 1  is a constant, and  L ( t )  is a slowly varying function defined by 
lim t + L ( a t ) t = 1 , a > 0 .
 Here, η is known as the tail dependence coefficient. X and Y are asymptotically dependent if  η = 1  and are asymptotically independent if  η < 1 .
On the other hand, if  X F 1 and Y F 2 are not necessarily identically distributed, ref. [17] defined χ ˜ = lim z 1 χ ˜ ( z ) , where
χ ˜ ( z ) = 2 log [ P ( F 1 ( X ) z , F 2 ( Y ) z ) ] log [ P ( F 1 ( X ) z ) ] .
Tail indices of typed (13) and (14) for a random vector ( X , Y ) G Θ with marginals (8) are presented in the next propositions.
Proposition 2.
Let ( X , Y ) G Θ be from (8) with X and Y i.d. F B G ( . , 1 , 0 , 1 , δ ) , then
P ( Y > t | X > t ) exp c ( ( t | t | δ ) ) + B ( t | t | δ ) ϵ ( x ) x d x x 2 δ 1 1 η ,
where c is a measurable non-negative function such that lim t + c ( t ) = c 0 > 0 , lim x + ϵ ( x ) = 0 , and η as defined in (13).
Proof. 
From relation (7), we have that
P ( Y > t | X > t ) = P ( W > T ( t ) | Z > T ( t ) ) ,
where T ( t ) = t | t | δ and ( Z , W ) G as defined in (4). Now, by replacing (13) into (16), it yields
P ( Y > t | X > t ) L ( T ( x ) ) [ T ( x ) ] 1 1 η .
The result (15) follows from (17) when considering a positive t and the Karamata representation of L ( t ) [26]. □
The above proposition shows that the tail dependence or tail independence of X and Y also depends on the parameter δ . On the other hand, as shown in the next proposition, the dependence of the components of ( F B G 1 ( X ) , F B G 2 ( Y ) ) is governed by the parameters r.
Proposition 3.
Let ( X , Y ) G Θ be from (8), then
χ ˜ = 2 2 1 / r .
Proof. 
From Proposition 1 and (14), we have that X F B G 1 , Y F B G 2 , and
χ ˜ ( z ) = 2 log [ P ( X F B G 1 1 ( z ) , Y F B G 2 1 ( z ) ) ] log P ( X F B G 1 1 ( z ) ) = 2 log [ G Θ ( F B G 1 1 ( z ) , F B G 2 1 ( z ) ) ] log P ( X F B G 1 1 ( z ) ) . = 2 + { [ ln ( z ) ] r + [ ln ( z ) ] r } 1 r ln ( z ) .
The proof of (18) is completed when computing the limit of (19) when z 1 . □
The following proposition is useful in simulation procedures of the vector ( X , Y ) G Θ .
Proposition 4.
Let ( X , Y ) G Θ and ( Z , W ) G with G Θ and G defined according to (8) and (4), respectively. Then, the following relationship between the conditional distributions is valid
F Y | X = x ( y ) = F W | Z = T 1 ( x ) ( T 2 ( y ) ) T 1 ( x ) .
Proof. 
From Proposition 1, we have that X F B G 1 and Y F B G 2 . Then, the definition of conditional probability follows that
F Y | X = x ( y ) = y g Θ ( x , t ) F B G 1 ( x ) d t .
When replacing g Θ ( x , y ) = g ( T 1 ( x ) , T 2 ( y ) ) T 1 ( x ) T 2 ( y ) in (21) and substituting s = T 1 ( t ) , it follows that
F Y | X = x ( y ) = T 1 ( x ) T 2 ( y ) g ( T 1 ( x ) , s ) F G 1 ( T 1 ( x ) ; ξ 1 , 0 , σ 1 ) d s .
The expression (22) proves (20). □
The stress–strength function is a tool used in reliability engineering to analyze the failure probability P ( Y < X ) , where X and Y represent stress and strength, respectively. Ref. [22] collected and synthesized theoretical results of P ( Y < X ) for several independent random variables X and Y. The authors also presented applications, particularly in engineering. When X and Y are dependent, the literature is limited. Here, we calculate the stress–strength function, assuming that X and Y are dependent and that ( X , Y ) has a TLSBEV distribution.
Proposition 5.
Consider ( X , Y ) G Θ defined in (8), then
P ( Y < X ) = 0 1 D 1 C r ( t , F B G 2 ( F B G 1 1 ( t ) ) ) d t ,
where
C r ( t , s ) = exp { [ ln ( t ) ] r + [ ln ( s ) ) ] r } 1 r
and D 1 C r ( t , s ) = C r ( t , s ) t .
Proof. 
From Proposition 1, we have that X F B G 1 and Y F B G 2 , then
P ( Y < X ) = + P ( Y < x | X = x ) d F B G 1 ( x ) .
On the other hand, considering that
G Θ ( x , y ) = C r ( F B G 1 ( x ) , F B G 2 ( y ) ) and ( F B G 1 ( X ) , F B G 2 ( Y ) ) C r
with C r as defined in (24). With the substitutions F B G 1 ( x ) = t e F B G 2 ( y ) = s , the partial derivative of (24), with respect to t, gives
D 1 C r ( t , s ) = P ( F B G 2 ( Y ) < s | F B G 1 ( X ) = t ) = P ( Y < y | X = x )
and
P ( Y < x | X = x ) = P ( Y < x | X = x ) | y = x = D 1 C r ( t , s ) | s = F B G 2 ( F B G 1 1 ( t ) ) .
The result (23) follows by substituting (27) into (25) and leaving the integral in terms of t. □
This result will be used in Section 5 to compare the wind speeds of two weather stations.

4. Maximum Likelihood Estimation

The parameter vector Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) of ( X , Y ) G Θ is estimated using the maximum likelihood method. The logarithm of the maximum likelihood function of Θ , for observations { ( x i , y i ) } i = 1 n of ( X , Y ) , is given by
( Θ ) = i = 1 n ln [ g Θ ( x i , y i ) ] ,
where g Θ is as in (9).
The maximum likelihood estimate (MLE) of Θ , Θ ^ , is obtained by maximizing (28) numerically. The procedure for computing the MLE of Θ is shown in Algorithms 1 and 2. The Algorithm 1 computes the logarithm of the maximum likelihood function (28). The function obtained in Algorithm 1 is used in step 9 of Algorithm 2. In the estimation procedure, it was necessary to use two packages of [24]. One is the EVD package from [27] and the other is the bgev package from [21].
To ensure that the maximum values of Equation (28) are obtained, we choose the initial values of r 0 , ξ 01 , σ 01 , μ 01 , δ 01 , ξ 02 , σ 02 , μ 02 , δ 02 by the following strategy. For all parameters, we choose as the initial value the value, from a grid of values limited in an interval, that gave the largest value of the function . These intervals are 0 r 10 , 1 / 2 < ξ i < 10 , 0 < σ i 1 2 ( max { d a t a } ) , min { d a t a } < μ i max { d a t a } , and 0 < δ 10 .
Algorithm 1: Log-likelihood function
Require: data z = ( x , y )
Ensure: = log-likelihood function
  Extract from θ = ( θ 1 , θ 2 ) :
ξ 1 θ [ 1 ] , σ 1 θ [ 2 ] , μ 1 θ [ 3 ] , δ 1 θ [ 4 ]
ξ 2 θ [ 5 ] , σ 2 θ [ 6 ] , μ 2 θ [ 7 ] , δ 2 θ [ 8 ]

  Set r max ( r , 0.00001 )
  Compute the values as (6):
t 1 ( x μ 1 ) · | x μ 1 | δ 1 , t 2 ( y μ 2 ) · | y μ 2 | δ 2

  Combine t 1 and t 2 , updateddata = ( t 1 , t 2 ) , into (9):
gtheta dbvevd ( updateddata , r , log , mar 1 = ( 0 , σ 1 , ξ 1 ) , mar 2 = ( 0 , σ 2 , ξ 2 ) ) ×
( δ 1 + 1 ) ( δ 2 + 1 ) | x μ 1 | δ 1 | y μ 2 | δ 2

  Compute the log-likelihood function:
log ( gtheta )
Algorithm 2: MLE of Θ = ( r , θ ) with θ = ( θ 1 , θ 2 ) and θ 1 = ( ξ 1 , σ 1 , μ 1 , δ 1 ) , θ 2 = ( ξ 2 , σ 2 , μ 2 , δ 2 ) .
Symmetry 16 01639 i001

5. Applications

In this section, we use some of the wind speed data used by [10], as well as other climate variables such as air temperature and relative humidity, to demonstrate the applicability of the proposed TLSBEV model defined in (8).
The data is from Environment and Climate Change Canada, accessed on 1 November 2023, available at the website http://climate.weather.gc.ca. The climate variables analyzed and adjusted are temperature (T), relative humidity (HUM), and wind speed (WS). Air temperature is measured in degrees Celsius (°C), relative humidity in percentage, and wind speed is the hourly average in meters per second, measured at a height of 10 m above the ground. These data were collected from the weather stations in Montreal (MTL), Cap-Madeleine (Cap-M), and Kuujjuarapik (KJ), in the province of Quebec. The data collection period was from January 1953 to October 2017 for Montreal, from January 1994 to October 2017 for Cap-Madeleine, and from January 1957 to October 2017 for Kuujjuarapik.
The geographical location of the selected stations (red) is illustrated in Figure 1.
All analyses were performed using the R Statistical Software [24].

5.1. Extreme Data Modeling (Temperature, Humidity)

In this subsection, we model bivariate vectors of two climatic variables in the context of extreme data. To ensure the serial independence of the variables, we used the block maximum (minimum) technique to extract subsamples of maxima (or minima) of size N, so that the Ljung–Box test, given by [28], verifies the null hypothesis of serial independence of the subsampled data. The block maximum technique is described in [29]. At a significance level of 5 % , the Ljung–Box test does not reject the null hypothesis for block size N. Table 1 shows the values of N for the maximum temperature and minimum relative humidity subsamples.

5.1.1. Exploratory Data Analysis

In this subsection, a preliminary exploratory analysis of the data is performed, before being adjusted by our model. Since the model (8) is a proposal for modeling data with more than one mode, the idea is to verify through histograms (marginal data) and scatterplots (bivariate data) whether these data systems present more than one mode.
Figure 2 shows the bivariate data (points and contour) and the marginal data on the sides. This figure shows that in order to fit bivariate and univariate data according to specific distributions, they must be bimodal.
In order to observe changes in the climate system of Kuujjuarapik every 15 years, we plotted and analyzed the scatterplot of the data divided into four time intervals. The analysis of the data from Montreal and Cap-Madeleine is analogous and therefore omitted. We divide the Kuujjuarapik maximum temperature and minimum humidity data into 4-time bins. They are [1957–1972], [1973–1987], [1988–2002], and [2003–2017]. The subsamples of the extremes (maximum, minimum) for each time bin are of size N = 73 days.
The scatterplots of extreme data for each period are presented in Figure 3. They reveal a bimodal bivariate density for each period, with subtle variations among them, especially in the points of highest concentration (modes). The identification of the modes for each period, as shown in Table 2, suggests the occurrence of climate changes, particularly in the increased concentration of the combination between temperature and relative humidity.
Furthermore, since the overall shape of the contours in Figure 3 follows the same pattern as the data for the entire period [1957–2017], shown in Figure 2, we opted to adjust the data from Kuujjuarapik for the complete series. The same approach was applied to Montreal and Cap-Madeleine.

5.1.2. Adjustment of Bivariate Extremes

We now proceed to model these data using ( X , Y ) G Θ as defined in (8) with Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) . The results of the maximum likelihood estimates of Θ , obtained by Algorithms 1 and 2, are in Table 3.
The estimates in Table 3 can be interpreted as follows: since the values of the parameter r, for all three vectors, are greater than 1, temperature and humidity are not independent. The parameters that indicate bimodality in the marginal densities or multimodality in the joint density are the δ ’s. Since for the three vectors (T, HUM) the values of δ 1 and δ 2 are positive, they indicate that at two moments, there is a higher frequency of temperature and humidity. In the case of humidity, these two high frequencies are closer together, since the value of δ 2 = 0.1 is small, while for temperature, these two high frequencies are further apart, especially for Montreal, where δ 1 = 1.14 . The location parameters are μ 1 and μ 2 . The parameter μ 1 indicates where the lowest temperature frequency occurs and μ 2 is the value at which the lowest humidity frequency occurs. The graphs in Figure 4 help to understand these comments.
To measure the goodness of fit of the proposed TBEV model, we compared the AIC and BIC metrics of the BEV and TLSBEV models. These results are presented in Table 4. The AIC and BIC results of TLSBEV are remarkably lower than that of bivariate extreme value (BEV) distribution, indicating that the proposed model fits better than the reference BEV model.
Another statistical test performed to measure the goodness of fit of the TLSBEV and BEV models is the Kolmogorov–Smirnov test [30]. For sample data x 1 , x 2 , d o t s , x n from a model X F , the Kolmogorov–Smirnov statistic, D = sup | F n ( x i ) F ( x ) | , quantifies the distance between the empirical distribution function F n ( x ) = # [ x i x ] n and the reference cumulative distribution function (model). The hypothesis testing consists of:
Based on the hypothesis test,
H 0 : follows model F versus H 1 : does not follow model F .
At a level of α % , H 0 is rejected if the p-value < α .
Forthe TLSBEV and BEV models, we calculated the value of D and its corresponding p-value at α = 5 % significance. These results, shown in Table 5, indicate that the null hypothesis is rejected for the BEV model, but for the TLSBEV model, it is not rejected. This test reinforces the indication that the TLSBEV model presents a good fit to the temperature and humidity data.
The marginal densities fitted by the bimodal GEV distribution (Figure 4) show a good fit to the univariate data. Furthermore, in Figure 5, it can be seen that the contours of the original data (in black) are close to the contours of the fitted density g Θ ^ (in orange). We can conclude that in both the univariate and bivariate cases, the TLSBEV model successfully captures the modes. This confirms the results of the Akaike Information Criterion, the Bayesian Information Criterion, and the KS-test presented in Table 4 and Table 5, indicating that the proposed model fits better than the BEV distribution, which is unimodal. Therefore, the TLSBEV model is a good alternative for modeling extreme climate data in complex systems.

5.2. Wind Speed Modeling for Iid Data

Assessing wind energy potential in areas of interest requires reliable estimates of the statistical characteristics of wind speed [10]. In this section, similar to [10], we assume that wind speed data are independent and identically distributed (iid). In this study, we apply the proposed theory to identify the best locations for the construction of two wind power plants in two of the three regions corresponding to the meteorological stations in Montreal, Cap-Madeleine, and Kuujjuarapik. We consider the bivariate models (Cap M, MTL) and (Cap M, KJ) to determine the two best locations. Since the histogram of the marginal data is bimodal, we assume that the bivariate data follow a TLSBEV distribution, G Θ . To compute the estimates of Θ , we use Algorithm 2 of Section 4. The results of these estimates are presented in Table 6. Figure 6 shows that the BGEV density well fitted to the univariate wind speed data. Using the estimates from Table 6, we generate the density g Θ (orange) and compare its contours with those of the true data. These densities and contour comparisons (fitted vs. true) are shown in Figure 7 and Figure 8. While the density of (Cap M, MTL) has two modes, the density of (Cap M, KJ) has four modes. In other words, the bivariate wind speed data show characteristics of a complex multimodal system. The results demonstrate the great potential of the TLSBEV distribution for modeling data from complex systems, which are typically modeled using mixtures of distributions.
After obtaining very good results in fitting the bivariate wind speed data, we proceed to estimate the reliability function (23). The estimates of (23) for the two pairs of vectors are given in the last column of Table 6. Since the reliability function associated with the wind speed of the stations (Cap-Madeleine, Montreal) and (Cap-Madeleine, Kuujjuarapik) is greater than 0.5, we can conclude that probabilistically Montreal and Kuujjuarapik are the two most suitable locations for the installation of wind farms.

6. Conclusions

Based on the BEV distribution, in this work, we propose a promising extreme value distribution (TLSBEV) capable of modeling data from complex systems. Some properties of the new distribution were obtained and parameter estimates were calculated by maximum likelihood. For climate data from weather stations in the province of Quebec, the new distribution was a good fit.
We leave for future work the study of the asymptotic properties of the maximum likelihood estimator of TLSBEV, as well as the study of other estimation techniques for this model.
Since the symmetric logistic distribution BEV used in this work does not capture asymmetry in bivariate data, a continuation of this work could be the development of new multimodal distributions that are not necessarily symmetric, that is, more flexible than the TLSBEV.

Author Contributions

Conceptualization, C.G.O. and Y.S.O.; methodology, C.G.O.; software, Y.S.O.; validation, Y.S.O., C.G.O. and Y.S.M.; formal analysis, C.G.O.; investigation, C.G.O.; resources, C.G.O.; data curation, Y.S.O.; writing—original draft preparation, C.G.O.; writing—review and editing, Y.S.M.; visualization, Y.S.M.; supervision, C.G.O.; project administration, C.G.O.; funding acquisition, C.G.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Brasilia with resources from the DPI/DPG/ BCE call for proposals no. 01/2024.

Data Availability Statement

http://climate.weather.gc.ca, accessed on 27 January 2022.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Embrechts, P.; Klüppelberg, C.; Mikosch, T. Modelling Extremal Events for Insurance and Finance; Springer: Berlin/Heidelberg, Germany, 1997. [Google Scholar]
  2. Akpinar, S.; Akpinar, E. Estimation of wind energy potential using finite mixture distribution models. Energy Convers. Manag. 2009, 50, 877–884. [Google Scholar] [CrossRef]
  3. Carta, J.; Ramírez, P.; Velazquez, S. A review of wind speed probability distributions used in wind energy analysis: Case studies in the Canary Islands. Renew. Sustain. Energy Rev. 2009, 13, 933–955. [Google Scholar] [CrossRef]
  4. Chang, T. Estimation of wind energy potential using different probability density functions. Appl. Energy 2011, 88, 1848–1856. [Google Scholar] [CrossRef]
  5. Jaramillo, O.; Borja, M. Wind speed analysis in La Ventosa, Mexico: A bimodal probability distribution case. Renew. Energy 2004, 29, 1613–1630. [Google Scholar] [CrossRef]
  6. Khamees, A.; Abdelaziz, A.; Ali, Z.; Alharthi, M.; Ghoneim, S.; Eskaros, M. Mixture probability distribution functions using novel metaheuristic method in wind speed modeling. Ain Shams Eng. J. 2022, 13, 101613. [Google Scholar] [CrossRef]
  7. Kollu, R.; Rayapudi, S.; Narasimham, S.; Pakkurthi, K. Mixture probability distribution functions to model wind speed distributions. Int. J. Energy Environ. Eng. 2012, 3, 27. [Google Scholar] [CrossRef]
  8. Li, Q.; Wang, J.; Zhang, H. Comparison of the goodness-of-fit of intelligent-optimized wind speed distributions and calculation in high-altitude wind-energy potential assessment. Energy Convers. Manag. 2021, 247, 114737. [Google Scholar] [CrossRef]
  9. Nezhad, M.; Heydari, A.; Neshat, M.; Keynia, F.; Piras, G.; Garcia, D. A Mediterranean Sea Offshore Wind Classification using MERRA-2 and machine learning models. Renew. Energy 2022, 190, 156–166. [Google Scholar] [CrossRef]
  10. Ouarda, T.; Charron, C. On the mixture of wind speed distribution in a Nordic region. Energy Convers. Manag. 2018, 174, 33–44. [Google Scholar] [CrossRef]
  11. Tsvetkova, O.; Ouarda, T. A review of sensitivity analysis practices in wind resource assessment. Energy Convers. Manag. 2021, 238, 114112. [Google Scholar] [CrossRef]
  12. Wang, Y.; Li, Y.; Zou, R.; Song, D. Bayesian infinite mixture models for wind speed distribution estimation. Energy Convers. Manag. 2021, 236, 113946. [Google Scholar] [CrossRef]
  13. Yang, Z.; Lin, Y.; Dong, S. Joint model of wind speed and corresponding direction based on wind rose for wind energy exploitation. J. Ocean Univ. China 2022, 21, 876–892. [Google Scholar] [CrossRef]
  14. Yang, Z.; Huang, W.; Dong, S.; Li, H. Mixture bivariate distribution of wind speed and air density for wind energy assessment. Energy Convers. Manag. 2023, 276, 116540. [Google Scholar] [CrossRef]
  15. Escalante, C.; Raynal, J. A Trivariate Extreme Value Distribution Applied to Flood Frequency Analysis. J. Res. Natl. Inst. Stand. Technol. 1994, 99, 369–375. [Google Scholar] [CrossRef] [PubMed]
  16. Raynal, J.; Salas, J. Multivariate Extreme Value Distributions in Hydrological Analyses. In Water for the Future: Hydrology in Perspective; IAHS Publications: Wallingford, UK, 1987; Volume 164, pp. 111–119. [Google Scholar]
  17. Coles, S. An Introduction to Statistical Modeling of Extreme Values; Springer Series in Statistics; Springer: London, UK, 2001. [Google Scholar]
  18. Kotz, S.; Nadarajah, S. Extreme Value Distributions: Theory and Applications, 1st ed.; Imperial College Press: London, UK, 2000. [Google Scholar]
  19. Waal, D.; Harris, T.; de Waal, A.; Mazarura, J. Modelling Bimodal Data Using a Multivariate Triangular-Linked Distribution. Mathematics 2022, 10, 2370. [Google Scholar] [CrossRef]
  20. Gumbel, E. Distributions des Valeurs Extremes en Plusieurs Dimensions. In Annales de l’ISUP; Publications de l’Institut de Statistique de l’Université de Paris: Paris, France, 1960; Volume 9, pp. 171–173. [Google Scholar]
  21. Otiniano, C.; Oliveira, Y.; Sousa, T. Bimodal GEV Distribution with Location Parameter. 2024. R Package Version 0.1., License GPL-3, Repository CRAN. Available online: https://cran.r-project.org/web/packages/bgev/index.html (accessed on 1 November 2023).
  22. Kotz, S.; Lumelskii, Y.; Pensky, M. The Stress-Strength Model and Its Generalizations: Theory and Applications; World Scientific: Singapore, 2003. [Google Scholar]
  23. Fisher, R.; Tippett, L. Limiting forms of the frequency distribution of the largest or smallest member of a sample. Proc. Camb. Philos. Soc. 1928, 24, 180–190. [Google Scholar] [CrossRef]
  24. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 10 October 2024).
  25. Ledford, A.; Tawn, J. Statistics for near independence in multivariate extreme values. Biometrika 1996, 83, 169–187. [Google Scholar] [CrossRef]
  26. Bingham, N.; Goldie, C.; Teugels, J. Regular Variation; Cambridge University Press: Cambridge, UK, 1987. [Google Scholar]
  27. Stephenson, A. evd: Extreme Value Distributions. R News 2024, 1, 31–32. [Google Scholar] [CrossRef]
  28. Ljung, G.; Box, G. On a measure of lack of fit in time series models. Biometrika 1978, 65, 297–303. [Google Scholar] [CrossRef]
  29. Jondeau, E.; Poon, S.; Rockinger, M. Financial Modeling Under Non-Gaussian Distributions; Springer: London, UK, 2007. [Google Scholar]
  30. Dimitrova, D.S.; Jia, Y.; Kaishev, V.K.; Tan, S. KSgeneral: Computing P-Values of the One-Sample K-S Test and the Two-Sample K-S and Kuiper Tests for (Dis)Continuous Null Distribution. 2024. R Package Version 2.0.2. Available online: https://cran.r-project.org/web/packages/KSgeneral/index.html (accessed on 25 May 2024).
Figure 1. On the map of the Quebec province, points 1, 2 and 3 in red are the stations of Montreal, Cap-Madeleine and Kuujjuarapik, respectively.
Figure 1. On the map of the Quebec province, points 1, 2 and 3 in red are the stations of Montreal, Cap-Madeleine and Kuujjuarapik, respectively.
Symmetry 16 01639 g001
Figure 2. Scatterplots (dots) and their contours of the extreme subsamples ( X , Y ) . On the sides are the marginal data.
Figure 2. Scatterplots (dots) and their contours of the extreme subsamples ( X , Y ) . On the sides are the marginal data.
Symmetry 16 01639 g002
Figure 3. Maximum temperature (horizontal axis) and minimum humidity (vertical axis) of Kuujjuarapik for every 15 years. On the side are the respective marginal distributions.
Figure 3. Maximum temperature (horizontal axis) and minimum humidity (vertical axis) of Kuujjuarapik for every 15 years. On the side are the respective marginal distributions.
Symmetry 16 01639 g003
Figure 4. In the solid line, the marginal densities were fitted by the F B G ( . ; ξ ^ , σ ^ , μ ^ , δ ^ ) with the estimates from Table 3.
Figure 4. In the solid line, the marginal densities were fitted by the F B G ( . ; ξ ^ , σ ^ , μ ^ , δ ^ ) with the estimates from Table 3.
Symmetry 16 01639 g004
Figure 5. Data contour (black) and TLSBEV density contour lines (orange) with estimates from Table 3.
Figure 5. Data contour (black) and TLSBEV density contour lines (orange) with estimates from Table 3.
Symmetry 16 01639 g005
Figure 6. Fitted BGEV density in orange: (a) g B G ( x ; 0.28 , 3.7 , 2.85 , 0.33 ) from Cap M, (b) g B G ( y ; 0.15 , 2.57 , 2.46 , 0.17 ) from MTL, and (c) g B G ( z ; 0.13 , 4.7 , 2.697 , 0.42 ) from KJ.
Figure 6. Fitted BGEV density in orange: (a) g B G ( x ; 0.28 , 3.7 , 2.85 , 0.33 ) from Cap M, (b) g B G ( y ; 0.15 , 2.57 , 2.46 , 0.17 ) from MTL, and (c) g B G ( z ; 0.13 , 4.7 , 2.697 , 0.42 ) from KJ.
Symmetry 16 01639 g006
Figure 7. In the right panel is the plot of the density g Θ with the estimates from the first row of Table 6 for Θ ^ . In the left panel, in black, are the contours of the real data, and in orange the contour lines of g Θ .
Figure 7. In the right panel is the plot of the density g Θ with the estimates from the first row of Table 6 for Θ ^ . In the left panel, in black, are the contours of the real data, and in orange the contour lines of g Θ .
Symmetry 16 01639 g007
Figure 8. In the right panel is the plot of the density g Θ with the estimates from the second row of Table 6 for Θ ^ . In the left panel, in black, are the contours of the real data, and in orange the contour lines of g Θ .
Figure 8. In the right panel is the plot of the density g Θ with the estimates from the second row of Table 6 for Θ ^ . In the left panel, in black, are the contours of the real data, and in orange the contour lines of g Θ .
Symmetry 16 01639 g008
Table 1. Block size N and p-value for Quebec variables.
Table 1. Block size N and p-value for Quebec variables.
StationVariableN in Hours (Days)p-Value
Cap-M
T1634 (68.08)0.05260
HUM1210 (50.4)0.07597
MTL
T1859 (77.45)0.06334
HUM1813 (75.54)0.13485
KJ
T1938 (80.7 dias)0.10132
HUM1767 (73.7 dias)0.08313
Table 2. Modes of the empirical density of (T; HUM) for the four periods.
Table 2. Modes of the empirical density of (T; HUM) for the four periods.
[1957–1972][1973–1987][1988–2002][2003–2017]
Mode 1(11; 41)(3.7; 39.2)(4.9; 43)(4; 38)
Mode 2(26.4; 25.5)(26.4; 24.5)(28.5; 23)(29.5; 22.7)
Table 3. MLE estimates of Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) .
Table 3. MLE estimates of Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) .
( X , Y ) r ^ ξ ^ 1 σ ^ 1 μ ^ 1 δ ^ 1 ξ ^ 2 σ ^ 2 μ ^ 2 δ ^ 2
(T, HUM) Cap-M1.24−0.3819.0215.940.35−0.099.3329.090.11
(T, HUM) MTL1.13−0.41161.320.191.14−0.219.325.340.1
(T, HUM) KJ1.34−0.38816.760.73−0.019.8825.360.1
Table 4. AIC and BIC values for TLSBEV and BEV models.
Table 4. AIC and BIC values for TLSBEV and BEV models.
TLSBEVBEV
VariávelAICBICAICBIC
(T, HUM) Cap-M1145.51150.41818.71819.6
(T, HUM) MTL511.2516.24550.54543.4
(T, HUM) KJ520.6524.13152.33143.7
Table 5. Kolmogorov–Smirnov (KS) test for TLSBEV and BEV models.
Table 5. Kolmogorov–Smirnov (KS) test for TLSBEV and BEV models.
ModelTLSBEVBEV
DataDp-ValueDp-Value
(T, HUM) Cap-M0.1080.620.3030.127
(T, HUM) MTL0.1360.840.4810.084
(T, HUM) KJ0.5050.980.1950.092
(WS, T) KJ0.8430.950.4380.099
Table 6. Estimates of Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) from ( X , Y ) G Θ and P ( Y < X ) as (23).
Table 6. Estimates of Θ = ( r , ξ 1 , σ 1 , μ 1 , δ 1 , ξ 2 , σ 2 , μ 2 , δ 2 ) from ( X , Y ) G Θ and P ( Y < X ) as (23).
( X , Y ) r ^ ξ ^ 1 σ ^ 1 μ ^ 1 δ ^ 1 ξ ^ 2 σ ^ 2 μ ^ 2 δ ^ 2 P ^ ( Y < X )
(Cap M, MTL)1.40.283.72.850.33−0.152.572.460.170.6157
(Cap M, KJ)1.00.283.72.850.33−0.134.72.6970.420.5422
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Otiniano, C.G.; Oliveira, Y.S.; Maluf, Y.S. Probability Distribution of Extreme Events in Complex Systems: Application to Climate Data. Symmetry 2024, 16, 1639. https://doi.org/10.3390/sym16121639

AMA Style

Otiniano CG, Oliveira YS, Maluf YS. Probability Distribution of Extreme Events in Complex Systems: Application to Climate Data. Symmetry. 2024; 16(12):1639. https://doi.org/10.3390/sym16121639

Chicago/Turabian Style

Otiniano, Cira G., Yasmin S. Oliveira, and Yuri S. Maluf. 2024. "Probability Distribution of Extreme Events in Complex Systems: Application to Climate Data" Symmetry 16, no. 12: 1639. https://doi.org/10.3390/sym16121639

APA Style

Otiniano, C. G., Oliveira, Y. S., & Maluf, Y. S. (2024). Probability Distribution of Extreme Events in Complex Systems: Application to Climate Data. Symmetry, 16(12), 1639. https://doi.org/10.3390/sym16121639

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop