Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas

Ismail, Mohd Sabri; Masseran, Nurulkamal

doi:10.3390/sym15040907

Open AccessArticle

Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas

by

Mohd Sabri Ismail

^*

and

Nurulkamal Masseran

Department of Mathematical Science, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, Malaysia

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(4), 907; https://doi.org/10.3390/sym15040907

Submission received: 7 March 2023 / Revised: 12 April 2023 / Accepted: 12 April 2023 / Published: 13 April 2023

(This article belongs to the Special Issue Selected Papers from the 5th International Conference on Mathematical Sciences (ICMS5 2023))

Download

Browse Figures

Versions Notes

Abstract

:

Investigating the dependence structures among the characteristics of the current unhealthy air pollution events is a valuable endeavor to understand the pollution behavior more clearly and determine the potential future risks. This study determined the characteristics of air pollution events based on their duration, severity, and intensity. It focused on modeling the dependence structures for all the possible pairs of characteristics, which were (duration, intensity), (severity, intensity), and (duration, severity), using various parametric copula models. The appropriate copula models for describing the behavior of the relationship pairs of the (duration, intensity), (severity, intensity), and (duration, severity) were found to be the Tawn type 1, 180°-rotated Tawn type 1, and Joe, respectively. This result showed that the dependence structures for the pairs were skewed and asymmetric. Therefore, the obtained copulas were appropriate models for such non-elliptical structures. These obtained models can be further extended in future work through the vine copula approach to provide a more comprehensive insight into the tri-variate relationship of the duration–intensity–severity characteristics.

Keywords:

unhealthy air pollution; dependence modeling; bivariate copula

1. Introduction

Air pollution refers to the presence of pollutants such as carbon monoxide (CO), ozone (O₃), nitrogen dioxide (NO₂), sulfur dioxide (SO₂), and particulate matter with a size of less than 10 microns (PM₁₀) in the outdoor air [1]. These pollutants come from sources such as motor vehicles, industrial activities, open burning, and forest fires [2]. To monitor the local air quality, authorities usually use the air pollution index (API) [3]. When the API crosses a certain threshold, air pollution is classified as unhealthy, which implies that the air is dangerous to breathe and harmful to human health [4]. In addition to affecting health, unhealthy air pollution events have large social and economic impacts [5,6,7,8]. For instance, such events can lead to severe health problems such as respiratory diseases, cardiovascular diseases, and skin allergies [9,10]. In addition, public movement is restricted, causing people to stay indoors, masks to be worn when outside, and schools to be closed [11]. Consequently, unhealthy air pollution events reduce the national income due to partial business operations, a decline in tourist activity, increased national health costs, and decreased national productivity [12]. Therefore, examining unhealthy air pollution events is vital to provide a clearer understanding of the air pollution data and determine the future potential risks.

In multivariate modeling, copula functions provide a significant result for modeling the dependence structure of the random variables related to the air pollution data. This success story is due to the great flexibility in modeling the multivariate distributions that copula functions provide due to the separation between the dependence structure and the univariate distributions of the variables [13,14]. Previously, Sak et al. [15] applied the

t

-copula to PM2.5 levels to investigate the air pollution risk. Chan and So [16] used a Gaussian copula on multiple air pollutants to understand air pollution. Falk et al. [17] analyzed the air pollutants in Milan, Italy, using the generalized Pareto copula. Kim et al. [18] investigated the PM2.5 time series data in China using an asymmetric bivariate copula function comprising the mixed Frank and Gumbel copulas. Masseran and Hussain [19] applied Gaussian and modified Joe–Clayton copulas individually for constant and dynamic cases of air pollutant variables to investigate air pollution events. In addition, He et al. [20] employed a mixed copula model to investigate the dynamic relationship between the meteorological factors and the atmospheric pollutants in the cities of Beijing and Guangzhou, China.

In addition to the PM2.5 levels, air pollutants data, and other relevant variables (e.g., meteorological factors) that have been used in the above studies, the characteristics of a calamity such as a drought or unhealthy air pollution events can also be studied for risk monitoring and controlling purposes [21]. Motivated by this idea, Masseran [22] studied two characteristics of unhealthy air pollution events, namely, the duration and severity, and the result showed that the fitted Joe copula could model the dependence structure between those two variables. Furthermore, Masseran [22] derived probability measures to plan and mitigate the risks of unhealthy air pollution events. Therefore, continuing the latter study, this study also aims to model the characteristics of unhealthy air pollution events, such as the duration, severity, and intensity.

In contrast to [22], a new characteristic, the intensity, is studied alongside the duration and severity using copula functions. For a better understanding of the air pollution behavior, this new addition is reasonable because the severity depends on not only the duration but also the intensity [23]. Therefore, this study focuses on determining the most appropriate bivariate copula model for each possible pair of variables, which are the (intensity, duration), (intensity, severity), and (duration, severity). To this end, a wide range of existing parametric copula models was examined in this study. In searching for the appropriate model, the information-based model comparison (AIC, BIC, log-likelihood) and non-nested model comparison (Vuong and Clarke tests) were applied.

Our study found that the appropriate bivariate copula models for modeling the distribution of the relationship pairs of the (duration, intensity), (severity, intensity), and (duration, severity) were the Tawn type 1, 180°-rotated Tawn type 1, and Joe, respectively. This result showed that the dependence structures for the pairs were skewed and asymmetric. Therefore, the obtained copulas were the most appropriate models for such non-elliptical structures. Furthermore, the obtained bivariate copula models importantly served as the basis (or building blocks) for the development of a vine copula model (a more flexible and tractable model for modeling the multivariate data using bivariate copulas) in future studies.

This paper is organized as follows. Section 2 introduces a bivariate copula and its relevant properties. Section 3 briefly covers the study area and data. Section 4 describes the proposed methods. Section 5, presents and discusses the results. Section 6 ends this paper with a conclusion and some suggestions for future work. Figure 1 illustrates a pipeline for the content of this manuscript to help readers understand the full text.

2. Bivariate Copula

Originating from Sklar’s theorem [24], a bivariate copula

C

is a bivariate distribution function on the

two

-dimensional hypercube

{[0, 1]}^{2}

with uniformly distributed marginals [25]. To clarify, this theorem covers any arbitrary dimensionality, but this discussion focuses on the

two

-dimensional case. The theorem states that for a two-dimensional random vector

X = (X_{1}, X_{2})

with a joint distribution function

F

and marginal distribution functions

F_{i}

, for

i = 1, 2

, the joint distribution function can be expressed as the following.

F (x_{1}, x_{2}) = C (F_{1} (x_{1}), F_{2} (x_{2}))

(1)

with the corresponding density of

f (x_{1}, x_{2}) = c (F_{1} (x_{1}), F_{2} (x_{2})) f (x_{1}) f (x_{2}),

(2)

where

C

and

c

are the copula and copula density, respectively. For a continuous distribution, the copula

C

is unique. Consequently, the corresponding copula density can be obtained by a partial differentiation, such as the following.

c (u_{1}, u_{2}) = \frac{\partial^{2}}{\partial u_{1} \partial u_{2}} C (u_{1}, u_{2}),

(3)

where

u_{i} = F_{i} (x_{i})

for

i = 1, 2

, which is known as the copula scale [26].

Various bivariate copula models with unique characteristics have been proposed in the literature to model the relationship between the two random variables. Some examples are the Clayton copula, the Frank copula, and the Joe copula. To investigate the dependence properties of these bivariate copulas, the central measure of the dependence, such as Kendall’s tau, is considered. In particular, Kendall’s tau can be defined as the following.

τ (X_{1}, X_{2}) = P ((X_{11} - X_{21}) (X_{12} - X_{22}) > 0) - P ((X_{11} - X_{21}) (X_{12} - X_{22}) < 0)

(4)

where

(X_{11}, X_{12})

and

(X_{21}, X_{22})

are independently and identically distributed (i.i.d) copies of

(X_{1}, X_{2})

. Since

τ (X_{1}, X_{2})

is independent with regards to the marginal transformations, it depends only on the underlying copula. Specifically, Kendall’s tau holds that the following [27].

τ (X_{1}, X_{2}) = 4 \int_{0}^{`} \int_{0}^{1} C (u_{1}, u_{2}) d C (u_{1}, u_{2}) - 1

(5)

The Clayton copula is useful for capturing the positive dependence of the bivariate variables, where the strength of the dependency is dictated by the Kendall’s tau correlation. With a particular rotation, this copula also can model a negative dependence structure that exists in a dataset [28]. The bivariate Clayton copula and its density are, respectively, given below.

C (u_{1}, u_{2}; θ) = {(u_{1}^{- θ} + u_{2}^{- θ} - 1)}^{- \frac{1}{θ}}

(6)

where

θ \in (0, \infty)

controls the degree of dependence.

In contrast, the Frank copula provides a versatile dependency measure because it can accommodate the entire range of dependencies for the positive and negative sides [29,30]. The bivariate Frank copula and its density are, respectively, provided below.

C (u_{1}, u_{2}; θ) = - \frac{1}{θ} \ln [1 + \frac{(e^{- θ u_{1}} - 1) (e^{- θ u_{2}} - 1)}{e^{- θ} - 1}],

(7)

where

θ \in (- \infty, \infty)

and

θ \neq 0

. Last but not least, the Joe copula can be used to describe the positive dependence among the variables [28,29]. In addition, the Joe copula has been proposed to address cases with a high positive correlation [31]. To cover the negative dependence, the rotation can be applied to the Joe copula [22]. The bivariate Joe copula is shown below.

C (u_{1}, u_{2}; θ) = 1 - {[{(1 - u_{1})}^{θ} + {(1 - u_{2})}^{θ} - {(1 - u_{1})}^{θ} {(1 - u_{2})}^{θ}]}^{\frac{1}{θ}},

(8)

where

θ \in [- 1, \infty)

. The three abovementioned copulas are classified as the Archimedean copulas with a different generator function (a unique identity of copula functions within the Archimedean copula family) [26].

For the Archimedean copulas, such as the Clayton, Frank, and Joe copulas, Kendall’s tau also responds to their generator function

\emptyset (t, θ)

. Generally, the corresponding Kendall’s tau for an Archimedean copula is obtained using the following [26,32].

τ (θ) = 1 + 4 \int_{0}^{1} \frac{\emptyset (t, θ)}{\emptyset^{'} (t, θ)} d t

(9)

For the Clayton copula, its generator function is

\emptyset (t, θ) = \frac{1}{θ} (t^{- θ} - 1)

and its Kendall’s tau is

τ (θ) = \frac{θ}{θ + 2} \in [0, 1]

. The generator function and Kendall’s tau for the Frank copula are, respectively, as follows.

\emptyset (t, θ) = - \ln (\frac{e^{- θ t} - 1}{e^{- θ} - 1}) and

(10)

τ (θ) = 1 - \frac{4}{θ} + 4 \frac{D_{1} (θ)}{θ} \in [- 1, 1],

(11)

where

D_{1} (θ) = \int_{0}^{θ} \frac{x / θ}{e^{x} - 1} d x

is the Debye function [25]. For the Joe copula, the generator function and Kendall’s tau are, respectively, given below.

\emptyset (t, θ) = - \ln (1 - {(1 - t)}^{θ}) and

(12)

τ (θ) = 1 + [\frac{- 2 + 2 γ + 2 \ln (2) + φ (\frac{1}{θ}) + φ (\frac{1}{2} \frac{2 + θ}{θ}) + θ}{θ - 2}] \in [0, 1],

(13)

where

γ = \lim_{n \to \infty} (\sum_{i = 1}^{n} \frac{1}{i} - \ln (n)) \approx 0.57721

is the Euler constant, and

φ (x) = \frac{d}{d x} \ln (Γ (x)) = \frac{d}{d x} Γ (x) / Γ (x)

is the digamma function [26].

To extend the range of dependence of

τ (θ)

, the counterclockwise rotations of the copula density

c (., .)

of 90°, 180°, and 270° can be used, where they are defined as

c_{90} (u_{1}, u_{2}) = c (1 - u_{2}, u_{1}),

c_{180} (u_{1}, u_{2}) = c (1 - u_{1}, 1 - u_{2})

, and

c_{270} (u_{1}, u_{2}) = c (u_{2}, 1 - u_{1})

, respectively [26]. For example, using a 90° rotation, the Clayton copula can be extended to a copula with a full range of Kendall’s tau values by defining the following.

c_{C l a y t o n}^{e x t e n d e d} (u_{1}, u_{2}; θ) = \{\begin{matrix} c_{C l a y t o n} (u_{1}, u_{2}; θ) & if θ > 0 \\ c_{C l a y t o n} (1 - u_{2}, u_{1}; θ) & otherwise \end{matrix},

(14)

where

c_{C l a y t o n} (u_{1}, u_{2}; θ) = (θ + 1) {(u_{1}^{- θ} + u_{2}^{- θ} - 1)}^{- (\frac{1 + 2 θ}{θ})} {(u_{1} u_{2})}^{- θ - 1}

[26].

3. Study Area and Data

In Malaysia, the local air quality is continuously monitored by the Malaysian Department of the Environment (DOE). The DOE is responsible for collecting, supervising, and reporting the API data. To measure the API in a certain area, air quality monitoring stations were placed in strategic areas covering urban, suburban, and industrial areas. Each station records the concentration readings for the five main pollutants: carbon monoxide (CO), ozone (O₃), nitrogen dioxide (NO₂), sulfur dioxide (SO₂), and particulate matter less than 10 microns in size (PM₁₀). O₃, CO, NO₂, and SO₂ are measured in the parts per million (ppm) unit mass of a contaminant, while PM₁₀ is measured in micrograms per cubic meter (

{μ g / m}^{3}

).

The API is determined based on the highest level of the five main standardized pollutants. The calculation of the standardized sub-API indices can be undertaken using the mathematical formulas provided by the DOE [33]. The standardized sub-API value for the CO pollutant can be computed using the following equation.

I d x (CO) = \{\begin{matrix} CO \times 11.11111, if CO < 9 ppm, \\ 100 + \{[CO - 9] \times 16.66667\}, if 9 \leq CO < 15 ppm, \\ 200 + \{[CO - 15] \times 6.66667\}, if 15 \leq CO < 30 ppm, \\ 300 + \{[CO - 30] \times 10\}, if CO \geq 30 ppm . \end{matrix}

(15)

The standardized sub-API value for the O₃ pollutant can be computed using the following equation.

I d x (O_{3}) = \{\begin{matrix} O_{3} \times 1000, if O_{3} < 0 . 2 ppm, \\ 200 + \{[O_{3} - 0.2] \times 500\}, if 0.2 \leq O_{3} < 0 . 4 ppm, \\ 300 + \{[O_{3} - 0.4] \times 1000\}, if O_{3} \geq 0 . 4 ppm . \end{matrix}

(16)

The standardized sub-API value for the NO₂ pollutant can be computed using the following equation.

I d x ({NO}_{2}) = \{\begin{matrix} {NO}_{2} \times 588.23529, if {NO}_{2} < 0 . 17 ppm, \\ 100 + \{[{NO}_{2} - 0.17] \times 232.56\}, if 0.17 \leq {NO}_{2} < 0 . 6 ppm, \\ 200 + \{[{NO}_{2} - 0.6] \times 166.667\}, if 0.6 \leq {NO}_{2} < 1 . 2 ppm, \\ 200 + \{[{NO}_{2} - 1.2] \times 250\}, if {NO}_{2} \geq 1 . 2 ppm . \end{matrix}

(17)

The standardized sub-API value for the SO₂ pollutant can be computed using the following equation.

I d x ({SO}_{2}) = \{\begin{matrix} {SO}_{2} \times 2500, if {SO}_{2} < 0 . 04 ppm, \\ 100 + \{[{SO}_{2} - 0.04] \times 384.61\}, if 0.04 \leq {SO}_{2} < 0 . 3 ppm, \\ 200 + \{[{SO}_{2} - 0.3] \times 333.333\}, if 0.3 \leq {SO}_{2} < 0 . 6 ppm, \\ 200 + \{[{SO}_{2} - 0.6] \times 500\}, if {SO}_{2} \geq 0 . 6 ppm . \end{matrix}

(18)

The standardized sub-API value for the PM₁₀ pollutant can be computed using the following equation.

I d x ({PM}_{10}) = \{\begin{matrix} {PM}_{10}, if {PM}_{10} {< 50 μ g / m}^{3}, \\ 50 + \{[{PM}_{10} - 50] \times 0.5\}, if 50 \leq {PM}_{10} {< 350 μ g / m}^{3}, \\ 200 + \{[{PM}_{10} - 350] \times 1.4286\}, if 350 \leq {PM}_{10} {< 420 μ g / m}^{3}, \\ 300 + \{[{PM}_{10} - 420] \times 1, 25\}, if 420 \leq {PM}_{10} {< 500 μ g / m}^{3}, \\ 400 + [{PM}_{10} - 500], if {PM}_{10} \geq {500 μ g / m}^{3} . \end{matrix}

(19)

From these standardized individual indices, the API value at a particular time can then be determined based on the highest value among these sub-indices [34,35]. Figure 2 shows the process for determining the API.

For our analysis, Klang (latitude

101^{°} 26^{'} 44.023 E

and longitude

3^{°} 2^{'} 41.701 N

), located in Peninsular Malaysia, was chosen as the study area. It is one of the largest cities in Malaysia with a dense population and a high level of economic and industrial activity, particularly in import and export trade. In addition, Klang is the 13th busiest trans-shipment and 16th busiest container port in the world [36]. Therefore, the frequency of unhealthy air pollution events in Klang is higher compared to the other cities [23], leading to more sample data. Figure 3 [37] illustrates the location of Peninsular Malaysia and Klang.

To examine the unhealthy air pollution events in Klang, its regional API data was obtained. An unhealthy air pollution event, as classified by the DOE, refers to a period when the API value exceeds 100 [38]. Take the obtained API data as a set,

A P I = \{x_{t}\}

, where

x_{t}

is the API value along the time index

t \in \{1, 2, \dots, T\}

. Furthermore, let the total number of the recorded unhealthy air pollution events be

N

. Then, for

j = 1, 2, \dots, N

, the period for the

j

-th unhealthy air pollution event can be denoted as

P_{j} = \{t | x_{t} > 100\} \subset \{1, 2, \dots, T\}

. For each non-overlapping

P_{j}

, the duration, severity, and intensity, respectively, can be denoted as the following.

$d_{j} = |P_{j}|$ (the cardinality of the period $P_{j}$ ),
$s_{j} = \sum_{t \in P_{j}} x_{t}$ (the summation of all the API values within the period $P_{j}$ ), and
$i_{j} = \max_{t \in P_{j}} \{x_{t}\}$ (the maximum API value within the period $P_{j}$ ).

Figure 4 provides an illustration for determining the durations, severities, and intensities of the first three air pollution events.

Obtained from the DOE, the hourly API data from 1 January 1997 to 31 August 2020 that were used in this study are depicted in Figure 5. Three metrics were obtained from Figure 5, namely, the severity, duration, and intensity. The total number of these three metrics was 301, which means that 301 unhealthy air pollution events were recorded from 1997 to 2020. Moreover, the descriptive statistics for the severity, duration, and intensity data are provided in Table 1. The center measures (means and medians) showed a large discrepancy among the data. Furthermore, the measures of the spreads (the ranges between the minimum values and the maximum values and standard deviations) were large, showing that the data had a substantial variation, particularly the severity data. In addition, all the characteristics (severity, duration, and intensity) had a considerable skewness, with a long right tail distribution (shown by the kurtosis values).

In this study, the severity measures how severe an unhealthy air pollution event was, the duration is its time span, and the intensity is its largest magnitude. These three metrics (severity, duration, and intensity) are very dependent on each other and influence the distribution of each possible pair that will be fitted by the bivariate copula modeling. Therefore, the dependency for each pair of variables will be analyzed using a bivariate copula. The details on our methods are provided in the section below.

Figure 5. Time series plot that corresponds to an unhealthy threshold of air pollution events.

Table 1. Descriptive statistics for the intensity, severity and duration data.

Variable	Mean	Median	Min. Value	Max. Value	Std. Deviation	Skewness	Kurtosis
Intensity	125.11	112	100	543	44.77	5.61	44.97
Severity	2241.76	231.27	100	36,677	4948.3	3.92	20.92
Duration (hours)	16.74	2	1	224	31.91	3.24	15.73

4. Methodology

Let

D u r = \{d_{j} | j = 1, \dots, N\}

,

S e v = \{s_{j} | j = 1, \dots, N\}

, and

I n t = \{i_{j} | j = 1, \dots, N\}

be the sets of the duration, severity, and intensity of

N

unhealthy air pollution events. The focus of this study is on investigating the dependence structure among all the possible pairs of these three random variables using bivariate copula modeling (as discussed in Section 2). Three pairs are possible, namely the (

I n t

,

D u r

), (

I n t

,

S e v

), and (

D u r

,

S e v

).

For simplicity, let

X_{1}

,

X_{2}

, and

X_{3}

represent the

D u r

,

S e v

, and

I n t

, respectively. First, each variable was transformed into the corresponding pseudo-copula data using an estimated probability integral transform (PIT) by setting

u_{j k} = {\hat{F}}_{j k} (x_{j k})

for

j = 1, 2, \dots, N

, and

k = 1, 2, 3

. Here, an empirical distribution function was used to transform the variables, defined as the following.

{\hat{F}}_{j k} (x) = \frac{1}{n + 1} \sum_{j = 1}^{N} 1_{\{x_{j k} \leq x\}}, for all x,

(20)

where

x_{1 k}, x_{2 k}, \dots . x_{N k}

is the

k

-th variable. Thus, the copula data

U = (U_{1}, U_{2}, U_{3})

was obtained, where

U_{k} = {\hat{F}}_{k} (X_{k})

is the copula data for the

k

-th variable.

Next, using the copula data

U = (U_{1}, U_{2}, U_{3})

, the pairwise dependencies among the variables (

D u r

,

S e v

, and

I n t

) were explored. For that purpose, a plot comprising the marginal histograms of the copula data, pair plots of the copula data, Kendall’s tau coefficients, and empirical contour plots of the normalized copula data were used to inspect the pairwise dependency structures.

Generally, the copula models can be classified into at least two groups, such as elliptical copulas and non-elliptical copulas. The copulas derived from an elliptical distribution are Gaussian and Student t-copulas. The other copulas are non-elliptical and have more flexibility to model asymmetric and skewed distributions. All these copula models can be applied to the characteristics of the unhealthy air pollution events, and the obtained models can provide useful information regarding their dependencies, regardless of whether they are distributed in a symmetric or asymmetric distribution. Table 2 lists all the considered bivariate copula models in this study.

Then, the copula modeling approach was used to model the dependence structure for each pair of variables. For each pair, the parameters for each considered bivariate copula model were estimated using the maximum likelihood estimation (MLE). For the observations

u_{r, s}

where

r = 1, 2, \dots, N

, and

s = 1, 2

, the MLE was computed as follows.

M L E = \max_{θ \in Θ} \{l (u; θ)\},

(21)

with a likelihood function of

l (u; θ) = \prod_{i = 1}^{N} c (u_{r 1}, u_{r 2}; θ),

(22)

where

θ

is the parameter of the possible set

Θ

that maximizes the likelihood function.

Generally, there are two ways to optimize the copula parameters, namely the MLE and the inversion of the empirical Kendall’s tau estimation. As defined above, the MLE is suitable for cases where the number of parameters is not too large (e.g., one or two). While the inversion of the empirical Kendall’s tau uses the one-to-one relationship between the tau and the copula parameter to estimate the copula parameters. However, the inversion of the empirical Kendall’s tau approach is less efficient and is not applicable to all the bivariate copula models [27]. Only one parameter bivariate copula models and the Student

t

-copula can be used in the approach [39]. Therefore, the MLE is preferable compared to the inversion of the empirical Kendall’s tau estimation.

Henceforth, for each bivariate copula model with its optimized parameters, an information-based model comparison comprised the Akaike information criteria (AIC) and Bayesian information criterion (BIC), and a log-likelihood was applied to choose the best model for each pair. Here, any bivariate copula model that obtained the highest log-likelihood and the lowest AIC and BIC was considered superior. For the observations

u_{r, s}

where

r = 1, 2, \dots, N

, and

s = 1, 2

, the AIC, BIC, and log-likelihood were computed as follows.

A I C = - 2 \sum_{r = 1}^{N} \ln [c (u_{r 1}, u_{r 2}; θ)] + 2 k,

(23)

B I C = - 2 \sum_{r = 1}^{N} \ln [c (u_{r 1}, u_{r 2}; θ)] + \ln (N) k,

(24)

where

k

is the number of the bivariate copula model parameters, and [26,39]

L o g l i k = \sum_{r = 1}^{N} \ln [c (u_{r 1}, u_{r 2}; θ)]

(25)

For this study, the function named BiCopEstList under the R-package VineCopula was used to build these copula models and obtain all the AIC, BIC, and log-likelihood values.

In addition, the scoring of the goodness-of-fit tests based on the Vuong and Clarke tests were applied to compare the models [40,41,42]. In these two tests, the best model was assumed to fit the data better than all the other models. Therefore, if model one was superior to model two, a score of one was assigned to model one. However, if model two was favored over model two, a score of −1 was assigned to model one. If the tests could not discriminate between the two compared models, nothing was assigned. Through these tests, a model that fit the data better than any other model was identified. In addition, the model with the highest score on the Vuong and Clarke tests was selected as the best model. In this study, the function BiCopVuongClarke in the R-package VineCopula was used for the computation [39].

5. Results

In this study, three random variables related to the characteristics of the unhealthy air pollution events, namely, the intensity, duration, and severity, were examined. Specifically, each pair of variables was modeled using a bivariate copula to describe its dependence structure. First, each original variable was transformed into a copula variable using the PIT approach. The margin histograms of the original and copula variables are illustrated in the first and second rows in Figure 6, respectively. In Figure 6, the marginal distributions of the copula variables seemed more uniformly distributed compared to the original variables. This comparison showed that in the copula variables, the marginal distribution effects and the dependence structure were separated to obtain more accurate and flexible modeling using the copula function.

Before the copula function was applied, a pairwise dependency for each pair of copula variables was investigated using marginal histograms, pair plots, Kendall’s tau coefficients, and empirical contour plots. The obtained marginal histograms, pair plots, Kendall’s tau coefficients, and empirical contour plots for all the pairs are provided in Figure 7. The lower left blocks of Figure 7 located below the diagonal contain the normalized contour plots. The different colors in the empirical contour plots indicate the density of each pair of variables, where the yellowish color represents a higher density than the greenish color. As a preliminary analysis in the copula modeling, the most important observation was the shape of the empirical contour plots.

Focusing on the empirical contour plots, Figure 7 provides evidence of the non-elliptical and asymmetric shapes. This evidence suggests it was more appropriate to model each pair using the Tawn, Joe, and other non-elliptical and asymmetric copula functions. Therefore, the non-elliptical and asymmetric copula functions were considered for modeling each possible pair compared to the elliptical copulas. In addition, Figure 7 also identified the positive dependencies, since Kendall’s tau was shown with positive values for all the pair plots estimates, ranging from 0.39 to 0.89. The most positive dependency was provided by the pair of the severity and duration, implying that these variables were associated with a strong positive monotonous relationship; a higher magnitude of severity was always associated with a longer duration.

A wide range of the existing parametric bivariate copula models, as listed in Table 2, was examined to fit each pair of copula variables. In this study, the parameters for all the considered bivariate copula models were optimized using the MLE. Next, the Kendall’s tau coefficient for each optimized bivariate copula model was obtained to evaluate its dependency. In addition, for each optimized bivariate copula model, the log-likelihood, AIC, and BIC were computed. These results were used to compare all the bivariate copula models and choose the best model for each pair. The details on the parameter estimates, Kendall’s tau, log-likelihood, AIC, and BIC for the pairs of the (

I n t

,

D u r

), (

I n t

,

S e v

), and (

D u r

,

S e v

) are provided in Table 3, Table 4, and Table 5, respectively.

In Table 3, the obtained and bolded log-likelihood, AIC, and BIC values indicated that the Tawn type 1 was the best bivariate copula model for the intensity and duration relationship. This conclusion was drawn because the Tawn type 1 had the highest log-likelihood (64.67) value, and the lowest AIC (−125.33) and BIC (−117.92) values, showing that it fit the pair better than the other considered models. Moreover, the computed Kendall’s tau coefficient from the Tawn type 1 copula was 0.32, which was near the empirical Kendall’s tau coefficient of 0.39, as reported in Figure 7. The latter result indicated that the Tawn type 1 preserved a moderate degree of the reported positive dependency. A surface plot of the Tawn type 1 copula density and its contour plot with standard normal margins for the intensity and duration relationship are shown in the first and second columns in Figure 8, respectively.

For the intensity and severity relationship, Table 4 shows that the best model was the 180°-rotated Tawn type 1, according to the log-likelihood, AIC, and BIC values bolded in the table. Compared to the other considered models, the 180°-rotated Tawn type 1 obtained the highest log-likelihood (159.57) value and the lowest AIC (−315.14) and BIC (−307.72) values. In addition, the 180°-rotated Tawn type 1 provided a Kendall’s tau coefficient of 0.49, which was near the empirical Kendall’s tau coefficient of 0.55, as shown in Figure 7. This result showed that the 180°-rotated Tawn type 1 preserved a strong degree of positive dependency. A surface plot of the 180°-rotated Tawn type 1 copula density and its contour plot with standard normal margins for the intensity and duration relationship are shown in Figure 9a and 9b, respectively.

In contrast, for the duration and severity relationship, the obtained log-likelihood, AIC, and BIC values indicated that the Joe was the best bivariate copula model, as shown by the bold font in Table 5. Table 5 reported that the Joe had the highest log-likelihood (441.84) value and the lowest AIC (−881.68) and BIC (−877.98) values, indicating that it fit the pair better than other considered models. Moreover, the computed Kendall’s tau coefficient from the Joe was 0.85, which was near the empirical Kendall’s tau coefficient of 0.89, as reported in Figure 7. This result showed that the Joe preserved an extremely strong degree of positive dependency. A surface plot of the Joe copula density and its contour plot with standard normal margins for the intensity and duration relationship are shown in Figure 10a and 10b, respectively.

Table 3. Parameter estimates, Kendall’s tau, and log-likelihood, AIC, and BIC values for the intensity and duration relationship.

Copula	Par. Num.	Par. 1	Par. 2	tau	Log-lik.	AIC	BIC
Gaussian	1	0.59	0.00	0.40	52.57	−103.15	−99.44
t	2	0.58	30.00	0.40	52.61	−101.22	−93.80
Clayton	1	0.84	0.00	0.30	28.65	−55.30	−51.59
Gumbel	1	1.58	0.00	0.37	55.57	−109.14	−105.44
Frank	1	3.64	0.00	0.36	44.14	−86.28	−82.58
Joe	1	1.83	0.00	0.31	53.85	−105.69	−101.98
BB1	2	0.00	1.58	0.37	55.57	−107.14	−99.73
BB6	2	1.07	1.51	0.36	55.60	−107.20	−99.79
BB7	2	1.74	0.31	0.36	55.40	−106.81	−99.40
BB8	2	2.08	0.97	0.34	56.07	−108.14	−100.73
Survival Clayton	1	0.99	0.00	0.33	55.96	−109.93	−106.22
Survival Gumbel	1	1.57	0.00	0.36	40.64	−79.28	−75.57
Survival Joe	1	1.72	0.00	0.28	24.10	−46.20	−42.50
Survival BB1	2	0.82	1.11	0.36	56.48	−108.96	−101.55
Survival BB6	2	1.00	1.57	0.36	40.62	−77.24	−69.83
Survival BB7	2	1.13	0.95	0.35	56.34	−108.69	−101.28
Survival BB8	2	6.00	0.47	0.34	40.44	−76.88	−69.47
Tawn type 1	2	2.62	0.42	0.32	64.67	−125.33	−117.92
180°-rotated Tawn type 1	2	1.72	0.59	0.29	31.47	−58.94	−51.53
Tawn type 2	2	1.57	0.59	0.26	40.29	−76.58	−69.17
180°-rotated Tawn type 2	2	1.76	0.59	0.30	38.75	−73.51	−66.09

Figure 10. Surface plot of the Joe copula density (a) and its contour plot with standard normal margins (b) for the severity and duration relationship.

Table 4. Parameter estimates, Kendall’s tau, and log-likelihood, AIC, and BIC values for the intensity and severity relationship.

Copula	Par. Num.	Par. 1	Par. 2	tau	Log-lik.	AIC	BIC
Gaussian	1	0.74	0.00	0.53	116.36	−230.72	−227.01
t	2	0.70	2.78	0.49	121.38	−238.76	−231.35
Clayton	1	1.90	0.00	0.49	115.69	−229.39	−225.68
Gumbel	1	1.96	0.00	0.49	106.31	−210.63	−206.92
Frank	1	5.42	0.00	0.48	89.43	−176.86	−173.15
Joe	1	2.21	0.00	0.40	83.13	−164.26	−160.55
BB1	2	1.04	1.42	0.54	129.43	−254.86	−247.44
BB6	2	1.00	1.96	0.49	106.30	−208.59	−201.18
BB7	2	1.75	1.72	0.54	135.18	−266.36	−258.94
BB8	2	6.00	0.62	0.47	86.36	−168.71	−161.30
Survival Clayton	1	1.42	0.00	0.41	87.69	−173.39	−169.68
Survival Gumbel	1	2.13	0.00	0.53	125.54	−249.08	−245.37
Survival Joe	1	2.68	0.00	0.48	115.37	−228.73	−225.02
Survival BB1	2	0.27	1.90	0.54	128.01	−252.03	−244.61
Survival BB6	2	1.07	2.03	0.53	125.58	−247.16	−239.75
Survival BB7	2	2.36	0.97	0.54	136.81	−269.63	−262.22
Survival BB8	2	2.68	1.00	0.48	115.37	−226.73	−219.32
Tawn type 1	2	2.11	0.75	0.43	100.61	−197.23	−189.81
180°-rotated Tawn type 1	2	4.70	0.58	0.49	159.57	−315.14	−307.72
Tawn type 2	2	2.08	0.75	0.43	96.36	−188.72	−181.31
180°-rotated Tawn type 2	2	2.07	0.75	0.42	104.50	−204.99	−197.58

In Figure 10a, the Joe copula density increased sharply when

u_{1}

and

u_{2}

approached 1. This behavior showed that the severity and duration were very highly correlated with each other and highlighted that the possibility for severe air pollution events to continue to occur over a long period was high. This result also highlighted the importance of risk awareness for the prolonged, severe, and unhealthy air pollution events.

Henceforth, two scoring goodness-of-fit tests based on the Vuong and Clarke tests were applied to compare the two models. Using these two tests, the model that fit any pair better than the other considered models was identified. Additionally, these tests indicated the best model according to the total obtained score. The comparison results based on the Vuong and Clarke tests for the pairs of the (

I n t

,

D u r

), (

I n t

,

S e v

), and (

D u r

,

S e v

) are provided in Table 6, Table 7, and Table 8, respectively.

Table 5. Parameter estimates, Kendall’s tau, and log-likelihood, AIC, and BIC values for the duration and severity relationship.

Copula	Par. Num.	Par. 1	Par. 2	tau	Log-lik.	AIC	BIC
Gaussian	1	0.93	0.00	0.76	280.62	−559.25	−555.54
t	2	0.96	2.00	0.82	331.71	−659.41	−652.00
Clayton	1	2.75	0.00	0.58	144.35	−286.70	−282.99
Gumbel	1	5.98	0.00	0.83	377.95	−753.90	−750.19
Frank	1	22.45	0.00	0.83	353.82	−705.64	−701.93
Joe	1	11.81	0.00	0.85	441.84	−881.68	−877.98
BB1	2	0.00	5.97	0.83	377.84	−751.69	−744.27
BB6	2	6.00	1.77	0.84	432.39	−860.77	−853.36
BB7	2	5.00	0.12	0.68	352.76	−701.51	−694.10
BB8	2	6.00	1.00	0.72	382.67	−761.33	−753.92
Survival Clayton	1	11.10	0.00	0.85	440.08	−878.16	−874.45
Survival Gumbel	1	3.69	0.00	0.73	230.44	−458.88	−455.17
Survival Joe	1	3.63	0.00	0.58	144.06	−286.12	−282.41
Survival BB1	2	5.00	1.72	0.83	417.01	−830.02	−822.61
Survival BB6	2	1.00	3.68	0.73	230.38	−456.75	−449.34
Survival BB7	2	1.00	6.00	0.75	396.18	−788.36	−780.95
Survival BB8	2	6.00	0.85	0.63	221.85	−439.70	−432.29
Tawn type 1	2	7.10	0.96	0.83	388.06	−772.12	−764.70
180°-rotated Tawn type 1	2	3.69	0.99	0.72	229.34	−454.67	−447.26
Tawn type 2	2	6.15	0.99	0.83	378.53	−753.05	−745.64
180°-rotated Tawn type 2	2	5.83	0.93	0.78	263.57	−523.14	−515.73

Using a non-nested model comparison based on the Vuong and Clarke tests, Table 6 shows that the Tawn type 1 was the best model because it obtained the highest total score of 21 on the Vuong and Clarke tests. This result was consistent with the above outcomes for the information-based model comparison (log-likelihood, AIC, and BIC). Furthermore, the total scores of the Tawn type 1 were far better than those of the second- and third-best models, which were the BB8 and Gumbel copulas, respectively. On the Vuong and Clarke tests, the BB8 copula obtained scores of 10 and 18, respectively and the Gumbel copula had scores of nine and 15, respectively. Therefore, these tests also indicated that the Tawn type 1 was much more appropriate for the intensity and duration than the other considered models.

In Table 7, for the intensity and severity relationship, the 180°-rotated Tawn type 1 was reported as the best model since it obtained the highest total score of 21 for the Vuong and Clarke tests. This outcome was similar to the result obtained from the information-based model comparison (log-likelihood, AIC, and BIC). Focusing on the Vuong test, the result showed that the best model was the 180°-rotated Tawn type 1 (21), followed by Survival BB6 (12) and then Survival BB7 (12). In contrast, for the Clarke test, the 180°-rotated Tawn type 1 (21) was almost equivalent to the second-best model, the Tawn type 2 (19), but significantly different from the third-best model, Survival Gumbel (15). However, these tests indicated that the 180°-rotated Tawn type 1 was more suitable for the intensity and duration than the other considered models.

Finally, for the duration and severity relationship, Table 8 shows that the Joe and Survival Clayton were equally superior because they obtained total scores of 19 and 20 for the Vuong and Clarke tests, respectively. In addition, BB6 was similar to these two models because it had total scores of 19 and 17 for the Vuong and Clarke tests, respectively. These results acknowledged the similarity between these three copula models because their total scores were quite similar. However, considering the outcome of the information-based model comparison (log-likelihood, AIC, and BIC), the Joe model maintained an advantage in modeling the duration and severity relationship over the other considered models. We noted that a similar finding was reported in Masseran [22] for the duration and severity relationship.

Table 6. Details on the Vuong test and Clarke tests for the intensity and duration relationship.

Copula	Vuong Test	Clarke Test
Gaussian	2	−6
t	2	6
Clayton	−15	−15
Gumbel	9	15
Frank	7	8
Joe	6	5
BB1	9	15
BB6	8	12
BB7	6	5
BB8	10	18
Survival Clayton	6	−2
Survival Gumbel	−9	−9
Survival Joe	−18	−18
Survival BB1	6	3
Survival BB6	−11	−11
Survival BB7	6	0
Survival BB8	4	0
Tawn type 1	21	21
180°-rotated Tawn type 1	−17	−18
Tawn type 2	−11	−13
180°-rotated Tawn type 2	0	5

Based on the abovementioned results, it is also worth noting the possibility where the result of the information-based model comparison (log-likelihood, AIC, and BIC) is not matched by the result of the non-nested model comparison (the Vuong and Clarke tests). This occurs due to the nature of these two comparisons, where the first analyzes the model performance solely based on a single model criterion, without any comparison to another model. In addition, some very simple formulas connecting the log-likelihood, AIC, and BIC can also be derived and used to improve the mathematical models, such as quantitative structure–activity relationship (qSAR) models for understanding how the structure and activity of random variables relate [43]. In contrast, the second model performed an early comparison to investigate whether there was significant evidence to distinguish between one model’s specifications and another model’s specifications. Therefore, the different focuses in these two tests led to different outcomes and highlighted the need for careful interpretations.

Overall, based on the comparison results obtained from information-based model comparison using the log-likelihood, AIC, and BIC values and the non-nested model comparison using the Vuong and Clarke tests, our findings indicated that the Tawn type 1, 180°-rotated Tawn type 1, and Joe copulas were the best models to fit the relationship for the pairs of the (

I n t

,

D u r

), (

I n t

,

S e v

), and (

D u r

,

S e v

), respectively. This outcome also highlighted that the dependence structures for the pairs were skewed, asymmetric, and non-Gaussian shapes. The latter characteristics established the used copulas as the appropriate models for such dependence structures. Furthermore, these characteristics also highlighted the importance of developing some statistical tests for the skewed, asymmetric, and non-Gaussian shapes, since most of the statistical tests rely on Gaussian shapes.

Table 7. Details on the Vuong test and Clarke tests for the intensity and severity relationship.

Copula	Vuong Test	Clarke Test
Gaussian	−9	−9
T	9	12
Clayton	5	−2
Gumbel	−9	−8
Frank	−1	4
Joe	−18	−16
BB1	7	9
BB6	−11	−10
BB7	6	4
BB8	−7	−5
Survival Clayton	−16	−15
Survival Gumbel	9	15
Survival Joe	10	−1
Survival BB1	9	15
Survival BB6	12	10
Survival BB7	12	7
Survival BB8	9	4
Tawn type 1	−17	−19
180°-rotated Tawn type 1	21	21
Tawn type 2	9	19
180°-rotated Tawn type 2	−9	−14

For future studies, these obtained models can be further extended through the vine copula approach to provide a more comprehensive insight into the tri-variate relationship between the duration, intensity, and severity. Furthermore, some recent tests for the general distributions also have been developed. Such tests have been applied to detect the outliers for the continuous distributions based on the cumulative distribution function and to detect the extreme values with order statistics in the samples from the continuous distributions. Therefore, these tests can also be explored further in future research to better understand air pollution behavior.

Table 8. Details on the Vuong test and Clarke test for the relationship between duration and severity.

Copula	Vuong Test	Clarke Test
Gaussian	−5	−4
t	−1	0
Clayton	−18	−19
Gumbel	9	9
Frank	−1	5
Joe	19	20
BB1	8	6
BB6	19	17
BB7	−1	−4
BB8	3	0
Survival Clayton	19	20
Survival Gumbel	−10	−11
Survival Joe	−18	−17
Survival BB1	15	15
Survival BB6	−12	−13
Survival BB7	9	4
Survival BB8	−11	−15
Tawn type 1	10	13
180°-rotated Tawn type 1	−11	−8
Tawn type 2	9	11
180°-rotated Tawn type 2	−11	−8

6. Conclusions

This study examined the bivariate dependence structures among the characteristics of unhealthy air pollution events, namely, the duration, severity, and intensity. Air pollution is classified as unhealthy when the API values cross a certain threshold. For Malaysia, the threshold is equal to 100. For each non-overlapping period of an unhealthy air pollution event, the duration is the total number of days of that period, the severity is the summation of all the API values within that period, and the intensity is the maximum API value within that period.

For modeling purposes, copula models were suggested to fit the bivariate dependence structure for all the possible pairs, including the (intensity, duration), (intensity, severity), and (duration, severity). The normalized contour plots of the pairs illustrated that the pairs had skewed, asymmetric, and non-Gaussian shapes. Therefore, the copula models were suitable for this application because they provided a great flexibility in modeling multivariate non-Gaussian distributions due to the separation of the margins and dependence by the copula function.

A wide range of existing parametric copula models were fitted on each pair and optimized using the MLE. Then, the best model was determined based on two comparison methods. The first method was the information-based model comparison that relied on the log-likelihood estimate and the AIC and BIC criteria. The second method was a non-nested model comparison based on the Vuong and Clarke tests.

According to the results of these two comparison methods, the Tawn type 1, 180°-rotated Tawn type 1, and Joe copulas were the best models to fit the relationship for the pairs of the (intensity, duration), (intensity, severity), and (duration, severity), respectively. These models showed that the dependence structures for the pairs were skewed, and the obtained copula models were appropriate tools for such structures. In future work, these obtained models can be further extended through the vine copula approach to provide a more comprehensive insight into the tri-variate relationship of the duration–intensity–severity characteristics.

Author Contributions

Conceptualization, M.S.I. and N.M.; data curation, N.M.; methodology, N.M.; software, M.S.I.; supervision, N.M.; writing—original draft, M.S.I.; writing—review and editing, N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University Kebangsaan Malaysia, which provided a research grant through the Dana Impak Perdana 2.0 (Grant No: DIP-2022-002).

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is available from the authors upon request.

Acknowledgments

The authors would like to acknowledge the Malaysian Department of Environment for kindly providing the data on the air pollution index in Klang, Malaysia.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zulkepli, N.F.S.; Noorani, M.S.M.; Razak, F.A.; Ismail, M.; Alias, M.A. Hybridization of hierarchical clustering with persistent homology in assessing haze episodes between air quality monitoring stations. J. Environ. Manag. 2022, 306, 114434. [Google Scholar] [CrossRef] [PubMed]
Forsyth, T. Public concerns about transboundary haze: A comparison of Indonesia, Singapore, and Malaysia. Glob. Environ. Chang. 2014, 25, 76–86. [Google Scholar] [CrossRef] [Green Version]
Afroz, R.; Hassan, M.N.; Ibrahim, N.A. Review of air pollution and health impacts in Malaysia. Environ. Res. 2003, 92, 71–77. [Google Scholar] [CrossRef] [PubMed]
Usmani, R.S.A.; Saeed, A.; Abdullahi, A.M.; Pillai, T.R.; Jhanjhi, N.Z.; Hashem, I.A.T. Air pollution and its health impacts in Malaysia: A review. Air Qual. Atmos. Health 2020, 13, 1093–1118. [Google Scholar] [CrossRef]
Araujo, L.N.; Belotti, J.T.; Alves, T.A.; de Souza Tadano, Y.; Siqueira, H. Ensemble method based on Artificial Neural Networks to estimate air pollution health risks. Environ. Model. Softw. 2020, 123, 104567. [Google Scholar] [CrossRef]
Li, R.; Dong, Y.; Zhu, Z.; Li, C.; Yang, H. A dynamic evaluation framework for ambient air pollution monitoring. Appl. Math. Model. 2019, 65, 52–71. [Google Scholar] [CrossRef]
Lu, M.; Schmitz, O.; de Hoogh, K.; Hoek, G.; Li, Q.; Karssenberg, D. Integrating statistical and agent-based modelling for activity-based ambient air pollution exposure assessment. Environ. Model. Softw. 2022, 158, 105555. [Google Scholar] [CrossRef]
Sacks, J.D.; Lloyd, J.M.; Zhu, Y.; Anderton, J.; Jang, C.J.; Hubbell, B.; Fann, N. The Environmental Benefits Mapping and Analysis Program–Community Edition (BenMAP–CE): A tool to estimate the health and economic benefits of reducing air pollution. Environ. Model. Softw. 2018, 104, 118–129. [Google Scholar] [CrossRef]
Aditama, T.Y. Impact of haze from forest fire to respiratory health: Indonesian experience. Respirology 2000, 5, 169–174. [Google Scholar] [CrossRef]
Hod, R. The impact of air pollution and haze on hospital admission for cardiovascular and respiratory diseases. Int. J. Public Health Res. 2016, 6, 707–712. [Google Scholar]
Wen, Y.S.; bin Mohd Nor, A.F. Transboundary air pollution in Malaysia: Impact and perspective on haze. Nova J. Eng. Appl. Sci. 2016, 5, 1. [Google Scholar]
Quah, E.; Varkkey, H. The political economy of transboundary pollution: Mitigation of forest fires and haze in Southeast Asia. Asian Community Concepts Prospect 2013, 323, 358. [Google Scholar]
Durante, F.; Sempi, C. Principles of Copula Theory; CRC Press: Boca Raton, FL, USA, 2016; Volume 474. [Google Scholar]
Nelsen, R.B. An Introduction to Copulas; Springer Science & Business Media: Berlin, Germany, 2007. [Google Scholar]
Sak, H.; Yang, G.; Li, B.; Li, W. A copula-based model for air pollution portfolio risk and its efficient simulation. Stoch. Environ. Res. Risk Assess. 2017, 31, 2607–2616. [Google Scholar] [CrossRef]
Chan, R.K.; So, M.K. Multivariate modelling of spatial extremes based on copulas. J. Stat. Comput. Simul. 2018, 88, 2404–2424. [Google Scholar] [CrossRef]
Falk, M.; Padoan, S.A.; Wisheckel, F. Generalized pareto copulas: A key to multivariate extremes. J. Multivar. Anal. 2019, 174, 104538. [Google Scholar] [CrossRef] [Green Version]
Kim, J.-M.; Lee, N.; Xiao, X. Directional dependence between major cities in China based on copula regression on air pollution measurements. PLoS ONE 2019, 14, e0213148. [Google Scholar] [CrossRef]
Masseran, N.; Hussain, S.I. Copula modelling on the dynamic dependence structure of multiple air pollutant variables. Mathematics 2020, 8, 1910. [Google Scholar] [CrossRef]
He, S.; Li, Z.; Wang, W.; Yu, M.; Liu, L.; Alam, M.N.; Gao, Q.; Wang, T. Dynamic relationship between meteorological conditions and air pollutants based on a mixed Copula model. Int. J. Climatol. 2021, 41, 2611–2624. [Google Scholar] [CrossRef]
Shiau, J. Fitting drought duration and severity with two-dimensional copulas. Water Resour. Manag. 2006, 20, 795–815. [Google Scholar] [CrossRef]
Masseran, N. Modeling the characteristics of unhealthy air pollution events: A copula approach. Int. J. Environ. Res. Public Health 2021, 18, 8751. [Google Scholar] [CrossRef]
Masseran, N.; Safari, M.A.M. Intensity–duration–frequency approach for risk assessment of air pollution events. J. Environ. Manag. 2020, 264, 110429. [Google Scholar] [CrossRef] [PubMed]
Sklar, M. Fonctions de repartition an dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 1959, 8, 229–231. [Google Scholar]
Chowdhary, H.; Escobar, L.A.; Singh, V.P. Identification of suitable copulas for bivariate frequency analysis of flood peak and flood volume data. Hydrol. Res. 2011, 42, 193–216. [Google Scholar] [CrossRef]
Czado, C. Analyzing dependent data with vine copulas. In Lecture Notes in Statistics; Springer: Berlin, Germany, 2019; Volume 222. [Google Scholar]
Czado, C.; Nagler, T. Vine copula based modeling. Annu. Rev. Stat. Its Appl. 2022, 9, 453–477. [Google Scholar] [CrossRef]
Yusof, F.; Hui-Mean, F.; Suhaila, J.; Yusof, Z. Characterisation of drought properties with bivariate copula analysis. Water Resour. Manag. 2013, 27, 4183–4207. [Google Scholar] [CrossRef]
Latif, S.; Mustafa, F. Bivariate flood distribution analysis under parametric copula framework: A case study for Kelantan River basin in Malaysia. Acta Geophys. 2020, 68, 821–859. [Google Scholar] [CrossRef]
Tosunoglu, F.; Can, I. Application of copulas for regional bivariate frequency analysis of meteorological droughts in Turkey. Nat. Hazards 2016, 82, 1457–1477. [Google Scholar] [CrossRef]
McNeil, A.J.; Frey, R.; Embrechts, P. Quantitative Risk Management: Concepts, Techniques and Tools-Revised Edition; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
Hürlimann, W. Hutchinson-Lai’s conjecture for bivariate extreme value copulas. Stat. Probab. Lett. 2003, 61, 191–198. [Google Scholar] [CrossRef]
Environment, D.O. A Guide to Air Pollutant Index in Malaysia (API); Ministry of Science, Technology and the Environment: Kuala Lumpur, Malaysia, 1997. [Google Scholar]
Masseran, N.; Safari, M.A.M. Mixed POT-BM approach for modeling unhealthy air pollution events. Int. J. Environ. Res. Public Health 2021, 18, 6754. [Google Scholar] [CrossRef]
AL-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H. Hierarchical-Generalized Pareto model for estimation of unhealthy air pollution index. Environ. Model. Assess. 2020, 25, 555–564. [Google Scholar] [CrossRef]
AL-Dhurafi, N.A.; Masseran, N.; Zamzuri, Z.H.; Razali, A.M. Modeling unhealthy air pollution index using a peaks-over-threshold method. Environ. Eng. Sci. 2018, 35, 101–110. [Google Scholar] [CrossRef]
Maps, G. Pictures of Klang. 2019. Available online: https://www.google.com/maps/place/Klang,+Selangor/@3.0431358,101.3582538,12z/data=!3m1!4b1!4m6!3m5!1s0x31cc534c4ffe81cf:0xeb61f5772fd54514!8m2!3d3.044917!4d101.4455621!16zL20vMDJtMmgw (accessed on 2 January 2023).
Masseran, N.; Safari, M.A.M. Risk assessment of extreme air pollution based on partial duration series: IDF approach. Stoch. Environ. Res. Risk Assess. 2020, 34, 545–559. [Google Scholar] [CrossRef]
Schepsmeier, U.; Stoeber, J.; Brechmann, E.C.; Graeler, B.; Nagler, T.; Erhardt, T.; Almeida, C.; Min, A.; Czado, C.; Hofmann, M. Package ‘Vinecopula’; R Package Version; 2015; Volume 2. Available online: https://cran.r-project.org/web/packages/VineCopula/VineCopula.pdf (accessed on 15 February 2023).
Belgorodski, N. Selecting Pair-Copula Families for Regular Vines with Application to the Multivariate Analysis of European Stock Market Indices. Diploma Thesis, Technische Universität München, Munich, Germany, 2010. [Google Scholar]
Clarke, K.A. A simple distribution-free test for nonnested model selection. Political Anal. 2007, 15, 347–363. [Google Scholar] [CrossRef]
Vuong, Q.H. Likelihood ratio tests for model selection and non-nested hypotheses. Econom. J. Econom. Soc. 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
Bolboaca, S.D.; Jäntschi, L. Comparison of Quantitative Structure-Activity Relationship Model Performances on Carboquinone Derivatives. Sci. World J. 2009, 9, 272946. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Figure 1. Pipeline for the content of this manuscript.

Figure 2. Process for determining the API data.

Figure 3. Maps of (a) Peninsular Malaysia (Klang is denoted by the red dot) and (b) the city of Klang.

Figure 4. The process was used to determine the durations, severities, and intensities of the first three air pollution events (red regions).

Figure 6. Margin histograms of the original data (first row) and copula data (second row).

Figure 7. Pairwise dependencies among the intensity, duration, and severity. Within the plot, above the diagonal lists the pair plots of the copula data with their corresponding Kendall’s tau estimation; the diagonal lists the histogram of the copula margins; below the diagonal lists the normalized contour plots.

Figure 8. Surface plot of the Tawn type 1 copula density (a) and its contour plot with standard normal margins (b) for the intensity and duration relationship.

Figure 9. Surface plot of the 180°-rotated Tawn type 1 copula density (a) and its contour plot with standard normal margins (b) for the intensity and severity relationship.

Table 2. List of the considered copula models.

Number	Copula Short Name	Copula Long Name	Parameter Number
1	N	Gaussian	1
2	t	t	2
3	C	Clayton	1
4	G	Gumbel	1
5	F	Frank	1
6	J	Joe	1
7	BB1	BB1	2
8	BB6	BB6	2
9	BB7	BB7	2
10	BB8	BB8	2
11	SC	Survival Clayton	1
12	SG	Survival Gumbel	1
13	SJ	Survival Joe	1
14	SBB1	Survival BB1	2
15	SBB6	Survival BB6	2
16	SBB7	Survival BB7	2
17	SBB8	Survival BB8	2
18	Tawn	Tawn type 1	2
19	Tawn 180	180°-rotated Tawn type 1	2
20	Tawn 2	Tawn type 2	2
21	Tawn 2 180	180°-rotated Tawn type 2	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ismail, M.S.; Masseran, N. Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas. Symmetry 2023, 15, 907. https://doi.org/10.3390/sym15040907

AMA Style

Ismail MS, Masseran N. Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas. Symmetry. 2023; 15(4):907. https://doi.org/10.3390/sym15040907

Chicago/Turabian Style

Ismail, Mohd Sabri, and Nurulkamal Masseran. 2023. "Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas" Symmetry 15, no. 4: 907. https://doi.org/10.3390/sym15040907

APA Style

Ismail, M. S., & Masseran, N. (2023). Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas. Symmetry, 15(4), 907. https://doi.org/10.3390/sym15040907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling the Characteristics of Unhealthy Air Pollution Events Using Bivariate Copulas

Abstract

1. Introduction

2. Bivariate Copula

3. Study Area and Data

4. Methodology

5. Results

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI