Next Article in Journal
New Perspectives of Symmetry Conferred by q-Hermite-Hadamard Type Integral Inequalities
Previous Article in Journal
Uncertain Stochastic Hybrid Age-Dependent Population Equation Based on Subadditive Measure: Existence, Uniqueness and Exponential Stability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantile-Based Multivariate Log-Normal Distribution

by
Raúl Alejandro Morán-Vásquez
*,†,
Alejandro Roldán-Correa
and
Daya K. Nagar
Instituto de Matemáticas, Universidad de Antioquia, Calle 67 No. 53-108, Medellín 050010, Colombia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2023, 15(8), 1513; https://doi.org/10.3390/sym15081513
Submission received: 28 June 2023 / Revised: 22 July 2023 / Accepted: 30 July 2023 / Published: 31 July 2023
(This article belongs to the Section Mathematics)

Abstract

:
We introduce a quantile-based multivariate log-normal distribution, providing a new multivariate skewed distribution with positive support. The parameters of this distribution are interpretable in terms of quantiles of marginal distributions and associations between pairs of variables, a desirable feature for statistical modeling purposes. We derive statistical properties of the quantile-based multivariate log-normal distribution involving the transformations, closed-form expressions for the mixed moments, expected value, covariance matrix, mode, Shannon entropy, and Kullback–Leibler divergence. We also present results on marginalization, conditioning, and independence. Additionally, we discuss parameter estimation and verify its performance through simulation studies. We evaluate the model fitting based on Mahalanobis-type distances. An application to children data is presented.

1. Introduction

Quantile regression modeling has been widely applied in different fields such as economics, environmental science, ecology, and medicine, among many others (Cade and Noon [1], Yu et al. [2]). A number of studies on nonparametric quantile regression and its applications have been developed since the seminal work of Koenker and Bassett [3]. Recently, several parametric quantile models have been studied in the regression literature, which have motivated the study of probability distributions that are useful for this purpose.
In the univariate setting, some distributions suitable for parametric quantile modeling appear in Ferrari and Fumes [4], Gijbels et al. [5], Mazucheli et al. [6], and Smithson and Shou [7]. Multivariate quantile modeling is less frequent in the statistical literature and often uses nonparametric methods. Several studies are based on extensions of the quantile concept to a multivariate setting. Some examples can be found in Breckling and Chambers [8], Kong and Mizera [9], McKeague et al. [10], and Wei [11]. Other multivariate models are based on the univariate quantile notion. For instance, Petrella and Raponi [12], Morán-Vásquez and Ferrari [13], and Morán-Vásquez et al. [14] propose methods for jointly modeling univariate marginal quantiles, taking into account the potential correlation between marginals.
In the present article, we define a quantile-based multivariate log-normal distribution. This distribution has positive support, and is simplified to the quantile-based log-normal distribution (Saulo et al. [15]) in the univariate setting. On the other hand, the usual multivariate log-normal distribution (Morán-Vásquez and Ferrari [13] and Morán-Vásquez et al. [14]) can be expressed as a quantile-based multivariate log-normal distribution. The parameters of the proposed distribution are interpretable in terms of marginal quantiles and associations between pairs of variables, making this model attractive to quantile modeling for correlated multivariate positive skewed data.
In this article, we study some statistical properties of the quantile-based multivariate log-normal family, describe the estimation of its parameters, and show its usefulness through an application to real data. We derive distributional properties obtained through transformations, as well as results related to the mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginal and conditional distributions, and independence. Applications of some of our results derived in this article establish new properties of the multivariate log-normal distribution. We compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution from the maximum likelihood estimates of the multivariate log-normal distribution. We evaluate the performance of the proposed estimation procedure through Monte Carlo simulations. The usefulness of the proposed distribution for modeling multivariate positive skewed data is illustrated through an analysis of real data on children’s weights and heights.
The paper is organized as follows. Section 2 presents the quantile-based multivariate log-normal distribution. Section 3 deals with the derivation of various statistical properties of the proposed distribution. Section 4 focuses on maximum likelihood estimation and simulation studies. Also, a graphical method to assess the goodness of fit is described. Section 5 presents an application to real data. Finally, Section 6 closes the paper with concluding remarks.

2. Quantile-Based Multivariate Log-Normal Distribution

We denote vectors with lowercase Greek letters in bold and matrices with capital Greek letters in bold. For vectors and matrices, the components are denoted by the respective Greek letter in normal font. For example, if θ R p and Ω ( p × q ) are a real matrix, then θ = ( θ 1 , , θ p ) and Ω = ( ω j k ) p × q . We denote by 0 = ( 0 , , 0 ) and 1 = ( 1 , , 1 ) the p-dimensional vectors whose components are all zero and one, respectively. We denote by I p the p × p identity matrix. Let Ω ( p × p ) be a square matrix. We denote by det ( Ω ) and tr ( Ω ) the determinant and trace of Ω , respectively. If Ω is a symmetric matrix, then Ω > 0 means that Ω is the positive definite. Additionally, Ω 1 / 2 is the unique symmetric positive definite square root of Ω > 0 . If Ω 1 and Ω 2 are matrices of the same dimension, then Ω 1 Ω 2 denotes the Hadamard product of Ω 1 and Ω 2 . If θ R p is a vector, then D θ denotes the diagonal matrix with diagonal elements of θ , that is, D θ = diag ( θ 1 , , θ p ) . We define the set R + p as
R + p = θ R p : θ k > 0 , k = 1 , , p .
If θ R p and f are a real function, we denote f ( θ ) = ( f ( θ 1 ) , , f ( θ p ) ) , provided that the components of θ are in the domain of f. If θ R + p and β R p , we write θ β = ( θ 1 β 1 , , θ p β p ) . We denote random vectors and their components with capital Roman letters in bold and normal fonts, respectively.
It is well known that the PDF of a multivariate normal vector X N p ( μ , Σ ) is given by
ϕ p ( x ; μ , Σ ) = ( 2 π ) p / 2 det ( Σ ) 1 / 2 exp 1 2 δ Σ ( x , μ ) ,
where δ Σ ( x , μ ) = ( x μ ) Σ 1 ( x μ ) is the square of the Mahalanobis distance between x and μ with respect to Σ . On the other hand, the random vector Y R + p has a multivariate log-normal distribution with median vector μ R + p and dispersion matrix Σ ( p × p ) > 0 , denoted by Y L N p ( μ , Σ ) , if log ( Y ) N p ( log ( μ ) , Σ ) , where the log denotes the natural logarithm function. The PDF of Y L N p ( μ , Σ ) is (Morán-Vásquez et al. [14])
L N p ( y ; μ , Σ ) = ϕ p ( log ( y ) ; log ( μ ) , Σ ) k = 1 p 1 y k , y R + p .
The multivariate log-normal distribution given in (1) has a slightly different parameterization than the one used by Fang et al. ([16], Section 2.8).
Let α 1 , , α p be fixed values in ( 0 , 1 ) . Theorem 5 and Corollary 2 of Morán–Vásquez and Ferrari [13] permits us to establish that the α k -quantile Q k of Y k satisfies Q k = μ k exp ( σ k k q k ) , where q k is the α k -quantile of a standard normal distribution, k = 1 , , p . Note that the quantile vector Q = ( Q 1 , , Q p ) can be expressed as
Q = D μ exp ( Ω q ) ,
where Ω = ( Σ I p ) 1 / 2 and q = ( q 1 , , q p ) . A reparameterization of the multivariate log-normal distribution in terms of Q is obtained by replacing μ = D Q exp ( Ω q ) in (1). Based on this, we present the quantile-based multivariate log-normal distribution in Definition 1.
Definition 1.
Let α = ( α 1 , , α p ) ( 0 , 1 ) p and q = ( q 1 , , q p ) R p be fixed vectors such that q k is the α k -quantile of a standard normal distribution, k = 1 , , p . The random vector Y R + p is said to have a quantile-based multivariate log-normal distribution with quantile vector Q = ( Q 1 , , Q p ) and dispersion matrix Σ ( p × p ) > 0 , denoted by Y Q L N p ( Q , Σ , q ) , if its PDF is
Q L N p ( y ; Q , Σ , q ) = ϕ p ( log ( y ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 y k , y R + p ,
where Ω = ( Σ I p ) 1 / 2 .
If we choose α = ( 1 / 2 , , 1 / 2 ) , then Definition 1 coincides with the definition of a multivariate log-normal distribution (Morán-Vásquez et al. [14]). In this case, Q = μ is the median vector. Note that Y Q L N p ( Q , Σ , q ) if log ( Y ) N p ( log ( Q ) Ω q , Σ ) , which establishes the way in which the quantile-based multivariate log-normal and normal distributions are related through the logarithmic transformation.
Figure 1 displays contour plots (at levels 0.15, 0.1, 0.05, 0.02, 0.01) of the quantile-based bivariate log-normal distribution. The legend indicates the values of α 1 , α 2 , and all the parameters considered in the first plot and the values that are changed from a plot to the subsequent one (in alphabetical order). The parameters Q 1 and Q 2 of the distribution in Figure 1a are the marginal medians of Y 1 and Y 2 , respectively. For Figure 1b–f, these parameters are the first quartile of Y 1 and the median of Y 2 , respectively. The parameter Q 2 impacts the scale of the marginal distribution of Y 2 (Figure 1b,c). The parameter σ 11 controls the dispersion of the marginal distribution of Y 1 (Figure 1c,d). The parameter σ 12 controls the association between the marginal distributions of Y 1 and Y 2 , ranging from a negative to positive association (Figure 1d–f).
The quantile-based multivariate log-normal distribution is suitable in situations where it is necessary to model quantiles of the marginals, taking into account the correlation between them. Additionally, our model can be useful for regression modeling purposes. For instance, assume that, for fixed k, log ( Q k ) = j = 1 r β j x j , where β 1 , , β r are unknown regression parameters and x 1 , , x r are fixed covariates. So, exp ( β j ) is the multiplicative effect of a one unit increase in x j on the α k -quantile of Y k . This is a parametric methodology that allows us to jointly analyze marginal quantiles, taking into account the association among the response variables through the dispersion matrix Σ ( p × p ) > 0 . These types of models can provide more accurate estimates than those that consider univariate models for each marginal assuming independence among them (Morán-Vásquez et al. [14]).

3. Main Properties

Theorems 1–3 state distributional results involving the transformation of quantile-based multivariate log-normal random vectors.
Theorem 1.
Let θ R + p . If Y Q L N p ( Q , Σ , q ) , then D θ Y Q L N p ( D θ Q , Σ , q ) .
Proof. 
From the transformation U = D θ Y , with the Jacobian J ( y u ) = k = 1 p θ k 1 , in (3), we arrive at
f U ( u ) = ϕ p ( log ( D θ 1 u ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 u k = ϕ p ( log ( u ) log ( θ ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 u k .
Since ϕ p ( log ( u ) log ( θ ) ; log ( Q ) Ω q , Σ ) = ϕ p ( log ( u ) ; log ( Q ) + log ( θ ) Ω q , Σ ) , (4) can be expressed as
f U ( u ) = ϕ p ( log ( u ) ; log ( D θ Q ) Ω q , Σ ) k = 1 p 1 u k ,
where the last line is obtained by using the identity log ( Q ) + log ( θ ) = log ( D θ Q ) . □
Corollary 1.
Let θ R + p . If Y L N p ( μ , Σ ) , then D θ Y L N p ( D θ μ , Σ ) .
Proof. 
The result follows by applying Theorem 1 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
The result stated in the above corollary can also be obtained as a particular case of Theorem 3(1) of Morán–Vásquez and Ferrari [13].
Theorem 2.
Let β R p have nonzero components. If Y Q L N p ( Q , Σ , q ) , then Y β Q L N p ( Q β , D β Σ D β , q * ) , where q * = D sgn ( β ) q .
Proof. 
Transforming T = Y β , with the Jacobian J ( y t ) = k = 1 p β k 1 t k 1 / β k 1 , in (3), we have
f T ( t ) = ϕ p ( log ( t 1 / β ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 | β k | t k = ϕ p ( D β 1 log ( t ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 | β k | t k .
By using the identity
ϕ p ( D β 1 log ( t ) ; log ( Q ) Ω q , Σ ) k = 1 p 1 | β k | = ϕ p ( log ( t ) ; D β log ( Q ) Ω * q * , D β Σ D β ) ,
with Ω * = ( D β Σ D β I p ) 1 / 2 and q * = D sgn ( β ) q , in (5), we have
f T ( t ) = ϕ p ( log ( t ) ; log ( Q β ) Ω * q * , D β Σ D β ) k = 1 p 1 t k ,
where the last line is derived by noting that D β log ( Q ) = log ( Q β ) . □
Corollary 2.
Let β R p with nonzero components. If Y L N p ( μ , Σ ) , then Y β L N p ( μ β , D β Σ D β ) .
Proof. 
The result follows by applying Theorem 2 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
The above corollary can also be obtained as a particular case of Theorem 3(2) of Morán–Vásquez and Ferrari [13].
Theorem 3.
Let λ R p 0 . If Y Q L N p ( Q , Σ , q ) , then
k = 1 p Y k λ k Q L N 1 k = 1 p Q k λ k , λ Σ λ , q ˜ ,
where q ˜ = ( λ Σ λ ) 1 / 2 λ Ω q .
Proof. 
Since log ( Y ) N p ( log ( Q ) Ω q , Σ ) , we have
λ log ( Y ) = log k = 1 p Y k λ k N 1 log k = 1 p Q k λ k ω ˜ q ˜ , λ Σ λ ,
where ω ˜ = ( λ Σ λ ) 1 / 2 and q ˜ = ( λ Σ λ ) 1 / 2 λ Ω q . This completes the proof. □
Corollary 3.
Let λ R p 0 . If Y L N p ( μ , Σ ) , then
k = 1 p Y k λ k L N 1 k = 1 p μ k λ k , λ Σ λ .
Proof. 
Simply apply Theorem 3 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
In Theorem 4, we give a closed-form expression for the mixed moments of quantile-based multivariate log-normal random vectors.
Theorem 4.
Let λ R p 0 . If Y Q L N p ( Q , Σ , q ) , then
E k = 1 p Y k λ k = exp λ Ω q + 1 2 λ Σ λ k = 1 p Q k λ k .
Proof. 
From (3), we have
E k = 1 p Y k λ k = R + p ϕ p ( log ( y ) ; log ( Q ) Ω q , Σ ) k = 1 p y k λ k 1 d y .
By making the change of variables y = exp ( x ) , with the Jacobian J ( y x ) = exp ( k = 1 p x k ) , in (7), we arrive at E ( k = 1 p Y k λ k ) = M X ( λ ) , where M X is the moment-generating function of X N p ( log ( Q ) Ω q , Σ ) . This completes the proof. □
In the following corollary, we derive the expected value and the covariance matrix of a quantile-based multivariate log-normal random vector.
Corollary 4.
Let Y Q L N p ( Q , Σ , q ) . Then,
  • E ( Y ) = D Q exp ( Ω q + σ / 2 ) , where σ = ( σ 11 , , σ p p ) is the vector with elements being the main diagonal elements of Σ.
  • Cov ( Y ) = ( Cov ( Y j , Y k ) ) p × p , where
    Cov ( Y j , Y k ) = Q j Q k exp σ j j q j σ k k q k + 1 2 ( σ j j + σ k k ) ( exp ( σ j k ) 1 ) .
Proof. 
For each k = 1 , , p , by choosing λ with all its components being 0, except the kth which is 1, in (6), we obtain
E ( Y k ) = Q k exp σ k k q k + σ k k 2 .
From the above expression, we get the first assertion. Similarly, for each j , k = 1 , , p , by choosing λ with all components equal to 0, except the jth and kth, which are 1, in (6), we have
E ( Y j Y k ) = Q j Q k exp σ j j q j σ k k q k + 1 2 ( σ j j + σ k k ) + σ j k .
The second assertion is obtained from the identity Cov ( Y j , Y k ) = E ( Y j Y k ) E ( Y j ) E ( Y k ) . □
In Section 2, we described the behavior of the quantile-based multivariate log-normal distribution in terms of the parameters involved in the matrix Σ . The following corollary establishes an exact interpretation of these parameters in terms of covariance between pairs of variables according to their signs.
Corollary 5.
Let Y Q L N p ( Q , Σ , q ) . Then,
  • Cov ( Y j , Y k ) > 0 if and only if σ j k > 0 , j k .
  • Cov ( Y j , Y k ) = 0 if and only if σ j k = 0 , j k .
Proof. 
The result follows from (8). □
Corollary 6.
Let λ R p 0 . If Y L N p ( μ , Σ ) , then
E k = 1 p Y k λ k = exp 1 2 λ Σ λ k = 1 p μ k λ k .
Moreover, E ( Y ) = D μ exp ( σ / 2 ) , where σ = ( σ 11 , , σ p p ) is the vector with elements being the main diagonal elements of Σ, and Cov ( Y ) = ( Cov ( Y j , Y k ) ) p × p , with
Cov ( Y j , Y k ) = μ j μ k exp 1 2 ( σ j j + σ k k ) ( exp ( σ j k ) 1 ) .
Proof. 
Apply Theorem 4 and Corollary 4 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
Theorem 5 gives a closed-form expression for the mode of the quantile-based multivariate log-normal distribution.
Theorem 5.
The mode of Y Q L N p ( Q , Σ , q ) is given by Mode ( Y ) = D Q exp ( Ω q Σ 1 ) . The value of the PDF of Y at the mode is
Q L N p ( Mode ( Y ) ; Q , Σ , q ) = ( 2 π ) p / 2 det ( Σ ) exp 1 2 1 Σ 1 1 ( log ( Q ) Ω q ) .
Proof. 
The mode of Y Q L N p ( Q , Σ , q ) is obtained by maximizing (3) with respect to y , which is the one that maximizes the function
f ( y ) = 1 2 δ Σ ( log ( y ) , log ( Q ) Ω q ) k = 1 p log ( y k )
with respect to y . By using results on vector differentiation (Seber ([17], Chapter 17)), we find that the equation f / y = 0 is equivalent to
Σ 1 ( log ( y ) log ( Q ) + Ω q ) + 1 = 0 .
The solution for y of the above equation is D Q exp ( Ω q Σ 1 ) .
Now, for y R + p , we have
( log ( y ) log ( Q ) + Ω q + Σ 1 ) Σ 1 ( log ( y ) log ( Q ) + Ω q + Σ 1 ) 0 ,
which implies that
Q L N p ( y ; Q , Σ , q ) Q L N p ( D Q exp ( Ω q Σ 1 ) ; Q , Σ , q ) = ( 2 π ) p / 2 det ( Σ ) exp 1 2 1 Σ 1 1 ( log ( Q ) Ω q ) ,
for all y R + p . Hence, Mode ( Y ) = D Q exp ( Ω q Σ 1 ) . □
Corollary 7.
The mode of Y L N p ( μ , Σ ) is given by Mode ( Y ) = D μ exp ( Σ 1 ) . The value of the PDF of Y at the mode is
L N p ( Mode ( Y ) ; μ , Σ ) = ( 2 π ) p / 2 det ( Σ ) exp 1 2 1 Σ 1 1 log ( μ ) .
Proof. 
Apply Theorem 5 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
Theorem 6 provides the distribution of a Mahalanobis-type distance involving a quantile-based multivariate log-normal random vector.
Theorem 6.
If Y Q L N p ( Q , Σ , q ) , then δ Σ ( log ( Y ) , log ( Q ) Ω q ) χ p 2 .
Proof. 
The result follows by noting that log ( Y ) N p ( log ( Q ) Ω q , Σ ) . □
The above result allows us to evaluate the goodness of fit of the quantile-based multivariate log-normal distribution by using quantile–quantile plots to compare empirical Mahalanobis distances with theoretical quantiles obtained from a chi-squared distribution with p degrees of freedom.
The Shannon entropy (also called differential entropy) of a continuous random vector X R p with PDF f X is defined as
H ( X ) = E [ log ( f X ( X ) ) ] .
On the other hand, the Kullback–Leibler (KL) divergence between the distributions of two p-dimensional random vectors T and U is given by
D K L ( T , U ) = E log f T ( T ) f U ( T ) ,
where f T and f U denote the PDFs of T and U , respectively. The above expected value is defined with respect to the PDF f T . A detailed study about Shannon entropy and KL divergence can be found in Pardo [18].
Lemmas 1 and 2 provide the Shannon entropy and the KL divergence for the multivariate normal distribution, respectively.
Lemma 1.
The Shannon entropy of X N p ( μ , Σ ) is given by
H ( X ) = 1 2 log [ det ( Σ ) ( 2 π e ) p ] .
Proof. 
See Pardo ([18], p. 32). □
Note that H ( X ) in the above lemma can be expressed as
H ( X ) = p 2 ( 1 + log ( 2 π ) ) + 1 2 log ( det ( Σ ) ) .
Lemma 2.
The KL divergence between X N p ( μ a , Σ a ) and W N p ( μ b , Σ b ) is given by
D K L ( X , W ) = 1 2 log det ( Σ b ) det ( Σ a ) + t r ( Σ b 1 Σ a I p ) + ( μ a μ b ) Σ b 1 ( μ a μ b ) .
Proof. 
See Pardo ([18], p. 33). □
In the following Theorem, we derive the Shannon entropy of the quantile-based multivariate log-normal distribution.
Theorem 7.
The Shannon entropy of Y Q L N p ( Q , Σ , q ) is given by
H ( Y ) = p 2 ( 1 + log ( 2 π ) ) + 1 2 log ( det ( Σ ) ) + 1 ( log ( Q ) Ω q ) .
Proof. 
By definition,
H ( Y ) = E [ log ( Q L N p ( Y ; Q , Σ , q ) ) ] = R + p log ( Q L N p ( y ; Q , Σ , q ) ) Q L N p ( y ; Q , Σ , q ) d y .
By making the change of variables y = exp ( x ) , with Jacobian J ( y x ) = exp ( k = 1 p x k ) , in the above integral, we have
H ( Y ) = R p [ log ( ϕ p ( x ; log ( Q ) Ω q , Σ ) ) 1 x ] ϕ p ( x ; log ( Q ) Ω q , Σ ) d x = H ( X ) + 1 E ( X ) ,
where X N p ( log ( Q ) Ω q , Σ ) . The result follows by calculating H ( X ) by using Lemma 1 and replacing E ( X ) = log ( Q ) Ω q in the above expression. □
Corollary 8.
The Shannon entropy of Y L N p ( μ , Σ ) is given by
H ( Y ) = p 2 ( 1 + log ( 2 π ) ) + 1 2 log ( det ( Σ ) ) + 1 log ( μ ) .
Proof. 
The result follows by applying Theorem 7 to the quantile-based multivariate log-normal distribution generated by α = ( 1 / 2 , , 1 / 2 ) . □
In Theorem 8, we derive the KL divergence between two quantile-based multivariate log-normal distributions.
Theorem 8.
The KL divergence between T Q L N p ( Q a , Σ a , q a ) and U Q L N p ( Q b , Σ b , q b ) is given by
D K L ( T , U ) = 1 2 [ log det ( Σ b ) det ( Σ a ) + tr ( Σ b 1 Σ a I p ) + ( log ( Q a ) Ω a q a log ( Q b ) + Ω b q b ) Σ b 1 ( log ( Q a ) Ω a q a log ( Q b ) + Ω b q b ) ] .
Proof. 
By definition,
D K L ( T , U ) = E log Q L N p ( T ; Q a , Σ a , q a ) Q L N p ( T ; Q b , Σ b , q b ) = R + p log Q L N p ( t ; Q a , Σ a , q a ) Q L N p ( t ; Q b , Σ b , q b ) Q L N p ( t ; Q a , Σ a , q a ) d t .
We substitute t = exp ( w ) , with Jacobian J ( t w ) = exp ( k = 1 p w k ) , above to arrive at
D K L ( T , U ) = R p log ϕ p ( w ; log ( Q a ) Ω a q a , Σ a ) ϕ p ( w ; log ( Q b ) Ω b q b , Σ b ) ϕ p ( w ; log ( Q a ) Ω a q a , Σ a ) d w = D K L ( T * , U * ) ,
where T * N p ( log ( Q a ) Ω a q a , Σ a ) and U * N p ( log ( Q b ) Ω b q b , Σ b ) . By using Lemma 2 to calculate D K L ( T * , U * ) we arrive at the desired result. □
Corollary 9.
The KL divergence between T Q L N p ( Q a , Σ a , q a ) and U L N p ( μ b , Σ b ) is given by
D K L ( T , U ) = 1 2 [ log det ( Σ b ) det ( Σ a ) + tr ( Σ b 1 Σ a I p ) + ( log ( Q a ) Ω a q a log ( μ b ) ) Σ b 1 ( log ( Q a ) Ω a q a log ( μ b ) ) ] .
Proof. 
Take the quantile-based multivariate log-normal random vector U in Theorem 8 generated by α = ( 1 / 2 , , 1 / 2 ) . □
Corollary 10.
The KL divergence between T L N p ( μ a , Σ a ) and U L N p ( μ b , Σ b ) is given by
D K L ( T , U ) = 1 2 [ log det ( Σ b ) det ( Σ a ) + t r ( Σ b 1 Σ a I p ) + ( log ( μ a ) log ( μ b ) ) Σ b 1 ( log ( μ a ) log ( μ b ) ) ] .
Proof. 
Generate the quantile-based multivariate log-normal random vector T in Corollary 9 with α = ( 1 / 2 , , 1 / 2 ) . □
With the aim to derive results on marginal and conditional distributions and independence, relating sub-vectors of the random vector having a quantile-based multivariate log-normal distribution, we introduce notations for partitions of Y R + p , Q R + p , q R p , and Σ ( p × p ) > 0 as follows:
Y = ( Y 1 , Y 2 ) , Q = ( Q 1 , Q 2 ) , q = ( q 1 , q 2 ) , Σ = Σ 11 Σ 12 Σ 21 Σ 22 ,
where Y 1 R + p 1 , Y 2 R + p 2 , Q 1 R + p 1 , Q 2 R + p 2 , q 1 R p 1 , q 2 R p 2 , Σ 11 ( p 1 × p 1 ) > 0 , Σ 22 ( p 2 × p 2 ) > 0 , and Σ 12 ( p 1 × p 2 ) and Σ 21 ( p 2 × p 1 ) are such that Σ 12 = Σ 21 . The Schur complement of the block Σ 11 of Σ is given by Σ 22 · 1 = Σ 22 Σ 21 Σ 11 1 Σ 12 . Also, we define Ω 1 = ( Σ 11 I p 1 ) 1 / 2 , Ω 2 = ( Σ 22 I p 2 ) 1 / 2 , and Q 2 · 1 = D Q 2 exp ( Σ 21 Σ 11 1 ( log ( y 1 ) log ( Q 1 ) + Ω 1 q 1 ) ) . The dimension p is such that p = p 1 + p 2 .
In Lemma 3, we give a factorization of the PDF of the quantile-based multivariate log-normal distribution.
Lemma 3.
Consider the partitions given in (9). The PDF of Y Q L N p ( Q , Σ , q ) can be expressed as
Q L N p ( y ; Q , Σ , q ) = Q L N p 1 ( y 1 ; Q 1 , Σ 11 , q 1 ) Q L N p 2 ( y 2 ; Q 2 · 1 , Σ 22 · 1 , q 2 · 1 ) ,
where q 2 · 1 = Ω 2 · 1 1 Ω 2 q 2 , with Ω 2 · 1 = ( Σ 22 · 1 I p 2 ) 1 / 2 .
Proof. 
It suffices to show that
δ Σ ( log ( y ) , log ( Q ) Ω q ) = δ Σ 11 ( log ( y 1 ) , log ( Q 1 ) Ω 1 q 1 ) + δ Σ 22 · 1 ( log ( y 2 ) , log ( Q 2 · 1 ) Ω 2 · 1 q 2 · 1 ) .
The straightforward calculation shows that
δ Σ ( log ( y ) , log ( Q ) Ω q ) δ Σ 11 ( log ( y 1 ) , log ( Q 1 ) Ω 1 q 1 ) = ( log ( y 1 ) log ( Q 1 ) + Ω 1 q 1 ) Σ 11 1 Σ 12 Σ 22 · 1 1 Σ 21 Σ 11 1 ( log ( y 1 ) log ( Q 1 ) + Ω 1 q 1 ) 2 ( log ( y 1 ) log ( Q 1 ) + Ω 1 q 1 ) Σ 11 1 Σ 12 Σ 22 · 1 1 ( log ( y 2 ) log ( Q 2 ) + Ω 2 q 2 ) + ( log ( y 2 ) log ( Q 2 ) + Ω 2 q 2 ) Σ 22 · 1 1 ( log ( y 2 ) log ( Q 2 ) + Ω 2 q 2 ) .
Now, using the result
( log ( Q 2 · 1 ) Ω 2 · 1 q 2 · 1 ) ( log ( Q 2 ) Ω 2 q 2 ) = Σ 21 Σ 11 1 ( log ( y 1 ) log ( Q 1 ) + Ω 1 q 1 ) ,
we have
δ Σ ( log ( y ) , log ( Q ) Ω q ) δ Σ 11 ( log ( y 1 ) , log ( Q 1 ) Ω 1 q 1 ) = ( log ( y 2 ) log ( Q 2 · 1 ) + Ω 2 · 1 q 2 · 1 ) Σ 22 · 1 1 ( log ( y 2 ) log ( Q 2 · 1 ) + Ω 2 · 1 q 2 · 1 ) = δ Σ 22 · 1 ( log ( y 2 ) , log ( Q 2 · 1 ) Ω 2 · 1 q 2 · 1 ) ,
which is the desired result. □
In Theorem 9, we show that the quantile-based multivariate log-normal family is preserved under marginalization and conditioning. In this theorem, we also present a characterization of the independence between subvectors of this family.
Theorem 9.
Let Y Q L N p ( Q , Σ , q ) . Consider the partitions given in (9). Then,
  • Y 1 Q L N p 1 ( Q 1 , Σ 11 , q 1 ) .
  • Y 2 | Y 1 = y 1 Q L N p 2 ( Q 2 · 1 , Σ 22 · 1 , q 2 · 1 ) .
  • Y 1 and Y 2 are independent if and only if Σ 12 = 0 .
Proof. 
Statements 1 and 2 follow from the factorization given in (10). To prove the statement 3, note that Y 1 and Y 2 are independent if and only if
Q L N p ( y ; Q , Σ , q ) = Q L N p 1 ( y 1 ; Q 1 , Σ 11 , q 1 ) Q L N p 2 ( y 2 ; Q 2 , Σ 22 , q 2 ) ,
which, from (10), is satisfied if and only if Σ 12 = 0 . □

4. Parameter Estimation

The reparameterization used in Definition 1 permits us to compute the maximum likelihood estimates of the parameters of the quantile-based multivariate log-normal distribution through the maximum likelihood estimates of the parameters of the multivariate log-normal distribution. Let y 1 , , y n be the observed values of a random sample Y 1 , , Y n of Y Q L N p ( Q , Σ , q ) . We denote the maximum likelihood estimators of Q and Σ by Q ^ and Σ ^ , respectively. From (2), we have
Q ^ = D μ ^ exp ( Ω ^ q ) ,
where Ω ^ = ( Σ ^ I p ) 1 / 2 , and μ ^ and Σ ^ are the maximum likelihood estimators of the multivariate log-normal distribution given by (Morán-Vásquez et al. [14])
μ ^ = exp 1 n i = 1 n log ( y i ) , Σ ^ = 1 n i = 1 n ( log ( y i ) log ( μ ^ ) ) ( log ( y i ) log ( μ ^ ) ) .
Note that the maximum likelihood estimator of Σ in the quantile-based multivariate log-normal distribution is the same as in the multivariate log-normal distribution. Furthermore, this estimator is the same for any choice of q .
We assess the goodness of fit of the quantile-based multivariate log-normal distributions by using quantile–quantile plots, comparing the empirical Mahalanobis distances δ Σ ^ ( log ( y i ) , log ( Q ^ ) Ω ^ q ) , i = 1 , , n , with the theoretical quantiles δ α i 2 , where α i = i / ( n + 1 ) , i = 1 , , n , obtained from a chi-squared distribution with p degrees of freedom. Additionally, we plot simulated envelopes (Atkinson [19]) for the quantile–quantile plots in order to help the comparison between quantiles and judge the adequacy of the models.
To evaluate the estimation procedure, we conducted simulations with the quantile-based bivariate log-normal distribution. We consider the sample sizes of n = 50 , 100 , 500 , 1000 , and 10 , 000 Monte Carlo replicates. The random samples of Y Q L N p ( Q , Σ , q ) were generated through the following steps:
  • Generate a random sample x 1 , , x n of of X N p ( log ( Q ) Ω q , Σ ) .
  • Compute y 1 = exp ( x 1 ) , , y n = exp ( x n ) . Then, y 1 , , y n is a random sample of Y Q L N p ( Q , Σ , q ) .
The true parameters were yielded by fitting the quantile-based bivariate log-normal distribution to the children data set considered in Section 5. Table 1 reports the median and the interquartile range for the estimated values of the parameters of the investigated models. The medians get close to the true parameters and the interquartile range gets smaller as the sample size grows, indicating a satisfactory performance of the estimators. All the computations were conducted in the R software [20].

5. Application

Anthropometric measures are useful for monitoring the growth and identification of childhood developmental problems. The World Health Organization [21,22] provides quantile estimations of several children’s anthropometric characteristics as height, weight, and head and arm circumferences, among others. These estimations are obtained by fitting univariate models for each anthropometric measurement separately, ignoring the association between them. We use the quantile-based bivariate log-normal distribution to estimate quantiles of children’s weights (in kilograms) and heights (in centimeters), considering the natural correlation between them. We consider a sample of 587 children between 2 and 5 years of age collected at the year 2018 in the El Poblado neighborhood, located in Medellín, Colombia [23].
The bagplot in Figure 2 shows that children’s weights and heights are positively associated with slight joint skewedness and highlights two outliers. In order to estimate the third quartile of weight and the first quartile of height, we fitted the quantile-based bivariate log-normal distribution Y Q L N 2 ( Q , Σ , q ) , with α 1 = 0.75 ( q 1 = 0.67 ) and α 2 = 0.25 ( q 2 = 0.67 ) . The maximum likelihood estimates of the parameters are Q ^ 1 = 101.62 , Q ^ 2 = 13.101 , σ ^ 11 = 0.0062 , σ ^ 22 = 0.0314 , and σ ^ 12 = 0.0121 . Therefore, the third quartile of the children’s height is estimated to be 101.62 cm, and the first quartile of the children’s weight is estimated to be 13.101 kg. Since σ ^ 12 = 0.0121 , thereby the children’s weights and heights are estimated to be positively correlated, which is consistent with the descriptive analysis presented in Figure 2.
Figure 3 shows the quantile–quantile plot with simulated envelopes for the Mahalanobis distances for the fitted quantile-based bivariate log-normal distribution. This plot suggests a suitable fit.

6. Final Remarks

In this article, we have proposed a multivariate distribution with positive support derived by applying a parameterization of the multivariate log-normal distribution by using their marginal quantiles. This distribution will attract researchers in the area of quantile modeling for correlated multivariate positive skewed data. We derived a number of important statistical properties of this distribution involving the transformations, mixed moments, expected value, covariance matrix, mode, Shannon entropy, Kullback–Leibler divergence, marginalization, conditioning, and independence. Needless to say, the quantile-based multivariate log-normal distribution defined in this article is rich in theoretical properties and can easily be manipulated from a mathematical viewpoint. The parameter estimation was approached by using the maximum likelihood estimation method. The satisfactory behavior of the estimation procedure was verified through simulation studies. Also, a graphical diagnostic tool was employed in order to assess the quality of the fitted distributions. On the other hand, an application to real data is presented and discussed as an alternative for the quantile estimation of the children’s weights and heights, considering the natural association between these variables.
There are several aspects that will be addressed in future articles. Bayesian approaches for the estimation of the parameters of the quantile-based multivariate log-normal distribution will be developed. The study of regression models based on the quantile-based multivariate log-normal distributions together with inferential developments and applications to real data will also be undertaken. These models will allow us to analyze the relationship between marginal quantiles of response vectors and a set of explanatory variables, taking into account the potential association among the marginal response variables. Additionally, a comparative analysis of this methodology with the model proposed by Petrella and Raponi [12] will be included in a forthcoming article.

Author Contributions

Conceptualization, R.A.M.-V., A.R.-C. and D.K.N.; methodology, R.A.M.-V., A.R.-C. and D.K.N.; investigation, R.A.M.-V., A.R.-C. and D.K.N.; writing-original draft preparation, R.A.M.-V., A.R.-C. and D.K.N.; writing-review and editing, R.A.M.-V., A.R.-C. and D.K.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there are no conflict of interests regarding the publication of this article.

References

  1. Cade, B.S.; Noon, B.R. A gentle introduction to quantile regression for ecologists. Front. Ecol. Environ. 2003, 1, 412–420. [Google Scholar] [CrossRef]
  2. Yu, K.; Lu, Z.; Stander, J. Quantile regression: Applications and current research areas. J. R. Stat. Soc. Ser. D 2003, 52, 331–350. [Google Scholar] [CrossRef]
  3. Koenker, R.; Bassett, G., Jr. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  4. Ferrari, S.L.P.; Fumes, G. Box–Cox symmetric distributions and applications to nutritional data. Adv. Stat. Anal. 2017, 101, 321–344. [Google Scholar] [CrossRef] [Green Version]
  5. Gijbels, I.; Karim, R.; Verhasselt, A. Semiparametric quantile regression using family of quantile-based asymmetric densities. Comput. Stat. Data Anal. 2021, 157, 107–129. [Google Scholar] [CrossRef]
  6. Mazucheli, J.; Alves, B.; Menezes, A.; Leiva, V. An overview on parametric quantile regression models and their computational implementation with applications to biomedical problems including COVID-19 data. Comput. Methods Programs Biomed. 2022, 221, 106816. [Google Scholar] [CrossRef] [PubMed]
  7. Smithson, M.; Shou, Y. CDF-quantile distributions for modelling random variables on the unit interval. Br. J. Math. Stat. Psychol. 2017, 70, 412–438. [Google Scholar] [CrossRef] [PubMed]
  8. Breckling, J.; Chambers, R. M-Quantiles. Biometrika 1988, 75, 761–771. [Google Scholar] [CrossRef]
  9. Kong, L.; Mizera, I. Quantile tomography: Using quantiles with multivariate data. Stat. Sin. 2012, 22, 1589–1610. [Google Scholar] [CrossRef] [Green Version]
  10. McKeague, I.W.; López-Pintado, S.; Hallin, M.; Šiman, M. Analyzing growth trajectories. J. Dev. Orig. Health Dis. 2011, 2, 322–329. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Wei, Y. An approach to multivariate covariate-dependent quantile contours with application to bivariate conditional growth charts. J. Am. Stat. Assoc. 2008, 103, 397–409. [Google Scholar] [CrossRef]
  12. Petrella, L.; Raponi, V. Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress. J. Multivar. Anal. 2019, 173, 70–84. [Google Scholar] [CrossRef] [Green Version]
  13. Morán-Vásquez, R.A.; Ferrari, S.L.P. Box-Cox elliptical distributions with application. Metrika 2018, 82, 547–571. [Google Scholar] [CrossRef] [Green Version]
  14. Morán-Vásquez, R.A.; Mazo-Lopera, M.A.; Ferrari, S.L.P. Quantile modeling through multivariate log-normal/independent linear regression models with application to newborn data. Biom. J. 2021, 63, 1290–1308. [Google Scholar] [CrossRef] [PubMed]
  15. Saulo, H.; Dasilva, A.; Leiva, V.; Sánchez, L.; de la Fuente-Mella, H. Log-symmetric quantile regression models. Stat. Neerl. 2022, 76, 124–163. [Google Scholar] [CrossRef]
  16. Fang, K.T.; Kotz, S.; Ng, K.W. Symmetric Multivariate and Related Distributions; Chapman and Hall: London, UK, 1990. [Google Scholar]
  17. Seber, G.A.F. A Matrix Handbook for Staticians; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  18. Pardo, L. Statistical Inference Based on Divergence Measures; Chapman & Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar]
  19. Atkinson, A.C. Two graphical displays for outlying and influential observations in regression. Biometrika 1981, 68, 13–20. [Google Scholar] [CrossRef]
  20. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 21 July 2023).
  21. World Health Organization. WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-for-Height and Body Mass Index-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2006. Available online: https://apps.who.int/iris/handle/10665/43413 (accessed on 25 May 2023).
  22. World Health Organization. WHO Child Growth Standards: Head Circumference-for-Age, Arm Circumference-for-Age, Triceps Skinfold-for-Age and Subscapular Skinfold-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2007. Available online: https://apps.who.int/iris/handle/10665/43706 (accessed on 25 May 2023).
  23. MEData: Portal de datos de Medellín. Estado Nutricional de Menores de 6 años Programa de Crecimiento y Desarrollo. 2022. Available online: http://medata.gov.co/dataset/estado-nutricional-de-menores-de-6-anos-programa-de-crecimiento-y-desarrollo (accessed on 21 July 2023).
Figure 1. Contour plots at levels 0.15, 0.1, 0.05, 0.02, 0.01 of the joint PDF of Y Q L N 2 ( Q , Σ , q ) given in Definition 1, where (a) α 1 = α 2 = 0.5 , Q 1 = 0.8 , Q 2 = 1 , σ 11 = 0.8 ,   σ 22 = 1 , σ 12 = 0.5 , (b) α 1 = 0.25 , (c) Q 2 = 1.2 , (d) σ 11 = 1 , (e) σ 12 = 0 , (f) σ 12 = 0.7 .
Figure 1. Contour plots at levels 0.15, 0.1, 0.05, 0.02, 0.01 of the joint PDF of Y Q L N 2 ( Q , Σ , q ) given in Definition 1, where (a) α 1 = α 2 = 0.5 , Q 1 = 0.8 , Q 2 = 1 , σ 11 = 0.8 ,   σ 22 = 1 , σ 12 = 0.5 , (b) α 1 = 0.25 , (c) Q 2 = 1.2 , (d) σ 11 = 1 , (e) σ 12 = 0 , (f) σ 12 = 0.7 .
Symmetry 15 01513 g001
Figure 2. Bagplot of weight vs. height; children’s data.
Figure 2. Bagplot of weight vs. height; children’s data.
Symmetry 15 01513 g002
Figure 3. Quantile–quantile plot with simulated envelopes for the Mahalanobis distances for the fitted distribution.
Figure 3. Quantile–quantile plot with simulated envelopes for the Mahalanobis distances for the fitted distribution.
Symmetry 15 01513 g003
Table 1. Median (M) and interquartile range (IQR) of the parameter estimates of the quantile-based bivariate log-normal distributions.
Table 1. Median (M) and interquartile range (IQR) of the parameter estimates of the quantile-based bivariate log-normal distributions.
n = 50 n = 100 n = 500 n = 1000
ProbabilityTrue ParameterMIQRMIQRMIQRMIQR
α 1 = 0.75 Q 1 101.62101.591.6480101.591.1849101.600.5391101.610.3747
α 2 = 0.25 Q 2 13.10113.1150.492313.1080.337813.1020.155713.1030.1115
α 1 = 0.50 Q 1 96.36896.3731.439796.3651.000096.3650.458296.3670.3229
α 2 = 0.50 Q 2 14.76414.7680.496514.7660.341714.7630.162514.7640.1123
α 1 = 0.25 Q 1 91.39291.4291.516691.4061.050991.3970.478491.3960.3437
α 2 = 0.75 Q 2 16.64016.6320.619016.6340.433216.6350.201616.6360.1412
σ 11 0.00620.00620.00160.00620.00120.00620.00050.00620.0004
σ 22 0.03140.03130.00850.03130.00590.03130.00270.03130.0019
σ 12 0.01210.01210.00350.01210.00250.01210.00110.01210.0008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Morán-Vásquez, R.A.; Roldán-Correa, A.; Nagar, D.K. Quantile-Based Multivariate Log-Normal Distribution. Symmetry 2023, 15, 1513. https://doi.org/10.3390/sym15081513

AMA Style

Morán-Vásquez RA, Roldán-Correa A, Nagar DK. Quantile-Based Multivariate Log-Normal Distribution. Symmetry. 2023; 15(8):1513. https://doi.org/10.3390/sym15081513

Chicago/Turabian Style

Morán-Vásquez, Raúl Alejandro, Alejandro Roldán-Correa, and Daya K. Nagar. 2023. "Quantile-Based Multivariate Log-Normal Distribution" Symmetry 15, no. 8: 1513. https://doi.org/10.3390/sym15081513

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop