Next Article in Journal
Volatility Contagion from Bulk Shipping and Petrochemical Industries to Oil Futures Market during the Economic Uncertainty
Previous Article in Journal
A Review of Mathematical Models Used to Estimate Wheeled and Tracked Unmanned Ground Vehicle Kinematics and Dynamics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Quantile Estimation Using the Log-Skew-Normal Linear Regression Model with Application to Children’s Weight Data

by
Raúl Alejandro Morán-Vásquez
1,*,†,
Anlly Daniela Giraldo-Melo
1,† and
Mauricio A. Mazo-Lopera
2,†
1
Instituto de Matemáticas, Universidad de Antioquia, Calle 67 No. 53-108, Medellín 050010, Colombia
2
Escuela de Estadística, Universidad Nacional de Colombia, Carrera 65 No. 59A-110, Medellín 050034, Colombia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(17), 3736; https://doi.org/10.3390/math11173736
Submission received: 27 June 2023 / Revised: 26 August 2023 / Accepted: 29 August 2023 / Published: 30 August 2023
(This article belongs to the Section Probability and Statistics)

Abstract

:
In this article, we establish properties that relate quantiles of the log-skew-normal distribution to its parameters, allowing us to investigate the relationship between quantiles of a positive skewed response variable and a set of explanatory variables via the log-skew-normal linear regression model. We compute the maximum likelihood estimates of the parameters through a correspondence between the log-skew-normal and skew-normal linear regression models. Monte Carlo simulations show the satisfactory performance of the quantile estimators. An application to children’s data is presented and discussed.

1. Introduction

Regression models with a continuous, positive and skewed response have been widely employed in applied sciences such as physics, economics, biology, medicine and engineering, among others. A possible approach when dealing with a positive response is to fit the Box–Cox transformed observations using a symmetric linear regression model. In this approach, the parameters of the model are interpretable in terms of the transformed observations (not the original response), which is a disadvantage from a statistical modeling viewpoint. In addition, traditional regression procedures for analyzing positive skewed responses focus on the conditional mean under a non-normal distribution assumption, such as gamma or inverse Gaussian. These models are not suitable in situations where it is necessary to emphasize the entire distribution of the response. For example, in pediatric studies it is of interest to predict the children’s weight in order to classify them in a category that determines their health and growth status. These categories are established from quantiles of the children’s weight depending on covariates such as age and gender and are typically illustrated through growth charts. Quantile regression (QR) procedures allow these situations to be attended to, since they predict changes for the entire distribution of a response.
The seminal work on QR is due to Koenker and Bassett [1], who expressed the λ th conditional quantile, 0 < λ < 1 , of a response variable Y as a function of an r-dimensional vector of explanatory variables x as
Q y ( λ | x ) = x β ( λ ) ,
where the parameter vector β ( λ ) R r is estimated by solving   
min β R r i = 1 n ρ λ ( y i x i β ) ,
with ρ λ being the function ρ λ ( u ) = λ u if u 0 and ρ λ ( u ) = ( λ 1 ) u if u < 0 . This procedure is non-parametric and provides robust estimates to outliers in the response. Recently, several parametric approaches to quantile modeling have been studied in the regression literature. Some parametric QR models have been obtained from reparameterized distributions using their quantile functions (Mazucheli et al.  [2]). Another approach is based on proportionality relationships between quantiles and parameters of distributions, allowing the relationship between covariates and quantiles of a response variable to be modeled (Morán-Vásquez and Ferrari [3], Morán-Vásquez et al. [4]). In the present paper, we use this approach with the log-skew-normal (LSN) distribution, which can be obtained from the skew-normal (SN) distribution (Azzalini and Capitanio [5]) and has the log-normal (LN) distribution (Morán-Vásquez and Ferrari [3], Morán-Vásquez et al. [4,6]) as a special case.
The SN distribution has received growing attention in recent years and has been used in a wide variety of applications, including time series (Genton and Thompson [7]), Bayesian analysis (Liseo and Loperfido [8]), regression analysis (Chai and Bailey [9]), spatial statistics (Allard and Naveau [10]) and graphical models (Capitanio et al. [11]), among others. The importance of the SN family lies in its flexibility to describe skewness in the data through a shape parameter. However, for positive data this distribution may not be suitable for statistical modeling purposes because its support is the real line. For this type of data, the LSN distribution may be an alternative since it has positive support (Azzalini et al. [12], Marchenko and Genton [13]). The LSN family is closely related to the SN family and is useful for handling skewness in positive data via a shape parameter.
In this article, we derive additional properties to those established by Marchenko and Genton [13] that relate quantiles to parameters of LSN distribution. Based on these facts, we propose and study the log-skew-normal linear regression model (LSNLRM), allowing us to analyze the relationship between the quantiles of a positive skewed response variable and a set of explanatory variables, making it easy to interpret the regression coefficients. This study of LSNLRM is motivated by the need to have alternative parametric models to estimate the relationship between explanatory variables and any quantile of a positive skewed response variable. This type of model has been used by the World Health Organization [14,15] to provide reference quantile curves of children’s anthropometric variables according to age and gender.
We establish a correspondence between the LSNLRM and the skew-normal linear regression model (SNLRM), and, based on it, we calculate the maximum likelihood estimates of the parameters. We verify using simulation studies the performance of our regression model coupled with the quantile estimation procedure and present comparisons with classical QR. We display quantile–quantile plots with simulated envelopes as a graphical diagnostic to assess the model fitting. We illustrate the usefulness of the proposed regression model in the construction of reference quantile curves for children’s weight according to age and gender. These types of growth curves are widely used in medicine to track growth and form an overall picture of children’s health. We show the importance of handling skewness in the response by means of a comparison between the LSNLRM and the log-normal linear regression model (LNLRM) (Morán-Vásquez et al. [4]).
This paper is organized as follows. In Section 2, we derive properties that relate quantiles of the LSN distribution to its parameters. In Section 3, we study the LSNLRM and propose a quantile estimation methodology. The parameter estimation based on the maximum likelihood estimation method is described, and a graphical tool to evaluate the goodness of fit is presented. Section 4 is devoted to results of simulation studies. In Section 5, a practical use of the LSNLRM is illustrated through the construction of anthropometric growth charts.

2. Connections between Parameters of the Log-Skew-Normal Distribution and Quantiles

In this section, we define the LSN distribution and derive properties that relate some of its parameters to quantiles.
A random variable X R has an SN distribution with location parameter ξ R , scale parameter ω > 0 and shape parameter α R , denoted by X SN ( ξ , ω 2 , α ) , if its probability density function (PDF) is
SN ( x ; ξ , ω , α ) = 2 ω φ x ξ ω Φ α x ξ ω , x R ,
where φ and Φ are the PDF and the cumulative distribution function of a standard normal random variable, respectively.
The SN distribution has important theoretical properties and it is easily manipulated from a mathematical viewpoint, which has motivated its application in a wide variety of areas. The reason is that the parameter α allows the skewness in the data to be handled. The well-known PDF of a normal distribution is obtained from (2) when α = 0 . However, if X SN ( ξ , ω 2 , α ) , then
( X ξ ) 2 ω 2 χ 1 2 .
This result is of central relevance for the evaluation of the goodness of fit of the LSNLRM (Section 3). A detailed study of the SN distribution can be found in Azzalini and Capitanio [5].
In Definition 1, we present the LSN distribution, which is useful for modeling skewed positive data (Azzalini et al. [12], Marchenko and Genton [13]).
Definition 1. 
A positive random variable Y has an LSN distribution with scale parameter ξ > 0 , relative dispersion parameter ω > 0 and shape parameter α R , if log ( Y ) SN ( log ( ξ ) , ω 2 , α ) . We write Y LSN ( ξ , ω 2 , α ) .
The PDF of Y LSN ( ξ , ω 2 , α ) is given by
LSN ( y ; ξ , ω , α ) = 2 y ω φ log ( y / ξ ) ω Φ α log ( y / ξ ) ω , y > 0 .
The LSN distribution given in (4) has a slightly different parameterization than the one used by Marchenko and Genton [13]. Our parametrization facilitates the interpretation of the regression coefficients of the LSNLRM in the estimation of the quantiles of a response variable (Section 3).
For α = 0 in (4), we obtain the PDF of an LN distribution (Morán-Vásquez and Ferrari [3], Morán-Vásquez et al. [4,6]).
Figure 1 displays different shapes of the PDF of Y LSN ( ξ , ω 2 , α ) according to the values of its parameters. Note that the parameter ξ affects the scale of the distribution of Y (Figure 1a). The dispersion of the distribution of Y is controlled by the parameter ω  (Figure 1b). The parameter α impacts the skewness of the distribution of Y (Figure 1c). The parameters ξ and ω are related to quantiles of Y, making the LSN distribution attractive for regression modeling purposes. In order to establish these relationships, we present in Theorem 1 a closed-form expression for the λ -quantile, y λ , of Y, with λ ( 0 , 1 ) .
Theorem 1. 
Let Y LSN ( ξ , ω 2 , α ) . The λ-quantile y λ of Y, λ ( 0 , 1 ) , satisfies
y λ = ξ exp ( ω q λ ) ,
where q λ is the λ-quantile of Z SN ( 0 , 1 , α ) .
Proof. 
Note that y λ is obtained from
P Z log ( y λ ) log ( ξ ) ω = λ ,
where Z SN ( 0 , 1 , α ) . Hence, ( log ( y λ ) log ( ξ ) ) / ω = q λ , where q λ is the λ -quantile of Z SN ( 0 , 1 , α ) . This shows the result.  □
From (5), it is noteworthy that all the quantiles of Y LSN ( ξ , ω 2 , α ) are proportional to the parameter  ξ . This feature of the LSN distribution allows any quantile of Y to be related to a set of explanatory variables by considering a regression structure on ξ  (Section 3).
Theorem 1 also allows us to obtain an interpretation of the quantiles related to parameter ω . For this, we define a quantile-based coefficient of variation for Y as (Rigby and Stasinopoulos [16])
CV Y = 3 4 y 3 / 4 y 1 / 4 y 1 / 2 .
By replacing (5) in (6), we obtain
CV Y = 3 4 exp ( q 3 / 4 ω ) exp ( q 1 / 4 ω ) exp ( q 1 / 2 ω ) .
Hence, CV Y depends on ω through a monotonically increasing function. Then, ω can be seen as a relative dispersion parameter of the distribution of Y.

3. Quantile Modeling through the LSNLRM

In this section, we define the LSNLRM, which together with the properties established in Section 2 allows us to investigate the relationship between explanatory variables and any quantile of a positive response variable, taking into account the potential skewness of the response.
Let Y 1 , , Y n be independent random variables that represent observations of Y > 0 on n individuals. The LSNLRM is defined as
Y i ind LSN ( ξ i , ω 2 , α ) , log ( ξ i ) = x i β ,
for i = 1 , , n , where x i = ( x i 1 , , x i r ) is a constant vector with the measurements of the ith individual on the explanatory variables x 1 , , x r . So, x i j is the observed value of the ith individual in the jth explanatory variable, ξ i > 0 represents the scale parameter of Y i , ω > 0 is the relative dispersion parameter, α R is the shape parameter, and β = ( β 1 , , β r ) is the vector of regression coefficients. We assume that x i 1 = 1 , i = 1 , , n , j = 1 , , r . The number of parameters in (7) is k = r + 2 .
The LNLRM is a particular case of the LSNLRM when α = 0 . This model can also be obtained as a particular case of the regression models studied in Morán-Vásquez et al. [4] and Vanegas and Paula [17].
As a consequence of Theorem 1, the λ -quantile y λ of Y, λ ( 0 , 1 ) , under the regression model (7), is related to the explanatory variables through the expression
y λ = exp j = 1 r β j x j + ω q λ ,
where q λ is the λ -quantile of Z SN ( 0 , 1 , α ) . Hence, exp ( β j ) is the multiplicative effect on y λ when x j is increased by one unit, keeping the other explanatory variables fixed. Note that this exponential effect is the same for all the quantiles.
From (8), y λ varies according to q λ weighted by the relative dispersion parameter  ω . Therefore, to estimate y λ 1 , , y λ m it is neccesary to obtain a single fit of the model (7) and the separate computation of q λ 1 , , q λ m . In contrast, to estimate several quantiles using the classical QR given in (1), one must carry out a fit for each one.
Using Definition 1, it is straightforward to show that the model (7) is equivalent to log ( Y i ) ind SN ( x i β , ω 2 , α ) . Hence, the parameters involved in the LSNLRM can be estimated through an SNLRM with a log ( Y ) response.
Let y 1 , , y n be the observed values of the independent random variables Y 1 , , Y n from (7). The log-likelihood function of θ = ( β , ω , α ) is given by ( θ ) = i = 1 n i ( θ ) , where
i ( θ ) = C log ( ω ) ( log ( y i ) x i β ) 2 2 ω 2 + log Φ α log ( y i ) x i β ω ,
for i = 1 , , n , where C is a constant that does not depend on θ . The maximum likelihood estimator of θ , denoted by θ ^ = ( β ^ , ω ^ , α ^ ) , does not have a closed form. The package sn (Azzalini [18]) in R provides the implementation of several numerical optimization methods to find the maximum likelihood estimates of θ . This package also computes the estimated asymptotic standard errors for the parameters based on the Fisher information matrix described in Azzalini ([5] Section 3.1), which is useful for inferential procedures about the parameters.
Let α ( 0 , 1 ) . An asymptotic 100 ( 1 α ) % confidence interval for the regression coefficient β j is given by β ^ j ± z 1 α / 2 S E ^ ( β ^ j ) , where z 1 α / 2 is the 100 ( 1 α / 2 ) th percentile of a standard normal distribution, and S E ^ ( β ^ j ) is the estimated asymptotic standard error for β ^ j . Using this confidence interval, we can quantify the effect of the explanatory variable  x j on the response variable Y. Moreover, the Wald statistic W = ( β ^ j / S E ^ ( β ^ j ) ) 2 can be used to test the null hypothesis H 0 : β j = 0 against the alternative hypothesis H 1 : β j 0 . The asymptotic distribution of the Wald statistic under the null hypothesis is  χ 1 2 .
For model selection purposes, we consider the Akaike information criterion (AIC) (Akaike [19]) and the Bayesian information criterion (BIC) (Schwarz [20]), which are given by
AIC = 2 ( θ ^ ) + 2 k , BIC = 2 ( θ ^ ) + k log ( n ) ,
respectively. The model having the smallest AIC or BIC value is claimed to provide the best fit. Additionally, we assess the adequacy of the LSNLRM using quantile–quantile plots, comparing the observed square normalized residuals r ^ i 2 = ( log ( y i ) log ( ξ ^ i ) ) 2 / ω ^ 2 , i = 1 , , n , with the theoretical quantiles r α i 2 , where α i = i / ( n + 1 ) , i = 1 , , n , sampled from an χ 2 distribution with one degree of freedom; see (3). Additionally, we construct simulated envelopes for the quantile–quantile plots to facilitate the comparison between quantiles and judge the adequacy of the models. A simulated envelope is a graphical procedure for detecting large deviations due to an outlier or a lack of fit (Atkinson [21]), enabling a better comparison between the observed square normalized residuals and the quantiles of an χ 2 distribution with one degree of freedom. The construction of simulated envelopes is carried out through the following steps:
  • Draw a random sample of size n, say z 1 , , z n , of Z SN ( 0 , 1 , α ^ ) .
  • Compute r i 2 = z i 2 , i = 1 , , n .
  • Repeat steps 1 and 2 m times, to obtain the square normalized residuals r i j 2 , i = 1 , , n , j = 1 , , m .
  • For each i = 1 , , n , compute r ( i ) L 2 = min { r i 1 2 , , r i m 2 } and r ( i ) U 2 = max { r i 1 2 , , r i m 2 } . Thus, the bounds of the ith square normalized residual are given by r ( i ) L 2 and r ( i ) U 2 .
The code to reproduce the plots and numerical results presented in this paper can be found at https://github.com/moranvasquez/LSNLRM (accessed on 18 August 2023).

4. Simulation Studies

In this section, we present simulation studies to evaluate the estimation methodology of the LSNLRM described in Section 3 and the performance of the quantile estimation procedure given in (8) compared with classical QR.
We conducted simulations with the model
Y i i n d LSN ( ξ i , ω 2 , α ) , log ( ξ i ) = β 1 + β 2 x i 2 + β 3 x i 3 ,
for i = 1 , , n . We consider different sample sizes, namely n = 50 , 100 , 500 , 1000 and 10,000 Monte Carlo replicates. The true parameters were obtained by fitting the LSNLRM to the children data set described in Section 5. The response variable, Y, simulates the weight of the children (in kilograms); the explanatory variables, x 2 (gender; 0 for female, 1 for male) and x 3 , (age, in years) were generated as random draws from a Bernoulli distribution (with probability of success 0.5 ) and uniform distribution (with lower limit 2 and upper limit 5), respectively. The needed optimizations were performed using the default method of the selm.fit function from the sn package in R, namely, nlminb. As initial values of the parameter estimates, we use those provided by default in the selm.fit function, whose detailed computation can be found in Azzalini ([5] Section 3.1.7).
Table 1 reports the median and the median absolute deviation (MAD) of the parameter estimates of the model (9). The medians are close to the true parameters and the MAD becomes smaller as the sample size grows, indicating a satisfactory performance of the estimators.
Table 2 shows comparisons between our quantile estimation method and classical QR. For both models, we estimate y λ , x 2 , where y λ , 0 and y λ , 1 denote the λ -quantile of Y when x 2 = 0 and x 2 = 1 , respectively. The true values of the quantiles were calculated for each value of x 2 by replacing in (8) the true parameters and the explanatory variable x 3 by the mean of the simulated observations. It can be seen that the quantile estimators of both models have suitable behavior since the medians are close to the true quantiles and the MAD decreases as n grows. Nevertheless, it is remarkable that in all simulated scenarios the MADs of the estimates of our model are smaller than those provided by the classical QR. As a result, based on the MAD as a performance measure, our quantile estimation method has a better performance than the classical QR.

5. Construction of Anthropometric Growth Charts Using the LSNLRM

Anthropometric growth charts are useful tools to track growth and form an overall health picture for the child being measured. The World Health Organization [14,15] provides a wide variety of reference quantile curves to describe the dependence of several children’s anthropometric characteristics on age according to gender, such as height, weight, and head and arm circumferences, among others. The construction of these curves was based on samples with healthy breastfed infants and young children from diverse ethnic backgrounds and cultural settings, collected by the Multicentre Growth Reference Study between 1997 and 2003 (de Onis et al. [22]). Particularly, weight-for-age is one of the most employed growth charts for monitoring changes in health or nutritional status in children. In contrast to the anthropometric references provided by the World Health Organization, some authors propose growth curves for infants that take into account the particularities of developing countries (Khadilkar and Khadilkar [23]). It is also recommended to consider certain age ranges, within which the range of 2 to 5 years stands out as a period of transition in growth (Khadilkar et al. [24], Grummer-Strawn et al. [25]).
We use the LSNLRM to construct reference quantile curves for children’s weights according to gender and age. We consider a sample of 3663 children (1728 girls and 1935 boys) between 2 and 5 years of age collected during the year 2018 at the Buenos Aires neighborhood, located in the Medellín municipality, department of Antioquia, Colombia [26]. The response variable, Y, is the child’s weight (in kilograms), and the explanatory variables are the gender (G; 0 for female, 1 for male) and age (A; in years).
Figure 2 shows the comparative boxplots of the children’s weight according to the gender for six age intervals. It can be seen that the empirical quartiles of weight are affected by both age and gender. Furthermore, for each age interval, the weights of the boys and girls have a slight positive skewness and outliers. To investigate these relationships, we fit the LSNLRM
Y i i n d LSN ( ξ i , ω 2 , α ) , log ( ξ i ) = β 1 + β 2 G i + β 3 A i ,
for i = 1 , , 3663 . In order to see the importance of controlling the skewness in the response through the parameter α , we consider the LNLRM, in contrast to the LSNLRM when α is estimated. Figure 3 displays the quantile–quantile plots with simulated envelopes for the square normalized residuals for each model. There are 273 points outside the envelope associated with the LNLRM (Figure 3a), while there are no points outside the envelope for the LSNLRM (Figure 3b). Table 3 shows the AIC and BIC for each fit. Based on the AIC, BIC and Figure 3, we conclude that the LSNLRM provides a better fit when compared with the LNLRM.
Table 4 gives the maximum likelihood estimates of the regression coefficients, asymptotic standard errors (SEs), lower and upper bounds of the 95 % confidence intervals and the p-value of the Wald test for testing H 0 : β j = 0 against H 1 : β j 0 , j = 1 , 2 , 3 of the model (10). It is evident that all explanatory variables are significant in the model. The maximum likelihood estimates and standard errors (in parenthesis) for the relative dispersion and shape parameters are ω ^ = 0.1601 ( 0.0043 ) and α ^ = 1.2961 ( 0.1141 ) , respectively.
Figure 4 provides seven fitted quantile curves (in the 3th, 5th, 25th, 50th, 75th, 95th and 97th percetiles), given by (8), plotted with the data, for children’s weight vs. age, according to gender.

6. Final Remarks

In this article, we showed that all the quantiles of an LSN random variable are proportional to its scale parameter, ξ . This allowed us to relate the parameter ω to a robust coefficient of variation, making possible an interpretation of this parameter in terms of relative dispersion. This property motivated us to define and study the LSNLRM in order to analyze the relationship between quantiles of a positive response variable and a set of explanatory variables, taking into account the potential skewness of the response. Our proposal to estimate quantiles allows easy parameter interpretation, an attractive feature for statistical modeling purposes. We stated a correspondence between the LSNLRM and SNLRM and discussed maximum likelihood estimation issues, inferential procedures and the evaluation of the model fitting. Monte Carlo simulations suggested a suitable performance of our quantile estimation procedure and exhibited more accurate estimates than the traditional QR. An application to the construction of anthropometric growth charts was presented and discussed. The plots and numerical results presented in this paper are reproducible using the code available at https://github.com/moranvasquez/LSNLRM (accessed on 18 August 2023).
Future research related to the LSNLRM will focus on additional diagnostic procedures, incomplete data and mixed models, as well as its multivariate extensions to model the relationship between covariates and quantiles of positive skewed response vectors. In addition, it is interesting to study the approach described in Mazucheli et al.  [2] for the LSN family, as well as for other distributions recently proposed in the statistical literature (MirMostafaee et al. [27], Tamandi et al. [28], Reyes and Iriarte [29]), including Bayesian and score-adjusted approaches to parameter estimation and inferential procedures (MirMostafaee et al. [30], Ren and Hu [31], Nawa and Nadarajah [32]).

Author Contributions

Conceptualization, R.A.M.-V., A.D.G.-M. and M.A.M.-L.; methodology, R.A.M.-V., A.D.G.-M. and M.A.M.-L.; software, R.A.M.-V., A.D.G.-M. and M.A.M.-L.; investigation, R.A.M.-V., A.D.G.-M. and M.A.M.-L.; writing—original draft preparation, R.A.M.-V., A.D.G.-M. and M.A.M.-L.; writing—review and editing, R.A.M.-V., A.D.G.-M. and M.A.M.-L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that there are no conflict of interest regarding the publication of this article.

References

  1. Koenker, R.; Bassett, G., Jr. Regression quantiles. Econometrica 1978, 46, 33–50. [Google Scholar] [CrossRef]
  2. Mazucheli, J.; Alves, B.; Menezes, A.; Leiva, V. An overview on parametric quantile regression models and their computational implementation with applications to biomedical problems including COVID-19 data. Comput. Methods Programs Biomed. 2022, 221, 106816. [Google Scholar] [CrossRef]
  3. Morán-Vásquez, R.A.; Ferrari, S.L.P. Box-Cox elliptical distributions with application. Metrika 2018, 82, 547–571. [Google Scholar] [CrossRef]
  4. Morán-Vásquez, R.A.; Mazo-Lopera, M.A.; Ferrari, S.L.P. Quantile modeling through multivariate log-normal/independent linear regression models with application to newborn data. Biom. J. 2021, 63, 1290–1308. [Google Scholar] [CrossRef]
  5. Azzalini, A. The Skew-Normal and Related Families; Cambridge University Press: Cambridge, UK, 2014. [Google Scholar]
  6. Morán-Vásquez, R.A.; Roldán-Correa, A.; Nagar, D.K. Quantile-Based Multivariate Log-Normal Distribution. Symmetry 2023, 15, 1513. [Google Scholar] [CrossRef]
  7. Genton, M.G.; Thompson, K.R. Skew-elliptical time series with application to flooding risk. In Time Series Analysis and Applications to Geophysical Systems; Springer: Berlin/Heidelberg, Germany, 2004; pp. 169–185. [Google Scholar]
  8. Liseo, B.; Loperfido, N. A Bayesian interpretation of the multivariate skewnormal distribution. Stat. Probab. Lett. 2003, 61, 395–401. [Google Scholar] [CrossRef]
  9. Chai, H.S.; Bailey, K.R. Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 2008, 27, 3643–3655. [Google Scholar] [CrossRef]
  10. Allard, D.; Naveau, P. A new spatial skew-normal random field model. Commun. Stat. Theory Methods 2007, 36, 1821–1834. [Google Scholar] [CrossRef]
  11. Capitanio, A.; Azzalini, A.; Stanghellini, E. Graphical models for skew-normal variates. Scand. J. Stat. 2003, 30, 129–144. [Google Scholar] [CrossRef]
  12. Azzalini, A.; dal Cappello, T.; Kotz, S. Log-skew-normal and log-skew-t distributions as models for family income data. J. Income Distrib. 2002, 11, 12–20. [Google Scholar] [CrossRef]
  13. Marchenko, Y.; Genton, M. Multivariate log-skew-elliptical distributions with applications to precipitation data. Environmetrics 2010, 21, 318–340. [Google Scholar] [CrossRef]
  14. World Health Organization. WHO Child Growth Standards: Length/Height-for-Age, Weight-for-Age, Weight-for-Length, Weight-for-Height and Body Mass Index-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2006; Available online: https://apps.who.int/iris/handle/10665/43413 (accessed on 18 August 2023).
  15. World Health Organization. WHO Child Growth Standards: Head Circumference-for-Age, Arm Circumference-for-Age, Triceps Skinfold-for-Age and Subscapular Skinfold-for-Age: Methods and Development; World Health Organization: Geneva, Switzerland, 2007; Available online: https://apps.who.int/iris/handle/10665/43706 (accessed on 18 August 2023).
  16. Rigby, R.A.; Stasinopoulos, D.M. Using the Box-Cox t distribution in GAMLSS to model skewness and kurtosis. Stat. Model. 2006, 6, 209–229. [Google Scholar] [CrossRef]
  17. Vanegas, L.H.; Paula, G.A. A semiparametric approach for joint modeling of median and skewness. Test 2015, 24, 110–135. [Google Scholar] [CrossRef]
  18. Azzalini, A. The R Package sn: The Skew-Normal and Related Distributions Such as the Skew-t and the SUN (Version 2.1.0). 2022. Available online: https://cran.r-project.org/package=sn (accessed on 18 August 2023).
  19. Akaike, H. Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
  20. Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  21. Atkinson, A.C. Two graphical displays for outlying and influential observations in regression. Biometrika 1981, 68, 13–20. [Google Scholar] [CrossRef]
  22. de Onis, M.; Garza, C.; Victora, C.G.; Bhan, M.K.; Norum, K.R. WHO Multicentre Growth Reference Study (MGRS): Rationale, planning and implementation. Food Nutr. Bull. 2004, 25 (Suppl. S1), S1–S89. [Google Scholar] [CrossRef]
  23. Khadilkar, V.; Khadilkar, A. Growth charts: A diagnostic tool. Indian J. Endocrinol. Metab. 2011, 15 (Suppl. S3), S166–S171. [Google Scholar] [CrossRef]
  24. Khadilkar, V.V.; Khadilkar, A.V.; Chiplonkar, S.A. Growth performance of affluent Indian preschool children: A comparison with the new WHO growth standard. Indian Pediatr. 2010, 47, 869–872. [Google Scholar] [CrossRef]
  25. Grummer-Strawn, L.M.; Reinold, C.; Krebs, N.F. Use of World Health Organization and CDC growth charts for children aged 0–59 months in the United States. Centers Dis. Control. Prev. CDC 2010, 59, 1–14. [Google Scholar]
  26. MEData: Portal de datos de Medellín. Estado Nutricional de Menores de 6 Años Programa de Crecimiento y Desarrollo. 2022. Available online: http://medata.gov.co/dataset/estado-nutricional-de-menores-de-6-anos-programa-de-crecimiento-y-desarrollo (accessed on 18 August 2023).
  27. MirMostafaee, S.M.T.K.; Mahdizadeh, M.; Lemonte, A.J. The Marshall-Olkin Extended Generalized Rayleigh distribution: Properties and applications. Commun. Stat.-Theory Methods 2017, 46, 653–671. [Google Scholar] [CrossRef]
  28. Tamandi, M.; Jamalizadehb, A.; Mahdizadeh, M. A generalized Birnbaum-Saunders distribution with application to the air pollution data. Electron. J. Appl. Stat. Anal. 2019, 12, 26–43. [Google Scholar]
  29. Reyes, J.; Iriarte, Y.A. A New Family of Modified Slash Distributions with Applications. Mathematics 2023, 11, 3018. [Google Scholar] [CrossRef]
  30. MirMostafaee, S.M.T.K.; Mahdizadeh, M.; Aminzadeh, M. Bayesian inference for the Topp-Leone distribution based on lower k-record values. Jpn. J. Ind. Appl. Math. 2016, 33, 637–669. [Google Scholar] [CrossRef]
  31. Ren, H.; Hu, X. Bayesian Estimations of Shannon Entropy and Rényi Entropy of Inverse Weibull Distribution. Mathematics 2023, 11, 2483. [Google Scholar] [CrossRef]
  32. Nawa, V.M.; Nadarajah, S. New Closed Form Estimators for the Beta Distribution. Mathematics 2023, 11, 2799. [Google Scholar] [CrossRef]
Figure 1. PDF of Y LSN ( ξ , ω 2 , α ) , where (a) ω = 0.3 , α = 1 , ξ = 1 , 3 , 5 , 7 ; (b) ξ = 3 , α = 1 , ω = 0.2 , 0.3 , 0.4 , 0.5 ; (c) ξ = 4 , ω = 0.4 , α = 2 , 0 , 2 , 5 .
Figure 1. PDF of Y LSN ( ξ , ω 2 , α ) , where (a) ω = 0.3 , α = 1 , ξ = 1 , 3 , 5 , 7 ; (b) ξ = 3 , α = 1 , ω = 0.2 , 0.3 , 0.4 , 0.5 ; (c) ξ = 4 , ω = 0.4 , α = 2 , 0 , 2 , 5 .
Mathematics 11 03736 g001
Figure 2. Comparative boxplots of children’s weight for age by gender; children’s data.
Figure 2. Comparative boxplots of children’s weight for age by gender; children’s data.
Mathematics 11 03736 g002
Figure 3. Quantile–quantile plots of square normalized residuals with simulated envelope for (a) LNLRM and (b) LSNLRM.
Figure 3. Quantile–quantile plots of square normalized residuals with simulated envelope for (a) LNLRM and (b) LSNLRM.
Mathematics 11 03736 g003
Figure 4. Fitted quantile curves (in the 3th, 5th, 25th, 50th, 75th, 95th and 97th percetiles) for weight by age; children’s data: (a) for girls; (b) for boys.
Figure 4. Fitted quantile curves (in the 3th, 5th, 25th, 50th, 75th, 95th and 97th percetiles) for weight by age; children’s data: (a) for girls; (b) for boys.
Mathematics 11 03736 g004
Table 1. Median and MAD of the parameter estimates; LSNLRM.
Table 1. Median and MAD of the parameter estimates; LSNLRM.
True Parameter n = 50 n = 100 n = 500 n = 1000
Median MAD Median MAD Median MAD Median MAD
β 1 2.08362.11050.08652.09640.05302.08560.02122.08480.0142
β 2 0.02390.02350.02490.02360.01720.02400.00750.02390.0053
β 3 0.14230.14270.01360.14240.00890.14240.00440.14240.0030
ω 0.16010.16300.02210.15930.01690.15920.00850.15960.0060
α 1.29611.38241.35921.32070.66671.29900.23341.29570.1640
Table 2. Median and MAD of estimated 3th, 5th, 25th, 50th, 75th, 95th and 97th percentiles; LSNLRM and classical QR.
Table 2. Median and MAD of estimated 3th, 5th, 25th, 50th, 75th, 95th and 97th percentiles; LSNLRM and classical QR.
True Quantile n = 50 n = 100 n = 500 n = 1000
LSNLRM QR LSNLRM QR LSNLRM QR LSNLRM QR
Median MAD Median MAD Median MAD Median MAD Median MAD Median MAD Median MAD Median MAD
y 0.03 , 0 11.64612.1780.30712.2260.43311.6310.19511.7250.28211.9380.09112.0440.14311.7670.06411.8580.100
y 0.05 , 0 11.95212.4690.27212.5930.36311.9240.17911.9970.24912.2500.08412.3450.12412.0750.06012.1590.088
y 0.25 , 0 13.33913.8190.23913.9080.28513.2690.15613.3930.18813.6640.07413.7670.09513.4740.05313.5720.066
y 0.50 , 0 14.45714.9340.26815.0730.29814.3540.16914.5080.19914.8040.08114.9220.09814.6010.05814.7140.069
y 0.75 , 0 15.74416.2330.30016.4270.38015.6160.18815.7900.24416.1170.09316.2500.12715.8990.06716.0270.087
y 0.95 , 0 17.97018.4650.49918.5190.68017.7940.33018.0320.47418.3890.15318.5350.24518.1420.10718.2950.169
y 0.97 , 0 18.58619.0680.60019.1910.86418.3900.39418.5620.58119.0160.18019.1270.29518.7630.12718.8980.210
y 0.03 , 1 11.96412.4310.32412.5900.46511.9180.22412.0980.32412.0210.09112.0920.14012.0210.06312.1150.099
y 0.05 , 1 12.27812.7320.29312.9280.38812.2200.20612.3890.28312.3340.08412.4190.12012.3360.05812.4370.086
y 0.25 , 1 13.70314.1160.26614.2060.31813.5930.18713.6970.22313.7570.07313.8540.09113.7650.05313.8790.066
y 0.50 , 1 14.85215.2510.29415.4130.33014.7070.20514.8380.23714.9050.07915.0130.09614.9170.05715.0420.067
y 0.75 , 1 16.17416.5850.32316.7890.41416.0020.22416.1570.29516.2280.09216.3450.12316.2420.06616.3790.085
y 0.95 , 1 18.46018.8610.52918.8250.70718.2340.35918.2350.53518.5120.15218.6430.23218.5360.10718.6850.160
y 0.97 , 1 19.09419.4740.62119.5610.95218.8480.41818.8290.65519.1420.18019.3070.29519.1710.12719.3340.204
Table 3. Model selection criteria for children’s data.
Table 3. Model selection criteria for children’s data.
CriterionLNLRMLSNLRM
AIC−4879.52−4915.05
BIC−4854.70−4884.02
Table 4. Summary of the model fitted to children’s data.
Table 4. Summary of the model fitted to children’s data.
Explanatory VariableEstimateSELowerUpperp-Value
Intercept2.08360.00992.06422.1029<0.0001
Gender0.02390.00410.01590.0318<0.0001
Age0.14230.00230.13780.1469<0.0001
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Morán-Vásquez, R.A.; Giraldo-Melo, A.D.; Mazo-Lopera, M.A. Quantile Estimation Using the Log-Skew-Normal Linear Regression Model with Application to Children’s Weight Data. Mathematics 2023, 11, 3736. https://doi.org/10.3390/math11173736

AMA Style

Morán-Vásquez RA, Giraldo-Melo AD, Mazo-Lopera MA. Quantile Estimation Using the Log-Skew-Normal Linear Regression Model with Application to Children’s Weight Data. Mathematics. 2023; 11(17):3736. https://doi.org/10.3390/math11173736

Chicago/Turabian Style

Morán-Vásquez, Raúl Alejandro, Anlly Daniela Giraldo-Melo, and Mauricio A. Mazo-Lopera. 2023. "Quantile Estimation Using the Log-Skew-Normal Linear Regression Model with Application to Children’s Weight Data" Mathematics 11, no. 17: 3736. https://doi.org/10.3390/math11173736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop