Next Article in Journal
Modelling of Passenger Satisfaction and Reuse Intention with Monorail Services in Kuala Lumpur, Malaysia: A Hybrid SEM-ANN Approach
Previous Article in Journal
Hybrid GPU–CPU Efficient Implementation of a Parallel Numerical Algorithm for Solving the Cauchy Problem for a Nonlinear Differential Riccati Equation of Fractional Variable Order
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bimodal Extension of the Log-Normal Distribution on the Real Line with an Application to DNA Microarray Data

by
Mai F. Alfahad
1,
Mohamed E. Ghitany
1,*,
Ahmad N. Alothman
1 and
Saralees Nadarajah
2
1
Department of Statistics and Operations Research, Faculty of Science, Kuwait University, Kuwait City 13060, Kuwait
2
Department of Mathematics, University of Manchester, Manchester M13 9PL, UK
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(15), 3360; https://doi.org/10.3390/math11153360
Submission received: 3 July 2023 / Revised: 23 July 2023 / Accepted: 25 July 2023 / Published: 31 July 2023
(This article belongs to the Topic Mathematical Modeling)

Abstract

:
A bimodal double log-normal distribution on the real line is proposed using the random sign mixture transform. Its associated statistical inferences are derived. Its parameters are estimated by the maximum likelihood method. The performance of the estimators and the corresponding confidence intervals is checked by simulation studies. Application of the proposed distribution to a real data set from a DNA microarray is presented.

1. Introduction

A log-normal distribution is perhaps the most popular model for skewed data [1]. However, a log-normal distribution is defined only on the positive real line. Many of its application areas involve data spanning the entire real line. One example is the modeling of stock returns. The log-normal distribution is a popular model for stock returns. However, stock returns can be positive or negative. Positive stock returns correspond to profits, while negative stock returns correspond to losses. Other application areas of log-normal distributions involving data spanning the entire real line are discussed later on. Hence, a double log-normal distribution is needed.
We follow the procedure presented in [2] to construct a double log-normal (DLN) distribution. Consider the following two transforms [2]:
(i)
The random sign transform (RST) given by
W = ( 2 Y 1 ) X ;
(ii)
The random sign mixture transform (RSMT) given by
Z = Y X 1 ( 1 Y ) X 2 ,
where Y is a Bernoulli random variable (RV) with parameter β and X, X 1 and X 2 are non-negative RVs independent of Y. The probability density function (PDF) of W is
f W w ; β , θ = β ¯ f X | w | ; θ , w < 0 , β f X w ; θ , w 0 ,
and the PDF of Z is
f Z z ; β , θ 1 , θ 2 = β ¯ f X 2 | z | ; θ 2 , z < 0 , β f X 1 z ; θ 1 , z 0 ,
where f X · ; θ , f X 1 · ; θ 1 , f X 2 · ; θ 2 are the PDFs of non-negative RVs X, X 1 and X 2 , respectively, with (vector) parameters θ , θ 1 and θ 2 , respectively. If X is an RV from a family of distributions F 1 , then W is said to have a double  F 1 distribution. If X 1 and X 2 are independent RVs from a family of distributions F 2 , then Z is said to have a double F 2 distribution.
There are many double continuous distributions on the real line. However, the words ‘double’ or ‘reflection’ are sometimes used to denote the distribution of the absolute value of a random variable. Some double continuous distributions based on the RST are:
  • Double exponential distribution (Laplace) [3].
  • Double generalized gamma distribution [4].
  • Double Weibull distribution [5].
  • Double gamma distribution [6].
  • Double generalized Pareto distribution [7].
  • Double Lomax distribution [8].
  • Double Lindley distribution [9].
Some double continuous distributions based on the RSMT are:
  • Double half-normal distribution [10,11].
  • Double exponential distribution [12].
  • Double inverse gamma distribution [13].
  • Double Gompertz distribution [14].
  • Double Pareto II distribution [15].
  • Double inverse Gaussian distribution [16].
We construct the DLN distribution using the RSMT, i.e., the distribution of Z when X 1 and X 2 independently follow the log-normal distribution.
The remainder of this paper is organized as follows. In Section 2, the statistical properties of the DLN distribution are presented. The maximum likelihood estimates (MLEs) of the parameters and their asymptotic distributions are studied in Section 3. Simulations to check the finite sample performance of the estimators of the parameters and the corresponding confidence intervals are presented in Section 4. An application of the proposed double distribution to a real data set from a DNA microarray is presented in Section 5. Finally, the conclusions and comments are stated in Section 6.

2. Statistical Properties

We present the statistical properties of the DLN distribution in this section.

2.1. Probability Density Function

The PDF of the DLN distribution is
f Z ( z ) = β ¯ f X 2 | z | ; μ 2 , σ 2 , z < 0 , β f X 1 z ; μ 1 , σ 1 , z 0 ,
where for < μ 1 , μ 2 < and σ 1 , σ 2 > 0 , and
f X j x ; μ j , σ j = 1 2 π σ x exp ln ( x ) μ j 2 2 σ j 2 , x > 0 , j = 1 , 2
are the PDFs of the LN distributions.
Figure 1 shows the bimodality of the PDF of the DLN distribution for selected parameter values.
The DLN distribution has two modes given by
Mode ( Z ) = Mode X 2 and Mode X 1 ,
where
Mode X j = e μ j σ j 2 , j = 1 , 2
are the modes of the LN distributions.

2.2. Cumulative Distribution Function

The cumulative distribution function (CDF) of the DLN distribution is
F Z ( z ) = P ( Z z ) = β ¯ 1 F X 2 | z | ; μ 2 , σ 2 , z < 0 , β ¯ + β F X 1 z ; μ 1 , σ 1 , z 0 ,
where
F X j x ; μ j , σ j = P X j x = Φ ln ( x ) μ j σ j , x > 0 , j = 1 , 2
are the CDFs of the LN distributions and
Φ ( a ) = P ( Z a ) = a 1 2 π e z 2 / 2 d z , < a <
is the CDF of the standard normal distribution.
Figure 2 shows the CDF of the DLN distribution for selected parameter values. We can observe that F Z ( 0 ) = β ¯ and hence F Z ( 0 ) decreases as β increases.

2.3. Hazard Rate Function

The survival function of the DLN distribution is
S Z ( z ) = P ( Z > z ) = 1 β ¯ S X 2 | z | ; μ 2 , σ 2 , z < 0 , β S X 1 z ; μ 1 , σ 1 , z 0 ,
where
S X j x ; μ j , σ j = P X j > x = 1 Φ ln ( x ) μ j σ j , x > 0 , j = 1 , 2
are the SFs of the LN distributions.
The hazard rate function (HRF) of the DLN distribution is
h Z ( z ) = f Z ( z ) S Z ( z ) = β ¯ f X 2 | z | ; μ 2 , σ 2 1 β ¯ S X 2 | z | ; μ 2 , σ 2 , z < 0 , f X 1 ( z ) S X 1 z ; μ 1 , σ 1 , z 0 .
Figure 3 shows the HRF of the DLN distribution for selected parameter values. This figure shows that the HRF of the DLN distribution can be bimodal with one mode on each side of the origin.

2.4. Moments and Associated Measures

The rth raw moment of the DLN distribution is
E Z r = β E X 1 r + ( 1 ) r β ¯ E X 2 r , r 1 ,
where
E X j r = e r μ j + r 2 σ j 2 / 2 , j = 1 , 2
are the rth moments of the LN distributions.
In particular, the first four raw moments of Z are
E ( Z ) = β e μ 1 + σ 1 2 / 2 β ¯ e μ 2 + σ 2 2 / 2 , E Z 2 = β e 2 μ 1 + 2 σ 1 2 + β ¯ e 2 μ 2 + 2 σ 2 2 , E Z 3 = β e 3 μ 1 + 9 σ 1 2 / 2 β ¯ e 3 μ 2 + 9 σ 2 2 / 2 , E Z 4 = β e 4 μ 1 + 8 σ 1 2 + β ¯ e 4 μ 2 + 8 σ 2 2 .
The variance, skewness and kurtosis of the DLN distribution can be obtained using the well-known expressions:
V a r i a n c e ( Z ) = E Z 2 E ( Z ) 2 , S k e w n e s s ( Z ) = E Z 3 3 E Z 2 E ( Z ) + 2 E ( Z ) 3 V a r ( Z ) 3 / 2 , K u r t o s i s ( Z ) = E Z 4 4 E Z 3 E ( Z ) + 6 E Z 2 E ( Z ) 2 3 E ( Z ) 3 V a r ( Z ) 2
upon substituting for the raw moments.
Figure 4 shows the mean, variance, skewness and kurtosis of the DLN distribution as a function of β for selected values of μ 1 , σ 1 , μ 2 , σ 2 . We can observe that the skewness can be negative or positive, i.e., the DLN distribution can be skewed to the left or skewed to the right.

2.5. Harmonic Mean

The harmonic mean of an RV V is defined as
H M ( V ) = 1 E [ 1 / V ] ,
provided E [ 1 / V ] exists.
Proposition 1.
The harmonic mean of the RSMT Z is
H M ( Z ) = 1 β H M X 1 β ¯ H M X 2 .
Proof. 
Since
1 H M ( Z ) = E [ 1 / Z ] = 0 1 z β f X 1 ( z ) d z + 0 1 z β ¯ f X 2 ( z ) d z = β E 1 / X 1 β ¯ E 1 / X 2 = β H M X 1 β ¯ H M X 2 ,
the proposition follows. □
Corollary 1.
The harmonic mean of the DLN distribution is
H M ( Z ) = 1 β H M X 1 β ¯ H M X 2 ,
where
H M X j = e μ j σ j 2 / 2 , j = 1 , 2
are the harmonic means of the LN distributions.
Figure 5 shows the harmonic mean of the DLN distribution as a function of β for selected parameter values.

2.6. Entropies

Entropies are measures of a system’s variation, instability or unpredictability. For an RV V with PDF f V ( v ) , the following are two well-known entropies:
1.
Tsallis entropy [17]:
T α ( V ) = 1 α 1 1 E f V α 1 ( V ) , 0 < α 1 .
2.
Shannon entropy [18]:
H ( V ) = E ln f V ( V ) = lim α 1 T α ( V ) .
Proposition 2.
The Tsallis entropy of the RSMT Z is
T α ( Z ) = T α ( Y ) + β α T α X 1 + β ¯ α T α X 2
for 0 < α 1 , where
T α ( Y ) = 1 β α β ¯ α α 1 .
Proof. 
See [16]. □
Corollary 2.
The Shannon entropy of the RSMT Z is
H ( Z ) = lim α 1 T α ( Z ) = H ( Y ) + β H X 1 + β ¯ H X 2 ,
where
H ( Y ) = lim α 1 T α ( Y ) = β ln ( β ) β ¯ ln ( β ¯ ) .
Proposition 3.
The Tsallis entropy of the LN distribution with parameters ( μ , σ ) is
T α ( X ) = 1 α 1 1 1 2 π σ α 1 1 α exp μ ( 1 α ) + σ 2 2 α ( 1 α ) 2
for 0 < α 1 .
Proof. 
Since
1 ( α 1 ) T α ( X ) = 0 f X α ( x ) d x = 0 1 2 π σ α 1 x α exp α 2 σ 2 ln ( x ) μ 2 d x = 1 2 π σ α e ( 1 α ) y exp 1 2 σ / α 2 ( y μ ) 2 d y = 1 2 π σ α 1 1 α exp μ ( 1 α ) + σ 2 2 α ( 1 α ) 2 ,
the proposition follows. □
Corollary 3.
The Shannon entropy of the LN distribution with parameters ( μ , σ ) is
H ( X ) = lim α 1 T α ( X ) = 1 2 + μ + ln 2 π σ .
Proposition 4.
The Tsallis entropy of Z DLN β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 is
T α ( Z ) = T α ( Y ) + β α T α X 1 + β ¯ α T α X 2
for 0 < α 1 , where
T α ( Y ) = 1 β α β ¯ α α 1
and
T α X j = 1 α 1 1 1 2 π σ j α 1 1 α exp μ j ( 1 α ) + σ j 2 2 α ( 1 α ) 2 , j = 1 , 2 .
The proof of Proposition 4 follows directly from Propositions 2 and 3.
Corollary 4.
The Shannon entropy of Z DLN β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 is
H ( Z ) = H ( Y ) + β H X 1 + β ¯ H X 2 ,
where
H ( Y ) = β ln ( β ) β ¯ ln β ¯
and
H X j = 1 2 + μ j + ln 2 π σ j , j = 1 , 2 .
Figure 6 shows the Tsallis and Shannon entropies of the DLN distribution as a function of β for selected parameter values.
Note that the Tsallis and Shannon entropies can be negative for continuous distributions.

3. Maximum Likelihood Estimation

In this section, MLEs of the parameters of the DLN distribution and their asymptotic distributions are derived.
Let z 1 ,   z 2 , , z n be a random sample from the DLN β , μ 1 ,   σ 1 ,   μ 2 ,   σ 2 distribution. The log-likelihood function is
ln L β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 = i = 1 n ln β f X 1 z i ;   μ 1 ,   σ 1 1 z i > 0 + i = 1 n ln β ¯ f X 2 z i ; μ 2 ,   σ 2 1 z i < 0 ,
where 1 A denotes the indicator function. The MLEs of the parameters β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 are:
β ^ = n 1 n , μ ^ 1 = 1 n 1 i = 1 n ln z i 1 z i > 0 , σ ^ 1 = 1 n 1 i = 1 n ln z i μ ^ 1 2 1 z i > 0 , μ ^ 2 = 1 n 2 i = 1 n ln z i 1 z i < 0 , σ ^ 2 = 1 n 2 i = 1 n ln z i μ ^ 2 2 1 z i < 0 ,
where
n 1 = i = 1 n 1 z i > 0 , n 2 = i = 1 n 1 z i < 0 , n 1 + n 2 = n .
The Fisher information matrix about β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 is
I β , μ 1 , σ 1 , μ 2 , σ 2 = I Y ( β ) 0 0 0 β I X 1 μ 1 , σ 1 0 0 0 β ¯ I X 2 μ 2 , σ 2
, where I Y ( β ) = 1 β β ¯ is the Fisher information matrix about β and I X j μ j , σ j = diag ( 1 σ j 2 , 2 σ j 2 ) , j = 1 , 2 is the Fisher information matrix about μ j , σ j .
Moreover, the asymptotic distribution of the MLEs as n is
n β ^ β μ ^ 1 μ 1 σ ^ 1 σ 1 μ ^ 2 μ 2 σ ^ 2 σ 2 d M V N 0 , I 1 β , μ 1 , σ 1 , μ 2 , σ 2 ,
where d denotes convergence in distribution, M V N stands for multivariate normal distribution and
I 1 β , μ 1 , σ 1 , μ 2 , σ 2 = diag β β ¯ , σ 1 2 β , σ 1 2 2 β , σ 2 2 β ¯ , σ 2 2 2 β ¯ .

4. Simulations

This section details simulations to check the finite sample performance of the MLEs of the parameters of the DLN distribution. The performance is evaluated in terms of biases, mean squared errors of the MLEs and coverage probabilities of the corresponding 95% confidence intervals.
The simulation was repeated M = 10,000 times. In each of the M repetitions, a random sample of size n = 50 , 100 , , 500 was drawn from the DLN distribution with selected parameter values β , μ 1 , σ 1 , μ 2 , σ 2 = ( 0.3 , 2 , 1 , 1 , 2 ) , ( 0.5 , 0 , 1 , 1 , 2 ) , ( 0.8 , 2 , 1 , 1 , 2 ) and ( 0.547 , 2.812 , 1.016 , 2.224 , 0.764 ) , using the following algorithm:
  • Generate Y i B e r n o u l l i ( β ) , i = 1 , 2 , , n ;
  • Generate X 1 , i LN μ 1 , λ 1 , i = 1 , 2 , , n ;
  • Generate X 2 , i LN μ 2 , λ 2 , i = 1 , 2 , , n ;
  • Set Z i = Y i X 1 , i 1 Y i X 2 , i , i = 1 , 2 , , n .
The parameter values β , μ 1 , σ 1 , μ 2 , σ 2 = ( 0.547 , 2.812 , 1.016 , 2.224 , 0.764 ) are those estimated in the real data application in Section 5.
The measures examined in this simulation study are:
  • The bias of the MLEs:
    B i a s θ ^ = 1 M j = 1 M θ ^ j θ , θ = β , μ 1 , σ 1 , μ 2 , σ 2 .
  • The mean squared error (MSE) of the MLEs:
    M S E θ ^ = 1 M j = 1 M θ ^ j θ 2 , θ = β , μ 1 , σ 1 , μ 2 , σ 2 .
  • The coverage probability (CP) of the 95% confidence interval of each parameter:
    C P ( θ ) = 1 M j = 1 M 1 θ ^ j 1.96 S . E . θ ^ j , θ ^ j + 1.96 S . E . θ ^ j , θ = β , μ 1 , σ 1 , μ 2 , σ 2 .
The results of the simulation study are reported in Figure 7, Figure 8 and Figure 9.
  • Figure 7 shows that the absolute biases of the MLEs are small and approach zero as n increases.
  • Figure 8 shows that the MSEs of the MLEs are small and decrease as n increases.
  • Figure 9 shows that the coverage probabilities of the 95% confidence intervals are close to the nominal level.
Figure 7. Bias of the MLEs of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : ( 0.3 , 2 , 1 , 1 , 2 ) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) ( - - -), ( 0.8 , 2 , 1 , 1 , 2 ) ( . . . . . ) , ( 0.547 , 2.812 , 1.016 , 2.224 , 0.764 ) ( . ) .
Figure 7. Bias of the MLEs of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : ( 0.3 , 2 , 1 , 1 , 2 ) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) ( - - -), ( 0.8 , 2 , 1 , 1 , 2 ) ( . . . . . ) , ( 0.547 , 2.812 , 1.016 , 2.224 , 0.764 ) ( . ) .
Mathematics 11 03360 g007
Figure 8. MSE of the MLEs of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : ( 0.3 , 2 , 1 , 1 , 2 ) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) ( - - -), ( 0.8 , 2 , 1 , 1 , 2 ) ( . . . . . ), (0.547, −2.812, 1.016, −2.224, 0.764) ( . ).
Figure 8. MSE of the MLEs of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : ( 0.3 , 2 , 1 , 1 , 2 ) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) ( - - -), ( 0.8 , 2 , 1 , 1 , 2 ) ( . . . . . ), (0.547, −2.812, 1.016, −2.224, 0.764) ( . ).
Mathematics 11 03360 g008
Figure 9. CP of the 95% confidence intervals of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : (0.3, −2, 1, −1, 2) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) (- - -), (0.8, 2, 1, −1, 2) ( . . . . . ), (0.547, −2.812, 1.016, −2.224, 0.764) ( . ).
Figure 9. CP of the 95% confidence intervals of the parameters of the DLN distribution: β , μ 1 , σ 1 , μ 2 , σ 2 : (0.3, −2, 1, −1, 2) (Mathematics 11 03360 i004), (0.5, 0, 1, 1, 2) (- - -), (0.8, 2, 1, −1, 2) ( . . . . . ), (0.547, −2.812, 1.016, −2.224, 0.764) ( . ).
Mathematics 11 03360 g009
These conclusions show that the MLEs of the DLN distribution are well behaved for point as well as interval estimation.

5. Application

In this section, we apply the proposed DLN distribution to a real data set from a DNA microarray reported in [19]. According to Wikipedia, “A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome”. The data labelled as “SID 377353, ESTs [5’:, 3’:AA055048]” consist of the following 118 observations: 0.029, 0.062, 0.011, 0.009, 0.065, −0.128, 0.133, 0.116, 0.184, 0.111, −0.066, −0.049, 0.05, 0.137, 0.162, 0.173, 0.033, 0.107, 0.11, 0.147, 0.118, 0.172, 0.284, −0.137, 0.038, −0.145, −0.181, −0.155, 0.198, 0.024, 0.079, −0.252, 0.062, 0.097, 0.032, 0.026, 0.195, 0.019, 0.138, −0.3, −0.105, −0.11, −0.168, −0.173, −0.15, 0.078, 0.113, −0.047, 0.024, 0.001, −0.075, 0.014, 0.058, −0.083, −0.339, −0.177, −0.073, −0.044, −0.106, −0.159, −0.101, −0.074, −0.126, −0.131, −0.22, −0.184, −0.105, 0.173, 0.151, 0.064, −0.007, −0.005, −0.189, −0.219, −0.301, −0.212, −0.088, 0.157, 0.042, 0.184, 0.114, 0.102, 0.119, −0.064, −0.075, 0.073, 0.038, 0.017, −0.134, −0.118, −0.097, 0.059, 0.025, −0.102, −0.096, −0.035, 0.057, −0.055, 0.015, −0.23, −0.115, 0.255, 0.034, 0.078, 0.129, 0.081, 0.032, 0.047, −0.145, 0.012, −0.224, 0.074, −0.06, −0.137, 0.034, 0.009, −0.139, −0.141.
Figure 10 shows the histogram of the data, which indicates bimodality around the origin.
For the sake of comparing the bimodal DLN distribution with other bimodal distributions, we consider the double inverse Gaussian (DIG) distribution proposed in [16]. The PDF of the DIG distribution is
f Z ( z ) = β ¯ f X 2 | z | ; ν 2 , λ 2 , z < 0 , β f X 1 z ; ν 1 , λ 1 , z 0 ,
where
f X j x ; ν j , λ j = λ j 2 π x 3 / 2 exp λ j x ν j 2 2 ν j 2 x , x > 0 , ν j , λ j > 0 , j = 1 , 2
are the PDFs of inverse Gaussian distributions.
Table 1 gives the MLEs, their standard errors (S.E.s), estimated log-likelihoods and Kolmogrov–Smirnov (KS), Anderson–Darling (AD) and Cramér–von Mises (CVM) goodness-of-fit tests of the fitted DIG and DLN distributions. This table shows that the MLE of β and its S.E. are the same for both the fitted DIG and DLN distributions, since the Bernoulli parameter β is estimated independently in the RSMT. In addition, this table shows that the MLEs of μ 1 and μ 2 in the fitted DLN distribution are both negative.
Table 1 shows that the three goodness-of-fit tests have much smaller (larger) test statistics for the fitted DLN (DIG) distribution. This table also shows that the three goodness-of-fit tests reject (accept) the DIG (DLN) distribution for the given data. This conclusion is supported by the diagnostic plots in Figure 11 and Figure 12. In these figures, (i) the PDF and CDF plots indicate, in an informal way, that the fitted DIG (DLN) distribution may not be suitable for the given data; (ii) the quantile–quantile (Q–Q) plots show that the fitted DIG and DLN distributions inappropriately describe the tails of the distributions; (iii) the probability–probability (P–P) plots show that the fitted DIG (DLN) distribution inappropriately (appropriately) describes the center of the distribution.

6. Conclusions and Comments

We have proposed a bimodal distribution on the real line, referred to as the double log-normal distribution. We have derived its statistical properties, including the probability density, cumulative distribution and hazard rate functions, the moments and associated measures and harmonic mean, as well as Tsallis and Shannon entropies. Additionally, maximum likelihood estimates of the parameters and their asymptotic distribution are provided. Simulation studies showed that the maximum likelihood estimation performed well in terms of the bias, mean squared error and coverage probability of confidence intervals. Application to a DNA microarray data set showed that the proposed distribution is flexible and competitive for modeling bimodal data around the origin.
Instead of the log-normal distribution, one can consider the length biased log-normal distribution developed in [20]. It will be interesting to formulate a double length biased log-normal distribution.

Author Contributions

Conceptualization, M.E.G.; Methodology, M.F.A. and M.E.G.; Software, M.F.A. and A.N.A.; Validation, M.F.A., A.N.A. and S.N.; Formal analysis, M.F.A.; Data curation, M.F.A. and A.N.A.; Writing—original draft, M.E.G.; Writing—review & editing, S.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are given in the paper. The code used can be obtained from the corresponding author.

Acknowledgments

The authors would like to thank the Editor and the three referees for careful reading and comments which greatly improved the paper.

Conflicts of Interest

The authors have no conflict of interest.

References

  1. Crow, E.L.; Shimizu, K. Lognormal Distributions: Theory and Applications; Statistics Textbooks and Monographs; Routledge: London, UK, 2018. [Google Scholar]
  2. Aly, E. A unified approach for developing Laplace-type distributions. J. Indian Soc. Probab. Stat. 2018, 19, 245–269. [Google Scholar] [CrossRef]
  3. Laplace, P.S. Memoire sur la probabilite des causes par les evenements. Mem. L’Acad. R. Sci. Present. Divers. Savan 1774, 6, 621–656. [Google Scholar]
  4. Plucinska, A. On a general form of the probability density function and its application to the investigation of the distribution of rheostat resistence. Zastosow. Mat. 1966, 9, 9–19. [Google Scholar]
  5. Balakrishnan, N.; Kocherlakota, S. On the double Weibull distribution: Order statistics and estimation. Sankhya Indian J. Stat. Ser. B 1985, 47, 161–178. [Google Scholar]
  6. Kantam, R.R.L.; Narasimham, V.L. Linear estimation in reflected gamma distribution. Sankhya Indian J. Stat. Ser. B 1991, 53, 25–47. [Google Scholar]
  7. Nadarajah, S.; Afuecheta, E.; Chan, S. A double generalized Pareto distribution. Stat. Probab. Lett. 2013, 83, 2656–2663. [Google Scholar] [CrossRef]
  8. Bindu, P.; Sangita, K. Double Lomax distribution and its applications. Statistica 2015, 75, 331–342. [Google Scholar]
  9. Kumar, S.C.; Jose, R. On double Lindley distribution and some of its properties. Am. J. Math. Manag. Sci. 2019, 38, 23–43. [Google Scholar]
  10. John, S. The three-parameter two-piece normal family of distributions and its fitting. Commun. Stat. Theory Methods 1982, 11, 879–885. [Google Scholar] [CrossRef]
  11. Kimber, A. Methods for the two-piece normal distribution. Commun. Stat. Theory Methods 1985, 14, 235–245. [Google Scholar] [CrossRef]
  12. Lingappaiah, G. On two-piece double exponential distribution. J. Korean Stat. Soc. 1988, 17, 46–55. [Google Scholar]
  13. Abdulah, E.; Elsalloukh, H. Bimodal Class based on the Inverted Symmetrized Gamma Distribution with Applications. J. Stat. Appl. Probab. 2014, 3, 1–7. [Google Scholar] [CrossRef]
  14. Hoseinzadeh, A.; Maleki, M.; Khodadadi, Z.; Contreras-Reyes, J.E. The skew-reflected-Gompertz distribution for analyzing symmetric and asymmetric data. J. Comput. Appl. Math. 2019, 349, 132–141. [Google Scholar] [CrossRef]
  15. Halvarsson, D. Maximum likelihood estimation of asymmetric double type II Pareto distributions. J. Stat. Theory Pract. 2020, 14, 22. [Google Scholar] [CrossRef]
  16. Almutairi, A.; Ghitany, M.; Alothman, A.; Gupta, R.C. Double Inverse-Gaussian Distributions and Associated Inference. J. Indian Soc. Probab. Stat. 2023, 1–32. [Google Scholar] [CrossRef]
  17. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  18. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  19. Cankaya, M. Asymmetric Bimodal Exponential Power Distribution on the Real Line. Entropy 2018, 20, 23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Ratnaparkhi, M.V.; Naik-Nimbalkar, U.V. The Length-biased Lognormal Distribution and its application in the Analysis of data from oil Field Exploration studies. J. Mod. Appl. Stat. Methods 2012, 11, 225–260. [Google Scholar] [CrossRef] [Green Version]
Figure 1. PDF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), (0.5, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Figure 1. PDF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), (0.5, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Mathematics 11 03360 g001
Figure 2. CDF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), (0.5, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Figure 2. CDF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), (0.5, −0.5, 1, 0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Mathematics 11 03360 g002
Figure 3. HRF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 0.9) (Mathematics 11 03360 i001), (0.5, 0.5, 0.9, −0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0 ,   1 ,   0 ,   0.5 ) (Mathematics 11 03360 i003).
Figure 3. HRF of the DLN distribution: β ,   μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (0.3, −0.5, 1, 0.5, 0.9) (Mathematics 11 03360 i001), (0.5, 0.5, 0.9, −0.5, 1) (Mathematics 11 03360 i002), ( 0.8 ,   0 ,   1 ,   0 ,   0.5 ) (Mathematics 11 03360 i003).
Mathematics 11 03360 g003
Figure 4. Mean, variance, skewness and kurtosis of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : ( 0.5 ,   1 ,   0.5 , 1 ) (Mathematics 11 03360 i001), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i002), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Figure 4. Mean, variance, skewness and kurtosis of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : ( 0.5 ,   1 ,   0.5 , 1 ) (Mathematics 11 03360 i001), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i002), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Mathematics 11 03360 g004
Figure 5. Harmonic mean of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (−0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i002), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Figure 5. Harmonic mean of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : (−0.5, 1, 0.5, 1) (Mathematics 11 03360 i001), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i002), ( 0.5 ,   1 ,   0.5 ,   1 ) (Mathematics 11 03360 i003).
Mathematics 11 03360 g005
Figure 6. Tsallis and Shannon entropies of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : ( 3 ,   1 ,   3 ,   1 ) (Mathematics 11 03360 i001), (3, 1, −3, 1) (Mathematics 11 03360 i002), (0, 1, 0, 1) (Mathematics 11 03360 i003).
Figure 6. Tsallis and Shannon entropies of the DLN distribution as a function of β : μ 1 ,   σ 1 ,   μ 2 ,   σ 2 : ( 3 ,   1 ,   3 ,   1 ) (Mathematics 11 03360 i001), (3, 1, −3, 1) (Mathematics 11 03360 i002), (0, 1, 0, 1) (Mathematics 11 03360 i003).
Mathematics 11 03360 g006
Figure 10. Histogram of the microarray data.
Figure 10. Histogram of the microarray data.
Mathematics 11 03360 g010
Figure 11. Diagnostic plots of the fitted DIG distribution.
Figure 11. Diagnostic plots of the fitted DIG distribution.
Mathematics 11 03360 g011
Figure 12. Diagnostic plots of the fitted DLN distribution.
Figure 12. Diagnostic plots of the fitted DLN distribution.
Mathematics 11 03360 g012
Table 1. Summary of the fitted DIG and DLN distributions for DNA microarray data.
Table 1. Summary of the fitted DIG and DLN distributions for DNA microarray data.
ModelParameterMLES.E. ln L ^ KS (p-Value)AD (p-Value)CVM (p-Value)
DIG β 0.5420.04639.2490.126 (0.046)3.285 (0.020)0.545 (0.030)
ν 1 0.0870.017
λ 1 0.0360.006
ν 2 0.1320.018
λ 2 0.1260.024
DLN β 0.5420.04664.8290.065 (0.709)0.851 (0.446)0.103 (0.570)
μ 1 −2.8120.127
σ 1 1.0160.090
μ 2 −2.2240.104
σ 2 0.7640.074
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alfahad, M.F.; Ghitany, M.E.; Alothman, A.N.; Nadarajah, S. A Bimodal Extension of the Log-Normal Distribution on the Real Line with an Application to DNA Microarray Data. Mathematics 2023, 11, 3360. https://doi.org/10.3390/math11153360

AMA Style

Alfahad MF, Ghitany ME, Alothman AN, Nadarajah S. A Bimodal Extension of the Log-Normal Distribution on the Real Line with an Application to DNA Microarray Data. Mathematics. 2023; 11(15):3360. https://doi.org/10.3390/math11153360

Chicago/Turabian Style

Alfahad, Mai F., Mohamed E. Ghitany, Ahmad N. Alothman, and Saralees Nadarajah. 2023. "A Bimodal Extension of the Log-Normal Distribution on the Real Line with an Application to DNA Microarray Data" Mathematics 11, no. 15: 3360. https://doi.org/10.3390/math11153360

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop