Next Article in Journal
Neuromodulated Dopamine Plastic Networks for Heterogeneous Transfer Learning with Hebbian Principle
Previous Article in Journal
Numerical Investigation on the Formation and Penetration Behavior of Explosively Formed Projectile (EFP) with Variable Thickness Liner
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation

1
Department of Statistics, Faculty of Mathematics & Computer, Shahid Bahonar University of Kerman, Kerman 7616914111, Iran
2
Institute of Statistics, National Chung Hsing University, Taichung 402, Taiwan
3
Department of Public Health, China Medical University, Taichung 404, Taiwan
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(8), 1343; https://doi.org/10.3390/sym13081343
Submission received: 2 July 2021 / Revised: 20 July 2021 / Accepted: 22 July 2021 / Published: 25 July 2021

Abstract

:
Multivariate skew-symmetric-normal (MSSN) distributions have been recognized as an appealing tool for modeling data with non-normal features such as asymmetry and heavy tails, rendering them suitable for applications in diverse areas. We introduce a richer class of MSSN distributions based on a scale-shape mixture of (multivariate) flexible skew-symmetric normal distributions, called the SSMFSSN distributions. This very general class of SSMFSSN distributions can capture various shapes of multimodality, skewness, and leptokurtic behavior in the data. We investigate some of its probabilistic characterizations and distributional properties which are useful for further methodological developments. An efficient EM-type algorithm designed under the selection mechanism is advocated to compute the maximum likelihood (ML) estimates of parameters. Simulation studies as well as applications to a real dataset are employed to illustrate the usefulness of the presented methods. Numerical results show the superiority of our proposed model in comparison to several existing competitors.

1. Introduction

The use of multivariate normal (MN) distribution plays a central role in statistical modeling. However, there are some situations where the data are not in agreement with the MN distribution. Departure from normality can take place in different ways, such as multimodality, lack in central symmetry, and positive or negative excess of kurtosis. The class of scale mixtures of skew-normal distributions (SMSN) whose general form was first introduced by Branco and Dey [1] includes many multivariate skew symmetric (MSS) distributions with only one mode.
More formally, a scale mixture distribution can be constituted by mixing a base density over a scaling distribution. Its density can be expressed in the form of the integral given by
f ( y ) = 0 g y | κ ( τ ) d H ( τ ; ν ) ,
where g y | κ ( τ ) is the conditional density of a p × 1 random vector Y given κ ( τ ) . Herein, κ ( · ) is a positive function of a scaling variable τ with cumulative distribution function (cdf) H ( τ ; ν ) , indexed by the parameter vector ν .
Using (1), the family of SMSN distributions can be generated by assuming a multivariate skew-normal (MSN) distribution [2] with location ξ , scale covariance matrix κ ( τ ) Σ , and shape parameter λ for g y | κ ( τ ) . The marginal probability density function (pdf) of Y can be obtained as follows:
f ( y ) = 2 0 ϕ p ( y ; ξ , κ ( τ ) Σ ) Φ κ ( τ ) 1 / 2 λ Σ 1 / 2 ( y ξ ) d H ( τ ; ν ) ,
where ϕ p ( · ; ξ , Σ ) denotes the pdf of the p-variate MN distribution with mean vector ξ and covariance matrix Σ , and Φ ( · ) is the cdf of the standard normal distribution. Some simple scaling functions such as κ ( τ ) = τ and κ ( τ ) = 1 / τ lead to well-known distributions. As described by Branco and Dey [1], some remarkable examples are the multivariate skew-t (MST), multivariate skew-slash (MSSL), and multivariate skew-contaminated-normal (MSCN) distributions, to name just a few. The SMSN family collapses to the class of scale mixture of normal distributions when the skewness parameter vanishes. The MSN distribution [2] with the parameterization given by Arellano-Valle and Genton [3] can be recovered when H ( τ ; ν ) is degenerated by imposing τ = 1 .
The multivariate skew-scale mixtures of normal (MSSMN) distributions [4] is established when g y | κ ( τ ) defined in (1) is given by
g y | κ ( τ ) = τ 1 = 2 ϕ p ( y ; ξ , τ 1 Σ ) Φ λ Σ 1 / 2 ( y ξ ) .
Therefore, the pdf of MSSMN distribution is obtained as follows:
f ( y ) = 2 Φ λ Σ 1 / 2 ( y ξ ) 0 ϕ p ( y ; ξ , τ 1 Σ ) d H ( τ ; ν ) .
Recently, Arellano-Valle et al. [5] proposed a multivariate class of scale-shape mixtures of skew-normal (MSSMSN) distributions which provides alternative candidates for modeling asymmetric data. A convenient hierarchical representation of the MSSMSN distribution is given by
Y | τ = ( τ 1 , τ 2 ) S N p ξ , κ ( τ 1 ) Σ , η ( τ 1 , τ 2 ) λ ,
where τ = ( τ 1 , τ 2 ) is a mixing vector with a joint cdf H ( τ 1 , τ 2 ; ν ) , and η ( τ 1 , τ 2 ) : ( R + , R + ) R is a real-valued shape mixing function which is not necessarily symmetric about zero. The family of MSSMSN distributions encapsulates several renowned unimodal asymmetric distributions generated by varying the scale and shape functions, κ ( τ 1 ) and η ( τ 1 , τ 2 ) , for a given distribution of τ , or alternatively by fixing κ ( τ 1 ) and η ( τ 1 , τ 2 ) but varying the distribution of τ . A convenient setup for the mixing functions is to choose κ ( τ 1 ) = 1 / τ 1 and η ( τ 1 , τ 2 ) = τ 2 , leading to a particular form of the shape mixture of SMSN distributions. If we choose η ( τ 1 , τ 2 ) = ( τ 1 / τ 2 ) 1 / 2 , it produces another form of shape mixtures of MSSMN distributions. See [5] for a more comprehensive discussion and detailed investigation.
As shown by Azzalini and Capitanio [6], the pdf of the MSS distribution can be written as follows:
f ( y ; ξ , Σ ) = 2 | Σ | 1 / 2 f 0 Σ 1 / 2 ( y ξ ) G 0 w Σ 1 / 2 ( y ξ ) ,
where ξ and Σ are, respectively, the location vector and scale covariance matrix; f 0 : R p R + is a p-variate centrally symmetric pdf with respect to the origin, i.e., f 0 ( x ) = f 0 ( x ) ; G 0 : R [ 0 , 1 ] is a univariate cdf satisfying G 0 ( x ) = 1 G 0 ( x ) ; and w : R p R is an odd real-valued function, namely w { x } = w { x } . The MSS class is equivalent to the one studied by Wang et al. [7] for which G 0 w { x } is replaced by π : R p [ 0 , 1 ] satisfying π ( x ) = 1 π ( x ) .
In light of (6), it is possible to generate a wide range of asymmetric families of unimodal and multimodal skew distributions depending on the specification of the function w { x } . This essential property induces greater flexibility in the available shapes. For example, Ma and Genton [8] proposed a flexible class of skew-symmetric distributions by choosing w { x } = P K ( x ) , where P K ( x ) is an odd polynomial function of order K.
The Hadamard product, also known as the Schur product, is a type of matrix multiplication that is commutative and simpler than the matrix product. See [9] for a comprehensive review and its applications to multivariate statistical analysis. The Hadamard product is advantageous in computations and algebraic manipulations because the products are entry-wise, the multiplication is commutative, and, particularly, the inverse is very easy to obtain and the computation of power matrices is straightforward. By convention, we use “⊙" to denote the Hadamard operations (product and power). Let A = ( a i j ) and B = ( b i j ) be two p × q matrices of the same dimension but not necessarily square. Then, the Hadamard product between these two matrices, denoted by A B , is the element-wise product of A and B , that is, a p × q matrix whose ( i , j ) entry is a i j b i j . Accordingly, the nth Hadamard power of matrix A is defined as A n = [ a i j n ] . More key properties concerning the multiplication and partial derivatives of the Hadamard product are deferred to Appendix A.
In the multivariate setup, the odd polynomial of order K has various combinations of the coefficients. For example, the bivariate case under an odd polynomial of order K = 3 is given by P K ( x 1 , x 2 ) = α 1 x 1 + α 2 x 2 + α 3 x 1 3 + α 4 x 2 3 + α 5 x 1 2 x 2 + α 6 x 1 x 2 2 . As an alternative to that of Ma and Genton [8] in more a general setting, we introduce the multivariate flexible skew-symmetric-normal (MFSSN) distribution, denoted by M F S S N p ( ξ , Σ , α ) , which has the following pdf:
f M F S S N ( y ; ξ , Σ , α ) = 2 ϕ p ( y ; ξ , Σ ) Φ λ 1 η 1 + λ 2 η 3 + + λ m η 2 m 1 ,
where α = ( λ 1 , , λ m ) is a p m × 1 multiple-scaled vector of shape parameters and η 2 k 1 = Σ 1 / 2 ( y ξ ) 2 k 1 remains a p × 1 vector through an odd Hadamard power transformation of order 2 K 1 , for K = 1 , , m . Notably, the MFSSN distribution encompasses the flexible generalized skew-normal (FGSN) distribution introduced by Ma and Genton [8] as the univariate case. Figure 1 illustrates the scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution under ξ = 0 , Σ = I 2 and various specifications of shape parameters arisen from two setups of m. As can be seen, many different non-elliptically distributional shapes with multiple modes and asymmetric patterns can be produced. Additional flexibility can be gained by expanding the thickness of tails. This motivates us to propose a more general class of scale-shape mixtures of MFSSN (SSMFSSN) distributions.
The class of SSMFSSN distributions can be hierarchically represented by
Y τ = ( τ 1 , τ 2 ) M F S S N p ( ξ , τ 1 1 Σ , ϑ ) ,
where ϑ = λ 1 τ 2 1 / 2 τ 1 1 / 2 , , λ m τ 2 1 / 2 τ 1 ( 2 m 1 ) / 2 . Further, we denote the pdf of τ = ( τ 1 , τ 2 ) by h τ ( τ 1 , τ 2 ; ν ) . From (8), the marginal pdf of Y is given by
f S S M F S S N ( y ; ξ , Σ , ζ , ν ) = ϕ p ( y ; ξ , τ 1 1 Σ ) Φ ( τ 2 1 / 2 { λ 1 η 1 + λ 2 η 3 + + λ m η 2 m 1 } ) d H ( τ ; ν ) .
Obviously, the MFSSN model is obtained by setting τ 1 = τ 2 = 1 in (9).
The family of SSMFSSN distributions introduced in (8) is quite vast, containing several subfamilies of asymmetric and multimodal distributions which have never been considered in the literature. Notice that the MSSMSN distribution described in (5) is a simple case of SSMFSSN by taking κ ( τ 1 ) = 1 / τ 1 , η ( τ 1 , τ 2 ) = ( τ 1 / τ 2 ) 1 / 2 , and λ j = 0 , for j = 2 , , m . More importantly, the scale-shape mixtures of flexible generalized skew-normal (SSMFGSN) distributions proposed very recently by Mahdavi et al. [10] can be thought of as a univariate case of SSMFSSN when the dimension p = 1 .
The EM algorithm [11] and some of its extraordinary variants such as the expectation conditional maximization (ECM) algorithm [12] and the expectation conditional maximization either (ECME) algorithm [13] are broadly applicable methods to carry out ML estimation for multivariate skew distributions in a complete-data framework. To the best of our knowledge, previous developments of the EM-type algorithm are based on the convolution-type representations, see, e.g., [14] for the MSN distribution, [15] for the MST distribution, [1] for the SMSN distribution, and [5] for the MSSMSN distribution. Since our proposed SSMFSSN model cannot be explicitly expressed by a convolution-type representation, we develop a novel EM-based procedure designed under the selection mechanism to compute the ML estimates.
The rest of the paper is organized as follows. Section 2 presents the formulation of the general SSMFSSN model and discusses how to deploy the ECME algorithm for ML estimation based on the selection-type mechanism. Section 3 exemplifies some particular cases of SSMFSSN distributions constructed by setting different mixing distributions for τ . The proposed techniques are illustrated by conducting two simulation studies in Section 4 and analyzing a real data example in Section 5. We conclude in Section 6 with a few remaks and offer directions for future research.

2. Methodology

2.1. The Family of SSMFSSN Distributions

A p-variate random vector Y S S M F S S N p ( ξ , Σ , α , ν ) is asserted to follow the SSMFSSN distribution with location vector ξ , scale covariance matrix Σ , shape parameters α = ( λ 1 , , λ m ) , and flatness parameters ν if it has the following selection-type representation:
Y = d V ( U > 0 ) ,
where V = ξ + Σ 1 / 2 τ 1 1 / 2 Z 1 , U = λ 1 ( τ 1 1 / 2 Z 1 ) + + λ m ( τ 1 ( 2 m 1 ) / 2 Z 1 2 m 1 ) τ 2 1 / 2 Z 2 and ( Z 1 , Z 2 ) N p + 1 ( 0 , I p + 1 ) . Herein, ‘ = d ’ stands for equality in distribution, and U is obviously a continuous random variable symmetric about zero. Using this characterization, the random samples for S S M F S S N p ( 0 , I p , α , ν ) can be simulated through the following scheme
X = τ 1 1 / 2 Z 1 , if U > 0 , τ 1 1 / 2 Z 1 , otherwise .
As a result, the random samples of the general S S M F S S N p ( ξ , Σ , α , ν ) can be obtained by the affine transformation Y = ξ + Σ 1 / 2 X .
For fitting the SSMFSSN model (10) within the complete-data framework via the EM-type algorithm, we introduce two latent variables W = d U ( U > 0 ) and γ = ( γ 1 , γ 2 ) = d ( τ 1 , τ 2 ) ( U > 0 ) . Then, ( Y , W , γ ) = d ( V , U , τ ) | ( U > 0 ) has the following joint pdf:
f Y , W , γ ( y , W , γ ) = 1 Pr ( U > 0 ) f V , U , τ ( y , W , γ ) = 2 f τ ( γ ) f V | τ ( y ) f U | V , τ ( W ) = 2 γ 2 1 / 2 h τ ( γ 1 , γ 2 ; ν ) ϕ p y ; ξ , γ 1 1 Σ ϕ γ 2 1 / 2 { W ζ } ,
where ζ = λ 1 η 1 + + λ m η 2 m 1 , h τ ( γ 1 , γ 2 ; ν ) is the pdf of τ = ( τ 1 , τ 2 ) evaluated at point γ = ( γ 1 , γ 2 ) .
Integrating out W from (12) gives the following joint pdf
f Y , γ ( y , γ 1 , γ 2 ) = 2 h τ ( γ 1 , γ 2 ; ν ) ϕ p y ; ξ , γ 1 1 Σ Φ γ 2 1 / 2 ζ .
Therefore, the marginal pdf of Y is given by
f Y ( y ) = 0 0 f Y , γ ( y , γ 1 , γ 2 ) d γ 1 d γ 2 = τ 1 τ 2 2 ϕ p ( y ; ξ , γ 1 1 Σ ) d H τ 1 ( γ 1 ; ν 1 ) Φ γ 2 1 / 2 ζ d H τ 2 ( τ 2 ; ν 2 ) ,
where the second equality holds if we further assume τ 1 and τ 2 are mutually independent. It is noteworthy that the shape mixtures of MSSMN distribution established by Arellano-Valle et al. [5] also belongs to the family of our proposed SSMFSSN model by imposing λ 2 = = λ m = 0 .
Dividing (12) by (13) yields the following relation
f ( W y , γ 1 , γ 2 ) = γ 2 1 / 2 ϕ γ 2 1 / 2 ( W ζ ) Φ γ 2 1 / 2 ζ f ( W y , γ 2 ) ,
for which ‘≡’ in (15) means that W and γ 1 are conditionally independent. Moreover, it is straightforward to show that
W ( y , γ 2 ) T N ζ , γ 2 1 I ( 0 , ) ,
where T N ( μ , σ 2 ) I A represents a doubly truncated normal distribution confined within the interval A = { a 1 < x < a 2 } , and I A is an indicator function of set A . Using Lemma 2 of Lin et al. [16], we have the following conditional expectation:
E ( W y , γ 2 ) = ζ + γ 2 1 / 2 ϕ ( γ 1 / 2 ζ ) Φ ( γ 1 / 2 ζ ) .
By Bayes’ rule, the conditional pdf of γ = ( γ 1 , γ 2 ) given Y = y is
f ( γ 1 , γ 2 y ) = 2 h τ ( γ 1 , γ 2 ; ν ) ϕ p y ; ξ , γ 1 1 Σ Φ ( γ 2 1 / 2 ζ ) f Y ( y ) .

2.2. Parameter Estimation via the ECME Algorithm

The EM algorithm [11] is a widely used iterative technique to deal with ML estimation in models that involve incomplete data or latent variables. One primary virtue of EM lies in the fact of attractive monotone convergence properties and the preservation of simplicity and stability. In practice, a major limitation of EM is often that some estimators in the M-step cannot be solved in terms of closed-form expressions. To overcome this obstacle, the ECM algorithm proposed by Meng and Rubin [12] recommends replacing the E-step of EM with a sequence of simpler conditional maximization (CM) steps, yet it also enjoys the same convergence properties as EM. However, in certain problems, some of the CM-steps of ECM may become analytically intractable or suffer from slow convergence. As a further flexible extension, the ECME algorithm [13] divides the CM-steps of ECM to maximize either the Q-function, called the CMQ-step, or the corresponding constrained actual log-likelihood function, named as the CML-step. In what follows, we describe in greater detail how the proposed SSMFSSN model can be fitted by using the ECME algorithm.
Suppose that y = ( y 1 , , y n ) constitutes a set of p-dimensional observed samples of size n arising from the SSMFSSN model. Under the EM framework, the latent variables W = ( W 1 , , W n ) and γ = ( γ 1 , , γ n ) introduced in the preceding subsection are treated as missing data. Then, the complete data are given by y c = ( y , W , γ ) . Further, we let θ = ( ξ , vech ( Σ ) , α , ν ) denote the entire unknown parameters to be estimated in the SSMFSSN model, where vech ( · ) is the half-vectorization operator that stacks the lower triangular elements of a p × p symmetric matrix into a single p ( p + 1 ) / 2 vector.
According to (12), the log-likelihood function of θ corresponding to the complete-data y c , excluding additive constants and terms that do not involve parameters of the model, is given by
c ( θ y c ) = n 2 ln | Σ | 1 2 i = 1 n { γ 1 i ( y i ξ ) Σ 1 ( y i ξ ) + γ 2 i ( W i ζ i ) 2 2 ln h ( γ 1 i , γ 2 i ; ν ) } ,
with
ζ i = λ 1 η i , 1 + λ 2 η i , 3 + + λ m η i , 2 m 1 , = λ 1 [ Σ 1 / 2 ( y i ξ ) ] 1 + λ 2 [ Σ 1 / 2 ( y i ξ ) ] 3 + + λ m [ Σ 1 / 2 ( y i ξ ) ] 2 m 1 = 1 p Λ 1 [ Σ 1 / 2 ( y i ξ ) ] 1 + 1 p Λ 2 [ Σ 1 / 2 ( y i ξ ) ] 3 + + 1 p Λ m [ Σ 1 / 2 ( y i ξ ) ] 2 m 1 = 1 p [ Δ 1 ( y i ξ ) ] 1 + 1 p [ Δ 2 ( y i ξ ) ] 3 + + 1 p [ Δ m ( y i ξ ) ] 2 m 1 = 1 p j = 1 m [ Δ j ( y i ξ ) ] 2 j 1 ,
where 1 p is a p × 1 vector of ones, η i , 2 j 1 = Σ 1 / 2 ( y i ξ ) 2 j 1 , Λ j = Diag { λ j } is a p × p diagonal matrix containing the elements of λ j on the main diagonal, and Δ j = Λ j 1 / ( 2 j 1 ) Σ 1 / 2 is a p × p reparameterized matrix.
On the kth iteration, the E-step requires the calculation of the so-called Q-function, which is the conditional expectation of (19) given the observed data y and the current estimate θ ^ ( k ) , where the superscript ( k ) denote the updated estimates at iteration k. To evaluate the Q-function, we require the following conditional expectations:
s ^ 1 i ( k ) = E ( γ 1 i y i , θ ^ ( k ) ) , s ^ 2 i ( k ) = E ( γ 2 i y i , θ ^ ( k ) ) , s ^ 3 i ( k ) = E ( W i γ 2 i y i , θ ^ ( k ) ) ,
which have explicit expressions that are discussed in detail in a subsequent section along with
s ^ 4 i ( k ) ( ν ) = E ( ln h ( γ 1 i , γ 2 i ; ν ) y i , θ ^ ( k ) ) ,
which may not have standard form for some subfamilies. Substituting (20) and (21) into (19) yields the following Q-function:
Q ( θ θ ^ ( k ) ) = n 2 ln | Σ | 1 2 i = 1 n { s ^ 1 i ( k ) ( y i ξ ) Σ 1 ( y i ξ ) + s ^ 2 i ( k ) ζ i 2 2 s ^ 3 i ( k ) ζ i 2 s ^ 4 i ( k ) ( ν ) } .
The CM-steps are implemented to update estimates of θ in the order of ξ , Σ , α and ν by maximizing, one by one, the Q-function obtained in the E-step. After some algebraic manipulations, they are summarized by the following CMQ and CML steps:
  • CMQ-Step 1: Fixing Σ = Σ ^ ( k ) and α = α ^ ( k ) , we update ξ ^ ( k ) via Proposition A2 by taking the partial derivative of (22) with respect to ξ . Since the derivation cannot get a closed-form expression for its maximizer, the solution of ξ ^ ( k + 1 ) is validated by numerically solving the root of the following equation:
    i = 1 n s ^ 1 i ( k ) Σ ^ ( k ) 1 ( y i ξ ) + s ^ 2 i ( k ) ζ i ( k ) a ^ i s 3 i ( k ) a ^ i = 0 ,
    where the two terms ζ i ( k ) = 1 p j = 1 m [ Δ ^ j ( k ) ( y i ξ ) ] 2 j 1 and a ^ i = j = 1 m ( 2 j 1 ) Δ ^ j ( k ) [ Δ ^ j ( k ) ( y i ξ ) ] 2 j 2 are nonlinear functions of ξ with Δ ^ j ( k ) = Diag { λ ^ j ( k ) } 1 / ( 2 j 1 ) Σ ^ ( k ) 1 / 2 .
  • CMQ-Step 2: Fixing ξ = ξ ^ ( k + 1 ) and then updating Σ ^ ( k ) by maximizing (22) over Σ gives
    Σ ^ ( k + 1 ) = 1 n i = 1 n s ^ 1 i ( k ) ( y i ξ ^ ( k + 1 ) ) ( y i ξ ^ ( k + 1 ) ) .
  • CMQ-Step 3: Fixing ξ = ξ ^ ( k + 1 ) , we update Δ ^ j ( k ) via Proposition A3 by taking the partial derivative of (22) with respect to Δ j each, j = 1 , , m . Since their solutions cannot be isolated and set equal to zeros, we have the following equation for finding the nonlinear roots of Δ j :
    i = 1 n s ^ 3 i ( k ) s ^ 2 i ( k ) ζ i ( k + 1 ) ( Δ j ) Δ j ( y i ξ ^ ( k + 1 ) ) 2 j 2 ( y i ξ ^ ( k + 1 ) ) = 0 ,
    where ζ i ( k + 1 ) ( Δ j ) = 1 p j = 1 m [ Δ j ( y i ξ ( k + 1 ) ) ] 2 j 1 is a nonlinear function of Δ j . After simplification, we can transform Δ ^ j ( k + 1 ) back to
    λ ^ j ( k + 1 ) = Δ j ( k + 1 ) Σ 1 / 2 ( k + 1 ) 2 j 1 1 p , j = 1 , , m .
Collecting the above solutions turn out to be α ^ ( k + 1 ) = ( λ ^ 1 ( k + 1 ) , , λ ^ m ( k + 1 ) ) .
For some members of SSMFSSN, the calculation of s ^ 4 i ( k ) ( ν ) is not straightforward. An update of ν ^ ( k ) can be achieved by directly maximizing the constrained actual log-likelihood function. This gives rise to the following CML-step:
  • CML-Step: ν ^ ( k ) is updated by optimizing the following constrained log-likelihood function:
    ν ^ ( k + 1 ) = arg max ν i = 1 n ln f S S M F S S N ( y i , ξ ^ ( k + 1 ) , Σ ^ ( k + 1 ) , ζ ^ ( k + 1 ) , ν ) .
Note that the maximization in the above CML-step requires p-dimensional search of the objective function (constrained log-likelihood), which can be easily accomplished by using, for example, the optim routine in R Development Core Team [17]. The iterations of the above algorithms are alternately repeated until a suitable convergence rule is satisfied, e.g., the relative difference | ( θ ^ ( k + 1 ) ) / ( θ ^ ( k ) ) 1 | is sufficiently small, e.g. less than 10 6 , which we consider in the numerical experiments, where ( θ ) = i = 1 n ln f S S M F S S N ( y i ; θ ) . To prevent infinite loop from adopting this criterion, the maximum number of iterations is set to 5000.
On the initialization of parameters for starting the algorithm, the location vector ξ ^ ( 0 ) and the scale covariance matrix Σ ^ ( 0 ) are specified as the sample mean vector and sample covariance matrix, respectively. The initial values for the shape parameters λ ^ 1 ( 0 ) are taken as the sample skewness of p variables, while the remaining λ ^ 2 ( 0 ) , , λ ^ m ( 0 ) are fixed around 0. As for ν ^ ( 0 ) = ( ν ^ 1 ( 0 ) , ν ^ 2 ( 0 ) ) , their initial values are chosen as relatively small values depending on the settings of parameter domain. To avoid getting stuck in one of the many local maxima of the likelihood function, a convenient method is to try a variety different of initial values with perturbations or using the bootstrap resampling method [14]. The solution with the highest log-likelihood value is treated as the ML estimates, denoted by θ ^ = ( ξ ^ , vech ( Σ ) ^ , α ^ , ν ^ ) .

3. Examples of SSMFSSN Distributions

We present some special cases of SSMFSSN distributions which are induced by setting different mixing distributions for τ . For each case, additional conditional expectations are also derived for the implementation of ECME.

3.1. The Multivariate Flexible Skew-Symmetric-t-Normal Distribution

The multivariate flexible skew-symmetric-t-normal (MFSSTN) distribution, denoted by Y M F S S T N p ( ξ , Σ , α , ν ) , is produced by taking τ 1 Γ ( ν / 2 , ν / 2 ) and τ 2 = 1 in (10). In this case, h τ ( γ 1 , γ 2 ; ν ) = g ( γ 1 ; ν / 2 , ν / 2 ) , where g ( · ; α , β ) represents the pdf of the gamma distribution with mean α / β . The pdf of the MFSSTN distribution can be expressed as
f M F S S T N y ; ξ , Σ , α , ν = 2 t p y ; ξ , Σ , ν Φ ( ζ ) ,
where t p ( · ; ξ , Σ , ν ) stands for the pdf of p-dimensional multivariate t distribution with location vector ξ , scale covariance matrix Σ , and degree of freedom (DOF) equal to ν . One special case of the MFSSTN distribution is the multivariate skew-t-normal (MSTN) distribution of Lin et al. [18], obtained by letting λ j = 0 for j = 2 , , m . In addition, the MFSSTN distribution reduces to MFSSN as ν .
According to (18), it is easy to observe that γ 1 | Y = y Γ ( ( ν + p ) / 2 , ( ν + δ 2 ) / 2 ) , where δ 2 = ( y ξ ) Σ 1 ( y ξ ) denotes the squared Mahalanobis distance. Therefore,
E ( γ 1 Y = y ) = ν + p ν + δ 2 and E ( ln γ 1 Y = y ) = ψ ν + p 2 ln ν + δ 2 2 ,
where ψ ( x ) = Γ ( x ) / Γ ( x ) is the digamma function.
From (20) and (21), the E-step involves the calculation of s ^ 1 i ( k ) = E ( γ 1 i y i , θ ^ ( k ) ) , s ^ 2 i ( k ) = 1 , s ^ 3 i ( k ) = E ( W i y i , θ ^ ( k ) ) and s ^ 4 i ( k ) = E ( ln γ 1 i y i , θ ^ ( k ) ) , which can be easily evaluated via the results of (17) and (29). As an alternative way of estimating ν , the CML-Step for the MFSSTN distribution can be altered to the following CMQ-Step.
  • CMQ-Step 4: ν ^ ( k + 1 ) is obtained by solving the root of the following equation:
    1 + ln ν 2 ψ ν 2 + 1 n i = 1 n s ^ 4 i ( k ) s ^ 1 i ( k ) = 0 .

3.2. The Multivariate Flexible Skew-Symmetric-Slash-Normal Distribution

Let Y be a p-dimensional random vector with the following representation
Y = ξ + τ 1 / 2 Σ 1 / 2 Z ,
where Z N p ( 0 , I p ) and is independent of τ = d U 1 / ν B e t a ( ν , 1 ) , where U is the uniform distribution on the interval ( 0 , 1 ) . From (31), the conditional distribution Y given τ is N p ( ξ , τ 1 Σ ) . Then, Y has a multivariate slash distribution with pdf given by
f S L ( y ; ξ , Σ , ν ) = ν 0 1 τ ν 1 ϕ p ( y ; ξ , τ 1 Σ ) d τ = 2 ν ν | Σ | 1 / 2 Γ ( ν + p / 2 ) G ( δ 2 / 2 ; ν + p / 2 ) π p / 2 δ 2 ν + p , y ξ ν ( ν + p / 2 ) ( 2 π ) p / 2 | Σ | 1 / 2 , y = ξ
where G ( · ; r ) denotes the cdf of the gamma distribution with scale parameter 1 and shape parameter r. Using the law of iterated expectations, the mean vector and variance-covariance matrix of Y are
E ( Y ) = ξ and cov ( Y ) = ν ν 1 Σ .
If we select τ 1 B e t a ( ν , 1 ) and τ 2 = 1 for (10), this generates the multivariate flexible skew-symmetric-slash-normal (MFSSSLN) distribution, denoted by Y M F S S S L N p ( ξ , Σ , α , ν ) , with the pdf taking the form of
f M F S S S L N ( y ; ξ , Σ , α , ν ) = 2 f S L ( y ; ξ , Σ , ν ) Φ ( ζ ) .
Note that the MFSSSLN distribution contains the MFSSN distribution as a limiting case for ν and encloses the multivariate skew-slash distribution considered by Wang and Genton [19] as a reduced case when λ 2 = = λ m = 0 .
According to (18), the conditional distribution γ 1 y is given by
f ( γ 1 y ) = | δ | 2 ν + p γ 1 ν + p / 2 1 exp ( γ 1 δ 2 / 2 ) 2 ν + p / 2 Γ ( ν + p / 2 ) G ( δ 2 / 2 ; ν + p / 2 ) , y ξ ( ν + p 2 ) γ 1 ν + p / 2 1 , y = ξ .
In addition, some algebraic manipulations yield
E ( γ 1 Y = y ) = 2 ν + p δ 2 G ( δ 2 / 2 ; ν + p / 2 + 1 ) G ( δ 2 / 2 ; ν + p / 2 ) , y ξ 2 ν + p 2 ν + p + 2 , y = ξ .
and
E ( ln γ 1 Y = y ) = ln 2 δ 2 + 0 δ 2 / 2 ln ( x ) x ν + p / 2 1 e x d x Γ ( ν + p / 2 ) G ( δ 2 / 2 ; ν + p / 2 ) , y ξ 2 2 ν + p , y = ξ .
To implement the ECME procedure for fitting MFSSSLN, the conditional expectations involved in (20) and (21) can be easily evaluated via the results of (17), (36), and (37). Besides, the DOF ν ^ ( k ) can be alternatively updated by maximizing the Q-function over ν , leading to the following CMQ-Step:
  • CMQ-Step 4:
    ν ^ ( k + 1 ) = n i = 1 n s ^ 4 i ( k ) .

3.3. The Multivariate Flexible Skew-Symmetric-Contaminated-Normal Distribution

The multivariate flexible skew-symmetric-contaminated-normal (MFSSCN) distribution, denoted by Y M F S S C N p ( ξ , Σ , α , ν 1 , ν 2 ) , arises when τ 2 = 1 , while τ 1 has a discrete distribution taking value ν 2 ( 0 , 1 ) with probability ν 1 and value 1 with probability 1 ν 1 . More precisely,
h ( τ 1 , ν ) = ν 1 I ( τ 1 = ν 2 ) + ( 1 ν 1 ) I ( τ 1 = 1 ) , 0 < ν 1 < 1 and 0 < ν 2 < 1 ,
where ν 1 is the proportion of outliers and ν 2 is an inflation parameter denoting the degree of contamination.
Using (14), the pdf of Y is given by
f M F S S C N ( y ; ξ , Σ , ζ , ν 1 , ν 2 ) = 2 ν 1 ϕ p ( y ; ξ , ν 2 1 Σ ) + ( 1 ν 1 ) ϕ p ( y ; ξ , Σ ) Φ ( ζ ) .
Obviously, the MFSSCN distribution reduces to the MFSSN distribution when ν 2 = 1 , to the multivariate skew-contaminated-normal distribution [4] when λ 2 = = λ m = 0 , and is said to follows the “MFSSCNe" distribution if ν 1 and ν 2 are restricted to be equal, namely ν 1 = ν 2 = ν .
To obtain s ^ 4 i ( k ) , we require the following conditional expectation
E ( γ 1 Y = y ) = 1 ν 1 + ν 1 ν 2 1 + p / 2 exp { ( 1 ν 2 ) δ 2 / 2 } 1 ν 1 + ν 1 ν 2 p / 2 exp { ( 1 ν 2 ) δ 2 / 2 } .
The resulting Q-function can be easily evaluated through (17) and (41) since s ^ 2 i ( k ) = 1 . To estimate ν 1 and ν 2 , we perform the CML-Step, so the calculation of s ^ 4 i ( k ) can be omitted.

3.4. The Multivariate Flexible Skew-Symmetric-t Distribution

The multivariate flexible skew-symmetric-t (MFSST) distribution, denoted by Y M F S S T p ( ξ , Σ , α , ν ) , is created by setting τ 1 = τ 2 = τ with τ Γ ( ν / 2 , ν / 2 ) . Utilizing (8), the hierarchical representation for Y can be simplified as
Y τ M F S S N p ( ξ , τ 1 1 Σ , ϑ ) ,
where ϑ = λ 1 , τ 1 λ 2 , τ m + 1 λ m . Therefore, it can be verified that the MFSST distribution has the following pdf
f M F S S T y ; ξ , Σ , ζ , ν = 2 t p y ; ξ , Σ , ν T ζ ν + p ν + δ 2 ; ν + p ,
where T ( · , ν ) denotes the cdf of the t distribution with DOF ν . The detailed proof of (43) is sketched in Appendix B. It is notable that the MFSST distribution includes the multivariate skew-t distribution of Azzalini and Capitanio [6] as a particular member by letting λ 2 = = λ m = 0 and the MFSSN distribution as a limiting case when ν grows to infity.
Using (18) subject to γ 1 = γ 2 = γ , it suffices to show that
f ( γ y ) = 1 T ( M ; ν + p ) g γ ; ν + p 2 , ν + δ 2 2 Φ γ 1 / 2 ζ ,
where M = ζ ν + p δ 2 + ν .
With arguments similar to those of Lin et al. [20], it is straightforward to derive
E ( γ Y = y ) = ν + p δ 2 + ν T M ν + p + 2 ν + p ; ν + p + 2 T M ; ν + p
and
E ( ln γ Y = y ) = ψ ν + p 2 ln δ 2 + ν 2 + ν + p δ 2 + ν T ( M ν + p + 2 ν + p ; ν + p + 2 ) T ( M ; ν + p ) 1 + ζ ( δ 2 1 ) ( ν + p ) ( ν + δ 2 ) 3 t ( M ; ν + p ) T ( M ; ν + p ) + 1 T ( M ; ν + p ) M κ ν ( x ) t ( x ; ν + p ) d x ,
where
κ ν ( x ) = ψ ν + p + 1 2 ψ ν + p 2 ln 1 + x 2 ν + p + ( ν + p ) x 2 ν p ( ν + p ) ( ν + p + x 2 ) .
Additionally, using the law of iterated expectations, we can deduce that
E ( W γ Y = y ) = 1 T ( M ; ν + p ) { M ν + p δ 2 + ν T M ν + p + 2 ν + p ; ν + p + 2 + ( δ 2 + ν ) 1 / 2 Γ ( ν + p + 1 ) / 2 ) π Γ ( ν + p ) / 2 1 + ζ 2 δ 2 + ν ν + p + 1 2 } .
As a consequence, the E-step in (20) and (21) requires the calculation of s ^ 1 i ( k ) = s ^ 2 i ( k ) = E ( γ i y i , θ ( k ) ) , s ^ 3 i ( k ) = E ( W i γ i y i , θ ^ ( k ) ) , and s ^ 4 i ( k ) = E ( ln γ i y i , θ ^ ( k ) ) , which can be directly evaluated via (45)–(47). Moreover, the procedure of updating ν ^ ( k ) is the same as (30).

3.5. The Multivariate Flexible Skew-Symmetric-t-t Distribution

Consider two independent random variables τ 1 Γ ( ν 1 / 2 , ν 1 / 2 ) and τ 2 Γ ( ν 2 / 2 , ν 2 / 2 ) with joint pdf given by
h τ ( γ 1 , γ 2 ; ν 1 , ν 2 ) = g ( γ 1 ; ν 1 / 2 , ν 1 / 2 ) g ( γ 2 ; ν 2 / 2 , ν 2 / 2 ) .
Using (14), we thus generate the multivariate flexible skew-symmetric-t-t (MFSSTT) distribution, denoted by Y M F S S T T p ( ξ , Σ , α , ν 1 , ν 2 ) , whose pdf is of the form
f M F S S T T y ; ξ , Σ , α , ν = 2 t p ( y ; ξ , Σ , ν 1 ) T ( ζ ; ν 2 ) .
When the two DOFs are restricted to be equal, namely ν 1 = ν 2 = ν , Y is said to follow the ‘MFSSTTe’ distribution. One thing worth noting is that the MFSSTT distribution reduces to the MFSSN distribution when ( ν 1 , ν 2 ) ( , ) , to the MFSSTN distribution by letting ν 1 = ν and ν 2 = , and embeds the multivariate skew-t-t (MSTT) distribution introduced by Wang et al. [21] as a special case under the setting of λ 2 = = λ m = 0 .
According to (18), it is easy to see γ 1 Y = y Γ ( ν 1 + p ) / 2 , ( ν 1 + δ 2 ) / 2 . The conditional pdf of γ 2 given Y = y is
f ( γ 2 y ) = g γ 2 ; ν 2 / 2 , ν 2 / 2 Φ γ 2 1 / 2 ζ T ( ζ ; ν 2 ) .
Further, we have the following conditional expectations:
E ( γ 2 Y = y ) = T ζ ν 2 + 2 ν 2 ; ν 2 + 2 T ( ζ ; ν 2 )
and
E ( ln γ 2 Y = y ) = ψ ν 2 + 1 2 ln ν 2 2 + T ζ ν 2 + 2 ν 2 ; ν 2 + 2 T ( ζ ; ν 2 ) 1 ζ ν 2 t ( ζ ; ν 2 ) T ( ζ ; ν 2 ) 1 T ( ζ ; ν 2 ) ζ ln 1 + x 2 ν 2 t ( x ; ν 2 ) d x .
Applying the law of iterated expectations to (15) and (50) gives
E ( W γ 2 Y = y ) = 1 T ( ζ ; ν 2 ) ζ T ζ ν 2 + 2 ν 2 ; ν 2 + 2 + t ( ζ ; ν 2 ) .
With slight modifications as defined in (20) and (21), the necessary conditional expectations in the E-step include s ^ 1 i ( k ) = E ( γ 1 i y i , θ ^ ( k ) ) , s ^ 2 i ( k ) = E ( γ 2 i y i , θ ^ ( k ) ) , s ^ 3 i ( k ) = E ( W i γ 2 i y i , θ ^ ( k ) ) , s ^ 4 i ( k ) = E ( ln γ 1 i y i , θ ^ ( k ) ) , and s ^ 5 i ( k ) = E ( ln γ 2 i y i , θ ^ ( k ) ) , which can be easily evaluated via the results given in (29) and (51)–(53), respectively. To numerically estimate ν 1 and ν 2 for the MFSSTT distribution, we resort to the following two root-finding equations:
  • CMQ-Step 4: ν ^ 1 ( k + 1 ) and ν ^ 2 ( k + 1 ) are obtained by solving the roots of the following two equations:
    1 + ln ν 1 2 ψ ν 1 2 + 1 n i = 1 n s ^ 4 i ( k ) s ^ 1 i ( k ) = 0
    and
    1 + ln ν 2 2 ψ ν 2 2 + 1 n i = 1 n s ^ 5 i ( k ) s ^ 2 i ( k ) = 0 .

4. Simulation Studies

4.1. Recovery of the True Underlying Parameters

The first experiment intends to investigate the ability of the proposed ECME algorithm to recover the true underlying parameters. Monte Carlo samples of different sample sizes n = 100 , 250, 500, and 1000 were generated from the MFSSN distributions specified in (7) and five examples of SSMFSSN distributions studied in Section 3. For ease of exposition, we considered the Hadamard power transformation of order three ( K = 3 ) that allows two shape parameters in the skewing function Φ ( · ) . Moreover, the flatness parameters for MFSSCN and MFSSTT were assumed to be equal, say ν 1 = ν 2 = ν , referred to as the MFSSCNe and MFSSTTe distributions. The presumed true parameters were ξ = ( 1 , 1 ) , σ = vech ( Σ ) = ( 1 , 0.5 , 4 ) , λ 1 = ( 2 , 2 ) , and λ 2 = ( 1 , 1 ) . Furthermore, ν = 3 was taken for the MFSSTN, MFSSSLN, MFSST, and MFSSTTe distributions, while ν = 0.7 was adopted for the MFSSCNe distribution since its support lies within the interval ( 0 , 1 ) . As an illustration, Figure 2 displays the scatter plots superimposed on the fitted contours for each type of data simulated from one trail.
For all scenarios, the accuracies of the parameter estimates are assessed by computing the mean absolute bias (MAB) and the root mean square error (RMSE) over R = 100 replications. For a vector of parameters θ = ( θ 1 , , θ p ) , these measures are, respectively, defined as
MAB = 1 p R k = 1 p r = 1 R | θ ^ k r θ k A | and RMSE = 1 p R k = 1 p r = 1 R ( θ ^ k r θ k A ) 2 ,
where θ ^ k r denotes the ML estimate of the kth parameter at the rth replication and θ k A represents the actual value of θ k .
The experimental results are summarized in Table 1. It is readily seen both MAB and RMSE values tend to approach zero with increasing the sample size. While this study is limited to the simplest case ( p = 2 ; m = 2 ) , our developed ECME algorithm shows favorable ability to recover the true parameter values with data generated exactly according to model assumptions. Similar experiments have also been undertaken on more complex scenarios ( p = 3 ; m = 3 ) . The extensive results would not necessarily be excessively reported since the conclusions are in accordance with those already presented.

4.2. Comparing the Proposed Procedure with Convolution-Type EM Algorithms

The second experiment aims to compare the performance of the proposed selection-type ECME procedure outlined in Section 2.2 with the traditional EM-based algorithms derived based on convolution-type representations. As an illustration, we consider the fitting of MSN, MST, and MSCN distributions arisen from the multivariate skew-normal independent (SNI) family studied by Cabral et al. [22]. As discussed above, they are special cases of our proposed MFSSN, MFSST, and MFSSCN distributions by setting λ 2 = = λ m = 0 . Accordingly, the CMQ-Step 1 in Section 2 can be simplified as follows:
  • CMQ-Step 1: Fixing Σ = Σ ^ ( k ) and λ 1 = λ ^ 1 ( k ) , we obtain ξ ^ ( k + 1 ) by
    ξ ^ ( k + 1 ) = I p i = 1 n s ^ 1 i ( k ) + Σ ^ ( k ) Δ ^ 1 ( k ) 1 p 1 p Δ ^ 1 ( k ) i = 1 n s ^ 2 i ( k ) 1 { i = 1 n s ^ 1 i ( k ) y i + Σ ^ ( k ) Δ ^ 1 ( k ) 1 p 1 p Δ ^ 1 ( k ) i = 1 n s ^ 2 i ( k ) y i Σ ^ ( k ) Δ ^ 1 ( k ) 1 p i = 1 n s ^ 3 i ( k ) } .
In total, 100 Monte Carlo (MC) samples of sizes n = 100 , 500, and 1000 were generated from each of the three distributions. The true parameters were the same as those given in the previous experiment except for λ 2 = 0 . Each simulated sample was fitted twice with the proposed selection-type ECME procedure and the EM-type algorithm based on convolution-type representations, as implemented by the mixsmsn R package [23]. For a fair comparison, we started the two algorithms using the same initial values as described at the end of Section 2. All computations were carried out by Microsoft R package 3.5.1 in win64 environment of a desktop computer with 2.80-GHz/Intel Core(TM) i7-7700HQ CPU Processor and 16.0 GB RAM. Performance evaluation was assessed by the execution CPU time and the converged log-likelihood maxima.
The box plots depicted in Figure 3 reveal the selection-type algorithm demands much lower computational cost than those required for the convolution-type algorithm. The phenomenon is more apparent for the MST and MSCN distributions, particularly for larger n. The high efficiency of the selection-type algorithm can be ascribed to the fact that its E-step is designed by virtue of simplification. Finally, it is worth mentioning that both algorithms can achieve the same final log-likelihood, as demonstrated by violin plots in Figure 4.   

5. An Illustrative Example: The Wind Speed Data

We considered a trivariate dataset analyzed by Azzalini and Genton [24] for the study of spatial distribution of wind speed by means of the MST and various MSSMSN distributions proposed by Azzalini and Capitanio [6] and Arellano-Valle et al. [5], respectively. This dataset contains 278 hourly average speed assembled at three meteorological towers: Goodnoe Hills (gh), Kennewick (kw), and Vansycle (vs) from 23 February to 30 November 2003 recorded at midnight when wind speeds tend to peak. The positive and negative signs of wind speed measurements represent a westerly wind direction and an easterly wind direction, respectively. The Ljung–Box test indicates weak serial correlation for observations measured at the three stations. For modeling these data, we followed Azzalini and Genton [24] to treat the observations as being independent and identically distributed. Figure 5 presents histograms overlaid with kernel density curves obtained by using R density() function for measurements collected at each tower.
Six SSMFSSN models of order 3 were considered to fit the wind speed data. For the sake of comparison, the MST, MSTN, and MSTC distributions belonging to the MSSMSN family [5] were also fitted as sub-models of SSMFSSN subject to the constraint of λ 2 = 0 . To select an appropriate model from the candidates, we adopted the Akaike information criterion (AIC) [25] and the Bayesian information criterion (BIC) [26], which are the two most widely used model selection indices based on penalized likelihood and applicable for both nested and non-nested models. The two criteria are defined as
AIC = 2 d 2 m a x and BIC = d log n 2 m a x ,
where d is the number of free parameters in the model and max is the maximized log-likelihood value. A lower AIC or BIC value indicates that a closer fit of the model to the data.
Table 2 compares the ML estimation results for nine candidate models. As can be seen, our proposed SSMFSSN models perform favorably as compared to three MSSMSN analogs because they suffer from a lack of ability to capture the possibly bimodal behavior of the wind speed data (Figure 5). Accordingly, the MFSSTT distribution provides the best fit in terms of the lowest value of AIC as well as BIC, followed by the MFSST distribution. The MSTC and MST are the top two MSSMSN models with smaller AIC and BIC values.
Table 3 summarizes the resulting ML estimates of 6 considered SSMFFSN models together with their asymptotic standard errors obtained by performing the parametric bootstrap method [27]. Notably, the estimates of the shape and flat parameters indicate the presence of skewed and leptokurtic characteristics toward different directions among the three variables. As an illustration, the fitted 3D contour densities of MSTC, MST, MFSST, and MFSSTT distributions are depicted in Figure 6. It is interesting to see that the MFSSTT model having the smallest AIC and BIC can adapt the shape of the wind speed data more closely than the other three competitors.

6. Conclusions

We introduce a novel family of SSMFSSN distributions as a generalization of the work of Mahdavi et al. [10] that can capture simultaneously the dependency among multivariate responses, skewness, heavy-tailedness, and, in particular, multimodal density shapes without resorting to the use of finite mixtures [14,15,21]. Since the SSMFSSN model cannot be represented by a convolution-type form, this stimulates us to devise a feasible ECME algorithm for ML estimation under the selection-type mechanism. The effectiveness and efficiency of the algorithm are evaluated by conducting two simulation studies. Numerical analysis of a real dataset highlights the potential and capability of our proposed approach as a promising alternative tool for modeling multimodal multivariate data with asymmetrical behavior. Computer programs for implementation of our methods can be installed as a R package from Github devtools::install_github(“a-mahdavi/SSMFSSN.EM”). Further developments of the current approach could be exploited for powerful extensions of the factor analysis model or finite mixtures thereof with censored or possibly missing values that were considered recently by the authors of [28,29,30,31,32,33,34,35]. One limitation of the SSMFSSN model is that it may not be suited to the data with modes having too far distances. Another worthwhile extension of this work is to pursue a mixture modeling framework of the current approach that would be an effective way to resolve this problem.

Author Contributions

A.M. and T.-I.L. conceived the project, developed the statistical methods, designed the approach, and analyzed the results. All authors contributed to the development of the methodology and to writing the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

TIL was partially supported by the Ministry of Science and Technology of Taiwan (Grant No. MOST 109-2118-M-005-005-MY3).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors gratefully acknowledge three anonymous referees for their comments and suggestions that greatly improved this paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Hadamard Product

Below, we present three propositions concerning the multiplication and partial derivatives of the Hadamard product which are useful for our methodological developments.
Proposition A1.
Let A be a p × q matrix and D be a p × p diagonal matrix. Then, ( D A ) n = D n A n , where n N .
Proof. 
Multiplication of a Hadamard product by diagonal matrix D satisfies the following equation:
D ( A B ) = ( D A ) B = A ( D B ) .
By taking B = A in (A1), we obtain
D ( A A ) = ( D A ) A = A ( D A ) .
Now, consider n = 2 ; using (A2), we have
D 2 A 2 = D [ ( D A ) A ] = ( D A ) ( D A ) = ( D A ) 2 .
Similarly, for n = 3 ,we have
D 3 A 3 = D [ ( D A ) 2 A ] = ( D A ) 2 ( D A ) = ( D A ) 3 .
Clearly, the desired statement can be established by mathematical induction on n. □
Proposition A2.
Let A be a p × q matrix and b R p and x R q be two column vector. Then,
x b ( A x ) n = n A ( A x ) n 1 b ,
Proof. 
Let f = b ( A x ) n . It is easy to show
f = T r a c e b ( A x ) n = b : ( A x ) n ,
where “:” denotes Frobenius products defined as A : B = T r a c e ( A B ) .
Making use of the following facts
( A B ) : C = A : ( B C ) , A : B C = B A : C = A C : B , x ( A B ) = A x B + A B x ,
the differential of the scalar function f yields
d f = b : n ( A x ) n 1 A ( d x ) = n ( A x ) n 1 b : A ( d x ) = n A ( A x ) n 1 b : d x .
This completes the proof. □
Proposition A3.
Let X be a p × q matrix and a R p and b R q be two column vector then,
x a ( x b ) n = n ( x b ) n 1 a b .
Proof. 
Similar to the proof of Proposition A2, we define f = a : ( X b ) n and obtain its differential as below
d f = a : n ( X b ) n 1 ( d X ) b = n ( X b ) n 1 a : ( d X ) b = n ( X b ) n 1 a b : d X ,
which completes the proof. □

Appendix B. Proof of Equation (43)

Proof. 
Without loss of generality, we assume that ξ = 0 and Σ = I p . Let Y = d ( τ 1 / 2 ) Z 1 | ( τ 1 / 2 Z 2 < λ 1 ( τ 1 / 2 Z 1 ) + + λ m ( τ 1 / 2 Z 1 ) 2 m 1 ) , where Z 1 N p ( 0 , I p ) and Z 2 N ( 0 , 1 ) are two independent random variables and τ Γ ( ν / 2 , ν / 2 ) . Clearly, ( X 1 , X 2 ) = d τ 1 / 2 ( Z 1 , Z 2 ) t p + 1 ( 0 , I p + 1 , ν ) . Therefore, it is easy to verify that X 1 t p ( 0 , I p , ν ) , X 2 t ( 0 , 1 , ν ) and
ν + p ν + x 1 x 1 X 2 | ( X 1 = x 1 ) t ( 0 , 1 , ν + p ) .
By Bayes’ theorem, the pdf of Y = d X 1 | ( X 2 < λ 1 X 1 + + λ m X 1 2 m 1 ) is
f Y ( y ) = f X 1 ( y ) Pr ( X 2 < λ 1 X 1 + + λ m X 1 2 m 1 | X 1 = y ) Pr ( X 2 < λ 1 X 1 + + λ m X 1 2 m 1 ) = 2 f X 1 ( y ) Pr ν + p ν + y y X 2 < ν + p ν + y y ( λ 1 y + + λ m y 2 m 1 ) | X 1 = y = 2 t p ( y ; 0 , I p , ν ) T ν + p ν + y y ( λ 1 y + + λ m y 2 m 1 ) ; ν + p .
This completes the proof. □

References

  1. Branco, M.D.; Dey, D.K. A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 2001, 79, 99–113. [Google Scholar] [CrossRef] [Green Version]
  2. Azzalini, A.; Valle, A.D. The multivariate skew-normal distribution. Biometrika 1996, 83, 715–726. [Google Scholar] [CrossRef]
  3. Arellano-Valle, R.B.; Genton, M.G. On fundamental skew distributions. J. Multivar. Anal. 2005, 96, 93–116. [Google Scholar] [CrossRef] [Green Version]
  4. Ferreira, C.S.; Lachos, V.H.; Bolfarine, H. Likelihood-based inference for multivariate skew scale mixtures of normal distributions. AStA Adv. Stat. Anal. 2016, 100, 421–441. [Google Scholar] [CrossRef]
  5. Arellano-Valle, R.B.; Ferreira, C.S.; Genton, M.G. Scale and shape mixtures of multivariate skew-normal distributions. J. Multivar. Anal. 2018, 166, 98–110. [Google Scholar] [CrossRef] [Green Version]
  6. Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 2003, 65, 367–389. [Google Scholar] [CrossRef]
  7. Wang, J.; Boyer, J.; Genton, M.G. A skew-symmetric representation of multivariate distributions. Stat. Sin. 2004, 1259–1270. Available online: https://www.jstor.org/stable/24307231 (accessed on 5 January 2005).
  8. Ma, Y.; Genton, M.G. Flexible class of skew-symmetric distributions. Scand. J. Stat. 2004, 31, 459–468. [Google Scholar] [CrossRef] [Green Version]
  9. Styan, G.P. Hadamard products and multivariate statistical analysis. Linear Algebra Its Appl. 1973, 6, 217–240. [Google Scholar] [CrossRef] [Green Version]
  10. Mahdavi, A.; Amirzadeh, V.; Jamalizadeh, A.; Lin, T.I. Maximum likelihood estimation for scale-shape mixtures of flexible generalized skew normal distributions via selection representation. Comput. Stat. 2021, 36, 2201–2230. [Google Scholar] [CrossRef]
  11. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–22. [Google Scholar]
  12. Meng, X.L.; Rubin, D.B. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 1993, 80, 267–278. [Google Scholar] [CrossRef]
  13. Liu, C.; Rubin, D.B. The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 1994, 81, 633–648. [Google Scholar] [CrossRef]
  14. Lin, T.I. Maximum likelihood estimation for multivariate skew normal mixture models. J. Multivar. Anal. 2009, 100, 257–265. [Google Scholar] [CrossRef] [Green Version]
  15. Lin, T.I. Robust mixture modeling using multivariate skew t distributions. Stat. Comp. 2010, 20, 343–356. [Google Scholar] [CrossRef]
  16. Lin, T.I.; Lee, J.C.; Yen, S.Y. Finite mixture modelling using the skew normal distribution. Stat. Sin. 2007, 909–927. Available online: https://www.jstor.org/stable/24307705 (accessed on 5 December 2007).
  17. R Development Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020; Available online: http://www.R-project.org (accessed on 10 January 2021).
  18. Lin, T.I.; Ho, H.J.; Lee, C.-R. Flexible mixture modelling using the multivariate skew-t-normal distribution. Stat. Comp. 2014, 24, 531–546. [Google Scholar] [CrossRef]
  19. Wang, J.; Genton, M.G. The multivariate skew-slash distribution. J. Stat. Plan. Inference 2006, 136, 209–220. [Google Scholar] [CrossRef]
  20. Lin, T.I.; Lee, J.C.; Hsieh, W.J. Robust mixture modeling using the skew t distribution. Stat. Comp. 2007, 17, 81–92. [Google Scholar] [CrossRef]
  21. Wang, W.L.; Jamalizadeh, A.; Lin, T.I. Finite mixtures of multivariate scale-shape mixtures of skew-normal distributions. Stat. Pap. 2020, 61, 2643–2670. [Google Scholar] [CrossRef]
  22. Cabral, C.R.B.; Lachos, V.H.; Prates, M.O. Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data. Anal. 2012, 56, 126–142. [Google Scholar] [CrossRef]
  23. Prates, M.O.; Lachos, V.H.; Cabral, C.R.B. mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. J. Stat. Softw. 2013, 54, 1–20. [Google Scholar] [CrossRef] [Green Version]
  24. Azzalini, A.; Genton, M.G. Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 2008, 76, 106–129. [Google Scholar] [CrossRef]
  25. Akaike, H. A new look at the statistical model identification. IEEE. Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  26. Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
  27. Efron, B.; Tibshirani, R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1986, 54–75. Available online: https://www.jstor.org/stable/2245500 (accessed on 22 May 2003). [CrossRef]
  28. Lin, T.I.; Wu, P.H.; McLachlan, G.J.; Lee, S.X. A robust factor analysis model using the restricted skew-t distribution. Test 2015, 24, 510–531. [Google Scholar] [CrossRef]
  29. Liu, M.; Lin, T. Skew-normal factor analysis models with incomplete data. J. Appl. Stat. 2015, 42, 789–805. [Google Scholar] [CrossRef]
  30. Wang, W.L.; Liu, M.; Lin, T.I. Robust skew-t factor analysis models for handling missing data. Stat. Methods. Appl. 2017, 26, 649–672. [Google Scholar] [CrossRef]
  31. Lin, T.I.; Wang, W.L.; McLachlan, G.J.; Lee, S.X. Robust mixtures of factor analysis models using the restricted multivariate skew-t distribution. Stat. Model. 2018, 28, 50–72. [Google Scholar] [CrossRef] [Green Version]
  32. Wang, W.L.; Castro, L.M.; Lachos, V.H.; Lin, T.I. Model-based clustering of censored data via mixtures of factor analyzers. Comput. Stat. Data. Anal. 2019, 140, 104–121. [Google Scholar] [CrossRef]
  33. Wang, W.L.; Castro, L.M.; Hsieh, W.C.; Lin, T.I. Mixtures of factor analyzers with covariates formodeling multiply censored dependent variables. Stat. Pap. 2021. [Google Scholar]
  34. Wang, W.L.; Lin, T.I. Robust clustering of multiply censored data via mixtures of t factor analyzers. TEST 2021. [Google Scholar] [CrossRef]
  35. Wang, W.L.; Lin, T.I. Robust clustering via mixtures of t factor analyzers with incomplete data. Adv. Data Anal. Classif. 2021. [Google Scholar] [CrossRef]
Figure 1. Four scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution: (a) m = 2 , λ 1 = ( 0 , 2 ) and λ 2 = ( 0 , 2 ) (bimodal); (b) m = 2 , λ 1 = ( 2 , 2 ) and λ 2 = ( 1 , 1 ) (trimodal); (c) m = 3 , λ 1 = ( 2 , 2 ) , λ 2 = ( 1 , 1 ) and λ 3 = ( 2 , 2 ) (trimodal); (d) m = 3 , λ 1 = ( 0 , 2 ) , λ 2 = ( 0 , 2 ) and λ 3 = ( 2 , 1 ) (trimodal).
Figure 1. Four scatter-contour plots coupled with their marginal histograms of the bivariate MFSSN distribution: (a) m = 2 , λ 1 = ( 0 , 2 ) and λ 2 = ( 0 , 2 ) (bimodal); (b) m = 2 , λ 1 = ( 2 , 2 ) and λ 2 = ( 1 , 1 ) (trimodal); (c) m = 3 , λ 1 = ( 2 , 2 ) , λ 2 = ( 1 , 1 ) and λ 3 = ( 2 , 2 ) (trimodal); (d) m = 3 , λ 1 = ( 0 , 2 ) , λ 2 = ( 0 , 2 ) and λ 3 = ( 2 , 1 ) (trimodal).
Symmetry 13 01343 g001
Figure 2. Scatter-contour plots of one simulation case with 500 random samples generated from six subfamilies of the proposed model.
Figure 2. Scatter-contour plots of one simulation case with 500 random samples generated from six subfamilies of the proposed model.
Symmetry 13 01343 g002
Figure 3. Box plots of CPU time for convergence of selection-type and convolution-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.
Figure 3. Box plots of CPU time for convergence of selection-type and convolution-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.
Symmetry 13 01343 g003
Figure 4. Violin plots of converged log-likelihood obtained by selection and convolution EM-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.
Figure 4. Violin plots of converged log-likelihood obtained by selection and convolution EM-type algorithms for fitting MSN, MST, and MSCN distributions under various sample sizes.
Symmetry 13 01343 g004
Figure 5. Histograms of univariate measurements overlaid with kernel density curves for the hourly average wind speed collected at three meteorological towers.
Figure 5. Histograms of univariate measurements overlaid with kernel density curves for the hourly average wind speed collected at three meteorological towers.
Symmetry 13 01343 g005
Figure 6. The 3-D contour densities for MSTC, MST, MFSST, and MFSSTT distributions fitted to the wind speed data.
Figure 6. The 3-D contour densities for MSTC, MST, MFSST, and MFSSTT distributions fitted to the wind speed data.
Symmetry 13 01343 g006
Table 1. Simulation results based on 100 replications with different sample sizes.
Table 1. Simulation results based on 100 replications with different sample sizes.
ModelParametern = 100n = 250n = 500n = 1000
MABRMSEMABRMSEMABRMSEMABRMSE
MFSSN ξ 0.0900.1220.0570.0740.0370.0480.0270.033
σ 0.2570.3820.1430.2090.1150.1680.0810.123
λ 1 0.4760.6250.2030.2700.1110.1550.0790.112
λ 2 0.3390.4890.1600.2090.1040.1310.0640.082
MFSSTN ξ 0.0950.1270.0620.0860.0400.0530.0280.036
σ 0.4190.6730.2440.3830.1580.2410.1210.179
λ 1 0.3820.4690.3490.4260.3170.3820.3060.355
λ 2 0.3350.4390.1940.2440.1770.2160.1510.181
ν 2.49510.7450.4580.6510.3190.4130.2030.258
MFSSSLN ξ 0.0910.1210.0520.0690.0360.0470.0250.035
σ 0.4240.6410.3060.4890.1900.3140.1270.200
λ 1 0.4580.5880.2670.3470.2160.2690.1740.206
λ 2 0.4700.6140.2590.3780.1740.2270.1140.138
ν 6.53011.8493.4258.2061.1913.2510.4810.721
MFSSCNe ξ 0.0870.1170.0480.0640.0360.0470.0260.034
σ 0.3630.5460.2850.4400.2230.3390.1330.202
λ 1 0.4460.5950.2880.3530.2020.2470.1380.174
λ 2 0.4260.5860.2650.3470.1780.2190.1310.160
ν 0.1990.2160.1620.1760.1160.1370.0830.098
MFSST ξ 0.1140.1520.0690.0910.0420.0550.0320.040
σ 0.3930.5860.2500.3860.1600.2380.1320.216
λ 1 0.4530.5590.2650.3290.2070.2530.1940.228
λ 2 0.3550.4680.2150.2730.1370.1750.1100.139
ν 1.1522.2100.4300.6210.2660.3660.2070.265
MFSSTTe ξ 0.1140.1460.0700.0940.0470.0610.0290.038
σ 0.4000.6540.2130.3230.1650.2550.1150.188
λ 1 0.4370.5460.4110.4760.4060.4530.4140.438
λ 2 0.3130.4090.2160.2600.1830.2170.1670.193
ν 1.3322.7330.4240.5730.3000.4080.2100.279
Table 2. Summary results from fitting various models to the wind speed data. The model with the smallest value of AIC and BIC is displayed in bold.
Table 2. Summary results from fitting various models to the wind speed data. The model with the smallest value of AIC and BIC is displayed in bold.
FamilyModel max dAICBIC
MSTC–3178.7136383.46430.5
MSSMSNMST–3180.7136387.56434.6
MSTN–3180.9136387.86434.9
MFSSN–3171.7156373.46427.8
MFSSTN–3145.6166323.16381.2
SSMFSSNMFSSSLN–3147.1166326.26384.2
MFSSCN–3145.6166323.36381.3
MFSST–3143.0166318.16376.1
MFSSTT–3138.2176310.46372.1
Table 3. ML estimates of key parameters for six SSMFSSN models. The associated standard errors are shown in parentheses.
Table 3. ML estimates of key parameters for six SSMFSSN models. The associated standard errors are shown in parentheses.
ParameterMFSSNMFSSTNMFSSSLNMFSSCNMFSSTMFSSTT
ξ 1 23.0(0.05)21.3 (0.09)21.1 (0.04)21.5 (0.07)19.9 (0.06)18.6 (0.08)
ξ 2 14.8 (0.04)15.6 (0.08)15.2 (0.04)15.0 (0.07)15.1 (0.04)15.0 (0.06)
ξ 3 14.6 (0.04)14.9 (0.06)14.8 (0.02)15.4 (0.04)13.4 (0.08)12.6 (0.09)
σ 11 221.2 (0.26)122.9 (0.49)80.6 (0.24)115.8 (0.31)115.6 (0.21)115.6 (0.25)
σ 21 138.8 (0.16)96.4 (0.36)64.6 (0.18)91.5 (0.24)92.8 (0.15)93.2 (0.17)
σ 31 151.0 (0.13)104.8 (0.26)69.5 (0.13)99.1 (0.16)102.9 (0.18)106.0 (0.19)
σ 22 181.3 (0.10)134.5 (0.27)90.3 (0.16)128.0 (0.18)131.2 (0.18)132.1 (0.20)
σ 32 111.4 (0.07)80.8 (0.18)54.4 (0.09)79.6 (0.12)78.8 (0.15)79.1 (0.15)
σ 33 296.4 (0.07)203.4 (0.19)134.0 (0.13)187.2 (0.09)204.4 (0.24)206.7 (0.27)
λ 11 –1.7 (0.01)–0.4 (0.04)–0.2 (0.01)–0.3 (0.02)0.1 (0.02)1.0 (0.02)
λ 12 1.6 (0.04)1.1 (0.04)1.0 (0.03)1.3 (0.04)1.3 (0.01)3.2 (0.01)
λ 13 1.9 (0.04)1.4 (0.04)1.1 (0.02)1.3 (0.02)1.4 (0.01)2.8 (0.02)
λ 21 –0.7 (1.99)–0.1 (2.36)0.0 (1.03)–0.1 (1.53)–0.2 (0.04)1.2 (0.32)
λ 22 –0.7 (0.05)–0.4 (0.25)–0.2 (0.05)–0.4 (0.19)–0.5 (0.01)–1.6 (0.21)
λ 23 –0.1 (1.52)–0.2 (1.98)–0.1 (0.78)–0.2 (1.18)–0.1 (0.02)–2.3 (0.17)
ν 1 5.0 (0.18)1.5 (0.03)0.2 (0.04)4.7 (0.06)4.7 (0.08)
ν 2 0.2 (0.54)1.0 (0.16)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Mahdavi, A.; Amirzadeh, V.; Jamalizadeh, A.; Lin, T.-I. A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation. Symmetry 2021, 13, 1343. https://doi.org/10.3390/sym13081343

AMA Style

Mahdavi A, Amirzadeh V, Jamalizadeh A, Lin T-I. A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation. Symmetry. 2021; 13(8):1343. https://doi.org/10.3390/sym13081343

Chicago/Turabian Style

Mahdavi, Abbas, Vahid Amirzadeh, Ahad Jamalizadeh, and Tsung-I Lin. 2021. "A Multivariate Flexible Skew-Symmetric-Normal Distribution: Scale-Shape Mixtures and Parameter Estimation via Selection Representation" Symmetry 13, no. 8: 1343. https://doi.org/10.3390/sym13081343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop