*Article* **The Properties of a Decile-Based Statistic to Measure Symmetry and Asymmetry**

**Mohammad Reza Mahmoudi 1,2 , Roya Nasirzadeh <sup>2</sup> , Dumitru Baleanu 3,4 and Kim-Hung Pho 5,\***


Received: 20 January 2020; Accepted: 4 February 2020; Published: 18 February 2020

**Abstract:** This paper studies a simple skewness measure to detect symmetry and asymmetry in samples. The statistic can be obviously applied with only three short central tendencies; i.e., the first and ninth deciles, and the median. The strength of the statistic to find symmetry and asymmetry is studied by employing numerous Monte Carlo simulations and is compared with some alternative measures by applying some simulation studies. The results show that the performance of this statistic is generally good in the simulation.

**Keywords:** symmetry; asymmetry; measure of skewness; decile; Monte Carlo algorithm

### **1. Introduction**

In scientific studies, the researchers can summarize a given dataset using descriptive statistics. The descriptive statistics contain three known tendencies: central tendencies, dispersion tendencies and shape tendencies [1]. The central and dispersion tendencies, such as mean, median, standard deviation and variance deal with the convenience of the dataset [1–5]. The shape tendencies, such as skewness and kurtosis, are related to the distribution of dataset [6–8]. These measures which may be utilized in divergent disciplines consist of the tests of normality and of the lustiness for normal theoretical procedures. Skewness is often utilized to reference to symmetry. Nevertheless, symmetry is not often perspicuously defined, and it is thought that everybody knows it. There are some definitions about symmetry relying on the disciplines that it is utilized in. In literature, any statement related to the symmetry of a structure has to be done with reference to some rules of symmetry—a score, a line or an axis [9]. In the statistical inference, the meaningful score or axis is taken as the center of a distribution. There are several measures employed to quantify the degree of skewness of a distribution. Assume that µ, *m*, *M*, σ, µ<sup>3</sup> , *Q*<sup>1</sup> and *Q*3, are the mean; median; mode; standard deviation; third centered moment; and the first and the third quartiles, respectively. The statistics introduced for measuring the skewness are Pearson's coefficient of skewness:

$$\text{SK}\_P = \frac{\mu - M}{\sigma} \tag{1}$$

Pearson's second coefficient of skewness:

$$SK\_{P2} = \frac{\Im(\mu - m)}{\sigma} \tag{2}$$

Yule's coefficient of skewness:

$$SK\_Y = \frac{(\mu - m)}{\sigma} \tag{3}$$

the standardized third central moment:

$$\gamma\_1 = \frac{\mu\_3}{\sigma^3} \tag{4}$$

Bowley's coefficient of skewness:

$$\text{SK}\_{\text{B}} = \frac{Q\_{\text{3}} + Q\_{\text{1}} - 2m}{Q\_{\text{3}} - Q\_{\text{1}}} \tag{5}$$

and three Galip's coefficients of skewness:

$$\text{SK}\_{\text{G1}} = \frac{\text{X}\_{\text{Max}} + \text{X}\_{\text{min}} - 2\text{M}}{\text{X}\_{\text{Max}} - \text{X}\_{\text{min}}} \tag{6}$$

$$\text{SK}\_{\text{G2}} = \frac{\text{X}\_{\text{Max}} + \text{X}\_{\text{min}} - 2m}{\text{X}\_{\text{Max}} - \text{X}\_{\text{min}}} \tag{7}$$

$$\text{SK}\_{\text{G3}} = \frac{\text{X}\_{\text{Max}} + \text{X}\_{\text{min}} - 2\mu}{\text{X}\_{\text{Max}} - \text{X}\_{\text{min}}} \tag{8}$$

[9–17].

Although there are numerous different measures, and practical elongations of the above coefficients were proposed afterward, the original measures are still employed to this day, especially γ1(or its variants). It is largely utilized in statistical calculation software.

When we face a dataset containing outliers, we need a measure that can carefully consider these outliers. Therefore, probably, the measures that are based on the extreme values (max and min) such as three Galip's coefficients of skewness; are based on the first and the last quartiles (*Q*<sup>1</sup> and *Q*3) such as Bowley's coefficient of skewness; or are based on the first and the last deciles (*D*<sup>1</sup> and *D*9), should be more effective than other methods. The previous studies indicated that the three Galip's coefficients of skewness had the most power to detect symmetry and asymmetry. But the Bowley's coefficient of skewness acted not so well. There is no deep study about the definition of skewness based on deciles and the comparison between them and other alternatives.

In this work, at first, we consider the definition of skewness based on deciles and then study its asymptotic properties, similar to the approach that was applied in [18–23]. Finally, the power of the considered statistic to detect symmetry and asymmetry is compared with the powers of other measures of skewness.

#### **2. Decile-Based Skewness**

Let *X*1, . . . , *X<sup>n</sup>* be a sample from a distribution *F* on the real line, and we suppose that *F* is continuous so that all observations are distinct with probability one. We may then arrange the observations in increasing order without ties, *X*(1) < . . . < *X*(*n*) . These variables are called the order statistics, where *X*(*k*) is the *k th* order statistic. For 0 < *p* < 1, the *p* th quantile of *F* is defined as *x<sup>p</sup>* = *F* −1 (*p*) and the corresponding sample quantile is defined as *X*(*k*) where *k* = *np* , the ceiling of (the smallest integer greater than or equal to *np*). Let *D*<sup>1</sup> and *D*<sup>9</sup> be the first and nine sample deciles (0.1 and 0.9 quantiles), respectively. We consider our statistic for measuring the skewness by

$$SK = \frac{(D\_9 - m) - (m - D\_1)}{D\_9 - D\_1} \tag{9}$$

In the following, the asymptotic distribution of the proposed statistic is explored.

**Lemma 1.** *LetU*1, . . . , *U<sup>n</sup> be independent, identically distributed (iid in short) random variables from U*(0, 1) *and U*(1) < . . . < *U*(*n*) *, which are order statistics of U*1, . . . , *Un. If n* → ∞*, then*

$$\sqrt{n}\begin{cases} \mathcal{U}\_{\lceil np\_1 \rceil} - p\_1\\ \mathcal{U}\_{\lceil np\_2 \rceil} - p\_2\\ \mathcal{U}\_{\lceil np\_3 \rceil} - p\_3 \end{cases} \xrightarrow{D} N(\mathbf{0}, \Sigma) \tag{10}$$

*where* 0 < *p*<sup>1</sup> < *p*<sup>2</sup> < *p*<sup>3</sup> < 1, *and*

$$\mathbf{E} = \begin{bmatrix} p\_1(1-p\_1) & p\_1(1-p\_2) & p\_1(1-p\_3) \\ p\_1(1-p\_2) & p\_2(1-p\_2) & p\_2(1-p\_3) \\ p\_1(1-p\_3) & p\_2(1-p\_3) & p\_3(1-p\_3) \end{bmatrix} \tag{11}$$

**Proof.** Assume that *Y*1,*Y*2, . . . are iid exponential variables with mean *1* and *S<sup>j</sup>* = P *j i*=1 *Yi* . Additionally, assume that <sup>√</sup> *n k*1 *n* − *p*<sup>1</sup> → 0, √ *n k*2 *n* − *p*<sup>2</sup> <sup>→</sup> <sup>0</sup> and <sup>√</sup> *n k*3 *n* − *p*<sup>3</sup> → 0 as *k*1*, k*2*, k*3, and n→ ∞. Then by the extension of the results given in [24],

$$
\sqrt{n+1} \begin{bmatrix}
\frac{1}{n+1} S\_{k\_1} - p\_1 \\
\frac{1}{n+1} (S\_{k\_2} - S\_{k\_1}) - (p\_2 - p\_1) \\
\frac{1}{n+1} (S\_{k\_3} - S\_{k\_2}) - (p\_3 - p\_2) \\
\frac{1}{n+1} (S\_{n+1} - S\_{k\_3}) - (1 - p\_3)
\end{bmatrix} \overset{D}{\rightarrow} N(\mathbf{0}, \Sigma\_1),
$$

such that

$$
\boldsymbol{\Sigma}\_1 = \begin{bmatrix}
p\_1 & 0 & 0 & 0 \\
0 & p\_2 - p\_1 & 0 & 0 \\
0 & 0 & p\_3 - p\_2 & 0 \\
0 & 0 & 0 & 1 - p\_3
\end{bmatrix}
$$

Take *g*(*x*1, *x*2, *x*3, *x*4) = <sup>1</sup> *x*1+*x*2+*x*3+*x*<sup>4</sup> [*x*1, *x*<sup>1</sup> + *x*2, *x*<sup>1</sup> + *x*<sup>2</sup> + *x*3] 0 ; then, by Cramer's theorem [24],

$$\sqrt{n}\begin{pmatrix}\frac{S\_{k\_1}}{S\_{n+1}} - p\_1\\\frac{S\_{k\_2}}{S\_{n+1}} - p\_2\\\frac{S\_{k\_3}}{S\_{n+1}} - p\_3\end{pmatrix} \xrightarrow{D} N(\mathbf{0}, \Sigma)$$

Finally, the proof is completed with the reality that the distribution of *Sk*1 *Sn*+<sup>1</sup> , *Sk*2 *Sn*+<sup>1</sup> , *Sk*3 *Sn*+<sup>1</sup> 0 given *<sup>S</sup>n*+<sup>1</sup> is the same as the distribution of *U*(*k*<sup>1</sup> ) , *U*(*k*2) , *U*(*k*3) 0 .

**Corollary 1.** *LetX*1, . . . , *X<sup>n</sup> be iid random variables with density and distribution functions f and F, respectively. Additionally, assume that f*(*x*) *is continuous and positive in a neighborhood of the quantiles xp*<sup>1</sup> , *xp*<sup>2</sup> *and xp*<sup>3</sup> *with p*<sup>1</sup> < *p*<sup>2</sup> < *p*3*; then,*

$$\sqrt{n}\begin{pmatrix}X\_{\lceil np\_1\rceil}-\mathbb{x}\_{p\_1} \\ X\_{\lceil np\_2\rceil}-\mathbb{x}\_{p\_2} \\ X\_{\lceil np\_3\rceil}-\mathbb{x}\_{p\_3}\end{pmatrix}\xrightarrow{D}N(\mathbf{0},\Sigma^\*)\tag{12}$$

*where*

$$
\begin{array}{cc}
\mathsf{T}^{\*} = \begin{bmatrix}
\frac{p\_{1}(1-p\_{1})}{f^{2}\{\mathbf{x}\_{P\_{1}}\}} & \frac{p\_{1}(1-p\_{2})}{f\{\mathbf{x}\_{P\_{1}}\} f\{\mathbf{x}\_{P\_{2}}\}} & \frac{p\_{1}(1-p\_{3})}{f\{\mathbf{x}\_{P\_{1}}\} f\{\mathbf{x}\_{P\_{3}}\}} \\
\frac{p\_{1}(1-p\_{2})}{f\{\mathbf{x}\_{P\_{1}}\} f\{\mathbf{x}\_{P\_{2}}\}} & \frac{p\_{2}(1-p\_{2})}{f^{2}\{\mathbf{x}\_{P\_{2}}\}} & \frac{p\_{2}(1-p\_{3})}{f\{\mathbf{x}\_{P\_{2}}\} f\{\mathbf{x}\_{P\_{3}}\}} \\
\frac{p\_{1}(1-p\_{3})}{f\{\mathbf{x}\_{P\_{1}}\} f\{\mathbf{x}\_{P\_{3}}\}} & \frac{p\_{2}(1-p\_{3})}{f\{\mathbf{x}\_{P\_{2}}\} f\{\mathbf{x}\_{P\_{3}}\}} & \frac{p\_{3}(1-p\_{3})}{f^{2}\{\mathbf{x}\_{P\_{3}}\}}
\end{array} \tag{13}
$$

**Proof.** By applying the transformation *g*(*y*1, *y*2, *y*3) = *F* −1 (*y*1), *F* −1 (*y*2), *F* −1 (*y*3) 0 to the variables *U*d*np*<sup>1</sup> <sup>e</sup> − *p*1, *U*d*np*2<sup>e</sup> − *p*2, *U*d*np*3<sup>e</sup> − *p*<sup>3</sup> in Lemma 1, the proof will be completed. Be careful that the derivation of *g* is

$$
\dot{\mathcal{g}}(y\_1, y\_2, y\_3) = \begin{bmatrix}
\frac{1}{f(F^{-1}(y\_1))} & 0 & 0 \\
0 & \frac{1}{f(F^{-1}(y\_2))} & 0 \\
0 & 0 & \frac{1}{f(F^{-1}(y\_3))}
\end{bmatrix}.
$$

The asymptotic distribution of SK is provided in the following theorem. This is our major contribution. It is also necessary to infer the skewness of population.

**Theorem 1.** *LetX*1, . . . , *X<sup>n</sup> be iid random variables with density function f. Additionally, assume that f*(*x*) *is continuous and positive in a neighborhood of the quantiles x*0.1, *x*0.5 *and x*0.9*. Then, the asymptotic distribution of the proposed statistic can be illustrated by*

$$T\_n = \sqrt{n} \Big( \mathcal{S} \mathcal{S} - \frac{\mathbb{x}\_{0.9} + \mathbb{x}\_{0.1} - 2\mathbb{x}\_{0.5}}{\mathbb{x}\_{0.9} - \mathbb{x}\_{0.1}} \Big) \stackrel{D}{\to} N(0, \sigma^2)$$

*where*

$$\begin{aligned} \sigma^{2} = \frac{1}{\left(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.1}\right)^{4}} & \quad \left[\frac{0.36(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.9})^{2}}{f^{2}(\mathbf{x}\_{0.1})} + \frac{(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.1})^{2}}{f^{2}(\mathbf{x}\_{0.5})} + \frac{0.36(\mathbf{x}\_{0.5} - \mathbf{x}\_{0.1})^{2}}{f^{2}(\mathbf{x}\_{0.9})} \\ & \quad - \frac{0.4(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.1})(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.5})}{f(\mathbf{x}\_{0.1})f(\mathbf{x}\_{0.5})} + \frac{0.08(\mathbf{x}\_{0.5} - \mathbf{x}\_{0.1})(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.5})}{f(\mathbf{x}\_{0.4})f(\mathbf{x}\_{0.9})} \\ & \quad - \frac{0.4(\mathbf{x}\_{0.9} - \mathbf{x}\_{0.1})(\mathbf{x}\_{0.5} - \mathbf{x}\_{0.1})}{f(\mathbf{x}\_{0.5})f(\mathbf{x}\_{0.9})} \end{aligned} \tag{14}$$

**Proof.** The proof is simply achieved using Cramer's theorem [24] and taking *g*(*x*1, *x*2, *x*3) = *x*1−2*x*2+*x*<sup>3</sup> *x*3−*x*<sup>1</sup> . 

**Corollary 2.** *LetX*1, . . . , *X<sup>n</sup> be iid random variables from U*(0, 1)*; then, the asymptotic distribution of the proposed statistic is given by*

$$
\sqrt{n}(\text{SK}-\text{0}) \xrightarrow{D} \text{N}(\text{0}, \text{1.25})\tag{15}
$$

These results can be employed to build an asymptotical confidence interval and to check the hypothesis.

#### *2.1. Asymptotic Confidence Interval*

Now, *T<sup>n</sup>* can be utilized as a pivotal quantity to build a confidence interval asymptotic to a population's skewness,

$$\left(\text{SK} - \frac{\text{\textdegree\prime}}{\sqrt{n}} Z\_{\alpha/2\prime} \text{ SK} + \frac{\text{\textdegree\prime}}{\sqrt{n}} Z\_{\alpha/2}\right) \tag{16}$$

where

$$\begin{split} \hat{\sigma}^2 = \frac{1}{\frac{1}{(D\_\Phi - D\_1)^4}} \begin{bmatrix} \frac{0.36(D\Phi - m)^2}{f^2(D\_1)} + \frac{(D\Phi - D\_1)^2}{f^2(m)} + \frac{0.36(m - D\_1)^2}{f^2(D\_\Phi)} \\ - \frac{0.4(D\_\Phi - D\_1)(D\_\Phi - m)}{f(D\_1)f(m)} + \frac{0.08(m - D\_1)(D\_\Phi - m)}{f(D\_1)f(D\_\Phi)} \\ - \frac{0.4(D\_\Phi - D\_1)(m - D\_1)}{f(m)f(D\_\Phi)} \end{bmatrix} \tag{17}$$

#### *2.2. Hypothesis Testing*

Hypothesis testing related to *skewness* is a crucial issue in practical application. For instance, the assumption *Skewness* = 0 is tantamount to the symmetry. Generally, to test *H*<sup>0</sup> : *Skewness* = γ0, the test statistic can be

$$T\_0 = \sqrt{n} \left(\frac{SK - \gamma\_0}{\partial}\right) \tag{18}$$

Similar to the methodology provided in Theorem 1, it can prove that with the null hypothesis, *T*<sup>0</sup> has, asymptotically, standard normal distribution.

#### **3. Asymptotic Properties of the Proposed Statistic**

In this part, many data sets are drawn to analyze the performance of the proposed approach, for distinct symmetric distributions and divergent sample sizes. Firstly, we checked that the given CI and test statistic are truly the asymptotic CI and test statistic. For every parameter, the experiential coverage probability (percentage of runs for which the given CI contains zero (true skewness)) was calculated by relying on 10,000 repetitions using *statistical R 3.6.2 and SPSS 25* software. In addition, for each repetition, the value of the given test statistic is presented and normal Q–Q plots of the given test statistic are provided. The Shapiro-Wilk's normality test is used to confirm the normality of the given test statistic. The experiential coverage probabilities for divergent parameters are illustrated as in Table 1.


**Table 1.** The experiential coverage probability of the proposed confidence interval.

The results show that the experiential coverage probability of proposed approach is more than nominal level (0.95), especially when the sample sizes grow. In the other hand, we can admit the given CI as the asymptotic CI for the skewness of population. Figure 1 and Table 2 show the Q–Q plots for the standard normal distribution and the results of Shapiro-Wilk's normality test in the test statistic, respectively.

**Table 2.** Shapiro-Wilk's normality test *p*-value for the given test statistic.


U(0,1) 0.3144 0.5566 0.6034 0.6219 0.8249 0.9488

**Figure 1.** The Q–Q plots versus standard normal distribution. Normal distribution: = 50 (a), = 1000 (b). *t distribution:*  = 50 (c), = 1000 (d). Uniform distribution: = 50 (e), = 1000 (f). **Figure 1.** The Q–Q plots versus standard normal distribution. Normal distribution: *n* = 50 (**a**), *n* = 1000 (**b**). *t distribution: n* = 50 (**c**), *n* = 1000 (**d**). Uniform distribution: *n* = 50 (**e**), *n* = 1000 (**f**).

It can be then seen that the asymptotic properties are relatively satisfied in all situations (p-value is greater than 5%). Thereafter, it can be seen that our approach is a good choice to build a CI and It can be then seen that the asymptotic properties are relatively satisfied in all situations (p-value is greater than 5%). Thereafter, it can be seen that our approach is a good choice to build a CI and execute hypothesis testing for the skewness of a population.

#### execute hypothesis testing for the skewness of a population. **4. Comparison with Alternative Measures**

asymmetry are summarized in Table 3.

**4. Comparison with Alternative Measures**  To check the performances of the considered statistic, its power to detect asymmetry is compared with the conventional measures of skewness by employing a Monte Carlo simulation. As in Section 3, numerous data sets were drawn to check the performances of the measures, for different asymmetric distributions and different sample sizes using R software. For this purpose, we generated 10,000 samples of size = 10, 20, 50, from a chi-square distribution with m degrees of freedom, (ଶ()). We considered three cases: extremely skewed (*m* = 1), moderately skewed (*m* = 5) and To check the performances of the considered statistic, its power to detect asymmetry is compared with the conventional measures of skewness by employing a Monte Carlo simulation. As in Section 3, numerous data sets were drawn to check the performances of the measures, for different asymmetric distributions and different sample sizes using R software. For this purpose, we generated 10,000 samples of size *<sup>n</sup>* = 10, 20, 50, from a chi-square distribution with m degrees of freedom, χ 2 (*m*) . We considered three cases: extremely skewed (*m* = 1), moderately skewed (*m* = 5) and slightly skewed (*m* = 40). The powers (at 5% significant level) of different measures to detect asymmetry are summarized in Table 3.

slightly skewed (*m* = 40). The powers (at 5% significant level) of different measures to detect


**Table 3.** The powers of different measures to detect skewness.

As preliminary results, based on the maximum power, it can be observed that the performances of *SK*, γ<sup>1</sup> , *SKG*1, *SKG*<sup>2</sup> and *SKG*<sup>3</sup> are approximately similar and are more powerful than other methods for all simulated datasets, and are therefore are very promising. The performances of *SKP*, *SKP*<sup>2</sup> and *SK<sup>Y</sup>* are approximately similar and have the next best ranks, while *SK<sup>B</sup>* has the worst performance in all situations. In general, the measures that are based on the extreme values (maximum and minimum), such as three Galip's coefficients of skewness, and those based on the first and the last deciles (*D*<sup>1</sup> and *D*9), are more effective than other methods, because of their better performances and easy calculations.

#### **5. Discussion**

In this work, at first, we considered the definition of skewness based on deciles, and then studied its asymptotic properties. The results showed that the experiential coverage probability of this measure was more than nominal level (0.95), especially when the sample size was increased. The Q–Q plots versus the standard normal distribution and the results of Shapiro-Wilk's normality test verified the theoretical asymptotic properties. Finally, the power of the considered statistic to detect symmetry and asymmetry was compared with the powers of other measures of skewness. The power study indicated that the performances of decile-based measure and three Galip's coefficients of skewness were approximately similar, and were more powerful than other methods for all simulated datasets, and are therefore are promising for application in practice.

#### **6. Conclusions**

We presented a simple measure to find skewness in patterns. The new measure relies on a new definition of skewness that contains many outstanding advantages. The proposed coefficient of skewness could be obviously calculated with only three short statistics; i.e., the first and nine deacons and the median. The strength of the proposed statistic to find symmetry and asymmetry was studied by employing numerous Monte Carlo simulations. The results show that the performance of new statistic is generally very good in the simulation. There are many definitions to describe symmetry and asymmetry. To investigate the skewness in datasets including outliers, we should use the measures that consider the effects of outliers. Therefore, probably, the measures that are based on the extreme values (maximum and minimum), such as three Galip's coefficients of skewness; those based on the first and the last quartiles (*Q*<sup>1</sup> and *Q*3), such as Bowley's coefficient of skewness; and those based on the first and the last deciles (*D*<sup>1</sup> and *D*9), are candidates for application. Other studies showed that Galip's coefficients of skewness are more powerful for detecting symmetry and asymmetry. There is no deep study about the definition of skewness based on deciles and a comparison between them and other alternatives. In this work, at first, we considered the definition of skewness based on deciles and then studied its asymptotic properties. Finally, the power of the considered statistic to detect symmetry and asymmetry was compared with the powers of other measures of skewness. For future works, we suggest readers to use a definition of skewness based on combinations of more deciles, not only the first and the ninth deciles. We think this combination will improve the detection of symmetry and asymmetry.

**Author Contributions:** Conceptualization, M.R.M., R.N., D.B. and K.-H.P.; data curation, M.R.M.; formal analysis, M.R.M., R.N. and K.-H.P.; investigation, M.R.M., R.N. and D.B.; methodology, M.R.M. and K.-H.P.; project administration, D.B.; supervision, M.R.M.; validation, M.R.M.; visualization, M.R.M.; writing—original draft, M.R.M. and R.N.; writing—review and editing, D.B. and K.-H.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Symmetry* Editorial Office E-mail: symmetry@mdpi.com www.mdpi.com/journal/symmetry

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34

www.mdpi.com

ISBN 978-3-0365-5257-6