*2.1. Moments*

The following proposition presents an expression to compute the *k*-th moment of a random variable APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*).

**Proposition 2.** *Let X* ∼ APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*)*, then*

$$\mathbb{E}\left[X^{k}\right] = \mathbb{E}\left[\left(\mathcal{F}\_{ST}^{-1}(\mathbf{Y};\lambda,\nu)\right)^{k}\right] \tag{4}$$

*where Y follows a* Beta(*<sup>α</sup>*, 1) *distribution and* F −1 *ST* (·; *λ*, *ν*) *is the inverse of the function* F*ST*(·; *λ*, *<sup>ν</sup>*)*.*

**Proof.** We have by definition that

$$\mathbb{E}\left[X^{k}\right] = \int\_{\mathbb{R}} \mathfrak{x}^{k} d\mathfrak{x} f\_{ST}(\mathfrak{x}) \left(\mathcal{F}\_{ST}(\mathfrak{x}; \lambda, \mathfrak{v})\right)^{a-1} d\mathfrak{x}$$

thus, letting *y* = F*ST*(*<sup>x</sup>*; *λ*, *<sup>ν</sup>*), then *x* = F −1 *ST*(*y*; *λ*, *<sup>ν</sup>*), it follows that

$$\mathbb{E}\left[X^{k}\right] = \int\_{0}^{1} \mathfrak{a}\left(\mathcal{F}\_{ST}^{-1}(y;\lambda,\nu)\right)^{k} y^{\alpha-1} dy$$

which is the expected value of the function F −1 *ST* (*<sup>Y</sup>*; *λ*, *<sup>ν</sup>*)*<sup>k</sup>*, where *Y* follows a beta distribution with parameters *α* and 1.

The indices of skewness ("*β*1) and kurtosis (*β*2) of APST distribution can be calculated by using the moments (4) as follows,

$$\sqrt{\beta\_1} = \frac{\mu\_3 - 3\mu\_1\mu\_2 + 2\mu\_1^3}{(\mu\_2 - \mu\_1^2)^{3/2}} \quad \text{and} \quad \beta\_2 = \frac{\mu\_4 - 4\mu\_1\mu\_3 + 6\mu\_2\mu\_1^2 - 3\mu\_1^4}{(\mu\_2 - \mu\_1^2)^2}$$

where *μk* = E[*X<sup>k</sup>*] for *k* = 1, ... , 4. Table 1 presents the ranges of possible values for the indices of asymmetry and kurtosis for ST(*<sup>λ</sup>*, *<sup>ν</sup>*), PT(*<sup>α</sup>*, *<sup>ν</sup>*), and APST(*<sup>λ</sup>*, *α*, *ν*) distributions, for values of *λ* between −40 and 40, values of *α* between 0.5 and 50, and for values of *ν* = 2, 3, 4, 5, 6, 7. It can seen from Table 1 that the length of the admissible intervals for the skewness and the kurtosis parameters of the APST distribution are larger than the corresponding intervals of the ST and PT distributions. This is an indicator that the APST model is more flexible in terms of asymmetry and kurtosis than the ST and PT models.

**Table 1.** Skewness and kurtosis for the models ST(*<sup>λ</sup>*, *<sup>ν</sup>*), PT(*<sup>α</sup>*, *<sup>ν</sup>*), and APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*), for *λ* ∈ (−40, <sup>40</sup>), *α* ∈ (0.5, 50) and *ν* = 2, . . . 7.


*2.2. Distribution Function*

**Proposition 3.** *Let X* ∼ APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*)*, then the CDF of X, namely,* F*APST*(*<sup>x</sup>*; *λ*, *α*, *ν*) *is*

$$\mathcal{F}\_{APST}(\mathbf{x}; \lambda, \mathbf{u}, \nu) = \left[ \mathcal{F}\_{ST}(\mathbf{x}; \lambda, \nu) \right]^{\mathbb{K}}, \quad \mathbf{x} \in \mathbb{R}. \tag{5}$$

**Proof.** The proof is immediate and it follows from results of Durrans [2].

The inversion method can be used to generate a random variable with APST distribution. Thus, taking *λ* ∈ R, *α*, *ν* ∈ R<sup>+</sup> and a random variable with uniform distribution, namely, *U* ∼ U(0, <sup>1</sup>), random variable *X* with APST(*<sup>λ</sup>*, *α*, *ν*) distribution is generated by taking

$$X = \mathcal{F}\_{ST}^{-1}\left(U^{1/\alpha}; \lambda, \nu\right).$$

**Remark 1.** *We consider a truncated* APST(*<sup>λ</sup>*, *α*) *distribution to obtain a new and useful lifetime distribution. A random variable T has a truncated alpha-power skew-t distribution (at zero), denoted by TAPST*(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*)*, if its PDF is given by*

$$f(t) = \frac{a f\_{ST}(t, \lambda, \nu) \left[\mathcal{F}\_{ST}(t, \lambda, \nu)\right]^{a-1}}{1 - \left[\mathcal{F}\_{ST}(0, \lambda, \nu)\right]^a}; \quad t > 0\tag{6}$$

*The survival and hazard rate functions of a random variable T following a* TAPST(*<sup>λ</sup>*, *α*, *ν*) *distribution are given by*

$$\mathcal{S}\_T(t) = P(T > t) = \frac{1 - \left[\mathcal{F}\_{ST}(0, \lambda, \nu)\right]^d - \left[\mathcal{F}\_{ST}(t, \lambda, \nu)\right]^d}{1 - \left[\mathcal{F}\_{ST}(0, \lambda, \nu)\right]^d}; \quad t > 0 \tag{7}$$

*and*

$$h\_T(t) = \frac{a f\_{ST}(t, \lambda, \nu) \left[ f\_{ST}(t, \lambda, \nu) \right]^{a-1}}{1 - \left[ \mathcal{F}\_{ST}(0, \lambda, \nu) \right]^a - \left[ \mathcal{F}\_{ST}(t, \lambda, \nu) \right]^a}; \quad t > 0 \tag{8}$$

*respectively.*

#### *2.3. Location and Scale Extension*

We can also consider a generalization of a APST distribution by adding location and scale parameters. The following definition gives a generalization of the APST model.

**Definition 2.** *Let X* ∼ APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*)*. The APST density of location and scale is defined as the distribution of Y* = *μ* + *σX, for μ* ∈ R *and σ* > 0*. The corresponding PDF is given by*

$$f\_{\rm APST}(y;\mu,\sigma,\lambda,\mu,\nu) = \frac{a}{\sigma} f\_{\rm ST}\left(\frac{y-\mu}{\sigma};\lambda,\nu\right) \left[\mathcal{F}\_{\rm ST}\left(\frac{y-\mu}{\sigma};\lambda,\nu\right)\right]^{a-1}, \text{ x} \in \mathbb{R},\tag{9}$$

*for λ* ∈ R *and α*, *ν* ∈ R+*. A random variable following a APST distribution of location and scale is denoted by Y* ∼ APST(*μ*, *σ*, *λ*, *α*, *<sup>ν</sup>*)*.*

The *k*-th moment of a random variable *Y* ∼ APST(*μ*, *σ*, *λ*, *α*, *ν*) can be obtained from the formula

$$\mathbb{E}\left[\mathcal{Y}^k\right] = \sum\_{i=0}^k \binom{k}{i} \mu^i \sigma^{k-i} \mathbb{E}\left[\mathcal{X}^{k-i}\right] \mathcal{X}$$

where *X* ∼ APST(*<sup>λ</sup>*, *α*, *<sup>ν</sup>*).

#### **3. Statistical Inference for APST Distribution**

This section concerns likelihood inference about the parameter vector *θ* = (*μ*, *σ*, *λ*, *α*, *ν*)# of the location-scale family defined in Equation (9). Let **Y** = (*<sup>Y</sup>*1, ... ,*Yn*)# be a random sample of the distribution APST(*μ*, *σ*, *λ*, *α*, *<sup>ν</sup>*). The log-likelihood function for *θ* = (*μ*, *σ*, *λ*, *α*, *ν*)# can be written as follows,

$$\begin{split} \ell(\theta; \mathbf{Y}) & \propto n \log a - n \log \sigma - \frac{n}{2} \log \nu \\ &+ n \log \Gamma \left( \frac{\nu + 1}{2} \right) - n \log \Gamma \left( \frac{\nu}{2} \right) - \frac{\nu + 1}{2} \sum\_{i=1}^{n} \log \left( 1 + \frac{z\_i^2}{\nu} \right) \\ &+ \sum\_{i=1}^{n} \log \mathcal{F}\_T \left( \lambda z\_i \sqrt{\frac{\nu + 1}{z\_i^2 + \nu}}; \nu + 1 \right) + (a - 1) \sum\_{i=1}^{n} \log \mathcal{F}\_{ST} \{ z\_i; \lambda, \nu \} \end{split} \tag{10}$$

where *zi* = (*yi* − *μ*)/*<sup>σ</sup>*. Thus, by differentiating the log-likelihood function, we obtain the following score equations,

*∂*(*<sup>θ</sup>*; **Y**) *∂μ* = *ν* + 1 *σν n* ∑ *i*=1 *zi* 1 + *z*2*iν* −1 − *λ σ n* ∑ *i*=1 *wi* 1 + *z*2*iν* −1 *fT*.*<sup>λ</sup>ziwi*; *ν* + 1/ <sup>F</sup>*T*.*<sup>λ</sup>ziwi*; *ν* + 1/ − *α* − 1 *σ n*∑*i*=1 *fST*(*zi*; *λ*, *ν*) F*ST*(*zi*; *λ*, *ν*) = 0 (11) *∂*(*<sup>θ</sup>*; **Y**) *∂σ* = − *n σ* + *ν* + 1 *σν n* ∑ *i*=1 *z*2*i* 1 + *z*2*iν* −1 − *λσ n*∑*i*=1 *ziwi* 1 + *z*2*iν* −1 *fT*.*<sup>λ</sup>ziwi*; *ν* + 1/ <sup>F</sup>*T*.*<sup>λ</sup>ziwi*; *ν* + 1/ − *α* − 1 *σ n* ∑ *i*=1 *zi fST*(*zi*; *λ*, *ν*) F*ST*(*zi*; *λ*, *ν*) = 0 (12)

*Symmetry* **2020**, *12*, 82

*n*

$$\frac{\partial \ell(\boldsymbol{\theta}; \mathbf{Y})}{\partial \lambda} = \sum\_{i=1}^{n} z\_i w\_i \frac{f\_{\Gamma}(\lambda z\_i w\_{i\prime}; \boldsymbol{\nu} + 1)}{\mathcal{F}\_{\Gamma}(\lambda z\_i w\_{i\prime}; \boldsymbol{\nu} + 1)} - \frac{a - 1}{\pi (1 + \lambda^2)} \sum\_{i=1}^{n} \frac{\left(1 + (1 + \lambda^2) z\_i^2 / \nu \right)^{-\frac{\nu}{2}}}{\mathcal{F}\_{\Gamma \Gamma}(z\_i; \lambda, \boldsymbol{\nu})} = 0,\tag{13}$$

*∂*(*<sup>θ</sup>*; **Y**) *∂α* = *n α* + ∑ *i*=1 log F*ST*(*zi*; *λ*, *ν*) = 0, (14) *∂*(*<sup>θ</sup>*; **Y**) *∂ν* = *nα* 2 \$*ψ* \$*ν* + 1 2 % − *ψ ν*2 − 1*ν*% − 12 *n*∑*i*=1 log 1 + *z*2*iν* + *ν* + 1 2*ν*<sup>2</sup> *n* ∑ *i*=1 *z*2*i* 1 + *z*2*iν* −1 + *λ* <sup>2</sup>*ν*(*ν* + 1) *n* ∑ *i*=1 *z*3*i wi* 1 + *z*2*iν* −1 *fT*.*<sup>λ</sup>ziwi*; *ν* + 1/ <sup>F</sup>*T*.*<sup>λ</sup>ziwi*; *ν* + 1/ − *λ* <sup>2</sup>*ν*(*ν* + 1) *n* ∑ *i*=1 *ziwi* 1 + *z*2*iν* −1 *fT*.*<sup>λ</sup>ziwi*; *ν* + 1/ <sup>F</sup>*T*.*<sup>λ</sup>ziwi*; *ν* + 1/ − (*α* − 1) <sup>2</sup>*π*(*ν* + 1) *λ* (1 + *λ*<sup>2</sup>) *n* ∑ *i*=1 .1 + (1 + *<sup>λ</sup>*<sup>2</sup>)*z*2*i* /*ν*/<sup>−</sup> *ν*2 F*ST*(*zi*; *λ*, *ν*) + *α* − 1 2 *n* ∑ *i*=1 *g*(*zi*; *ν*) <sup>F</sup>*ST*.*zi*; *λ*, *ν*/ = 0 (15)

where *ψ*(·) is the digamma function, *wi* = 1 *ν*+1 *x*2*i* +*ν* for *i* = 1, ... , *n*, and *g*(*x*; *ν*) is the function defined by

$$g(\mathbf{x};\nu) = \int\_{-\infty}^{\mathbf{x}} \left\{ \frac{(\nu+1)}{\nu^2} \mathbf{s}^2 \left(1 + \frac{\mathbf{s}^2}{\nu}\right)^{-1} - \log\left(1 + \frac{\mathbf{s}^2}{\nu}\right) \right\} f\_{ST}(\mathbf{s};\lambda,\nu) d\mathbf{s}$$

$$-\frac{\lambda}{\pi\nu} \int\_{-\infty}^{\mathbf{x}} \mathbf{s} \left(1 + \frac{\mathbf{s}^2}{\nu}\right)^{-1} \left\{ 1 + (1 + \lambda^2) \frac{\mathbf{s}^2}{\nu} \right\}^{-\frac{\nu+2}{2}} d\mathbf{s} \tag{16}$$

Equations (11)–(15) include nonlinear functions; therefore, it is not possible to obtain explicit forms of the maximum likelihood estimators (MLEs), and they must be calculated by using numerical methods. In this work, we used the *maxLik* function of R Development Core Team [15] which uses the Newton–Raphson optimization method. The elements of the observed information matrix are easily obtained after calculating the second derivative of the log-likelihood function and multiplying by −1, that is,

$$j\_{\boldsymbol{\theta}\_{i}\boldsymbol{\theta}\_{k}} = -\frac{\partial \ell(\boldsymbol{\theta}; \mathbf{Y})}{\partial \theta\_{i} \partial \theta\_{k}}, \quad i, k = 1, 2, \dots, 5 \tag{17}$$

where *θ* = (*μ*, *σ*, *λ*, *α*, *<sup>ν</sup>*)#. This elements are given in the Appendix A. To find the standard errors (EE) of the MLEs and calculate confidence intervals, the information matrix **I** (or Fisher information) must be calculated, which is defined as the expected value of the second derived from the log-likelihood function or less the expected value of the Hessian matrix; from this matrix, we calculate the EE as the diagonal elements of the inverse of this matrix. The elements of the **I** matrix are obtained as

$$\mathbf{I}(i,k) = -E\left(\frac{\partial \ell(\boldsymbol{\theta};\mathbf{Y})}{\partial \theta\_l \partial \theta\_k}\right), \quad i,k = 1,2,...,5\tag{18}$$

The role of the Fisher information in the asymptotic theory of maximum-likelihood estimation was emphasized by Ronald Fisher following some initial results by Francis Edgeworth, see Lehman and Casella [16] and Frieden [17] for more details. The Fisher-information matrix is used to calculate the covariance matrices associated with maximum-likelihood estimates, and it can also be used in the formulation of test statistics, such as the Wald test.

As the expected value under the APST distribution and the second-order derivatives are not direct, numerical methods must be used to obtain the explicit form of the information matrix **I**. Therefore, we use the observed information matrix to calculate the standard errors in the rest of the document.

When *ν* tends to infinite the ST distribution converges to the SN distribution and we recall that the information matrix of a random variable *X* ∼ SN(*μ*, *σ*, *λ*) which is denoted by **<sup>I</sup>***λ*(*ϕ*), where *ϕ* = (*μ*, *σ*, *<sup>λ</sup>*)#, is singular for *λ* = 0. Therefore, it is convenient to use a centered parameterization of the ST distribution proposed by Arellano-Valle and Azzalini [18].

The centered parameterization of the SN distribution was proposed as an alternative to the problem of singularity of the information matrix of the SN when *λ* = 0. Arellano-Valle and Azzalini [19] proposed a second representation of the SN by defining a new random variable *X* as

$$X = \mu + \sigma \left( \frac{Z - \mathbb{E}[Z]}{\sqrt{\text{Var}[Z]}} \right) \sigma$$

where *μ* ∈ R and *σ* > 0 are parameters of the random variable *X* and *Z* ∼ SN(*λ*). This representation is called centered parameterization, as E[*X*] = *μ* and Var[*X*] = *σ*<sup>2</sup> and it is denoted by CSN(*μ*, *σ*, *<sup>γ</sup>*1), where −0.9953 < *γ*1 < 0.9953. Under the centered parameterization model, *μ*, *σ*, and *γ*1 = "*β*1 represent the mean, the standard deviation and the skewness index of *X*, respectively. If *Z* ∼ SN(*λ*) then E[*Z*] = *bδ* and Var[*Z*] = 1 − (*bδ*)2, where *b* = √2/*π* and *δ* = *λ*/√1 + *λ*2; it has that the random variable *X* can be written as *X* = *μ* + *σZ* which has SN(*<sup>λ</sup>*1, *λ*2, *λ*) distribution, where

$$
\lambda\_1 = \mu - c\sigma \gamma\_1^{1/3}, \quad \lambda\_2 = \sigma \sqrt{1 + c^2 \gamma\_1^{2/3}}, \quad \lambda = \frac{c\gamma\_1^{1/3}}{\sqrt{b^2 + c^2(b^2 - 1)\gamma\_1^{2/3}}} \tag{19}
$$

with *c* = {2/(4 − *π*)}1/3. Under this denomination, the information matrix can be written as **<sup>I</sup>***γ*1 = **<sup>D</sup>**#**I***λ***D**, where **D** is a matrix that represents the derivative of the parameters of the standard representation (*λ*1, *λ*2 and *λ*) regarding to the new parameters (*μ*, *σ* and *γ*1). It also follows that the information matrix converges to a diagonal matrix **Σ**−<sup>1</sup> *c* = diag(*σ*2, *σ*2/2, 6) when *λ* → 0. This guarantees the existence and uniqueness of the MLEs of *λ*1 and *λ*2 for each fixed value of *λ*.

Following this same line of thought, we suppose that *Y* follows the model (1) with location parameter *μ* ∈ R and scale parameter *σ* > 0, that is,

$$f\_{ST}(y; \mu, \sigma, \lambda, \nu) = \frac{2}{\sigma} f\_T\left(\frac{y-\mu}{\sigma}; \nu\right) \mathcal{F}\_T\left(\lambda \sqrt{\frac{\nu+1}{Q\_y+\nu}} \left(\frac{y-\mu}{\sigma}\right); \nu+1\right), \quad y \in \mathbb{R} \tag{20}$$

where *λ* ∈ R and *Qy* = ((*y* − *μ*)/*σ*)2. This representation relates to the direct parameterization of the ST distribution with parameter vector *ρ* = (*μ*, *σ*, *λ*, *<sup>ν</sup>*)#. It follows that *ZT* = (*Y* − *μ*)/*σ* ∼ ST(*<sup>λ</sup>*, *<sup>ν</sup>*), and by the stochastic representation of the ST distribution is given by *ZT* = *<sup>Z</sup>*/√*<sup>V</sup>*, where *Z* ∼ SN(*λ*) and *V* ∼ *<sup>χ</sup>*2*ν*/*<sup>ν</sup>*. This entails to compute the first four cumulants of *ZT* denoted by *μ*1(*<sup>δ</sup>*, *<sup>ν</sup>*), *μ*2(*<sup>δ</sup>*, *<sup>ν</sup>*), *μ*3(*<sup>δ</sup>*, *ν*) and *μ*4(*<sup>δ</sup>*, *<sup>ν</sup>*), see [18]. The centered parameterization of the ST distribution of a random variable Y comes by defining

$$\mu\_t = \mathbb{E}[\boldsymbol{Y}] = \mu + \sigma \mu\_1(\delta, \nu) = \mu + \sigma b\_\nu \delta$$

$$
\sigma\_t^2 = \text{Var}[\mathcal{Y}] = \sigma^2 \mu\_2(\delta, \nu) = \eta^2 \left\{ \frac{\nu}{\nu - 2} - b\_\nu^2 \delta^2 \right\},
$$

*Symmetry* **2020**, *12*, 82

$$\begin{aligned} \gamma\_{1t} &= \frac{\mu\_3(\delta, \nu)}{\mu\_2(\delta, \nu)^{3/2}} = \frac{b\_\nu \delta}{\mu\_2(\delta, \nu)^{3/2}} \left\{ \frac{\nu(3 - \delta^2)}{\nu - 3} - \frac{3\nu}{\nu - 2} + 2b\_\nu^2 \delta^2 \right\} \\\\ \gamma\_{2t} &= \frac{\mu\_4(\delta, \nu)}{\mu\_2(\delta, \nu)^2} = \frac{1}{\mu\_2(\delta, \nu)^2} \left\{ \frac{3\nu^2}{(\nu - 2)(\nu - 4)} - \frac{4b\_\nu^2 \delta^2 \nu(3 - \delta^2) + \frac{6b\_\nu^2 \delta^2 \nu}{\nu - 2} - 4b\_\nu^4 \delta^4}{\nu - 3} \right\} - 3.1 \end{aligned}$$

The new representation is defined as the centered skew-*t* distribution with parameter vector *ρ***˜** = (*μ*, *σ*2, *γ*1, *<sup>γ</sup>*2)#. According to Arellano-Valle and Azzalini [18], the information matrix of this representation can be written as

$$\mathbf{I}(\boldsymbol{\rho}) = \mathbf{B}^{\top} \mathbf{I}(\boldsymbol{\rho}) \mathbf{B}\_{\prime}$$

where **B** is a matrix representing the derivative of the parameter vector *ρ* with respect to the new vector *ρ***˜**. It can shown that *bν* → *b* when *ν* → <sup>∞</sup>, see [18]. Therefore, the parameters of the centered ST model converge to *μt* → *μ*, *σ*<sup>2</sup> *t* → *σ*2, and *γ*1*t* → *γ*1 when *ν* → <sup>∞</sup>, that is, the parameters of the CSN. As *ZT* → SN(*λ*) when *ν* → <sup>∞</sup>, it follows that the random variable *Y* converges to a distribution with information matrix 

$$\mathbf{I}(\mu, \sigma^2, \gamma\_1, \kappa) = \begin{pmatrix} \mathbf{I}\_{\theta\_1 \theta\_1} & \mathbf{I}\_{\theta\_1 \kappa} \\ \mathbf{I}\_{\theta\_1 \kappa}^\top & I\_{a, a} \end{pmatrix},\tag{21}$$

where the elements of the diagonal correspond to the information of the parameter vector *θ*1 = (*μ*, *σ*2, *<sup>γ</sup>*1) and *α*, and **<sup>I</sup>***θ*1,*<sup>α</sup>* is the joint information of *θ*1 = (*μ*, *σ*2, *<sup>γ</sup>*1)# and *α*. Now, when *λ* → 0 and *α* = 1, it can be shown that **<sup>I</sup>***θ*1*θ*1 → diag(*σ*2, *σ*2/2, <sup>6</sup>), with determinant equal to 0.3333/*σ*4, and **<sup>I</sup>***θ*1,*<sup>α</sup>* = (0.9031/*<sup>σ</sup>*, <sup>−</sup>0.5956/*<sup>σ</sup>*, 0.7206)#; therefore, the determinant |**I**(*μ*, *σ*2, *γ*1, *α*)| = 0, and it concludes that the random variable Y converges to a distribution with information matrix non-singular when *ν* tends to infinite.

#### *3.1. Extension to Censored Data*

Based on the goodness of the APST distribution to fit asymmetric and heavy-tailed data, in this section we introduce the censored APST model which we will be denote by CAPST.

**Definition 3.** *Suppose that the random variable Y follows APST distribution, and consider a random sample Y* = ( *Y*1,*Y*2, ... ,*Yn*) *where only the Yi values greater than a constant k are recorded. In addition, for values Yi* ≤ *k only the value of k is recorded. Therefore, for i* = 1, 2, . . . , *n, the observed values Yo ican be written as*

$$\mathcal{Y}\_i^o = \begin{cases} k\_\prime & \text{if } \mathcal{Y}\_i \le k\_\prime \\ \mathcal{Y}\_{i\prime} & \text{if } \mathcal{Y}\_i > k\_\prime \end{cases}$$

*The resulting sample is said to be a censored APST, and we say that Y is a censored random variable APST. We will use the notation Y* ∼ CAPST(*θ*)*, where θ* = (*μ*, *σ*, *λ*, *α*, *<sup>ν</sup>*)#*.*

From Definition 3 it follows that *P*(*Yo i* = *k*) = *P*(*Yi* ≤ *k*) = {F*ST* ((*k* − *μ*)/*σ*)}*<sup>α</sup>* and for the observations *Yo i* = *Yi*, the distribution of *Yo i* is the same of *Yi*, i.e., *Yo i* ∼ APST(*θ*). For convenience, we choose to work with the case of left-censored data; however, the followings results can be extended to other types of censorship.

#### *3.2. Properties of the CAPST Model*

Let *Y* ∼ CAPST(*μ*, *σ*, *λ*, *α*, *<sup>ν</sup>*),


The estimates of the parameters of the model can be obtained via maximum likelihood method, where the log-likelihood function is given by

$$\begin{split} \ell(\boldsymbol{\theta};\mathbf{Y}) & \propto a \sum\_{0} \log \mathcal{F}\_{ST} \left( \frac{k-\mu}{\nu}; \lambda\_{\boldsymbol{\nu}} \boldsymbol{\nu} \right) + n\_{1} \log a - n\_{1} \log \boldsymbol{\nu} - \frac{n\_{1}}{2} \log \boldsymbol{\nu} \\ & \quad + n\_{1} \log \Gamma \left( \frac{\boldsymbol{\nu} + 1}{2} \right) - n\_{1} \log \Gamma \left( \frac{\boldsymbol{\nu}}{2} \right) - \frac{\boldsymbol{\nu} + 1}{2} \sum\_{1} \log \left( 1 + \frac{\boldsymbol{x}\_{i}^{2}}{\boldsymbol{\nu}} \right) \\ & \quad + \sum\_{1} \log \mathcal{F}\_{T} \left( \lambda \boldsymbol{x}\_{i} \sqrt{\frac{\boldsymbol{\nu} + 1}{\boldsymbol{x}\_{i}^{2} + \boldsymbol{\nu}}}; \boldsymbol{\nu} + 1 \right) + (\boldsymbol{\alpha} - 1) \sum\_{1} \log \mathcal{F}\_{ST} \left( \boldsymbol{x}\_{i}; \lambda, \boldsymbol{\nu} \right) \end{split}$$

where *xi* = (*yi* − *μ*)/*σ*; ∑1 and ∑0 are the sum over censored individuals and uncensored individuals, respectively; and *n*1 is the number of uncensored individuals.

#### **4. Real Data Applications**

In this section, we illustrate the applicability of the proposed model in Section 2 by analyzing two data sets. We use the statistical software *R* [15], version 3.5.3 with the package *maxLike* for maximizing the corresponding likelihood functions. For comparing purposes of various models, the AIC Akaike [20], BIC Schwarz [21], and corrected AIC (CAIC) Bozdogan [22] information criteria were used.

#### *4.1. Application 1: Volcano Heights Data*

**Table**

Consider the data set related to heights of 1520 volcanoes in the world which is available in website dx.doi.org/10.5479/si.GVP.VOTW4-2013 [23]. Table 2 presents the summary statistics for the data set. It can be noted that the asymmetry and kurtosis indices seem to indicate that the use of an asymmetric and heavy-tailed model is appropriate to analyze this data set. We analyzed these data by fitting the Student-*t*, ST, PT, and APST distributions.



Table 3 shows the parameter estimates, together with their corresponding standard errors (SE). Note that the values of the standard errors of the *μ* and *σ* estimates for the APST model are smaller than the corresponding standard errors of the respective parameters for the Student-*t*, ST, and PT models. Table 3 also presents some model selection criteria, together with the values of the log-likelihood. The AIC, BIC, and CAIC criteria indicate that the APST model seems to provide better fit to the volcanoes heights data than the T, ST, and PT models, supporting the asymmetry assertion of the volcano's heights variable. Figure 2 shows the graphs *QQplot* of the fitted models. It can be clearly seen from the figure that the APST model fits the data better than the Student-*t*, ST, and PT models. In addition, we can use the likelihood ratio (LR) test statistic to conform our claim. To do this, we consider the following hypotheses,

$$H\_0: (\lambda, a) = (0, 1) \text{ (T( $\mu$ ,  $\sigma$ ,  $\nu$ ))}\quad \text{vs.}\quad H\_1: (\lambda, a) \neq (0, 1) \text{ (APST( $\mu$ ,  $\sigma$ ,  $\lambda$ , a,  $\nu$ ))},$$

*Symmetry* **2020**, *12*, 82

The value of the LR test statistic is −2 log(Λ) = −2 . *T*(<sup>ˆ</sup> *θ*) − *APST*(<sup>ˆ</sup> *θ*) / =134.823 and comparing this quantity with *χ*2 2 =5.9914, the null hypotheses is rejected. The APST model is also compared with the ST and PT models by considering the hypotheses

$$H\_{01}: a = 1 \text{ (ST}(\mu, \sigma, \lambda, \nu)) \quad \text{v.s.} \quad H\_{11}: a \neq 1 \text{ (APST}(\mu, \sigma, \lambda, a, \nu)),$$

and

$$H\_{02}: \lambda = 0 \text{ (PT}(\mu, \sigma, \mathfrak{a}, \nu)) \quad \text{v.s.} \quad H\_{12}: \lambda \neq 0 \text{ (APST}(\mu, \sigma, \lambda, \mathfrak{a}, \nu)),$$

respectively. The respective values of the LR test statistic are given by −2 log(<sup>Λ</sup>1) = −2 . *ST*(<sup>ˆ</sup> *θ*) − *APST*(<sup>ˆ</sup> *θ*) / =26.620 and −2 log(<sup>Λ</sup>2) = −2 . *PT*(<sup>ˆ</sup> *θ*) − *APST*(<sup>ˆ</sup> *θ*) / =45.660 and comparing these quantities with *χ*2 1 =3.8414, both null hypotheses are rejected. Finally, Figure 3left shows the histogram of the volcano heights variable, whereas Figure 3right presents the empirical CDF (solid line) together with the CDF of the fitted APST model (dotted line).

**Figure 2.** Volcano height data: QQplot for Student-*t*, ST, PT, and APST models.

**Figure 3.** (**Left**) Graph of fitted densities to volcano height data. (**Right**) Empirical CDF and CDF of fitted APST model.


**Table 3.** Parameter estimates (SE) for the fitted models to the volcano height data.

#### *4.2. Application 2: Stellar Abundances Data*

The second data set is related to measurements for 68 solar-type stars, which are available in the package *astrodatR* of the software *R* [24] under the name *Stellar abundances*. These data were previously analyzed Mattos et al. [25] by using the Scale Mixture of Skew Normal Censored Regression (SMSNCR) models. We take only the response variable: log *N(Be)*, which represents the log of the abundance of beryllium scaled to Sun's abundance (i.e., the Sun has log *N*(*Be*) = 0.0)

In astronomical research, a previously identified sample of objects (stars, galaxies, quasars, X-ray sources, etc.) is observed at some new wavebands. According to Feigelson [24], due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. For this dataset we have 12 left-censored data points, i.e., 12 undetected beryllium measurement, that represents 19.35% of observations. Table 4 presents the ML estimates for the parameters of the censored Studen-*t* (CT), censored skew-*t* (CST), censored power-*t* (CPT), and censored alpha-power skew-*t* (CAPST) models, together with their corresponding standard errors. Table 4 also compares the fit of the four models using the model selection criteria (AIC, CAIC and BIC). Note that, again, the CAPST model with heavy tails have better fit than the CT, CST, and CPT models.

To identify atypical observations and/or model mispecification, we analyzed the transformation of the martingale residual, *rMTi* , proposed in Barros et al. [26]. These residuals are defined by

$$r\_{MTi} = \text{sign}(r\_{Mi})\sqrt{-2[r\_{Mi} + \delta\_i \log(\delta\_i - r\_{M\_i})]}, \qquad i = 1, \dots, m$$

where *rMi* = *δi* + log *<sup>S</sup>*(*yi*; *θ***ˆ**) is the martingal residual proposed by Ortega et al. [27], where *δi* = 0, 1 indicates whether the *i*-th observation is censored or not, respectively; sign(*rMi*) denotes the sign of *rMi*; and *<sup>S</sup>*(*yi*; *θ***ˆ**) = *<sup>P</sup>θ***<sup>ˆ</sup>**(*Yi* > *yi*) represents the survival function evaluated at *yi*, where *θ***ˆ** are the MLE for *θ*. The plots of *rMTi* with generated confidence envelopes are presented in Figure 4. From this figure, we can see clearly that the CST, CPT, and CAPST models fit better the data than the CT model, since, in that cases, there are not observations which lie outside the envelopes. The Figure 5 shows the graph of the densities of the different models fitted to the stellar abundances data. From the figure, the CAPST model seems to fit better the stellar abundances data than CT, CST and CPT models.


**Table 4.** Parameter estimates (SE) for the fitted models to the stellar abundances data.

**Figure 4.** Stellar abundances data. Envelopes of transformed martingale residuals for CT, CST, CPT, and CAPST models.

**Figure 5.** Graph of fitted densities to stellar abundances data.
