*Article* **Testing for the Rayleigh Distribution: A New Test with Comparisons to Tests for Exponentiality Based on Transformed Data**

**Gerrit Lodewicus Grobler, Elzanie Bothma \* and James Samuel Allison**

School of Mathematical and Statistical Sciences, Faculty of Natural and Agricultural Sciences, North-West University, Potchefstroom 2531, South Africa; gerrit.grobler@nwu.ac.za (G.L.G.); james.allison@nwu.ac.za (J.S.A.)

**\*** Correspondence: elzanie.bothma@nwu.ac.za

**Abstract:** We propose a new goodness-of-fit test for the Rayleigh distribution which is based on a distributional fixed-point property of the Stein characterization. The limiting null distribution of the test is derived and the consistency against fixed alternatives is also shown. The results of a finite-sample comparison is presented, where we compare the power performance of the new test to a variety of other tests. In addition to existing tests for the Rayleigh distribution we also exploit the link between the exponential and Rayleigh distributions. This allows us to include some powerful tests developed specifically for the exponential distribution in the comparison. It is found that the new test outperforms competing tests for many of the alternative distributions. Interestingly, the highest estimated power, against all alternative distributions considered, is obtained by one of the tests specifically developed for the Rayleigh distribution and not by any of the exponentiality tests based on the transformed data. The use of the new test is illustrated on a real-world COVID-19 data set.

**Keywords:** asymptotics; goodness-of-fit; Monte Carlo simulation; Rayleigh distribution; Stein characterization

**MSC:** 62F03; 62F05

#### **1. Introduction**

In 1880 an acoustics problem gave rise to a distribution that nowadays plays a prominent role in research fields such as reliability theory, life testing and survival analysis (see, e.g., [1]). The Rayleigh distribution was introduced by [2], while undertaking a study regarding the resultant of a great number of sound waves with differing phases. Refs. [3,4] demonstrated the importance of the Rayleigh distribution in communication engineering and electro-vacuum devices, respectively. Ref. [5] found that the Rayleigh distribution has clinical applications, specifically estimating the noise variance of Magnetic Resonance Images (MRI). Ref. [6] discusses this phenomenon and proposed that this estimation can be done by fitting the density function of the Rayleigh distribution to the partial histogram of the MRI. Ref. [7] improved this estimation with the use of background segmentation, by fitting the density function of the Rayleigh distribution to the histogram of the segmented background in order to estimate the noise variance. The estimation of the noise forms a crucial part in efficiently denoising the MRI as well as in the quality assessment of these images. The Rayleigh distribution has also become a popular model in survival analysis and reliability theory, see, e.g., [8,9].

For any of the above-mentioned applications to be relevant, it is crucial to test the hypothesis that the observed data are indeed realisations from a Rayleigh distribution. Since the square of a Rayleigh distributed variable is exponentially distributed, goodnessof-fit tests designed for the exponential distribution can be used to test for the Rayleigh

**Citation:** Grobler, G.L.; Bothma, E.; Allison, J.S. Testing for the Rayleigh Distribution: A New Test with Comparisons to Tests for Exponentiality Based on Transformed Data. *Mathematics* **2022**, *10*, 1316. https://doi.org/10.3390/ math10081316

Academic Editors: Alexandru Agapie, Denis Enachescu, Vlad Stefan Barbu and Bogdan Iftimie

Received: 15 March 2022 Accepted: 7 April 2022 Published: 15 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

distribution—a fact that we investigate further in Section 4. However, even though the applications of the Rayleigh distribution increased significantly over the past few decades, literature on tests specifically developed for the Rayleigh distribution is relatively scarce. Some of these include a test proposed by [10] based on the empirical Laplace transform, a test based on entropy suggested by [11] as well as [12] and an empirical likelihood based test by [13]. It has become a common approach to use distributional characterizations to propose goodness-of-fit testing procedures, see, e.g., Ref. [14] and the references therein. In this paper, we propose a new test for the Rayleigh distribution based on a modification of Stein's characterization discussed by [15].

The standard Stein characterization (see [16]) of the normal distribution states that *Z* is standard normal if, and only if,

$$\mathbb{E}[\mathbb{g}'(Z) - Z\mathbb{g}(Z)] = 0 \tag{1}$$

is true for all absolute continuous functions *g* for which the expectation exists. Some applications, such as goodness-of-fit tests based on (1), are rather complicated, since the results depend on the choice of *g*. Instead of using this relationship, Ref. [17] characterised the standard normal distribution based on the zero bias distribution. A real valued random variable *X*∗ is said to have a *X* zero-bias distribution if

$$\mathbb{E}[\mathcal{g}'(X^\*)] = \mathbb{E}[X\mathcal{g}(X)]$$

holds for all absolutely continuous functions *g* for which the expectation exists. If E*X* = 0 and Var(*X*) = 1, the *X* zero-bias distribution exists, is unique and has distribution function

$$F(t) = \mathbb{E}[X(X - t) \mathbf{1} \{X \le t\}], t \in \mathbb{R}.$$

Using this distribution function, ref. [17] showed that *Z* is standard normal if, and only if, the distribution function of *Z* is given by *F*(*t*). Ref. [15] generalised this method to a wide range of continuous distributions by generalising Stein's characterization. They showed that if *X* has support [0, ∞), then it has distribution *F* if, and only if, the distribution function of *X* is given by

$$F(t) = \mathbb{E}\left[ -\frac{f'(X)}{f(X)} \min\{X, t\} \right], t \in (0, \infty), \tag{2}$$

where *f* is the density of *X*. The result in (2) is true under some regularity conditions on *f* , which will be discussed in Section 2. The characterization in (2) will be used to develop a new goodness-of-fit test specifically for the Rayleigh distribution.

Before proceeding some notation is introduced. Let *X*1, ... , *Xn* be independent and identically distributed (i.i.d.) continuous realisations of a positive random variable *X* with unknown distribution function *F* and density *f* . If *X* follows a Rayleigh distribution with density function

$$f(\mathbf{x}) = \frac{\mathbf{x}}{\theta^2} e^{-\frac{\mathbf{x}^2}{2\theta^2}}, \mathbf{x} \ge 0, \theta > 0,$$

it will be denoted by *X* ∼ *Ral*(*θ*). The composite goodness-of-fit hypothesis to be tested is

$$H\_0: \text{ the distribution of } X \text{ is } \text{Ral}(\theta), \tag{3}$$

for some *θ* > 0, against general alternatives.

The remainder of the article is organised as follows: In Section 2, the new test statistic is introduced. Section 3 contains the basic theoretical results pertaining to the asymptotic behaviour of the test. The results of a Monte Carlo study, where the power performance of the newly proposed test is compared to some existing tests, is given in Section 4. The competing tests also include five powerful tests for exponentiality based on transformed data. The paper concludes in Section 5 with an application of the test to a real-world COVID-19 data set and some concluding remarks.

#### **2. Test Statistic**

For the characterization in (2) to be true the following regularity conditions, see [15], should hold:


$$\text{(V)}\qquad\lim\_{x\to 0}\frac{F(x)}{f(x)} = 0\text{?}$$

$$\text{(VI)}\qquad\lim\_{\text{x}\to\infty}\frac{\overset{\cdot}{1}-\overset{\cdot}{F}(\text{x})}{f(\text{x})}=0.$$

It can easily be seen that conditions (I), (II), (V) and (VI) hold for the Rayleigh distribution. For *X* ∼ *Ral*(*θ*), *κ <sup>f</sup>* in condition (III) becomes

$$\kappa\_f(\mathbf{x}) = \frac{\theta^2 e^{\mathbf{x}^2/2\theta^2}}{\mathbf{x}^2} \min\{F(\mathbf{x}), 1 - F(\mathbf{x})\} \left| 1 - \frac{\mathbf{x}^2}{\theta^2} \right|^2$$

and for *x*<sup>2</sup> > *θ*2; % % % <sup>1</sup> <sup>−</sup> *<sup>x</sup>*<sup>2</sup> *θ*2 % % % <sup>=</sup> *<sup>x</sup>*<sup>2</sup> *<sup>θ</sup>*<sup>2</sup> − 1. For *<sup>x</sup>* large enough we have that 1 − *<sup>F</sup>*(*x*) < *<sup>F</sup>*(*x*); thus,

$$\lim\_{\chi \to \infty} \kappa\_f(\mathbf{x}) = \theta^2 \lim\_{\mathbf{x} \to \infty} \left( \frac{\exp(\mathbf{x}^2 / 2\theta^2)}{\mathbf{x}^2} \right) (1 - F(\mathbf{x})) \left( \frac{\mathbf{x}^2}{\theta^2} - 1 \right) = 1$$

Because *x* is sufficiently small, we have that *F*(*x*) < 1 − *F*(*x*); thus, we have

$$\lim\_{\mathbf{x}\to\mathbf{0}}\kappa\_f(\mathbf{x}) = \theta^2 \lim\_{\mathbf{x}\to\mathbf{0}} \exp(-\mathbf{x}^2/2\theta^2) \left(1 - \frac{\mathbf{x}^2}{\theta^2}\right) \frac{\left(1 - e^{\mathbf{x}^2/2\theta^2}\right)}{\mathbf{x}^2} = \frac{1}{2\theta^2}\theta^2 = \frac{1}{2}.$$

Since *κ <sup>f</sup>*(*x*) is continuous with limits 1 and <sup>1</sup> <sup>2</sup> as *x* tends to infinity and zero, respectively, it implies that sup*x*∈[0,∞) *<sup>κ</sup> <sup>f</sup>*(*x*) <sup>&</sup>lt; <sup>∞</sup>.

The integral in condition (IV) can be written as follows in terms of expectations:

$$\int\_0^\infty (1+|x|) \left(1-\frac{\chi^2}{\theta^2}\right) \left(\frac{1}{\theta^2}\right) e^{-\chi^2/2\theta^2} d\chi = \mathbb{E}\left(\left\{1+X\right\} \left\{1-\frac{X^2}{\theta^2}\right\}\right),$$

where *X* is Rayleigh distributed. The finite moments of the Rayleigh distribution exist, i.e., <sup>E</sup>(*Xk*) <sup>&</sup>lt; <sup>∞</sup>, *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. Therefore,

$$\int\_0^\infty (1+|x|)|f'(x)|dx < \infty.$$

In Proposition 1 below, the characterization in (2) is re-stated specifically for the Rayleigh distribution.

**Proposition 1.** *Let X* : Ω → (0, ∞) *be a random variable with distribution function F and density function f that satisfies conditions (I)–(VI) and* E[*X*] < ∞*. Then X* ∼ *Ral*(*θ*) *if, and only if,*

$$\mathbb{E}\left[\left(\frac{X}{\theta^2} - \frac{1}{X}\right)\min\{X, t\}\right] - F(t) = 0, t > 0.$$

Note that *<sup>X</sup>* <sup>∼</sup> *Ral*(*θ*) if, and only if, *<sup>Y</sup>* <sup>=</sup> *<sup>X</sup> <sup>θ</sup>* ∼ *Ral*(1). This follows from the invariance property of the Rayleigh distribution with respect to scale transformations. This implies that *Y* ∼ *Ral*(1) if, and only if, for all *t* > 0

$$
\psi(t) = T^Y(t) - F^Y(t) = 0,\tag{4}
$$

where *<sup>T</sup>Y*(*t*) = <sup>E</sup>[(*<sup>Y</sup>* <sup>−</sup> 1/*Y*) min(*Y*, *<sup>t</sup>*)] and *<sup>F</sup><sup>Y</sup>* is the distribution function of *<sup>Y</sup>*. Our newly proposed test is motivated by (4). Since *ψ*(*t*) will be unknown, we estimate it by its empirical counterpart,

$$
\hat{\psi}\_n(t) = T\_n^Y(t) - F\_n^Y(t)\_n
$$

where *T<sup>Y</sup> <sup>n</sup>* (*t*) = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> j*=1 *Yj* − 1/*Yj* min(*Yj*, *t*), *F<sup>Y</sup> <sup>n</sup>* (*t*) = <sup>1</sup> *<sup>n</sup>* <sup>∑</sup>*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *<sup>I</sup>*(*Yj* <sup>≤</sup> *<sup>t</sup>*) and *Yj* <sup>=</sup> *Xj*/ <sup>ˆ</sup> *θn*, with ˆ *θ<sup>n</sup>* = (2*n*)−<sup>1</sup> ∑*<sup>n</sup> <sup>j</sup>*=<sup>1</sup> *X*<sup>2</sup> *<sup>j</sup>* the maximum likelihood estimator for *θ*.

We propose the following weighted *<sup>L</sup>*2−distance between *<sup>ψ</sup>*ˆ(*t*) and 0 to test the hypothesis in (3):

$$R\_{\rm n,a} = n \int\_0^\infty \hat{\psi}\_n^2(t) w\_a(t) dt,\tag{5}$$

where *wa*(*t*) is a positive, continuous weight function depending on a positive tuning parameter *a*. The test rejects for large values of *Rn*,*a*. Throughout the paper we use *wa*(*t*) = e−*at* as the weight function, which results in the following easily calculable form of the test statistic:

$$\begin{split} R\_{n,a} = & \frac{1}{n} \sum\_{j=1}^{n} \left( -\frac{1}{a} e^{-aY\_{(j)}} \left[ \left\{ Y\_{(j)} - \frac{1}{Y\_{(j)}} \right\}^2 \left\{ \frac{2}{a} Y\_{(j)} + \frac{2}{a^2} \right\} + 2Y\_{(j)}^2 - 3 \right] + \frac{2}{a^3} \left[ Y\_{(j)}^2 - 2 + \frac{1}{Y\_{(j)}^2} \right] \right) \\ & + \frac{2}{n} \sum\_{1 \le j < k \le n} \left( \left\{ Y\_{(j)} - \frac{1}{Y\_{(j)}} \right\} \left\{ Y\_{(k)} - \frac{1}{Y\_{(k)}} \right\} \left\{ -\frac{1}{a} e^{-aY\_{(j)}} \left\{ \frac{1}{a} Y\_{(j)} + \frac{2}{a^2} \right\} + \frac{2}{a^3} - \frac{Y\_{(j)}}{a^2} e^{-aY\_{(k)}} \right\} \\ & + \left\{ Y\_{(j)} - \frac{1}{Y\_{(j)}} \right\} \left\{ -\frac{Y\_{(j)}}{a} e^{-aY\_{(k)}} \right\} \\ & + \left\{ Y\_{(k)} - \frac{1}{Y\_{(k)}} \right\} \left\{ \frac{1}{a^2} e^{-aY\_{(k)}} - \frac{1}{a} e^{-aY\_{(j)}} \left( Y\_{(j)} + \frac{1}{a} \right) \right\} + \frac{1}{a} e^{-aY\_{(k)}}, \end{split}$$

where *Y*(1) < *Y*(2) < ··· < *Y*(*n*) denotes the order statistics of *Y*1,...,*Yn*.

**Remark 1.** *The most commonly used choices for the weight function wa*(·) *are wa*(*t*) = *<sup>e</sup>*−*a*|*t*<sup>|</sup> *and wa*(*t*) = *<sup>e</sup>*−*at*<sup>2</sup> *(see, e.g., [18,19]). Due to the positive support of the Rayleigh distribution, we use wa*(*t*) = *<sup>e</sup>*−*a*|*t*<sup>|</sup> <sup>=</sup> *<sup>e</sup>*−*at*, *<sup>t</sup>* <sup>≥</sup> <sup>0</sup>*. This choice does not only provide a close form expression for the test statistic, but also competitive powers which are reported in the Monte Carlo simulation study (see Section 4).*

#### **3. Asymptotics**

In this section, we will first show that, under the null hypothesis, *Rn*,*<sup>a</sup>* converges in distribution to a norm of a Gaussian element of the Hilbert space <sup>H</sup> <sup>=</sup> *<sup>L</sup>*2((0, <sup>∞</sup>), <sup>B</sup>) of measurable, square integrable functions. The norm || · ||H that will be used is defined in terms of a random element *Gn* of H, *n* ∈ N, by

$$||G\_n||\_{\mathcal{H}} = \left(\int\_0^\infty \{G\_n(t)\}^2 e^{-at}dt\right)^{\frac{1}{2}}.$$

We will also show that the newly proposed test is consistent.

First note that by substituting *t* with *<sup>s</sup>* ˆ *θn* and *Yj* with *Xj* ˆ *θn* in (5) the test statistic *Rn*,*<sup>a</sup>* can be rewritten as

$$R\_{n,a} = \frac{1}{\hat{\theta}\_n} \int\_0^\infty \left( \sqrt{n} \left\{ \hat{T}\_n^X(s) - F\_n^X(s) \right\} \right)^2 e^{-as/\hat{\theta}\_n} ds,\tag{6}$$

where

$$\mathcal{T}\_n^X(s) = \frac{1}{n\theta\_n^2} \sum\_{j=1}^n \left( X\_j - \frac{\hat{\theta}\_n^2}{X\_j} \right) \min\{ X\_j, s \} \tag{7}$$

is a continuous function.

To obtain our two main results, we use the following Lemma, in which the notation *Gn* <sup>≈</sup> *Hn* is used when ||*Gn* <sup>−</sup> *Hn*||<sup>2</sup> <sup>H</sup> <sup>=</sup> *<sup>o</sup>*P(1), where *<sup>o</sup>*P(1) denotes a sequence of random variables that converge to zero in probability. We also assume, w.l.o.g., that *θ* = 1.

**Lemma 1.** *Suppose X*, *X*1, *X*2, ... *are i.i.d. random variables with distribution function F<sup>X</sup> and* E - *X*4 . < ∞*. Let T*ˆ *<sup>X</sup> <sup>n</sup>* (*s*) *be defined as in* (7)*, then*

$$\mathcal{T}\_n^{\mathcal{X}}(s) = \frac{1}{\hat{\theta}\_n^2} \left\{ T\_n^{\mathcal{X}}(s) + \left(1 - \theta\_n^2\right) r\_n^{\mathcal{X}}(s) \right\},$$

*where*

$$r\_n^X(s) = \frac{1}{n} \sum\_{j=1}^n \frac{1}{X\_j} \min\{X\_{j'}s\}\_{j'}$$

*and*

$$T\_n^X(s) = \frac{1}{n} \sum\_{j=1}^n \left( X\_j - \frac{1}{X\_j} \right) \min(X\_j, s).$$

*We also have that*

$$\sqrt{n}\hat{T}\_n^X(s) \approx \frac{\sqrt{n}}{\hat{\theta}\_n^2} \left\{ T\_n^X(s) + \left(1 - \hat{\theta}\_n^2\right) r^X(s) \right\}\_{r^X}$$

*where*

$$r^X(s) = \mathbb{E}\left[\frac{1}{X}\min\{X, s\}\right].$$

**Proof.** The first result follows immediately by rewriting *T*ˆ *<sup>X</sup> <sup>n</sup>* (*s*) in (7) as

$$\mathcal{T}\_n^X(s) = \frac{1}{n\hat{\theta}\_n^2} \sum\_{j=1}^n \left[ \left( X\_j - \frac{1}{X\_j} \right) \min \{ X\_{j'}, s \} + \left( \frac{1}{X\_j} - \frac{\hat{\theta}\_n^2}{X\_j} \right) \min \{ X\_{j'}, s \} \right].$$

To show the second result we notice that

$$\sqrt{n}\left\{\hat{T}\_n^X(s) - \frac{1}{\hat{\theta}\_n^2} \left[T\_n^X(s) + \left(1 - \hat{\theta}\_n^2\right)r^X(s)\right]\right\} = \frac{\sqrt{n}(1 - \hat{\theta}\_n^2)}{\hat{\theta}\_n^2} \left\{r\_n^X(s) - r^X(s)\right\}.$$

Applying a weak form of the law of large numbers in separable Hilbert spaces, we have that *r<sup>X</sup> <sup>n</sup>* (*s*) = *<sup>r</sup>X*(*s*) + *<sup>o</sup>*P(1) and by the continuous mapping theorem ||*r<sup>X</sup>* <sup>−</sup> *<sup>r</sup><sup>X</sup> <sup>n</sup>* ||<sup>2</sup> <sup>H</sup> <sup>=</sup> *<sup>o</sup>*P(1). Since ˆ *θ*2 *<sup>n</sup>* is the maximum likelihood estimator of *<sup>θ</sup>*2, we have that <sup>√</sup>*<sup>n</sup>* <sup>1</sup> <sup>−</sup> <sup>ˆ</sup> *θ*2 *n* = OP(1), where OP(1) denotes a sequence of random variables that is bounded in probability. The result then follows from Slutsky's theorem.

**Theorem 1.** *Let X*, *X*1, *X*2, ... *be i.i.d. standard Rayleigh random variables. There exists a centred Gaussian element* <sup>W</sup> *of* <sup>H</sup> *such that Rn*,*<sup>a</sup>* <sup>D</sup> −→ ||W||<sup>2</sup> H, *where the covariance kernel of* <sup>W</sup> *is given by*

$$\begin{split} K(s,t) &= \text{Cov}\left[W\_{j}(s), \mathcal{W}\_{j}(t)\right] \\ &= \mathbf{F}^{X}(s\wedge t) + (s\wedge t)[I\_{3}(s\wedge t, s\vee t) - 2I\_{1}(s\wedge t, s\vee t) + I\_{-1}(s\wedge t, s\vee t)] \\ &+ st\left[2\mathbf{F}^{X}(s\vee t) - 2 + I\_{2}(s\wedge t, \infty) + I\_{-2}(s\vee t, \infty)\right] + I\_{4}(0,s\wedge t) - 2I\_{2}(0,s\wedge t) \\ &+ \mathbf{r}^{X}(s)\mathbf{r}^{X}(t) + 2\mathbf{F}^{X}(s)\mathbf{F}^{X}(t) \\ &+ \left\{-\frac{1}{2}I\_{4}(0,s) + \frac{3}{2}I\_{2}(0,s) + s\left[-\frac{1}{2}I\_{3}(s,\infty) + \frac{3}{2}I\_{1}(s,\infty) + I\_{-1}(s,\infty)\right]\right\}\mathbf{r}^{X}(t) \\ &+ \left\{-\frac{1}{2}I\_{4}(0,t) + \frac{3}{2}I\_{2}(0,t) + t\left[-\frac{1}{2}I\_{3}(t,\infty) + \frac{3}{2}I\_{1}(t,\infty) + I\_{-1}(t,\infty)\right]\right\}\mathbf{r}^{X}(s) \\ &- \frac{1}{2}\{I\_{4}(0,s) - I\_{2}(0,s) + s[I\_{3}(s,\infty) + I\_{1}(s,\infty)]\}F\_{X}(t) \\ &- \frac{1}{2}\{I\_{4}(0,t) - I\_{2}(0,t) + t[I\_{3}(t,\infty) + I\_{1}(t,\infty)]\}F\_{X}(s), \end{split}$$

*where*

$$I\_k(a,b) = \mathbb{E}\left[X\_j^k \ 1 \left(a \le X\_j \le b\right)\right],$$

*and*

$$I\_k(a,\infty) = \lim\_{b \to \infty} I\_k(a,b).$$

**Proof.** First note that

$$\sqrt{n}\left\{\hat{T}\_n^X(s) - F\_n^X(s)\right\} \approx \frac{\sqrt{n}}{\hat{\theta}\_n^2} \left\{ T\_n^X(s) + \left(1 - \hat{\theta}\_n^2\right) r^X(s) - \hat{\theta}\_n^2 F^X(s) \right\},$$

since ||*F<sup>X</sup>* <sup>−</sup> *<sup>F</sup><sup>X</sup> <sup>n</sup>* ||<sup>2</sup> <sup>H</sup> <sup>=</sup> *<sup>o</sup>*P(1). We can therefore write

$$\sqrt{n}\left\{\hat{T}\_n^X(s) - F\_n^X(s)\right\} \approx \frac{1}{\sqrt{n}\hat{\theta}\_n^2} \sum\_{j=1}^n W\_j(s),$$

where

$$\mathcal{W}\_{\vec{\jmath}}(s) = \left(X\_{\vec{\jmath}} - \frac{1}{X\_{\vec{\jmath}}}\right) \min\{X\_{\vec{\jmath}}, s\} + \left(1 - \frac{1}{2}X\_{\vec{\jmath}}^2\right)r^X(s) - \frac{1}{2}X\_{\vec{\jmath}}^2 F^X(s).$$

We note that *<sup>W</sup>*1, ... , *Wn* are i.i.d. random variables with *<sup>E</sup>*(*W*1) = 0 and *<sup>E</sup>*||*W*1||<sup>2</sup> <sup>H</sup> <sup>&</sup>lt; <sup>∞</sup>. Therefore, by the central limit theorem for separable Hilbert spaces (see [20]) there exists a centred Gaussian element W∈H with

$$\frac{1}{\sqrt{n}}\sum\_{j=1}^{n}\mathcal{W}\_{\vec{\jmath}}(\cdot)\xrightarrow{\mathcal{D}}\mathcal{W}(\cdot).$$

From this we have that <sup>√</sup>*<sup>n</sup>* 7 *T*ˆ *X <sup>n</sup>* (*s*) <sup>−</sup> *<sup>F</sup><sup>X</sup> <sup>n</sup>* (*s*) <sup>8</sup> <sup>=</sup> <sup>O</sup>P(1). Therefore, since <sup>ˆ</sup> *θ<sup>n</sup>* = 1 + *o*P(1) and by Holder's inequality we have that

$$\begin{split} & \left| \int\_{0}^{\infty} \left( \sqrt{n} \left\{ \hat{T}\_{n}^{X}(s) - F\_{n}^{X}(s) \right\} \right)^{2} e^{-as/\hat{\theta}\_{n}} \mathrm{ds} - \int\_{0}^{\infty} \left( \sqrt{n} \left\{ \hat{T}\_{n}^{X}(s) - F\_{n}^{X}(s) \right\} \right)^{2} e^{-as} \mathrm{ds} \right| \\ & \leq \sup\_{s>0} \left| e^{-as \left( \frac{1}{\hat{\theta}\_{n}} - 1 \right)} - 1 \right| ||\sqrt{n} \left\{ \hat{T}\_{n}^{X} - F\_{n}^{X} \right\}||\_{\mathcal{H}}^{2} = o\_{\mathbb{P}}(1) . \end{split}$$

Therefore,

$$R\_{n, \mathfrak{a}} = ||\sqrt{n}\left\{\mathcal{T}\_n^X - F\_n^X\right\}||\_{\mathcal{H}}^2 + o\_{\mathbb{P}}(1). \tag{8}$$

The final result then follows from Slutsky's theorem.

**Remark 2.** *A closed form expression for the covariance kernel for the limiting centred Gaussian distribution does not exist. However, for non-negative even numbers of k closed form formulas for functions Ik*(*a*, *b*) *exist by using the following recursive formulas*

$$\begin{aligned} I\_0(a,b) &= F^X(b) - F^X(a) \\ I\_k(a,b) &= a^k e^{-1/2a^2} - b^k e^{-1/2b^2} + k I\_{k-2}(a,b) .\end{aligned}$$

Now that we have shown that, under the null hypothesis, *Rn*,*<sup>a</sup>* converges in distribution to a norm of a Gaussian element of the Hilbert space H, we can continue to show that the newly proposed test is consistent. Therefore, we will show that *Rn*,*<sup>a</sup> <sup>n</sup>* = Δ + *o*P(1), where <sup>Δ</sup> <sup>=</sup> ||*T<sup>X</sup>* <sup>−</sup> *<sup>F</sup>X*||<sup>2</sup> <sup>H</sup> with the properties that <sup>Δ</sup> <sup>=</sup> 0 under the null hypothesis and Δ > 0 under fixed alternatives. This is as a result of the characterization of the Rayleigh distribution in Proposition 1.

**Theorem 2.** *Suppose X*, *X*1, *X*2, ... *are i.i.d. random variables with distribution function F<sup>X</sup> and* E - *X*2 . < ∞*. As n* → ∞*, we have*

$$\frac{R\_{\mathfrak{n},\mathfrak{a}}}{n} = ||T^{X} - F^{X}||\_{\mathcal{H}}^{2} + o\_{\mathbb{P}}(1).$$

**Proof.** From (8) we have that

$$\frac{R\_{n,a}}{n} = ||\mathcal{T}\_n^X - F\_n^X||\_{\mathcal{H}}^2 + o\_\mathbb{P}(1).$$

To prove the theorem we need to show that

$$||\hat{T}\_n^X - F\_n^X||\_{\mathcal{H}}^2 = ||T^X - F^X||\_{\mathcal{H}}^2 + o\_\mathbb{P}(1).$$

By a weak form of the law of large numbers for separable Hilbert spaces we have that *T<sup>X</sup> <sup>n</sup>* (*s*) = *TX*(*s*) + *o*P(1) and *F<sup>X</sup> <sup>n</sup>* (*s*) = *FX*(*s*) + *o*P(1). Moreover, from Lemma 1 we have that *T*ˆ *<sup>X</sup> <sup>n</sup>* (*s*) = *T<sup>X</sup> <sup>n</sup>* (*s*) + *o*P(1) and hence *T*ˆ *<sup>X</sup> <sup>n</sup>* (*s*) = *TX*(*s*) + *o*P(1). We also have that

$$
\mathcal{T}\_n^X(\mathbf{s}) - F\_n^X(\mathbf{s}) = \left(\mathcal{T}\_n^X(\mathbf{s}) - T^X(\mathbf{s})\right) + \left(T^X(\mathbf{s}) - F^X(\mathbf{s})\right) + \left(F^X(\mathbf{s}) - F\_n^X(\mathbf{s})\right),
$$

and by the continuous mapping theorem the result follows.

#### **4. Simulation Study**

In this section, Monte Carlo simulations are used to compare the finite sample performance of the newly proposed test to the following existing goodness-of-fit tests for the Rayleigh distribution:


The estimated powers of *Rn*,*a*, *ELn*,*<sup>a</sup>* and *KLn*,*<sup>a</sup>* are functions of a tuning parameter, *a*. For *Rn*,*<sup>a</sup>* we report the results for *a* = 1 and *a* = 5, for *ELn*,*<sup>a</sup> a* is 1 and 5 and for *KLn*,*<sup>a</sup>* results are reported for *a* = 3 and *a* = 4. The motivation for these choices of *a* will be discussed in Section 4.2.

In addition to the existing tests, we also compare the performance of the new test to the following five powerful tests for exponentiality (see, e.g., the overview papers by [21] as well as [22] for a discussion on a variety of tests for exponentiality);


Here, we test for the Rayleigh distribution by testing for exponentiality of the transformed data (using the well known property that the square of a Rayleigh distributed random variable follows an exponential distribution). The estimated powers of *BHn*,*a* and *HMn*,*<sup>a</sup>* are functions of a tuning parameter, *a*. For both *BHn*,*<sup>a</sup>* and *HMn*,*<sup>a</sup>* we report the results for *a* = 0.75, 1 and 1.25.

#### *4.1. Simulation Setting*

A significance level of 5% is used throughout. Critical values of all the tests are obtained using 50,000 independent Monte Carlo replications drawn from a standard Rayleigh distribution (all the test statistics are invariant with respect to scale transformations). Power estimates are calculated and reported for sample sizes *n* = 20 and *n* = 30 using 10,000 independent Monte Carlo replications obtained from various alternative distributions. These include some 'local' alternatives as well as those given in Table 1. These alternative distributions were chosen since they are frequently used alternatives for the Rayleigh distribution, which has an increasing hazard rate. The hazard rates of the considered alternative distributions include constant hazard rates (CHR), increasing hazard rates (IHR), decreasing hazard rates (DHR) and non-monotone hazard rates (NMHR). These alternatives all have support in R<sup>+</sup> and are used in many other empirical studies for goodness-of-fit tests of lifetime distributions (see, e.g., [10,21,27]). In Table 1, all scale parameters are set to one due to the scale transformation *Yj* = *Xj*/ ˆ *θn*, *j* = 1, ... , *n*. All simulations and calculations are done in Ref. [28]. The tables are produced using the *Stargazer* package, see [29].

**Table 1.** Probability density functions of the alternative distributions considered in the Monte Carlo study.


We first consider some local power estimates. Here, we consider a mixture distribution, which is obtained by sampling with probability *p* from a standard exponential distribution (*Exp*(1)) and with probability (1 − *p*) from a *Ral*(1) distribution. The value *p* = 0 corresponds to the standard Rayleigh distribution, whereas increasing values of *p* implies a larger deviation from the null distribution. These estimated powers are given in Table 2 and the estimated powers for the exponentiality tests based on the transformed data are given in Table 3. The estimated powers for sample sizes 20 and 30 against every alternative distribution in Table 1 are given in Tables 4 and 5, respectively. The estimated powers, obtained using the tests for exponentiality based on the transformed data, for sample sizes 20 and 30 are given in Tables 6 and 7, respectively. The entries in these tables are the

percentages of 10,000 independent Monte Carlo samples that resulted in the rejection of the null hypothesis (rounded to the nearest integer). For the reader's convenience, the highest estimated power for each alternative distribution among the existing tests, as well as the tests for exponentiality based on the square of the data, are displayed separately in bold in each of their respective tables. The last column of Tables 2, 4 and 5 contain the highest estimated powers from the corresponding exponentiality tests based on the transformed data (i.e., the highest powers obtained from Tables 3, 6 and 7 are also reported in the last column of Tables 2, 4 and 5); this will make comparison easier.

**Table 2.** Estimated local powers for the mixture of the Rayleigh and exponential distributions for various choices of the mixture parameter, *p*.


**Table 3.** Estimated local powers for the mixture of the Rayleigh and exponential distributions, using transformed data, for various choices of the mixture parameter, *p*.



**Table 3.** *Cont.*

**Table 4.** Estimated powers for general alternatives for the Rayleigh distribution for sample size *n* = 20.



**Table 5.** Estimated powers for general alternatives for the Rayleigh distribution for sample size *n* = 30.

**Table 6.** Estimated powers for general alternatives for the exponential distribution for sample size *n* = 20.



**Table 6.** *Cont.*

**Table 7.** Estimated powers for general alternatives for the exponential distribution for sample size *n* = 30.


#### *4.2. Simulation Results*

We will now present some general conclusions regarding the tabulated estimated powers of the different tests considered. Since the performance of the tests are affected by the type of hazard rate of the alternative distribution, we will discuss the overall performance as well as the performance when the results are grouped according to the type of hazard rate.

First, we will consider the estimated local powers, presented in Tables 2 and 3. We find that *KSn* and *CMn* exhibit poor power performance, displaying the lowest powers among the tests for the majority of the choices of the mixture probability, *p*. We note that *ELn*,1 and *Rn*,5 are tied for the best test for the majority of mixture proportions. Figure 1 displays the local powers of *AD*, *ELn*,1, *CR* and *Rn*,5 over the complete range of mixture probabilities. The superior performance of *ELn*,1 and *Rn*,5, for this mixture distribution, is clear from this figure.

**Figure 1.** Local powers for some of the tests over the entire range of mixture probabilities of the Rayleigh exponential mixture distribution for *n* = 20.

For the transformed data, *KS*9*<sup>n</sup>* exhibits the lowest powers overall and *HMn*,0.75 has the highest overall powers for the majority of the alternatives considered.

We will now consider the performance of the tests, developed specifically for the Rayleigh distribution, in general against all of the general alternative distributions listed in Table 1. From both Tables 4 and 5 we see that, in general, the powers of *KSn* and *CRn* are lower for the majority of the alternatives considered and perform unfavourably in comparison to the other tests, for both sample sizes. On the other hand, *ELn*,1 and *Rn*,5 perform quite well as we find that they outperform the other tests, having the highest estimated power for the majority of the alternatives considered. All tests considered perform quite well against the standard exponential distribution (which has a constant hazard rate) for both sample sizes.

Shifting our attention now to results associated with alternatives with increasing hazard rates, one finds, once again, that *KSn* and *CRn* have lower powers for both sample sizes considered. For most of the alternatives in this category *ELn*,1 and *Rn*,5 have the highest power, only being outperformed, or equaled, for a handful of these alternatives by other tests.

Moving our attention to alternatives with a decreasing hazard, we see that all the tests considered perform very well and, since there are such minor differences in the power performance between all the tests, it is difficult to identify a single 'best' test for this set of alternatives. However, for the smaller sample size, *KSn* still attains powers that are slightly lower than the rest of the tests.

We now observe the results associated with alternatives with non-monotone hazard rates. The tests that generally perform well are *ADn*, *ELn*,1 and *Rn*,1. However, the test that exhibits the highest power for the majority of the alternatives, for both sample sizes, is *Rn*,5.

Finally, we consider the performance of the tests for exponentiality based on the transformed data. The tests with the lowest powers are *KS*9*<sup>n</sup>* and *CM*:*n*. *BHn*,1 and *HMn*,1.25 perform very well, exhibiting high powers for most of the alternatives considered, especially for alternatives with decreasing or non-monotone hazard rates. *HMn*,0.75 displays the highest overall powers for the majority of the alternatives considered. However, the highest estimated power, against all alternative distributions considered, is obtained by one of the tests specifically developed for the Rayleigh distribution and not by any of the exponentiality tests based on the transformed data. Therefore, we recommend that the tests proposed specifically for the Rayleigh distribution is used when goodness-of-fit testing is performed for the Rayleigh distribution.

To conclude, we provide a brief demonstration of how the choice of the tuning parameter, *a*, influences the powers of the newly proposed test. In order to visualise the behaviour of the powers for different values of *a*, Figure 2 present the powers for *Rn*,*a* over a grid of *a* values and six different alternative distributions. This figure is also used to motivate the choice of *a* values included in the study.

**Figure 2.** Estimated powers for *R*100,*<sup>a</sup>* for some alternatives appearing in Table 1.

The choice of *a* = 1 was made since it is the point where the powers for most of the alternative distributions start to stabilize and reach a plateau. The choice for *a* = 5 is due to the fact that it is the point where the powers for most of the alternative distributions reach their maximum value.

#### **5. Practical Application**

As noted in Section 1, the Rayleigh distribution found various applications in the fields of survival analysis and reliability theory. In this section we demonstrate the use of the tests specifically developed for the Rayleigh distribution by applying them to a real-world survival data set: the COVID-19 data set of Italy given in Table 8—for a discussion on the data set, see [30]. The data set displays the COVID-19 mortality rates recorded for 59 days in Italy from 27 February 2020 to 27 April 2020. Ref. [30] discussed and analysed the use of an extended three parameter Rayleigh distribution to model the data. They concluded that the newly extended Rayleigh distribution is a good fit to the data. We, however, will investigate the goodness-of-fit of the traditional one parameter Rayleigh distribution as well as that of the exponential distribution. Figure 3 represents the probability plots of both the Rayleigh (grey dots) and exponential (black dots) distribution fitted to the data, where ˆ *θ* = 6.583 and *λ*ˆ = 0.123 in the case of the exponential distribution.

**Figure 3.** Probability plot of a fitted Rayleigh (grey dots) and exponential (black dots) distribution.

The probability plot suggests that the underlying distribution of the data might be the Rayleigh distribution instead of the exponential distribution.

**Table 8.** COVID-19 data set of Italy.


Table 9 contains the estimated p-value (calculated based on 50,000 samples of size 59 simulated from the standard Rayleigh distribution) of each test for testing formally whether the data originated from a Rayleigh distribution.

**Table 9.** *p*-values for the COVID-19 data of Italy.


From these p-values it is clear that all the tests do not reject the null hypothesis in (3) at a 5% significance level and we can therefore conclude that the Rayleigh distribution is also a feasible option to model the data.

Having found that the Rayleigh distribution is a good fit to the observed data, one can now go about calculating quantiles, moments and other useful distributional properties by using the theoretical Rayleigh distribution with estimated parameter ˆ *θ* = 6.583. For example, by fitting this Rayleigh distribution we find that the mean mortality rate over the 59 days is 8.2506.

#### **6. Conclusions and Future Research**

In this article, a new goodness-of-fit test statistic specifically designed for the Rayleigh distribution was considered. The finite-sample performance of this newly suggested test was studied via the use of a Monte Carlo simulation. From the results, it is clear that this new test is not only feasible when testing goodness-of-fit for the Rayleigh distribution, it also outperforms or equals competitor tests for the majority of the alternative distributions considered. For practical implementation we suggest using the choice *a* = 5 for *Rn*,*a*. Alternatively, one can use a data-dependent choice of this tuning parameter suggested, e.g., in [31].

In analysing mortality or survival data (like the COVID-19 data set) one will, more often than not, deal with observations that are censored. For our newly proposed test to be applicable in these kinds of situations, it needs to be modified to accommodate censoring. Naturally, this modification will complicate some of the asymptotic derivations and might be an avenue for future research. Some work in this regard has been started by [32] as well as [33].

**Author Contributions:** The authors (G.L.G., E.B., J.S.A.) contributed equally. All authors have read and agreed to the published version of the manuscript

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The work of E. Bothma and J.S. Allison are based on research supported by the National Research Foundation (NRF). Any opinion, finding and conclusion or recommendation expressed in this material is that of the authors and the NRF does not accept any liability in this regard.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

