Functional ANOVA (FANOVA)

Functional data Analysis of Variance, similar to the vector version, contrasts the distance between the mean levels of the factor variables. The aim of this contrast is to find out if the set of functions studied are statistically distinguishable [41]. There will also be *Q* independent samples *Xgj*(*t*), *j* = 1, ... , *ng*; *t* ∈ *I* = [*a*, *b*]. But these samples are extracted from L<sup>2</sup>(*l*) processes *Xg*(*t*), *g* = 1, ... , *Q* and their mean function is *E*(*Xg*(*t*)) = *mg*(*t*) [42,43]. If the functional sample is divided into groups like -<sup>X</sup>*j*, <sup>A</sup>*jnj*=1<sup>∈</sup> F xA = {1, ... , *A*}, being A the factor variable, the hypothesis contrast has the following form:

$$\begin{cases} H\_0: \overline{X}\_1(t) = \overline{X}\_2(t) = \dots = \overline{X}\_A(t) \\ H\_1: \exists \, h, \sigma \text{ s.t. } \overline{X}\_h(t) \neq \overline{X}\_c(t) \end{cases} \tag{5}$$

The model for the *j*-th observation belonging to the *g*-th group has the following form [41]:

$$X\_{\dot{\mathcal{K}}}(t) = X(t) + a\_{\mathcal{K}}(t) + \epsilon\_{\dot{\mathcal{K}}}(t) \tag{6}$$

where *Xjg*(*t*) is the functional value ofgroup *g*, <sup>α</sup>*g*(*t*) is the e ffect of being part of a determined group and *jg*(*t*) represents the unexplained variability for the *i*-th observation of group *g*.Furthermore, the model in Equation (6) can be represented in its matrix notation:

$$\mathbf{X}(t) = \mathbf{Z}\mathbf{y}(t) + \boldsymbol{\varepsilon}(t) \tag{7}$$

being **X**(*t*) a N-dimensional vector, γ(*t*)=( *<sup>X</sup>*(*t*), <sup>α</sup>1(*t*), ... , <sup>α</sup>*Q*(*t*))*<sup>T</sup>* a (Q + 1)-dimensional vector, (*t*) a vector of N residual functions, and **Z** the design matrix with dimension (N, Q + 1).

Thus, to assure the indentification of the functional e ffects <sup>α</sup>*g*(*t*), the sum to zero constraint is introduced [41,44]:

$$\sum\_{s=2}^{Q+1} \mathbf{y}(t) = 0, \; \forall t \tag{8}$$

The parameter vector γ(*t*) in Equation (7) can be estimated minimising the standard least squares:

$$LMSSE(\mathbf{y}) = \int \left[\mathbf{X}(t) - \mathbf{Z}\mathbf{y}(t)\right]^T [\mathbf{X}(t) - \mathbf{Z}\mathbf{y}(t)]dt$$

subject to the constraint (8) [41,44].

> Regarding the contrast in Equation (5), most tests are based on F test statistic [32,45]:

$$F\_n(t) = \frac{SSR\_n(t)/(Q-1)}{SSE\_n(t)/(n-Q)}$$

where

$$SSR\_n(t) = \sum\_{\mathcal{S}=1}^{Q} n\_{\mathcal{S}} \left( \overline{X}\_{\mathcal{S}}(t) - \overline{X}(t) \right)^2$$

$$SSE\_n(t) = \sum\_{\mathcal{S}=1}^{Q} \sum\_{j=1}^{n\_{\mathcal{S}}} \left( X\_{\mathcal{S}j}(t) - \overline{X}\_{\mathcal{S}}(t) \right)^2$$

represents the variations between groups and within groups, respectively. In addition, for these calculations, the sample mean function *<sup>X</sup>*(*t*)=(1/*n*) *Q g*=1 *ng j*=1 *Xgj*(*t*) and the sample group mean function *Xg*(*t*)=(1/*ng*) *ng j*=1*Xgj*(*t*), *g* = 1, ... , *Q* were taken into account.

In this work, two specific tests were used to contrast the similarity between samples. On the one hand, the F test with the reduced bias estimation method (FB) [46]. This test uses both point variations between groups and variations within groups. Specifically, the test statistic has the form:

$$F\_{\rm II} = \frac{\int\_{Q} SSR\_{n}(t)dt / \left(Q - 1\right)}{\int\_{Q} SSE\_{n}(t) / \left(n - Q\right)}\tag{9}$$

The distribution of this statistic is approximated by *<sup>F</sup>*(*Q*−<sup>1</sup>)*k*,(*<sup>n</sup>*−*Q*)*k*, where k is estimated by the bias-reduced method. The *p*-value taken into account comes from *<sup>P</sup>*(*<sup>F</sup>*(*Q*−<sup>1</sup>)*k*,(*<sup>n</sup>*−*Q*)*<sup>k</sup>* > *Fn*) [45,46].

On other hand, a permutation test based on a representation of the base function (FP) was used. This test is based on the basis representation procedure presented by Górecki and Smaga [47]. The functional observations are be represented by a finite number of basis functions ϕ*m* ∈ L<sup>2</sup>(*I*), *m* = 1, ... ,*K* as follows:

$$X\_{\mathfrak{F}f}(t) \approx \sum\_{m=1}^{K} c\_{\mathfrak{F}jm} \varphi\_{\mathfrak{M}}(t), \ t \in I \tag{10}$$

where *cgjm* are random variables with a significantly large *K*. Moreover, this test uses the following approximation of Equation (9) [45]:

$$\frac{(a-b)/(Q-1)}{(c-a)/(n-Q)}\tag{11}$$

where

$$a = \sum\_{\mathcal{S}=1}^{Q} \frac{1}{n\_{\mathcal{S}}} \mathbf{1}\_{n\_{\mathcal{S}}}^T \mathbf{C}\_{\mathcal{S}}^T \mathbf{J}\_{\mathcal{P}} \mathbf{C}\_{\mathcal{S}} \mathbf{1}\_{n\_{\mathcal{S}}}$$

$$b = \frac{1}{n} \sum\_{\mathcal{S}=1}^{Q} \sum\_{j=1}^{Q} \mathbf{1}\_{n\_{\mathcal{S}}}^T \mathbf{C}\_{\mathcal{S}}^T \mathbf{J}\_{\mathcal{P}} \mathbf{C}\_j \mathbf{1}\_{n\_j}$$

$$c = \sum\_{\mathcal{S}=1}^{Q} \operatorname{trace}(\mathbf{C}\_{\mathcal{S}}^T \mathbf{J}\_{\mathcal{P}} \mathbf{C}\_{\mathcal{S}})$$

being 1*a* a vector of ones with dimension *ax*1, **<sup>C</sup>***g* = (*cgjm*)*j*=1,...,*ng*;*m*=1,...,*K*, *i* = 1, ... , *Q*, and **J**ϕ := *I* ϕ(*t*)ϕ*<sup>T</sup>*(*t*)*dt* is the matrix of cross-products with dimensions *KxK* based on ϕ(**t**) = (ϕ1(*t*), ... ,ϕ*K*(*t*))*<sup>T</sup>*.
