Next Article in Journal
Existence and Uniqueness of a Curve with Both Minimal Length and Minimal Area
Next Article in Special Issue
Improved Estimation of the Inverted Kumaraswamy Distribution Parameters Based on Ranked Set Sampling with an Application to Real Data
Previous Article in Journal
Pairwise Constraints Multidimensional Scaling for Discriminative Feature Learning
Previous Article in Special Issue
Functional Ergodic Time Series Analysis Using Expectile Regression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Two-Sample Hypothesis Test for Functional Data

1
China National Institute of Standardization, Beijing 100191, China
2
School of Mathematics and Statistics, Zhengzhou University, Zhengzhou 450001, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(21), 4060; https://doi.org/10.3390/math10214060
Submission received: 26 September 2022 / Revised: 16 October 2022 / Accepted: 19 October 2022 / Published: 1 November 2022

Abstract

:
In this paper, we develop and study a novel testing procedure that has more a powerful ability to detect mean difference for functional data. In general, it includes two stages: first, splitting the sample into two parts and selecting principle components adaptively based on the first half-sample; then, constructing a test statistic based on another half-sample. An extensive simulation study is presented, which shows that the proposed test works very well in comparison with several other methods in a variety of alternative settings.

1. Introduction

In the recent literature, there has been an increasing interest in functional data analysis, with its extensive application in biometrics, chemometrics, econometrics, and medical research, as well as other fields. Functional data have intrinsically infinite dimensions and thus, classical methods for multivariate observations are not applicable. Therefore, it is necessary to develop special techniques for this type of data. There has intensive methodological and theoretical development in function data analysis; see [1,2,3,4,5] and so on.
In functional data analysis, a functional data set or curve can be modeled as independent realizations of an underlying stochastic process:
x i ( t ) = μ ( t ) + α i ( t ) + ϵ i ( t ) , i = 1 , 2 , , n ,
where μ ( t ) is the mean function of the stochastic process, α i ( t ) is the ith individual function variation from μ ( t ) , and ϵ i ( t ) is the ith measurement error process. In general, we assume α i ( t ) and ϵ i ( t ) are independent, and i.i.d. sample from α ( t ) and ϵ ( t ) , respectively, where, α ( t ) SP ( 0 , γ ) , ϵ ( t ) SP ( 0 , γ ϵ ) , and SP denotes a stochastic process with mean function μ ( t ) and covariance function γ ( s , t ) .
The mean function μ ( t ) reflects the underlying trend and can be used as an important index for population response, such as in drug and biomedical purposes, among other. One important statistical inference other than estimation is related to testing of various hypotheses about the mean function. Therefore, we focus on the problem of testing the equality of mean functions in two random samples independently drawn from two functional random variables. There have been some approaches proposed so far to address this problem. For instance, Ref. [6] proposed an adaptive Neyman test, but in the case when the sampling information is in a “discrete” format. Ref. [7] discussed two methods: multivariate analysis-based and bootstrap-based testing.
However, these methods have only been applied in narrow fields and not available for a global testing result. Refs. [8,9] proposed an L 2 -norm based statistic to test the equality of mean functions. Ref. [10] proposed and studied a so-called Globalized Pointwise F-test, abbreviated as GPF test. The GPF test is in general comparable with the L 2 -norm-based test and the F-type test adopted for the one-way ANOVA problem. Then, Ref. [11] proposed the F max -test; via some simulation studies, it was found that in terms of both level accuracy and power, the F max -test outperforms the Globalized Pointwise F (GPF) test of [10] when the functional data are highly or moderately correlated, and its performance is comparable with the latter otherwise. Ref. [5] proposed a statistic based on the functional principal component emi-distances. Furthermore, they gave a normalized version of the functional principal components that has a chi-square limit distribution. The statistic is scale-invariant. However, this method require pre-specifying a threshold to choose the leading principal components (PCs), where PCs are ranked based on eigenvalues. They chose the number of PCs, for example, d, based on the percentage of variance explained for the functional covariates. This method has two drawbacks: one is that it can only detect mean differences in this d-dimensional subspace; the other is that different thresholds often lead to different tests, whose power depends on the particular simple alternative hypothesis.
In this paper, we develop and study a novel testing procedure that overcomes the drawback that many tests can only detect mean difference in the d-dimensional subspace. Furthermore, the novel testing procedure is very powerful in the cases when there are differences in middle part and latter part of two function sequences. Additionally, we derived the asymptotic distribution of the new test statistics under the alternative hypothesis, which is the key difficulty in the current approach. In general, the novel testing procedure includes two stages: first, split the sample into two parts and select PCs adaptively based on the first half-sample; then, construct the test statistic based on another half-sample. Sample splitting is often used in high-dimension regression problems because most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. Ref. [12] adopted this technology to reduce the number of variables to a manageable size using the first split sample and apply classical variable selection techniques with the remaining variables, using the data from the second split.
In our procedure, we mainly adopt two methods to select PCs in the first stage: the adaptive Neyman test [6] and the adaptive ordered Neyman test. At the same time, we also consider selecting PCs based on pre-specifying a threshold; however, this threshold is an association–projection index that combines both the variation and the projection along each direction. The purpose of splitting the sample is two-fold: (1) to decrease the random noise effect in the first stage; (2) to derive the asymptotic distribution of the test statistic. From simulation results, we can see that our testing procedure asymptotically achieves the pre-specified significance level, and enjoys certain optimality in terms of its power, even when the population is a non-Gaussian process.
This paper is organized as follows. In Section 2, we introduce the test problem and briefly review the existing global two-sample test methods. Section 3 proposes our testing procedure. Simulation studies are given in Section 4. A real-data example is analyzed in Section 5. Section 6 concludes the present work. The derivations are given in Appendix A.

2. The Testing Problem for Functional Data

2.1. Preliminary

Let SP ( μ , Γ ) denote a stochastic process with mean function μ ( t ) , t T and covariance function Γ ( s , t ) , s , t T , where T [ 0 , 1 ] . Suppose we have the following two independent function samples:
X 1 ( t ) , , X n 1 ( t ) SP ( μ 1 , Γ 1 ( s , t ) ) ,
Y 1 ( t ) , , Y n 2 ( t ) SP ( μ 2 , Γ 2 ( s , t ) ) ,
where Γ 1 ( s , t ) denotes the covariance function of the function data X ( t ) and Γ 2 ( s , t ) denotes the covariance function of the function data Y ( t ) . However, we do not know if Γ 1 ( s , t ) and Γ 2 ( s , t ) are equal. We want to test whether the two mean functions are equal:
H 0 : μ 1 ( t ) = μ 2 ( t ) vs . H 1 : μ 1 ( t ) μ 2 ( t ) .
Let X ¯ ( t ) , Y ¯ ( t ) denote the sample mean functions of the two samples, respectively. First, we have
μ ^ 1 ( t ) = X ¯ ( t ) , μ ^ 2 ( t ) = Y ¯ ( t ) ,
and
n ( μ ^ 1 ( t ) μ ^ 2 ( t ) ) SP n ( μ 1 ( t ) μ 2 ( t ) ) , Γ 12 ( s , t ) ,
where Γ 12 ( s , t ) = Γ 1 ( s , t ) n 1 / n + Γ 2 ( s , t ) n 2 / n , n = n 1 + n 2 . Γ 12 ( s , t ) can be written as Γ 12 ( s , t ) = k = 1 λ k φ k ( s ) φ k ( t ) , where λ 1 λ 2 0 are the eigenvalues and φ k ( t ) , k = 1 , , are eigenfunctions satisfying T φ k ( t ) 2 d t = 1 and T φ k ( t ) φ l ( t ) d t = 0 , k l .
It is easy to see that Γ ^ 1 ( s , t ) = 1 n 1 i = 1 n 1 ( X i ( t ) μ ^ 1 ( t ) ) ( X i ( s ) μ ^ 1 ( s ) ) , Γ ^ 2 ( s , t ) = 1 n 2 i = 1 n 2 ( Y i ( t ) μ ^ 2 ( t ) ) ( Y i ( s ) μ ^ 2 ( s ) ) and Γ ^ 12 ( s , t ) = Γ ^ 1 ( s , t ) n 1 / n + Γ ^ 2 ( s , t ) n 2 / n .
Γ ^ 12 ( s , t ) can also be written in as an eigen-decomposition
Γ ^ 12 ( s , t ) = k = 1 λ ^ k φ ^ k ( s ) φ ^ k ( t ) ,
where the nonincreasing sequence ( λ ^ k : k 1 ) is the sample eigenvalues and ( φ ^ k : k 1 ) are the corresponding eigenfunctions forming an orthonormal basis of L 2 [ 0 , 1 ] .
To simplify notation, we use the symbol Γ for both the kernel and the operator. Now, the functional Mahalanobis semi-distance between μ ^ 1 ( t ) and μ ^ 2 ( t ) is defined as:
d F M 2 ( μ ^ 1 , μ ^ 2 ) = Γ ^ 12 , p n 1 2 n ( μ ^ 1 μ ^ 2 ) , Γ ^ 12 , p n 1 2 n ( μ ^ 1 μ ^ 2 ) .
Plugging (6) into (7), we have
d F M 2 ( μ ^ 1 , μ ^ 2 ) = k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ ^ k .

2.2. Existing Global Testing Methods

Although there is significant literature discussing the equality of means for two functional data sets, they can be roughly grouped into a few broad categories, as follows.
(1)
L 2 -norm-based test
The test is based on the L 2 -norm of the difference between μ ^ 1 ( t ) and μ ^ 2 ( t ) :
T L = n μ ^ 1 ( t ) μ ^ 2 ( t ) 2 = n ( μ ^ 1 ( t ) μ ^ 2 ( t ) ) 2 d t .
Ref. [8] proved that T L = n ( μ ^ 1 ( t ) μ ^ 2 ( t ) ) 2 d t = d k = 1 λ k A k + o p ( 1 ) , A k χ 1 2 ( n u k 2 λ k ) , where X = d Y denotes that X and Y have the same distribution, and u k = 0 1 ( μ 1 ( t ) μ 2 ( t ) ) φ k ( t ) d t , k = 1 , 2 , . Furthermore, they use the two-cumulant matched χ 2 approximation method and obtained an approximate distribution of T L , which is, α χ d 2 + β , where
α = k = 1 λ ^ k 3 k = 1 λ ^ k 2 , d = k = 1 λ ^ k 2 3 k = 1 λ ^ k 3 2 , β = k = 1 λ ^ k k = 1 λ ^ k 2 2 k = 1 λ ^ k 3 .
Then, they have P ( T L > K ) P ( χ d 2 > ( K β ) / α ) .
(2)
Projection-based Test
Ref. [5] considered projecting the observed mean difference onto the space spanned by φ ^ 1 , , φ ^ d , where d is determined based on the percentage of eigenvalues, constructed the following test statistic:
T H = k = 1 d n ( μ ^ 1 μ ^ 2 ) , φ ^ k 2 .
Given d, the asymptotic distribution of T H under the null hypothesis is the distribution of k = 1 d λ k Z k 2 , where Z k , k = 1 , , d are independent standard normal random variables. Alternately, they proposed a normalized version of T H which is given by
N T H = k = 1 d n ( μ ^ 1 μ ^ 2 ) , φ ^ k 2 / λ ^ k ,
Then, under the null hypothesis, N T H has an asymptotic χ d 2 distribution.
(3)
F-test
Finally, we describe the testing procedure proposed by [13]. Although they proposed a functional F test in a functional linear regression setting, the method can be specified for our two sample test as follows.
The F-test statistic for our setting is
T F = R S S 0 R S S 1 R S S 1 / ( n 1 ) ,
where
R S S 1 = i = 1 n 1 0 1 ( X i ( t ) X ¯ ( t ) ) 2 d t + i = 1 n 2 0 1 ( Y i ( t ) Y ¯ ( t ) ) 2 d t , R S S 0 = i = 1 n 1 0 1 ( X i ( t ) Z ¯ ( t ) ) 2 d t + i = 1 n 2 0 1 ( Y i ( t ) Z ¯ ( t ) ) 2 d t , Z ¯ ( t ) = n 1 X ¯ ( t ) + n 2 Y ¯ ( t ) n 1 + n 2 .
Ref. [13] also presented the distribution of the F-test as k = 1 λ k χ 1 2 k = 1 λ k χ n 2 2 . In practice, they also applied the idea of [14] approximation to derive approximate distribution of F-statistic, that is, ( χ f 1 2 / f 1 ) / ( χ f 2 2 / f 2 ) , which is an ordinary F distribution with degrees of freedom f 1 and f 2 , where f 1 = ( k = 1 λ ^ k ) 2 / k = 1 λ ^ k 2 , f 2 = ( n 2 ) ( k = 1 λ ^ k ) 2 / k = 1 λ ^ k 2 .

3. Our Testing Procedure

In order to determine the number of PCs p n adaptively and find significant parts to construct a more powerful test statistic, we propose a two-stage procedure via a data-splitting technique. With the help of this technique, we can derive the distribution of the test statistic.
First, we assume that the sample size is even for simplicity and randomly split samples into two groups: ( X ( 1 ) , Y ( 1 ) ) and ( X ( 2 ) , Y ( 2 ) ) . In the first stage, we choose p n based on the adaptively truncated Hotelling T 2 order statistic via the first group sample ( X ( 1 ) , Y ( 1 ) ) . In the second stage, we construct the test statistic via the second group sample ( X ( 2 ) , Y ( 2 ) ) and p n .
Next, we show three methods of choosing p n in a general case.
Denote V ^ k ( 1 ) = n / 2 ( μ ^ 1 ( 1 ) μ ^ 2 ( 1 ) ) , φ ^ k ( 1 ) 2 / λ ^ k ( 1 ) , k = 1 , , d , . In practice, since many of the trailing eigenvalues are close to zero and V ^ k will be very large for a large k. Hence, we generally give the cutoff d n a high threshold, for example, d n = max { k : λ ^ k / t = 1 k λ ^ t > 0.001 } . For most inference problems, there is no optimal test, but the adaptive Neyman tests have been shown to work well against a broad range of alternatives. Therefore, we choose p n adaptively based on the following adaptive Neyman test methods. One method is maximizing normalized (8) (in the Appendix A we have proven V ^ k has an approximate χ 1 2 distribution), that is,
p 1 n = argmax 1 d d n k = 1 d V ^ k ( 1 ) d 2 d .
Considering some possible nonsignificant V ^ k ( 1 ) terms, we propose another method that maximizes the sun of normalized order statistic V ( k ) , that is,
p 2 n = argmax 1 d d n k = 1 d V ^ ( k ) ( 1 ) E ( k = 1 d V ^ ( k ) ( 1 ) ) var ( k = 1 d V ^ ( k ) ( 1 ) ) ,
where V ^ ( k ) ( 1 ) is the k-th order statistic of V ^ k ( 1 ) , k = 1 , , d in decreasing order. Unfortunately, there is no closed form since it involves order statistics. However, empirical approximations of this maximum value can be conducted by very fast Monte Carlo approximations. The third method is a hard threshold truncation method, but we truncate at the d-th term based on the percentage of V ^ k , which combines both the variation and the projection along each direction. In our simulation, we set a truncation threshold as [5] projection-based test for comparison.
Remark 1.
p 1 n and p 2 n are chosen adaptively based on the first group data. For the convenience of follow-up theoretical analysis, we denote them as p n uniformly.
After we derive p n in the first stage, we construct the following statistic based on the second group sample as follows:
T 1 A = k = 1 p n V ^ k ( 2 ) ,
T 2 A = k = 1 p n V ^ ( k ) ( 2 ) ,
N T 1 A = k = 1 p n V ^ k ( 2 ) p n 2 p n ,
N T 2 A = k = 1 p n V ^ ( k ) ( 2 ) E ( k = 1 p n V ^ ( k ) 2 ) var ( k = 1 p n V ^ ( k ) 2 ) ,
where
V ^ k ( 2 ) = n / 2 ( μ ^ 1 ( 2 ) μ ^ 2 ( 2 ) ) , φ ^ k ( 2 ) 2 / λ ^ k ( 2 )
To derive the asymptotic distribution of T 1 A , T 2 A , N T 1 A , and N T 2 A , we make the following assumptions:
Assumption 1.
There exist constants a > 1 and C such that λ k λ k + 1 C k a 1 for k 1 and C 1 k a λ k C k a .
Assumption 2.
l E X 4 < , l E Y 4 < .
Assumption 3.
τ = lim n n 1 n , 0 < τ < 1 .
Assumption 4.
p n 2 a + 3 m 1 = o p ( 1 ) , where m = min ( n 1 , n 2 ) .
Assumptions 1 and 2 are regular in functional principle component analysis (FPCA). Assumption 1 implies that λ k C k a . Because the covariance functions are bounded, one has a > 1 . Assumption 1 essentially assumes that all the eigenvalues are positive, but decay exponentially. Assumption 3 requires that the two sample sizes n 1 , n 2 tend to proportionally. Assumption 4 specifies the growing rate of p n . In related literature e.g., [15], to guarantee estimation consistency, p n is usually assumed to satisfy p n a + 1 n 1 = o p ( 1 ) .
Theorem 1.
Under H 0 , n ( X ¯ Y ¯ ) d G , where G is a Gaussian process with mean zero and covariance function Γ, where Γ ( s , t ) = Γ 1 ( s , t ) τ + Γ 2 ( s , t ) 1 τ .
The proof of Theorem 1 follows the trivial central limit of the stochastic process and we omit it.
Remark 2.
We can write G as G = j = 1 η j λ j φ j , where η j , j 1 are i.i.d centered real Gaussian random variables with variance 1.
Theorem 2.
Under Assumptions 1–4, there exist some increasing sequences ( p n ) n such that under H 0
lim p n P ( N T 1 A x ) = Φ ( x ) ,
where Φ ( x ) denotes the cumulative distribution function (cdf) of standard normal distribution.
Theorem 3.
Under Assumptions 1–4, the test statistic T 2 A is approximately equivalent to k = 1 p n χ ( k ) 2 (1) under H 0 , where χ ( 1 ) 2 ( 1 ) , , χ ( p n ) 2 ( 1 ) are order statistics (in a decreasing order) of p n χ 2 ( 1 ) random variables.
Remark 3.
The asymptotic null distribution of T 2 A is affected not only by the values of V ^ k ( 2 ) , but also by the order of them. In practice, empirical approximations of quantiles and tail probability of the null distribution of T 2 A can be deduced by very fast Monte Carlo approximations.
Theorem 4.
Under Assumptions 1–4 and H 0 ,
lim ( m , p n ) P ( N T 2 A x ) = Φ ( x ) ,
where Φ ( x ) denotes the cdf of the standard normal distribution.
To obtain the asymptotic distribution of N T 1 A under the alternative in (3), we choose the local alternative, as defined in the following assumption:
Assumption 5.
H 1 n : μ 1 ( t ) μ 2 ( t ) = n 1 2 u ( t ) ,
where u ( t ) is any fixed real function such that 0 < u < .
Then, we have the following asymptotic power of N T 1 A :
Theorem 5.
Under Assumptions 1–6, the asymptotic distribution of the N T 1 A is given by
lim ( m , p n ) P ( N T 1 A > z 1 α ) = Φ ( z 1 α + Γ 12 1 u ( t ) 2 ) ,
where P denotes that the distribution have been obtained under the alternative, and z 1 α is the upper ( 1 α ) 100 % point of the standard normal distribution.
Theorem 6.
Under Assumptions 1–5, the distribution of the T 2 A is approximately equivalent to a noncentral χ 2 distribution χ p n 2 * ( ζ 0 ) , where
ζ 0 = k = 1 p n V ( k ) 2 ,
and V k 2 = n / 2 ( μ 1 μ 2 ) , φ k ( 2 ) 2 λ k ( 2 ) = u ( t ) , φ k ( 2 ) 2 λ k ( 2 ) .

4. Simulations

In this section, we report some Monte Carlo simulation results to compare the finite samples performance of the classical and proposed methods on the two sample mean testing problem under different settings, including a fixed simple alternative and sparse signals with varying locations.

4.1. Fixed Simple Alternative

In this subsection, we first look at a simple setting where the alternatives are fixed. We generate curves from two populations that are generated by 40 Fourier bases as
X ( t ) = k = 1 40 ( θ k 1 / 2 z 1 k + μ 1 k ) ϕ k ( t ) , Y ( t ) = k = 1 40 ( θ k 1 / 2 z 2 k + μ 2 k ) ϕ k ( t ) .
Here, z 1 k , z 2 k are independent standard normal random variables. In each case, we take ϕ k ( t ) = 2 sin ( ( k 0.5 ) π t ) , t [ 0 , 1 ] for k = 1 , 2 , and generate the data on a discrete grid of 100 equispaced points in [0, 1]. We took θ k = 1 / ( π ( k 0.5 ) ) 2 . We choose μ 1 k s and μ 2 k s depending on the property that we want to illustrate (see below). We compare power and size under three different methods: Ref. [8] L 2 -norm based test ( T L ); Ref. [13]’s F-test T F ; Ref. [5]’s projection-based test ( T H ) with fixed truncation and our two methods. We choose the commonly used threshold 99 % to determine the truncated term in [5]’s projection-based test ( T H ). The results are based on 1000 Monte Carlo replications. In all scenarios, we set the nominal size α = 0.05 .
To cover as many different scenes as possible, we set five different settings referring to the mean difference: (1) the mean differences arise early in the sequence μ 1 k s and μ 2 k s , that is, ( μ 11 , μ 12 , μ 13 , μ 14 , μ 15 , μ 16 ) = ( 0.5 , 0.5 , 1.5 , 0.5 , 1.5 , 0.5 ) and μ 1 k = 0 for k > 6 , μ 2 k = μ 1 k + δ for k 6 , μ 2 k = 0 for k > 6 ; (2) the mean differences arise in the middle of the sequence μ 1 k s and μ 2 k s , that is, ( μ 1 , 11 , μ 1 , 12 , μ 1 , 13 , μ 1 , 14 , μ 1 , 15 , μ 1 , 16 ) = ( 0.5 , 0.5 , 1.5 , 0.5 , 1.5 , 0.5 ) and μ 1 k = 0 for other k, μ 2 k = μ 1 k + δ for 11 k 16 , μ 2 k = 0 for other k; (3) the mean differences arise in the latter part of the sequence μ 1 k s and μ 2 k s , that is, ( μ 1 , 21 , μ 1 , 22 , μ 1 , 23 , μ 1 , 24 , μ 1 , 25 , μ 1 , 26 ) = ( 0.5 , 0.5 , 1.5 , 0.5 , 1.5 , 0.5 ) and μ 1 k = 0 for other k, μ 2 k = μ 1 k + δ for 21 k 26 , μ 2 k = 0 for other k; (4) the mean differences are scattered in the first, middle and latter part, that is, ( μ 1 , 11 , μ 1 , 12 , μ 1 , 13 , μ 1 , 21 , μ 1 , 22 , μ 1 , 23 ) = ( 0.5 , 0.5 , 1.5 , 0.5 , 1.5 , 0.5 ) and μ 1 k = 0 for other k, μ 2 k = μ 1 k + δ for k { 1 , 2 , 11 , 12 , 21 , 22 } , μ 2 k = 0 for other k; (5) the tiny differences appear in all the principal components. In this case, we set μ 1 k as independent N ( 0 , 1 ) random variables, and μ 2 k = μ 1 k + δ , 1 k 40 .
From Table 1, Table 2, Table 3, Table 4 and Table 5, we can see that these are obvious different performances in different settings. From Table 1, we can see that when the mean difference lies in early part in the sequence, Ref. [5]’s projection-based test ( T H ) has most powerful performance. This should not be surprising, because their method just chooses projection space spanned by the first few eigenfunctions, where the mean difference lies. From Table 2 we observe that when mean difference lies in the middle part in the sequence, our method has very high power compared to T H , T F and T L . Particularly, we notice that Ref. [5]’s projection-based test ( T H ) has a dramatic power loss. From Table 3, we can see that when the mean difference lies in the latter part of the sequence, our method still has the best performance. At the same time, we can find that T F and T L have higher power then T H in this case. This illustrates that T F and T L are sensitive to divergence degree and T H is more sensitive to location of mean difference. Furthermore, we notice that the power of T L and T F outperform our method only on large sample sizes and large discrepancies between the null hypothesis and alternative hypothesis. This is understandable, because our method also depends on mean difference projection on space spanned by the eigenfunction, excluding last few eigenfunctions. Table 4 and Table 5 illustrate more general cases. Table 4 demonstrates that when there are tiny differences in all directions, our method is still the most powerful, while T L and T F are useless. Table 5 demonstrates the performance of each method when there are differences in the early part, middle part, and latter part. From the simulation results, we can see that in this general case, our method has the most satisfactory performance.
We also conducted simulation studies under other similar scenarios. As they demonstrated similar patterns to those discussed above, we omit them here to save space.
It is worth noting that the proposed method has a first stage with randomly split data. There could be a potential limitation with large randomness. In order to understand the robustness for this splitting, we perform some supporting simulation studies, again considering multi-fold cross validation (CV), including two-fold CV, five-fold CV, and ten-fold CV. For convenience, we use the same data setting of Table 5. The results are shown in Table 6. From Table 6, we can see that in most cases, the hypothesis is robust for this splitting.

4.2. Sparse Signals with Varying Locations

In this section, we demonstrate the performance of different tests under signals with varying locations. We set μ 1 k ( 1 k 40 ) as independent U ( 0 , 1 ) random variables, and μ 2 k = μ 1 k + δ , where k is six random locations out of 40. In Setting 1, the signal of difference appears randomly in six of the first twenty principal components; we denote M ( 20 , 6 ) . In Setting 2, the signal of difference appears in six of forty principal components; we denote M ( 40 , 6 ) . The simulation results are illustrated in Table 7 and Table 8. From the simulation results, we can see that in these cases, our method also have the most satisfactory performance.
We also compare our method with the full-sample adaptive methods, including the adaptive Neyman–Pearson test ( T A ) and the ordered adaptive test T O A . In the full sample, the distribution of adaptive Neyman–Pearson test ( T A ) and ordered adaptive test T O A is notable, that is,
T A = max 1 d p n k = 1 d V ^ k ( 1 ) d 2 d ,
T O A = max 1 d p n k = 1 d V ^ ( k ) ( 1 ) E ( k = 1 d V ^ ( k ) ( 1 ) ) var ( k = 1 d V ^ ( k ) ( 1 ) ) .
We use permutation method to calculate size and power. Table 9 illustrates the simulation results, where Time1 is the time of running once T A and T O A and Time2 is the time to run N T 1 A and N T 2 A once. The unit is seconds. From Table 9, we can see that our splitting sample methods have a slight power loss compared to the adaptive Neyman–Pearson test T A and the ordered adaptive test T O A . However, we can save significant time in real computing.

5. Application

In this section, we apply our proposed hypothesis testing procedures to a real PM 2.5 dataset in Beijing, Tianjin, and Shijiazhuang between January 2017 and December 2019. The dataset was downloaded from the website http://www.tianqihoubao.com/aqi/, accessed on 10 June 2021. The data readings were taken every day, so the total data size is 1085 for every city. Beijing is surrounded by Tianjin and Shijiazhuang. Therefore, we want to know more about the average PM 2.5 difference in these three areas. The following Figure 1 and Figure 2 show the mean PM 2.5 ( μ g/m 3 ) in Beijing, Tianjin, and Hebei Province in different time periods. There are some missing days in some cycles. Note that Figure 1 shows negative values at the beginning for a measure that is always greater than zero because of B-spline approximation.
It is obvious that PM 2.5 changes over individual periods. Here, we test whether there is a significant difference in PM 2.5 among the three cities using the method we proposed. First, the sample is divided into two data sets: the training sample is the dataset in 2017; the test sample is the dataset in 2018–2019. The principle components are adaptively based on the training sample. Then, the test statistic is constructed via the test sample and the principle components are selected by the training sample. To test whether there is significant difference in PM 2.5 between the three cities, we carry out permutations 1000 times within each group to calculate the rejection proportions; then, we obtain the p-value of the test. The results are shown in Table 10.
From Table 10, we can see that all p-values are less than 0.05. The tests are statistically significant and suggest that the average PM 2.5 in these three areas are different from each other at a 0.05 level of significance.

6. Conclusions and Discussions

In this paper, we consider the problem of testing the equality of mean functions in two random samples independently drawn from two functional random variables. We develop and study a novel testing procedure that has a more powerful ability to detect mean difference. In general, it includes two stages: first, splitting the sample into two parts and selecting principle components adaptively based on the first half-sample; then, constructing the test statistic based on another half-sample. An extensive simulation study is presented, which shows that the proposed test works very well in comparison with several other methods in a variety of settings. Our future project is to detect differences in the covariance functions of independent sample curves. There have been some approaches proposed so far to address this problem, for instance, the factor-based test proposed by [4] and the regularized M-test introduced by [16].

Author Contributions

Data curation, S.F.; Funding acquisition, S.F.; Investigation, J.Z.; Methodology, J.Z.; Project administration, Y.H.; Validation, Y.H.; Writing—original draft, J.Z.; Writing—review & editing, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by China National Institute of Standardization through the “Special funds for basic R.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of the Main Results

Before proving the main results, we introduce the following useful lemmas. Furthermore, for the convenience of notation, we give proofs in the full sample.
Lemma A1.
Under Assumptions 1–4, we have
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ ^ k = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k + o p ( 1 ) .
Proof. 
Denote ε p ( n ) = { Δ ^ = | | | Γ 12 Γ ^ 12 | | | 1 2 λ p n } ; note that provided ε p ( n ) holds, we have
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ ^ k 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k = 1 2 p n k = 1 p n n ( λ k λ ^ k ) λ ^ k λ k f ¯ g ¯ , φ ^ k 2 1 2 p n k = 1 p n n X ¯ Y ¯ , φ ^ k 2 λ ^ k λ k sup λ ^ k λ k 1 2 p n k = 1 p n n X ¯ Y ¯ , φ ^ k 2 λ ^ k 2 | | | Γ 12 Γ ^ 12 | | | .
It can be proven easily that E | | | Γ 1 Γ ^ 1 | | | 2 = O ( n 1 1 ) , E | | | Γ 2 Γ ^ 2 | | | 2 = O ( n 2 1 ) ; then, | | | Γ 12 Γ ^ 12 | | | = O p ( m 1 / 2 ) .
According to central limit theory, we have
n X ¯ Y ¯ , φ ^ k λ ^ k d N ( 0 , 1 ) ,
Then, n X ¯ Y ¯ , φ ^ k 2 λ ^ k d χ 1 2 , which means n X ¯ Y ¯ , φ ^ k 2 λ ^ k is bounded in probability.
Notice that λ ^ k λ k = O p ( m 1 / 2 ) ; therefore
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ ^ k 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k O p ( 1 2 p n m 1 / 2 p n a + 1 ) = o p ( 1 ) .
Ref. [15] has proven that P ( ε p ) 1 as m ; thus, Lemma A1 holds. □
Lemma A2.
Under Assumptions 1–4, we have
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ k 2 λ k + o p ( 1 ) .
Proof. 
First,
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k φ k + φ k 2 λ k = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k φ k 2 λ k + 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ k 2 λ k + 1 2 p n k = 1 p n 2 n μ ^ 1 μ ^ 2 , φ ^ k φ k μ ^ 1 μ ^ 2 , φ k λ k .
Then,
1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k 2 λ k 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ k 2 λ k = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k φ k 2 λ k + 1 2 p n k = 1 p n 2 n μ ^ 1 μ ^ 2 , φ ^ k φ k μ ^ 1 μ ^ 2 , φ k λ k = I 1 + I 2 .
where
I 1 = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k φ k 2 λ k ,
I 2 = 1 2 p n k = 1 p n 2 n μ ^ 1 μ ^ 2 , φ ^ k φ k μ ^ 1 μ ^ 2 , φ k λ k .
It is obvious that
k = 1 p n n ( X ¯ Y ¯ ) , φ ^ k φ k 2 λ k 1 k = 1 p n n ( f ¯ g ¯ ) 2 · φ ^ k φ k 2 λ k 1 .
It can also be easily proven that n ( X ¯ Y ¯ ) 2 = O p ( 1 ) . According to the result of [17], we have φ ^ k φ k = O p ( k m 1 / 2 ) under corresponding conditions. Then, we have
I 1 = O p ( 1 2 p n k = 1 p n k 2 m 1 k a ) = O p ( 1 2 p n p n 3 m 1 ) = o p ( 1 ) .
Similarly, we can prove I 2 = o p ( 1 ) . □
Lemma A3.
Under Assumptions 1–4, we have
T * = 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ k 2 λ k 1
which converges in distribution to a centered Gaussian random variable g with variance 1.
The proof of Lemma A3 is similar to the techniques used by [18], so we omit it here.
Proof of Theorem 2.
Combine Lemmas A1 and A2 with Lemma A3; then, can proof Theorem 2. □
Proof of Theorem 3.
According to Theorem 1, Lemmas A1 and A2, we have V ^ k N ( 0 , 1 ) and V ^ k 2 χ 2 ( 1 ) . Then, the conclusion is obvious. □
Proof of Theorem 5.
We note that
n μ ^ 1 μ ^ 2 , φ k 2 = n μ ^ 1 μ ^ 2 μ 1 + μ 2 , φ k 2 + 2 n μ ^ 1 μ ^ 2 , φ k μ 1 μ 2 , φ k n μ 1 μ 2 , φ k 2 , = J k 1 + 2 J k 2 J k 3 .
where J k 1 = n μ ^ 1 μ ^ 2 μ 1 + μ 2 , φ k 2 ,   J k 2 = n μ ^ 1 μ ^ 2 , φ k μ 1 μ 2 , φ k ,   J k 3 = n μ 1 μ 2 , φ k 2 . Then
1 2 p n k = 1 p n V ^ k = 1 2 p n k = 1 p n ( J k 1 + 2 J k 2 J k 3 ) / λ ^ k .
Observe that
1 2 p n k = 1 p n ( J k 2 J k 3 ) / λ ^ k = 1 2 p n k = 1 p n n μ 1 μ 2 , φ k μ ^ 1 μ ^ 2 μ 1 + μ 2 , φ k / λ ^ k .
According to (A2), we have
n μ ^ 1 μ ^ 2 μ 1 + μ 2 , φ k / λ ^ k = O p ( 1 ) .
By Assumptions 1–5 and Lemma A1, we have that
1 2 p n k = 1 p n μ 1 μ 2 , φ k / λ ^ k 1 2 p n k = 1 p n n 1 2 u ( t ) / λ k = O p n 1 2 + b k = 1 p n k a 2 p n = O p n 1 2 p n a + 1 = o p ( 1 ) .
Under Assumptions 4 and 5, we have
lim p n 1 2 p n k = 1 p n J k 2 / λ ^ k = lim p n 1 2 p n k = 1 p n n μ ^ 1 μ ^ 2 , φ ^ k μ 1 μ 2 , φ ^ k / λ ^ k = lim p n 1 2 p n k = 1 p n n μ 1 μ 2 , φ k 2 λ k = lim p n k = 1 p n u ( t ) , φ k 2 / λ k = Γ 12 1 u ( t ) 2 .
From Theorem 2 and the above results, we have
P ( N T 1 A z 1 α ) = P k = 1 p n V ^ k p n 2 p n z 1 α = P k = 1 p n J k 1 / λ ^ k p n 2 p n z 1 α k = 1 p n J k 2 / λ ^ k 2 p n
From (A2) we can obtain J k 1 λ ^ k d χ 1 2 ; then,
lim ( m , p n ) P k = 1 p n J k 1 / λ ^ k p n 2 p n x = Φ ( x ) ,
Combined with (A4), we have
lim ( m , p n ) P ( N T 1 A > z 1 α ) = Φ ( z 1 α + Γ 12 1 u ( t ) 2 ) .
Proof of Theorem 6.
By Lemma A2, we have ( k = 1 p n V ^ k k = 1 p n V k ) P 0 as n . Define ( k 1 * , , k p n * ) as decreasing orders of V 1 , , V p n and ( k 1 , , k p n ) as decreasing orders of V ^ 1 , , V ^ p n . Ref. [19] have proven that the random orders in the selection procedure V ^ ( 1 ) , , V ^ ( p n ) are asymptotically equivalent to fixed orders V ( 1 ) , , V ( p n ) . □

References

  1. Besse, P.; Ramsay, J. Principal components analysis of sampled functions. Psychometrika 1986, 51, 285–311. [Google Scholar] [CrossRef]
  2. Rice, J.A.; Silverman, B.W. Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 1991, 53, 233–243. [Google Scholar] [CrossRef]
  3. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer Press: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  4. Ferraty, F.; Vieu, P.; Viguier-Pla, S. Factor-based comparison of groups of curves. Comput. Stat. Data Anal. 2007, 51, 4903–4910. [Google Scholar] [CrossRef]
  5. Horváth, L.; Kokoszka, P. Inference for Functional Data with Applications; Springer Press: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  6. Fan, J.Q.; Lin, S.K. Test of significance when data are curves. J. Am. Stat. Assoc. 1998, 93, 1007–1021. [Google Scholar] [CrossRef]
  7. Faraway, J.J. Regression analysis for a functional response. Technometrics 1997, 39, 254–261. [Google Scholar] [CrossRef]
  8. Zhang, C.Q.; Peng, H.; Zhang, J.T. Two samples tests for functional data. Commun. Stat.-Methods 2010, 39, 559–578. [Google Scholar] [CrossRef]
  9. Zhang, J.T. Statistical inferences for linear models with functional responses. Stat. Sin. 2011, 21, 1431–1451. [Google Scholar] [CrossRef] [Green Version]
  10. Zhang, J.T.; Liang, X. One-way ANOVA for functional data via globalizing the pointwise F-test. Scand. J. Stat. 2014, 45, 51–71. [Google Scholar] [CrossRef]
  11. Zhang, J.T.; Cheng, M.Y.; Wu, H.T.; Zhou, B. A new test for functional one-way ANOVA with applications to ischemic heart screening. Comput. Stat. Data Anal. 2019, 132, 3–17. [Google Scholar] [CrossRef]
  12. Wasserman, L.; Roeder, K. High dimensional variable selection. Ann. Stat. 2009, 37, 2178–2201. [Google Scholar] [CrossRef] [PubMed]
  13. Shen, Q.; Faraway, J.L. An F test for linear models with functional responses. Stat. Sin. 2004, 14, 1239–1257. [Google Scholar]
  14. Satterthwaites, F.E. Synthesis of variance. Psychometrika. 1941, 6, 309–3167. [Google Scholar] [CrossRef]
  15. Hall, P.; Hosseini-Nasab, M. On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2006, 68, 109–126. [Google Scholar] [CrossRef] [Green Version]
  16. Kraus, D.; Panaretos, V.M. Dispersion operators and resistant second-order functional data analysis. Biometrika 2012, 99, 813–832. [Google Scholar] [CrossRef] [Green Version]
  17. Kong, D.; Xue, K.; Yao, F.; Zhang, H.H. Partially functional linear regression in high dimensions (supplementary material). Biometrika 2016, 103, 147–159. [Google Scholar] [CrossRef] [Green Version]
  18. Shang, Y.L. A central limit theorem for randomly indexed m-dependent random variables. Filomat 2012, 26, 713–717. [Google Scholar] [CrossRef] [Green Version]
  19. Su, Y.R.; Di, C.Z.; Li, H. Hypothesis testing in functional linear models. Biometrics 2017, 73, 551–561. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The mean PM 2.5 (μg/m 3 ) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2019. The black line stands for Beijing. The red dasehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.
Figure 1. The mean PM 2.5 (μg/m 3 ) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2019. The black line stands for Beijing. The red dasehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.
Mathematics 10 04060 g001
Figure 2. The mean PM 2.5 (μg/m 3 ) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2017. The black line stands for Beijing. The red asehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.
Figure 2. The mean PM 2.5 (μg/m 3 ) of Beijing, Tianjin, and Shijiazhuang from December 2019 to January 2017. The black line stands for Beijing. The red asehed line stands for Tianjin. The green dotted line stands for Shijiazhuang.
Mathematics 10 04060 g002
Table 1. Size and power of five methods in Setting 1 (mean difference lies in early part μ 2 [ 1 : 6 ] = μ 1 [ 1 : 6 ] + δ ).
Table 1. Size and power of five methods in Setting 1 (mean difference lies in early part μ 2 [ 1 : 6 ] = μ 1 [ 1 : 6 ] + δ ).
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
n 1 = 120 0.000.0670.0650.0680.0450.055
 0.100.1510.1050.4610.3570.365
 0.200.2450.2580.7860.4830.471
 0.300.4050.4130.9980.8720.805
n 1 = 220 0.000.0450.0530.0660.0510.065
 0.100.2650.2530.6430.4820.471
 0.200.3850.3930.8810.5640.585
 0.300.5530.5120.9990.9760.918
Table 2. Size and power of five methods in Setting 2 (mean difference lies in middle part μ 2 [ 11 : 16 ] = μ 1 [ 11 : 16 ] + δ ).
Table 2. Size and power of five methods in Setting 2 (mean difference lies in middle part μ 2 [ 11 : 16 ] = μ 1 [ 11 : 16 ] + δ ).
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
n 1 = 120 0.000.0520.0630.0550.0670.066
 0.100.3520.3550.3840.8760.923
 0.200.4450.4660.4480.9980.999
 0.300.5810.5920.5440.9990.999
n 1 = 220 0.000.0340.0450.0660.0680.063
 0.100.4420.4610.5830.9750.923
 0.200.5650.5430.6540.9990.999
 0.300.6750.6910.6850.9990.999
Table 3. Size and power of five methods in Setting 3 (mean difference lies in latter part μ 2 [ 21 : 26 ] = μ 1 [ 21 : 26 ] + δ ).
Table 3. Size and power of five methods in Setting 3 (mean difference lies in latter part μ 2 [ 21 : 26 ] = μ 1 [ 21 : 26 ] + δ ).
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
n 1 = 120 0.000.0350.0520.0540.0550.042
 0.100.3420.3550.3720.4860.497
 0.200.5650.5520.5930.7440.795
 0.300.7250.7530.7650.9960.985
n 1 = 220 0.000.0650.0610.0450.0520.055
 0.100.4310.4250.4560.5270.588
 0.200.6280.6350.6810.8230.885
 0.300.8640.8750.8240.9990.999
Table 4. Size and power of five methods in Setting 4 (mean difference lies in separable part μ 2 [ 1 : 2 ] = μ 1 [ 1 : 2 ] + δ , μ 2 [ 11 : 12 ] = μ 1 [ 11 : 12 ] + δ , μ 2 [ 21 : 22 ] = μ 1 [ 21 : 22 ] + δ ).
Table 4. Size and power of five methods in Setting 4 (mean difference lies in separable part μ 2 [ 1 : 2 ] = μ 1 [ 1 : 2 ] + δ , μ 2 [ 11 : 12 ] = μ 1 [ 11 : 12 ] + δ , μ 2 [ 21 : 22 ] = μ 1 [ 21 : 22 ] + δ ).
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
  n 1 = 120 0.000.0450.0530.0670.0680.062
 0.100.1620.1650.3260.4630.554
 0.200.2050.2720.4660.9630.955
 0.300.3610.320.5650.9990.999
  n 1 = 220 0.000.0450.0380.0670.0450.068
 0.100.1920.2150.3930.4970.605
 0.200.2820.3140.5160.9990.999
 0.300.4610.4650.6891.0001.000
Table 5. Size and power of five methods in Setting 5 (difference averaged in total vector μ 2 = μ 1 + δ 1 40 ).
Table 5. Size and power of five methods in Setting 5 (difference averaged in total vector μ 2 = μ 1 + δ 1 40 ).
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
n 1 = 120 0.000.0450.0520.0680.0650.053
 0.100.1820.1790.3660.7940.836
 0.200.3720.3640.8540.9990.999
 0.300.5770.5640.8961.0001.000
  n 1 = 220 0.000.0360.0450.0620.0630.065
  0.100.2120.2490.9050.8260.887
 0.200.4130.4240.9350.9990.999
 0.300.6250.6360.9431.0001.000
Table 6. Size and power of N T 2 A in Setting 5 (difference averaged in total vector μ 2 = μ 1 + δ 1 40 ).
Table 6. Size and power of N T 2 A in Setting 5 (difference averaged in total vector μ 2 = μ 1 + δ 1 40 ).
δ
n 2 = n 1 = 120 0.000.10.20.3
two-fold CV0.0580.4680.7860.947
five-fold CV0.0560.4590.7720.943
ten-fold CV0.0530.4650.7650.978
Table 7. Size and power of five methods under randomized signal in Setting 1 M ( 20 , 6 ) .
Table 7. Size and power of five methods under randomized signal in Setting 1 M ( 20 , 6 ) .
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
  n 1 = 120 0.000.0340.0350.0460.0570.065
 0.100.2620.2750.3660.5940.606
 0.200.3750.3840.5890.9260.935
 0.300.5340.5480.8250.9990.999
  n 1 = 220 0.000.0670.0560.0630.0660.068
 0.100.3090.3150.4170.6280.733
 0.200.4230.4390.6150.9860.995
 0.300.6740.6950.9230.9990.999
Table 8. Size and power of five methods under randomized signal in Setting 2 M ( 40 , 6 ) .
Table 8. Size and power of five methods under randomized signal in Setting 2 M ( 40 , 6 ) .
n 2 = n 1 δ T L T F T H NT 1 A NT 2 A
  n 1 = 120 0.0000.0630.0380.0670.0650.063
 0.10.2750.2860.4470.4570.478
 0.20.3550.3720.7430.7850.799
 0.30.4260.4650.8780.9430.966
  n 1 = 220 0.000.0420.0530.0640.0660.065
 0.10.3230.3760.5240.5390.567
 0.20.4650.4860.8740.9230.995
 0.30.5490.5630.9250.9990.999
Table 9. Size and power of seven methods under randomized signal in Setting 2 M ( 40 , 6 ) .
Table 9. Size and power of seven methods under randomized signal in Setting 2 M ( 40 , 6 ) .
n 2 = n 1 δ T L T F T H T A T OA NT 1 A NT 2 A Time1Time2
 1200.000.0450.0520.0450.0630.0650.0640.06141.45751.1953
 0.100.2210.2360.2520.3150.4030.2240.36741.62081.1964
 0.200.4150.4540.6210.8830.9660.7490.85137.40701.0789
 0.300.6260.6840.8270.9940.9750.8910.89636.45921.0681
 1800.000.0510.0620.0540.0570.0540.0630.06734.67821.4470
 0.100.2930.2960.3050.4830.4910.3170.42841.62081.1964
 0.200.4950.5210.7150.9360.9950.8360.92237.40701.0789
 0.300.7210.7840.9350.9990.9999.9660.97536.45921.0681
Table 10. p-value of two tests.
Table 10. p-value of two tests.
Beijing vs. TianjinBeijing vs. ShijiazhuangTianjin vs. Shijiazhuang
Test N T 1 A N T 2 A N T 1 A N T 2 A N T 1 A N T 2 A
p-value0.0350.0340.0250.0320.0390.045
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, J.; Feng, S.; Hu, Y. Two-Sample Hypothesis Test for Functional Data. Mathematics 2022, 10, 4060. https://doi.org/10.3390/math10214060

AMA Style

Zhao J, Feng S, Hu Y. Two-Sample Hypothesis Test for Functional Data. Mathematics. 2022; 10(21):4060. https://doi.org/10.3390/math10214060

Chicago/Turabian Style

Zhao, Jing, Sanying Feng, and Yuping Hu. 2022. "Two-Sample Hypothesis Test for Functional Data" Mathematics 10, no. 21: 4060. https://doi.org/10.3390/math10214060

APA Style

Zhao, J., Feng, S., & Hu, Y. (2022). Two-Sample Hypothesis Test for Functional Data. Mathematics, 10(21), 4060. https://doi.org/10.3390/math10214060

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop