In this section, we report some Monte Carlo simulation results to compare the finite samples performance of the classical and proposed methods on the two sample mean testing problem under different settings, including a fixed simple alternative and sparse signals with varying locations.
4.1. Fixed Simple Alternative
In this subsection, we first look at a simple setting where the alternatives are fixed. We generate curves from two populations that are generated by 40 Fourier bases as
Here,
are independent standard normal random variables. In each case, we take
for
and generate the data on a discrete grid of 100 equispaced points in [0, 1]. We took
. We choose
and
depending on the property that we want to illustrate (see below). We compare power and size under three different methods: Ref. [
8]
-norm based test (
); Ref. [
13]’s F-test
; Ref. [
5]’s projection-based test (
) with fixed truncation and our two methods. We choose the commonly used threshold
to determine the truncated term in [
5]’s projection-based test (
). The results are based on 1000 Monte Carlo replications. In all scenarios, we set the nominal size
.
To cover as many different scenes as possible, we set five different settings referring to the mean difference: (1) the mean differences arise early in the sequence and , that is, and for , for , for ; (2) the mean differences arise in the middle of the sequence and , that is, and for other k, for , for other k; (3) the mean differences arise in the latter part of the sequence and , that is, and for other k, for , for other k; (4) the mean differences are scattered in the first, middle and latter part, that is, and for other k, for , for other k; (5) the tiny differences appear in all the principal components. In this case, we set as independent random variables, and .
From
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5, we can see that these are obvious different performances in different settings. From
Table 1, we can see that when the mean difference lies in early part in the sequence, Ref. [
5]’s projection-based test (
) has most powerful performance. This should not be surprising, because their method just chooses projection space spanned by the first few eigenfunctions, where the mean difference lies. From
Table 2 we observe that when mean difference lies in the middle part in the sequence, our method has very high power compared to
,
and
. Particularly, we notice that Ref. [
5]’s projection-based test (
) has a dramatic power loss. From
Table 3, we can see that when the mean difference lies in the latter part of the sequence, our method still has the best performance. At the same time, we can find that
and
have higher power then
in this case. This illustrates that
and
are sensitive to divergence degree and
is more sensitive to location of mean difference. Furthermore, we notice that the power of
and
outperform our method only on large sample sizes and large discrepancies between the null hypothesis and alternative hypothesis. This is understandable, because our method also depends on mean difference projection on space spanned by the eigenfunction, excluding last few eigenfunctions.
Table 4 and
Table 5 illustrate more general cases.
Table 4 demonstrates that when there are tiny differences in all directions, our method is still the most powerful, while
and
are useless.
Table 5 demonstrates the performance of each method when there are differences in the early part, middle part, and latter part. From the simulation results, we can see that in this general case, our method has the most satisfactory performance.
We also conducted simulation studies under other similar scenarios. As they demonstrated similar patterns to those discussed above, we omit them here to save space.
It is worth noting that the proposed method has a first stage with randomly split data. There could be a potential limitation with large randomness. In order to understand the robustness for this splitting, we perform some supporting simulation studies, again considering multi-fold cross validation (CV), including two-fold CV, five-fold CV, and ten-fold CV. For convenience, we use the same data setting of
Table 5. The results are shown in
Table 6. From
Table 6, we can see that in most cases, the hypothesis is robust for this splitting.