Weighted Graph-Based Two-Sample Test via Empirical Likelihood

Zhao, Xiaofeng; Yuan, Mingao

doi:10.3390/math12172745

Open AccessFeature PaperArticle

Weighted Graph-Based Two-Sample Test via Empirical Likelihood

by

Xiaofeng Zhao

¹

and

Mingao Yuan

^2,*

¹

School of Mathematics and Statistics, North China University of Water Resources and Electric Power, Zhengzhou 450045, China

²

Department of Statistics, North Dakota State University, Fargo, ND 58103, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2745; https://doi.org/10.3390/math12172745

Submission received: 2 August 2024 / Revised: 30 August 2024 / Accepted: 2 September 2024 / Published: 4 September 2024

(This article belongs to the Special Issue Network Biology and Machine Learning in Bioinformatics)

Download

Browse Figure

Versions Notes

Abstract

:

In network data analysis, one of the important problems is determining if two collections of networks are drawn from the same distribution. This problem can be modeled in the framework of two-sample hypothesis testing. Several graph-based two-sample tests have been studied. However, the methods mainly focus on binary graphs, and many real-world networks are weighted. In this paper, we apply empirical likelihood to test the difference in two populations of weighted networks. We derive the limiting distribution of the test statistic under the null hypothesis. We use simulation experiments to evaluate the power of the proposed method. The results show that the proposed test has satisfactory performance. Then, we apply the proposed method to a biological dataset.

Keywords:

two-sample test; weighted graph; empirical likelihood

MSC:

60K35; 05C80

1. Introduction

Comparing the distributions of two samples is a fundamental problem in statistics. This problem is known as two-sample hypothesis test. Under the null hypothesis, the distributions are equal, while under the alternative hypothesis, the distributions are different. In practice, the two-sample hypothesis test has various applications. In the medical field, the two-sample test is widely used in clinic trial experiments [1]. In the biology field, researchers apply two-sample tests to distinguish the expression of genes [2]. In manufacturing companies, a two-sample test is carried out to choose more efficient producing processes and examine product quality [3]. In the social science field, the two-sample test is intended for making comparisons of people with different races, genders, ethnicities, etc. [4]. The two-sample test problem has been studied extensively in the literature. The classical two-sample hypothesis tests include the two-sample t test, Hotelling’s T-squared test, the Wilcoxon test, and the Kolmogorov–Smirnov test.

A network is a structure that represents a group of objects and relationships between them. In mathematics, it is known as a graph. A network structure consists of nodes and edges, with nodes representing objects and edges representing the relationships between those objects. In the past decades, network data analysis has received intense attentions.

In many applications, a number of graphs from several populations are available. A natural question is whether the graph samples are from the same distribution. This problem can be formulated as a graph-based two-sample test. Under the null hypothesis, the distributions are the same. Under the alternative hypothesis, the distributions are different. The graph-based two-sample test has wide applications. Ref. [5] used a graph-based two-sample test to study how different screening rules may influence the diversification benefits of portfolios in asset management and wealth management. In brain network data analysis, a graph-based two-sample test can be employed to distinguish between various brain disorders [6,7].

Graph-based two-sample testing has been widely studied. Ref. [8] considered testing whether two networks have the same distribution and proposed a consistent and minmax optimal two-sample test. Ref. [9] propose two new tests for a small two-sample setting. Ref. [10] provide sufficient conditions under which it is possible to test the difference between two populations of inhomogeneous random graphs. Refs. [11,12] study the two-sample problem on a regime of random dot product graphs, and their test statistics are based on the kernel function of the spectral decomposition of the adjacency matrix. Refs. [13,14] propose a test based on subgraph counts. Ref. [15] propose a powerful test for a weighted graph-based two-sample test. However, the aforementioned graph-based two-sample tests present drawbacks in the following aspects: (1) Most of the tests are developed for binary (or unweighted) graphs. (2) The independence assumption of edges poses a strong condition which makes the tests conservative in some sense.

In practice, many real-world networks are weighted and the edges are correlated. For example, in brain networks, the edges are constructed based on correlations or other association measures between two brain regions. The association measures are weights of the edges, and they may be correlated [16]. In this paper, we study a weighted graph-based two-sample test. We propose a novel graph-based two-sample test based on empirical likelihood [17]. Empirical likelihood is a nonparametric method that does not require the form of the underlying distribution of data, and it retains some of the advantages of likelihood-based inference. We derive the asymptotic distribution of the test statistic under the null hypothesis. We use a simulation to study the power of the proposed test. We apply the proposed method to real-world weighted networks.

The rest of the paper is organized as follows: Section 2 describes the model and the proposed new graph-based two-sample test. Section 3 evaluates the performance of the new test using simulations and its application to real data. The proofs are deferred to Appendix A.

Notations: Let

c_{1}, c_{2}

be two positive constants. For two positive sequences

a_{n}

,

b_{n}

, denote

a_{n} ≍ b_{n}

if

c_{1} \leq \frac{a_{n}}{b_{n}} \leq c_{2}

;

a_{n} = O (b_{n})

if

\frac{a_{n}}{b_{n}} \leq c_{2}

; and

a_{n} = o (b_{n})

if

{lim}_{n \to \infty} \frac{a_{n}}{b_{n}} = 0

. For a sequence of random variables

X_{n}

,

X_{n} = O_{P} (a_{n})

means

\frac{X_{n}}{a_{n}}

is bounded in probability, and

X_{n} = o_{P} (a_{n})

means

\frac{X_{n}}{a_{n}}

converges to zero in probability.

2. Weighted Graph-Based Two-Sample Empirical Likelihood Test

A graph

G

is defined as

G

= (V,

E)

, where V is the set of all vertices (nodes) and E is the set of all edges in the graph. The adjacency matrix A of

G

is defined as follows:

A_{i j} = 1

if there is an edge between vertices i and j, and

A_{i j} = 0

otherwise. We assume the graph is undirected and does not have self-loops, that is,

A_{i j} = A_{j i}

and

A_{i i} = 0

. Then, the adjacency matrix A is symmetric and its diagonal elements are zeroes. If

A_{i j} \sim B e r n (p_{i j})

,

0 \leq p_{i j} \leq 1

, we refer to it as an inhomogeneous random graph. Since

A_{i j} \in {0, 1}

, the random graph A is said to be binary or unweighted.

To incorporate weights, we define the weighted random graph as follows.

Definition 1.

Let

F (z; θ)

be a probability distribution with the parameter θ, and

h : {[0, 1]}^{2} \to R^{d}

be a symmetric function, that is,

h (x, y) = h (y, x)

. Here, d is a positive integer. Let n be the number of nodes and

U = {U_{1}, U_{2}, \dots, U_{n}}

be an independent sample from the uniform distribution

U n i f ([0, 1])

. The random weighted graph

G (F, h)

is defined as follows. Given U, the edges

A_{i j}

(

1 \leq i < j \leq n

) are conditionally independent and follow the distribution

F (h (U_{i}, U_{j}))

, that is,

A_{i j} \sim F (z; h (U_{i}, U_{j})) .

Denote

A \sim G (n, F, h)

.

When

F (z; θ)

is the Bernoulli distribution,

G (F, h)

is the well-known random graphon model for unweighted graphs. Further, if h is a constant between one and zero, it is the Erdős-Rényi random graph. If

F (z; θ)

is not the Bernoulli distribution, the graph A is weighted. Since the distributions of

A_{i j}

depend on U, the edges

A_{i j}

(

1 \leq i < j \leq n

) are not independent. Hence, the random graph

G (F, h)

can model weights and correlations in weighted networks. Figure 1 provides visualizations of two weighted graphs. In Figure 1,

F (z; θ) = 0.7 δ_{{0}} + 0.3 F_{e x p} (z; θ)

, where

δ_{{0}}

is the Dirac measure centered on 0, and

F_{e x p} (z; θ)

is the exponential distribution

F_{e x p} (z; θ) = 1 - e^{- θ z}

. The left weighted graph is constructed based on

F (z; h_{1} (x, y))

with

h_{1} (x, y) = e^{- x y}

, and the right weighted graph is constructed based on

F (z; h_{2} (x, y))

with

h_{2} (x, y) = 2 + e^{- x y}

. The number on an edge represents the weight of the edge. On average, the left graph has larger weights than the right graph.

Let

m_{1}, m_{2}

be two positive integers,

n_{1} = (n_{11}, n_{12}, \dots, n_{1 m_{1}})

and

n_{2} = (n_{21}, n_{22}, \dots, n_{2 m_{2}})

be two sequences of positive integers. Given independent networks

A_{1}, \dots, A_{m_{1}}

with

A_{l} \sim G (n_{1 l}, F, h_{1})

(

1 \leq l \leq m_{1}

) and independent networks

B_{1}

,…,

B_{m_{2}}

with

B_{l} \sim G (n_{2 l}, F, h_{2})

(

1 \leq l \leq m_{2}

), we are interested in testing whether the two samples have the same distribution, that is, we test the following hypotheses

H_{0} : h_{1} = h_{2}, H_{1} : h_{1} \neq h_{2} .

(1)

Under the null hypothesis,

h_{1} = h_{2}

. The two graph samples are drawn from the same distribution. Under the alternative hypothesis,

h_{1} \neq h_{2}

. The distributions of the two samples are different.

A similar hypothesis test problem to that in (1) has been studied in several papers. In [11,13,14], the distribution F is assumed to be the Bernoulli distribution. Ref. [15] consider (1) under the assumption that the edges are independent and

d = 1

. In this work, we investigate (1) in a more general setting, and we propose a new test based on empirical likelihood.

Empirical likelihood (EL) was introduced by Owen [17,18] to construct a confidence region for the mean. It is a nonparametric method that does not require a prespecified distribution for the data. As a counterpart of the parametric likelihood method, it inherits the advantageous properties of the likelihood-based method. The empirical likelihood confidence region respects the shape of the data and usually outperforms the method based on asymptotic normality. Empirical likelihood has also been widely used in hypothesis testing.

Now, we present the empirical likelihood test for (1). For positive integers

k, l

,

k \geq 3

,

l \geq 1

, define the cycles

C_{t}^{(l)} (A) = \frac{1}{(\binom{n_{1 t}}{k})} \sum_{i_{1} < i_{2} < \dots < i_{k}} A_{t, i_{1} i_{2}}^{l} A_{t, i_{2} i_{3}}^{l} \dots A_{t, i_{k} i_{1}}^{l},

C_{s}^{(l)} (B) = \frac{1}{(\binom{n_{2 t}}{k})} \sum_{i_{1} < i_{2} < \dots < i_{k}} B_{s, i_{1} i_{2}}^{l} B_{s, i_{2} i_{3}}^{l} \dots B_{s, i_{k} i_{1}}^{l} .

In the binary graph case,

C_{t}^{(l)} (A)

represents the density of k-cycles in graph

A_{t}

, and

C_{t}^{(l_{1})} (A) = C_{t}^{(l_{2})} (A)

for all

l_{1}, l_{2}

. For weighted graphs,

C_{t}^{(l_{1})} (A) \neq C_{t}^{(l_{2})} (A)

if

l_{1} \neq l_{2}

. Let

X_{t} = (C_{t}^{(1)} (A), C_{t}^{(2)} (A), \dots, C_{t}^{(d)} (A)), (1 \leq t \leq m_{1}),

Y_{s} = (C_{s}^{(1)} (B), C_{s}^{(2)} (B), \dots, C_{s}^{(d)} (B)), (1 \leq s \leq m_{2}) .

That is,

X_{t}

and

Y_{s}

are d-dimensional vectors with the components

C_{t}^{(l)} (A)

and

C_{s}^{(l)} (B)

, respectively. Let

(p_{1}, p_{2} \dots, p_{m_{1}})

and

(q_{1}, q_{2} \dots, q_{m_{2}})

be probability vectors, that is,

\sum_{i = 1}^{m_{1}} p_{i} = 1, \sum_{j = 1}^{m_{2}} q_{j} = 1, p_{i} \geq 0, q_{j} \geq 0 .

Define the empirical likelihood test statistic as

R_{m_{1}, m_{2}} = max {\prod_{i = 1}^{m_{1}} p_{i} \prod_{j = 1}^{m_{2}} q_{j} | \sum_{i = 1}^{m_{1}} p_{i} X_{i} - \sum_{j = 1}^{m_{2}} q_{j} Y_{j} = 0_{d}} .

According to the Lagrange multiplier method, the maximizer is given by

{\hat{p}}_{i} = \frac{1}{m_{1} (1 + λ_{1} (X_{i} - μ))},

{\hat{q}}_{j} = \frac{1}{m_{2} (1 + λ_{2} (Y_{j} - μ))},

where

μ

,

λ_{1}

, and

λ_{2}

are the solutions to the following nonlinear equations:

\frac{1}{m_{1}} \sum_{j = 1}^{m_{1}} \frac{X_{j} - μ}{1 + λ_{1} (X_{j} - μ)} = 0,

(2)

\frac{1}{m_{2}} \sum_{j = 1}^{m_{2}} \frac{Y_{j} - μ}{1 + λ_{2} (Y_{j} - μ)} = 0,

(3)

m_{1} λ_{1} + m_{2} λ_{2} = 0 .

(4)

The test statistic

R_{m_{1}, m_{2}}

is a generalization of the classical two-sample empirical likelihood in [19,20]. The difference is that

X_{1}

,

X_{2}

, …,

X_{m_{1}}

are not identically distributed and

Y_{1}

,

Y_{2}

, …,

Y_{m_{2}}

are not identically distributed as in [19,20]. The limiting distribution of

R_{m_{1}, m_{2}}

under the null hypothesis is given by the following theorem.

Theorem 1.

Suppose

n_{1}, n_{2}

are fixed and the

4 d

-th moment of the distribution F is finite. Assume that

\frac{m_{1}}{m_{1} + m_{2}} \to β_{0} \in (0, 1)

, as

m_{1}, m_{2} \to \infty

. Under the null hypothesis,

H_{0}

,

- 2 log R_{m_{1}, m_{2}}

converges in distribution to

χ_{d}^{2}

as

m_{1}, m_{2} \to \infty

. Here

χ_{d}^{2}

is the chi-square distribution with the degree of freedom d.

According to Theorem 1, we define the empirical likelihood test for (1) as follows.

Reject H_{0} at significance level α if - 2 log R_{m_{1}, m_{2}} > χ_{d, 1 - α}^{2},

where

χ_{d, 1 - α}^{2}

is the

100 (1 - α) %

quantile of the chi-square distribution with the degree of freedom d. The p-value of the empirical likelihood test is defined as follows:

p - value = P (χ_{d}^{2} > - 2 log R_{m_{1}, m_{2}}) .

The assumption on the finiteness of 4d-th moment of the distribution F is not very restrictive. For unweighted networks, F is the Bernoulli distribution, and all the moments F exist. In modeling weighted networks, the weights are usually assumed to follow a distribution in the exponential family. Many distributions in the exponential family satisfy this assumption, for instance, the normal distribution, the exponential distribution, and the Poisson distribution. In many applications, the weights are usually correlations, which are between −1 and 1. It is reasonable to assume its higher moments exist.

3. Simulations

In this section, we run simulations to evaluate our proposed empirical likelihood test. For binary (or unweighted) graphs, refs. [13,14] proposed a two-sample t test based on

X_{t} (1 \leq t \leq m_{1})

and

Y_{s} (1 \leq s \leq m_{2})

. We compare our empirical likelihood test with the t-test in an unweighted network case. For weighted networks, it is not clear whether the t-test still works or not. However, as a comparison, we still run a simulation to evaluate its performance. Note that the t-test statistic is a function of

A_{t, i j}

(1 \leq i < j \leq n_{1 t}, 1 \leq t \leq m_{1})

and

B_{s, i j}

(1 \leq i < j \leq n_{2 s}, 1 \leq s \leq m_{2})

. For weighted networks,

A_{t, i j} \in R

and

B_{s, i j} \in R

. We still plug them into the t-test statistic and adopt the same rejection rule as in the unweighted network case. Then, we evaluate its performance and compare our empirical likelihood test with it.

In the simulations, we set the Type I error

α

to 0.05 and report the empirical size and power by 2000 trials. The empirical size is calculated as follows. Generate an independent sample

A_{1}, \dots, A_{m_{1}}, B_{1}, \dots, B_{m_{2}} \sim G (n, F, h)

, perform the empirical likelihood test, and record whether

H_{0}

is rejected or not. Then, repeat the experiment 2000 times, and the rejection rate is the empirical size. The empirical power is calculated as follows. Generate the independent sample

A_{1}, \dots, A_{m_{1}} \sim G (n, F, h_{1})

and independent sample

B_{1}, \dots, B_{m_{2}} \sim G (n, F, h_{2})

, perform the empirical likelihood test, and record whether

H_{0}

is rejected or not. Then, repeat the experiment 2000 times, and the rejection rate is the empirical power. The empirical size and power of the t-test are similarly calculated.

We take

m_{1} = m_{2} = m \in {20, 30, 40}

,

n_{11} = n_{12} = \dots = n_{1 m_{1}} = n_{1} \in {10, 20, 30}

, and

n_{21} = n_{22} = \dots = n_{2 m_{2}} = n_{1} \in {10, 20, 30}

. We consider the following five situations.

In the first simulation,

d = 1

,

h_{1} (x, y) = 1 - \frac{1}{5 + x y}

, and

h_{2} (x, y) = h_{1} (x, y) + c

with

c = 0; 0.01; 0.02

.

F (z; θ)

is the Bernoulli distribution. The result is shown in Table 1.

In the second simulation,

d = 1

,

h_{1} (x, y) = \frac{e^{x y}}{10}

, and

h_{2} (x, y) = h_{1} (x, y) + c

with

c = 0; 0.02; 0.04

.

F (z; θ)

is the Poisson distribution. The result is shown in Table 2.

In the third simulation, we consider

d = 1

,

h_{1} (x, y) = \frac{e^{x y + x^{2} y^{2}}}{2}

,

h_{2} (x, y) = h_{1} (x, y) + c

,

c = 0; 0.04; 0.06

, etc.

F (z; θ)

is the exponential distribution. The result is shown in Table 3.

In the forth simulation, we study

d = 2

,

h_{1} (x, y) = (x y, e^{x y} + x^{2} y^{2})

, and

h_{2} (x, y) = h_{1} (x, y) + c

. Here,

c = (c_{1}, c_{2})

is given below

c = (0, 0), c = (0.1, 0.1), c = (0.2, 0.2) .

F (z; h_{1} (x, y))

and

F (z; h_{2} (x, y))

are the two-dimensional normal distributions with mean and variance equal to

h_{1} (x, y)

and

h_{2} (x, y)

, respectively. The result is given in Table 4.

In the last simulation, we consider

d = 2

and the Gamma distribution with the parameter

α, β

. Let

h_{1} (x, y) = (f_{1} (x, y), f_{2} (x, y)) = (x^{3} y^{3}, x y)

and

h_{2} (x, y) = h_{1} (x, y) + c

. Here,

c = (c_{1}, c_{2})

is the same as in the normal distribution case.

F (z; h_{1} (x, y))

is the Gamma distribution with the parameter

α, β

, which depend on

f_{1}, f_{2}

as follows:

f_{1} = \frac{α}{β}, f_{2} = \frac{α}{β^{2}} + f_{1}^{2} = \frac{α + α^{2}}{β^{2}} .

The result is given in Table 5.

All the Type I errors (the third columns) are close to 0.05. As m increases, the power of the tests approaches 1, indicating that our empirical results are consistent with Theorem 1. Moreover, our method has higher power and converges to 1 faster than the two-sample t-test.

4. Real Data Analysis

In this section, we apply our proposed method to a real dataset from a proof-of-concept study [21]. The dataset comprises 72 samples, with 47 patients diagnosed with acute myeloid leukemia (AML) and 25 with acute lymphoblastic leukemia (ALL). The dataset contains 7129 features representing gene expression levels measured via DNA microarray analysis. We randomly choose 10, 20, and 50 features and also include all the 7129 features to calculate the correlation matrices. We take the absolute value of the correlation matrix elements and set the diagonal elements to 0, thereby mimicking the adjacency matrix of a weighted graph. We apply the proposed empirical likelihood test and the t-test to the networks and report the p-values. The results are shown in Table 6. Note that the two groups of networks are constructed based on two different types of diseases. The tests should reject the null hypothesis that they come from the same graph distribution. When

n \geq 20

, the p-values of the empirical likelihood test are smaller than 0.05. Hence, we reject the null hypothesis. For

n = 20, 30

, the p-values of the t-test are larger than 0.05. We fail to reject the null hypothesis. This indicates that the proposed empirical likelihood test is more powerful than the t-test.

5. Conclusions and Discussion

In this paper, we study the weighted graph two-sample test problem and propose an empirical likelihood test. A simulation is employed to evaluate the performance of the proposed empirical likelihood test. As a comparison, we also run a simulation for the graph-based two-sample t-test, which is developed in an unweighted network case. The simulation study shows that the t-test still works for weighted networks, and the proposed empirical likelihood test has higher power than the t-test.

One limitation of the current work is that the proposed empirical likelihood test only works for large sample sizes

m_{1}, m_{2}

. In Theorem 1, we require

m_{1}, m_{2} \to \infty

. In practice, it is possible that the sample sizes

m_{1}, m_{2}

are very small, for instance,

m_{1} = m_{2} = 1

[22]. In this case, the proposed empirical likelihood test does not work. Determining how to develop a powerful test that is valid for small

m_{1}

and

m_{2}

is an important future topic. Another important future topic is to study the weighted graph-based two-sample test problem from a minimax perspective. In an unweighted case, this problem has been studied in [10].

Author Contributions

Conceptualization, X.Z. and M.Y.; methodology, M.Y.; formal analysis, X.Z.; data curation, X.Z.; writing—review and editing, X.Z.; supervision, M.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Proof of Empirical Likelihood.

Firstly, we consider the case

d = 1

. In this case,

X_{i} = \frac{1}{(\binom{n_{i}}{k})} \sum_{1 \leq i_{1} < i_{2} < \dots < i_{k} \leq n_{i}} A_{i},_{i_{1} i_{2}} A_{i},_{i_{2} i_{3}} \dots A_{i},_{i_{k} i_{1}},

Y_{i} = \frac{1}{(\binom{l_{i}}{k})} \sum_{1 \leq i_{1} < i_{2} < \dots < i_{k} \leq l_{i}} B_{i},_{i_{1} i_{2}} B_{i},_{i_{2} i_{3}} \dots B_{i},_{i_{k} i_{1}} .

Let

{\bar{X}}_{m_{1}} = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} X_{i}, {\bar{Y}}_{m_{2}} = \frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} Y_{i}

,

σ_{i}^{2} = V a r (X_{i}), τ_{i}^{2} = V a r (Y_{i})

σ^{2} = \frac{\sum_{i = 1}^{m_{1}} σ_{i}^{2}}{m_{1}}, τ^{2} = \frac{\sum_{i = 1}^{m_{2}} τ_{i}^{2}}{m_{2}}

S_{n}^{2} = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - {\bar{X}}_{m_{1}})}^{2}

τ_{n}^{2} = \frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} {(Y_{i} - {\bar{Y}}_{n_{2}})}^{2}

R_{n}^{2} = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - \hat{μ})}^{2}

T_{n}^{2} = \frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} {(Y_{i} - \hat{μ})}^{2}

\hat{μ} = \hat{η} {\bar{X}}_{m_{1}} + (1 - \hat{η}) {\bar{Y}}_{m_{2}}

\hat{η} = \frac{\frac{m_{1}}{{S_{n}}^{2}}}{\frac{m_{1}}{{S_{n}}^{2}} + \frac{m_{2}}{{τ_{n}}^{2}}}

Let

ρ = \frac{m_{1}}{m_{1} + m_{2}}

and

\hat{η} = {[ρ T + (1 - ρ) S]}^{- 1} ρ T .

{\bar{X}}_{n} - η {\bar{X}}_{n} - (I - η) {\bar{Y}}_{n} = (I - η) ({\bar{X}}_{n} - {\bar{Y}}_{n})

{\bar{Y}}_{n} - η {\bar{X}}_{n} - (I - η) {\bar{Y}}_{n} = η ({\bar{X}}_{n} - {\bar{Y}}_{n})

Let

\begin{matrix} μ_{0} & = E [A_{1},_{12} A_{1, 23} A_{1, 34} \dots A_{1, k 1}] \\ = \int f (μ_{1}, μ_{2}) f (μ_{2}, μ_{3}) \dots f (μ_{k}, μ_{1}) d_{μ_{1}} d_{μ_{2}} \dots d_{μ_{k}}, \end{matrix}

□

Lemma A1.

Suppose

\frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} σ_{i}^{2} = σ^{2} + o (1)

and

\frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} τ_{i}^{2} = τ^{2} + o (1)

for two positive constants, σ and τ. Then,

S_{n}^{2} = σ^{2} + o_{p} (1)

and

τ_{n}^{2} = τ^{2} + o_{p} (1)

.

0 < c_{1} \leq n_{i} \leq c_{2} < + \infty, 0 < c_{1} \leq l_{i} \leq c_{2} < + \infty

.

Proof.

Note that

\begin{matrix} S_{n}^{2} & = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - μ_{0})}^{2} + {(μ_{0} - {\bar{X}}_{m_{1}})}^{2} + 2 ({\bar{X}}_{m_{1}} - μ_{0}) (μ_{0} - {\bar{X}}_{m_{1}}) \\ = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - μ_{0})}^{2} - {({\bar{X}}_{m_{1}} - μ_{0})}^{2} \\ = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} σ_{i}^{2} + O_{P} (\frac{1}{\sqrt{m_{1}}}) \\ = σ^{2} + O_{P} (\frac{1}{\sqrt{m_{1}}}) . \end{matrix}

Hence,

S_{n}^{2} = σ^{2} + o_{p} (1)

. Similarly, one can obtain

τ_{n}^{2} = τ^{2} + o_{p} (1)

. □

Lemma A2.

Suppose

\frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} σ_{i}^{2} = σ^{2} + o (1)

and

\frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} τ_{i}^{2} = τ^{2} + o (1)

for two positive constants, σ and τ, and the 4-th moment of distribution F is finite. Then, as

m_{1}, m_{2} \to \infty

,

\sqrt{m_{1}} ({\bar{X}}_{m_{1}} - μ_{0}) \overset{d}{\to} N (0, σ^{2})

and

\sqrt{m_{2}} ({\bar{Y}}_{m_{2}} - μ_{0}) \overset{d}{\to} N (0, τ^{2})

.

Proof.

We shall use the Lindeberg central limit theorem to prove the results. Note that

X_{i}

is independent and

\sqrt{m_{1}} ({\bar{X}}_{m_{1}} - μ_{0}) = \sum_{i = 1}^{m_{1}} \frac{X_{i} - μ_{0}}{\sqrt{m_{1}}} .

Then,

\sum_{i = 1}^{m_{1}} V a r (\frac{X_{i} - μ_{0}}{\sqrt{m_{1}}}) = \sum_{i = 1}^{m_{1}} \frac{σ_{i}^{2}}{m_{1}} = σ^{2} + o (1) .

Next, we verify Lindeberg’s condition. Let

ε > 0

be any fixed constant. According to the Cauchy–Schwarz inequality and Markov’s inequality, we have

\begin{matrix} \frac{1}{σ^{2}} \sum_{i = 1}^{m_{1}} E [{(\frac{X_{i} - μ_{0}}{\sqrt{m_{1}}})}^{2} I [| \frac{X_{i} - μ_{0}}{\sqrt{m_{1}}} | > ϵ σ]] & \leq \frac{1}{σ^{2}} \sum_{i = 1}^{m_{1}} E {(\frac{X_{i} - μ_{0}}{\sqrt{m_{1}}})}^{4} \frac{1}{ε^{2} σ^{2}} \\ = \frac{1}{ε^{2} σ^{4}} \frac{\sum_{i = 1}^{m_{1}} E {(X_{i} - μ)}^{4}}{m_{1}^{2}} \\ = o (1), \end{matrix}

where we used the fact that the 4-th moment of distribution F exists. Hence,

\sqrt{m_{1}} ({\bar{X}}_{m_{1}} - μ_{0}) \overset{d}{\to} N (0, σ^{2})

. Similarly, one has

\sqrt{m_{2}} ({\bar{Y}}_{m_{2}} - μ_{0}) \overset{d}{\to} N (0, τ^{2})

. □

Lemma A3.

Suppose

\frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} σ_{i}^{2} = σ^{2} + o (1)

and

\frac{1}{m_{2}} \sum_{i = 1}^{m_{2}} τ_{i}^{2} = τ^{2} + o (1)

for two positive constants, σ and τ. Then,

R_{n}^{2} = σ^{2} + o_{p} (1)

and

T_{n}^{2} = τ^{2} + o_{p} (1)

.

Proof.

According to Lemma A2, we have

{\bar{X}}_{m_{1}} - μ_{0} = O_{P} (\frac{1}{\sqrt{m_{1}}})

. Then,

\begin{matrix} R_{n}^{2} & = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - \hat{μ})}^{2} \\ = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - μ_{0} + μ_{0} - \hat{μ})}^{2} \\ = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} [{(X_{i} - μ_{0})}^{2} + {(μ_{0} - \hat{μ})}^{2} + 2 (X_{i} - μ_{0}) (μ_{0} - \hat{μ})] \\ = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} {(X_{i} - μ_{0})}^{2} + {(μ_{0} - \hat{μ})}^{2} + 2 ({\bar{X}}_{m_{1}} - μ_{0}) (μ_{0} - \hat{μ}) \\ = σ^{2} + O_{P} (\frac{1}{\sqrt{m_{1}}}) + {(\hat{μ} - μ_{0})}^{2} . \end{matrix}

Note that

\begin{matrix} \hat{μ} - μ_{0} & = \hat{η} ({\bar{X}}_{m_{i}} - μ_{0}) + (1 - \hat{η}) ({\bar{Y}}_{m_{2}} - μ_{0}) \\ = O_{P} (\frac{1}{\sqrt{m_{1}}}) + O_{P} (\frac{1}{\sqrt{m_{2}}}) . \end{matrix}

Hence,

R_{n}^{2} = σ^{2} + o_{p} (1)

. Similarly, we have

T_{n}^{2} = τ^{2} + o_{p} (1)

. □

Proof of Theorem 1.

Note that

0 = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} \frac{X_{i} - \hat{μ}}{1 + λ_{1} (X_{i} - \hat{μ})} = \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} (X_{i} - \hat{μ}) - \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} \frac{λ_{1} {(X_{i} - \hat{μ})}^{2}}{1 + λ_{1} (X_{i} - \hat{μ})},

from which it follows that

{\bar{X}}_{m_{1}} - \hat{μ} = λ_{1} \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} \frac{{(X_{i} - \hat{μ})}^{2}}{1 + λ_{1} (X_{i} - \hat{μ})} .

(A1)

Taking absolute value on both sides of (A1) yields

|{\bar{X}}_{m_{1}} - \hat{μ}| \geq |λ_{1}| \frac{R_{n}^{2}}{1 + |λ_{1}|} = R_{n}^{2} \frac{|λ_{1}|}{1 + |λ_{1}|} .

According to Lemmas A2 and A3, we have

{\bar{X}}_{m_{1}} - μ_{0} = O_{P} (\frac{1}{\sqrt{m_{1}}})

and

R_{n}^{2} = σ^{2} + o_{p} (1)

, respectively. Then,

|λ_{1}| = O_{p} (\frac{1}{\sqrt{m_{1}}})

. Similarly,

|λ_{2}| = O_{p} (\frac{1}{\sqrt{m_{2}}})

.

Next, we find the leading term of

λ_{1}, λ_{2}

, respectively. Note that

\begin{matrix} 0 & = & \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} \frac{X_{i} - \hat{μ}}{1 + λ_{1} (X_{i} - \hat{μ})} \\ = & \frac{1}{m_{1}} \sum_{i = 1}^{m_{1}} (X_{i} - \hat{μ}) - λ_{1} \frac{\sum_{i = 1}^{m_{1}} {(X_{i} - \hat{μ})}^{2}}{m_{1}} + \frac{λ_{1}^{2}}{m_{1}} \sum_{i = 1}^{m_{1}} \frac{{(X_{i} - \hat{μ})}^{2}}{1 + λ_{1} (X_{i} - \hat{μ})} . \end{matrix}

(A2)

Then,

{\bar{X}}_{m_{1}} - \hat{μ} = λ_{1} R_{n}^{2} + O_{p} (\frac{1}{m_{1}})

. Hence,

λ_{1} = \frac{{\bar{X}}_{m_{1}} - \hat{μ}}{R_{n}^{2}} + O_{p} (\frac{1}{\sqrt{m_{1}}})

. Similarly,

λ_{2} = \frac{{\bar{Y}}_{m_{2}} - \hat{μ}}{T_{n}^{2}} + O_{p} (\frac{1}{\sqrt{m_{2}}})

.

\begin{matrix} - 2 log R & = & - 2 \{- \sum_{i = 1}^{m_{1}} log [1 + λ_{1} (X_{i} - \hat{μ})] - \sum_{i = 1}^{m_{2}} log [1 + λ_{2} (Y_{i} - \hat{μ})]\} \\ = & 2 [λ_{1} \sum_{i = 1}^{m_{1}} (X_{i} - \hat{μ}) - \frac{1}{2} λ_{1}^{2} \sum_{i = 1}^{m_{1}} {(X_{i} - \hat{μ})}^{2} + O ({|λ_{1}|}^{3}) \\ + λ_{2} \sum_{i = 1}^{m_{2}} (Y_{i} - \hat{μ}) - \frac{1}{2} λ_{2}^{2} \sum_{i = 1}^{m_{2}} {(Y_{i} - \hat{μ})}^{2} + O ({|λ_{2}|}^{3})] \\ = & 2 [\frac{m_{1} {({\bar{X}}_{m_{1}} - \hat{μ})}^{2}}{R_{n}^{2}} - \frac{m_{1} {({\bar{X}}_{m_{1}} - \hat{μ})}^{2} R_{n}^{2}}{2 R_{n}^{4}} + O (\frac{1}{m_{1} \sqrt{m_{1}}}) \\ + \frac{m_{2} {({\bar{Y}}_{m_{2}} - \hat{μ})}^{2}}{T_{n}^{2}} - \frac{m_{2} {({\bar{Y}}_{m_{2}} - \hat{μ})}^{2} T_{n}^{2}}{2 T_{n}^{4}} + O (\frac{1}{m_{2} \sqrt{m_{2}}})] \\ = & \frac{m_{1} {({\bar{X}}_{m_{1}} - \hat{μ})}^{2}}{R_{n}^{2}} + \frac{m_{2} {({\bar{Y}}_{m_{2}} - \hat{μ})}^{2}}{T_{n}^{2}} + O (\frac{1}{m_{1} \sqrt{m_{1}}} + \frac{1}{m_{2} \sqrt{m_{2}}}) \\ = & \frac{m_{1} {(1 - \hat{η})}^{2} {({\bar{X}}_{m_{1}} - {\bar{Y}}_{m_{2}})}^{2}}{R_{n}^{2}} + \frac{m_{2} {\hat{η}}^{2} {({\bar{X}}_{m_{1}} - {\bar{Y}}_{m_{2}})}^{2}}{T_{n}^{2}} + O (\frac{1}{m_{1} \sqrt{m_{1}}} + \frac{1}{m_{2} \sqrt{m_{2}}}) \\ = & (\frac{m_{1} {(1 - \hat{η})}^{2}}{R_{n}^{2}} + \frac{m_{2} {\hat{η}}^{2}}{T_{n}^{2}}) {({\bar{X}}_{m_{1}} - {\bar{Y}}_{m_{2}})}^{2} + o_{p} (1) . \end{matrix}

(A3)

Let

η = \frac{\frac{m_{1}}{σ^{2}}}{\frac{m_{1}}{σ^{2}} + \frac{m_{2}}{τ^{2}}}

. Then, it is easy to obtain

\hat{η} = η + o_{p} (1)

. Hence,

\begin{matrix} \frac{m_{1} {(1 - \hat{η})}^{2}}{R_{n}^{2}} + \frac{m_{2} {\hat{η}}^{2}}{T_{n}^{2}} & = \frac{m_{1}}{σ^{2}} {(\frac{\frac{m_{2}}{τ^{2}}}{\frac{m_{1}}{σ^{2}} + \frac{m_{2}}{τ^{2}}})}^{2} + \frac{m_{2}}{τ^{2}} {(\frac{\frac{m_{1}}{σ^{2}}}{\frac{m_{1}}{σ^{2}} + \frac{m_{2}}{τ^{2}}})}^{2} + o_{p} (1) \\ = \frac{\frac{m_{1}}{σ^{2}} \frac{m_{2}}{τ^{2}} (\frac{m_{2}}{τ^{2}} + \frac{m_{1}}{σ^{2}})}{{(\frac{m_{1}}{σ^{2}} + \frac{m_{2}}{τ^{2}})}^{2}} \\ = \frac{\frac{m_{1}}{σ^{2}} \frac{m_{2}}{τ^{2}}}{\frac{m_{1}}{σ^{2}} + \frac{m_{2}}{τ^{2}}} \\ = \frac{1}{\frac{τ^{2}}{m_{2}} + \frac{σ^{2}}{m_{1}}} + o_{p} (1) . \end{matrix}

Note that

X_{i}

and

Y_{i}

are independent. Then, according to Lemma A2, we have

\frac{{\bar{X}}_{m_{1}} - {\bar{Y}}_{m_{2}}}{\sqrt{\frac{σ^{2}}{m_{1}} + \frac{τ^{2}}{m_{2}}}} = \frac{\sqrt{\frac{σ^{2}}{m_{1}}}}{\sqrt{\frac{σ^{2}}{m_{1}} + \frac{τ^{2}}{m_{2}}}} \frac{({\bar{X}}_{m_{1}} - μ_{0})}{\sqrt{\frac{σ^{2}}{m_{1}}}} + \frac{\sqrt{\frac{τ^{2}}{m_{2}}}}{\sqrt{\frac{σ^{2}}{m_{1}} + \frac{τ^{2}}{m_{2}}}} \frac{({\bar{Y}}_{m_{2}} - μ_{0})}{\sqrt{\frac{τ^{2}}{m_{2}}}} \overset{d}{\to} N (0, 1) .

(A4)

Then the desired result for

d = 1

follows from (A3) and (A4).

The proof for

d \geq 2

is similar to the proof of

d = 1

. In this case,

\hat{η} = {(ρ τ_{n} + (1 - ρ) S^{2})}^{- 1} ρ τ_{n}^{2}, \hat{μ} = \hat{η} {\bar{X}}_{m_{1}} + (I - \hat{η}) {\bar{Y}}_{m_{2}} .

The rest of the proof is almost the same. We omit it. □

References

Callegaro, A.; Spiessens, B. Testing treatment effect in randomized clinical trials with possible nonproportional hazards. Stat. Biopharm. Res. 2017, 9, 204–211. [Google Scholar] [CrossRef]
Tusher, V.G.; Tibshirani, R.; Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 2001, 98, 5116–5121. [Google Scholar] [CrossRef] [PubMed]
Montgomery, D.C. A modern framework for achieving enterprise excellence. Int. J. Lean Six Sigma 2010, 1, 56–65. [Google Scholar] [CrossRef]
Blau, F.D.; Kahn, L.M. The gender wage gap: Extent, trends, and explanations. J. Econ. Lit. 2017, 55, 789–865. [Google Scholar]
Gudmundarson, R.; Peters, G. Assessing portfolio diversification via two-sample graph kernel inference. A case study on the influence of ESG screening. PLoS ONE 2024, 19, e0301804. [Google Scholar]
Arroyo, J.; Kessler, D.; Levina, E.; Taylor, S. Network classification with applications to brain connectomics. Ann. Appl. Stat. 2017, 13, 1648–1677. [Google Scholar] [CrossRef]
Stam, C.J.; Jones, B.; Nolte, G.; Breakspear, M.; Scheltens, P. Small-world networks and functional connectivity in Alzheimer’s disease. Cereb. Cortex 2007, 17, 92–99. [Google Scholar] [CrossRef]
Ghoshdastidar, D.; Gutzeit, M.; Carpentier, A.; von Luxburg, U. Two-sample tests for large random graphs using network statistics. In Proceedings of the Conference on Learning Theory, PMLR, Amsterdam, The Netherlands, 7–10 July 2017; pp. 954–977. [Google Scholar]
Ghoshdastidar, D.; Von Luxburg, U. Practical methods for graph two-sample testing. Adv. Neural Inf. Process. Syst. 2018, 31, 1568. [Google Scholar]
Ghoshdastidar, D.; Gutzeit, M.; Carpentier, A.; Von Luxburg, U. Two-sample hypothesis testing for inhomogeneous random graphs. Ann. Stat. 2020, 48, 2208–2229. [Google Scholar] [CrossRef]
Tang, M.; Athreya, A.; Sussman, D.L.; Lyzinski, V.; Priebe, C.E. A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli 2017, 23, 1599–1630. [Google Scholar] [CrossRef]
Tang, M.; Athreya, A.; Sussman, D.L.; Lyzinski, V.; Park, Y.; Priebe, C.E. A semiparametric two-sample hypothesis testing problem for random graphs. J. Comput. Graph. Stat. 2017, 26, 344–354. [Google Scholar] [CrossRef]
Maugis, P.A.; Olhede, S.; Priebe, C.; Wolfe, P. Testing for equivalence of network distribution using subgraph counts. J. Comput. Graph. Stat. 2020, 29, 455–465. [Google Scholar] [CrossRef]
Maugis, P. Central limit theorems for local network statistics. arXiv 2020, arXiv:2006.15738. [Google Scholar] [CrossRef]
Yuan, M.; Wen, Q. A practical two-sample test for weighted random graphs. J. Appl. Stat. 2021, 50, 495–511. [Google Scholar] [CrossRef] [PubMed]
Simpson, S.; Bowman, F.; Laurienti, P. Analyzing complex functional brain networks: Fusing statistics and network science to understand the brain. Stat. Surv. 2013, 7, 1–36. [Google Scholar] [CrossRef] [PubMed]
Owen, A.B. Empirical Likelihood; Chapman and Hall/CRC: Boca Raton, FL, USA, 2001. [Google Scholar]
Owen, A.B. Empirical likelihood confidence region. Ann. Stat. 1990, 18, 90–120. [Google Scholar] [CrossRef]
Liu, Y.; Zou, C.; Zhang, R. Empirical likelihood for the two-sample mean problem. Stat. Probab. Lett. 2008, 78, 548–556. [Google Scholar] [CrossRef]
Wu, C.; Yan, Y. Empirical Likelihood Inference for Two-Sample Problems. Stat. Its Interface 2012, 5, 345–354. [Google Scholar] [CrossRef]
Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A.; et al. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 1999, 286, 531–537. [Google Scholar] [CrossRef] [PubMed]
Ghoshdastidar, D.; Luxburg, V.U. Two-sample hypothesis testing for inhomogeneous random graphs. In Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018; Volume 48, pp. 3019–3028. [Google Scholar]

Figure 1. Weighted graphs with

F (z; θ) = 0.3 δ_{{1}} + 0.7 F_{e x p} (z; θ)

. The left weighted graph corresponds to

F (z; h (x, y))

with

h (x, y) = e^{- x y}

, and the right weighted graph corresponds to

F (z; h (x, y))

with

h (x, y) = 2 + e^{- x y}

.

Figure 1. Weighted graphs with

F (z; θ) = 0.3 δ_{{1}} + 0.7 F_{e x p} (z; θ)

. The left weighted graph corresponds to

F (z; h (x, y))

with

h (x, y) = e^{- x y}

, and the right weighted graph corresponds to

F (z; h (x, y))

with

h (x, y) = 2 + e^{- x y}

.

Table 1. Simulated size and power with graphs generated from Bernoulli distribution.

$n_{1}, n_{2}$ $(m = 20)$	Method	$c = 0$ (Size)	$c = 0.01$ (Power)	$c = 0.02$ (Power)
$n_{1} = 10, n_{2} = 10$		0.054	0.092	0.196
$n_{1} = 10, n_{2} = 20$		0.051	0.117	0.284
$n_{1} = 10, n_{2} = 30$	t test	0.052	0.125	0.323
$n_{1} = 20, n_{2} = 20$		0.055	0.194	0.601
$n_{1} = 20, n_{2} = 30$		0.051	0.241	0.745
$n_{1} = 30, n_{2} = 30$		0.054	0.387	0.917
$n_{1} = 10, n_{2} = 10$		0.048	0.106	0.213
$n_{1} = 10, n_{2} = 20$		0.050	0.130	0.300
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.136	0.337
$n_{1} = 20, n_{2} = 20$		0.049	0.220	0.634
$n_{1} = 20, n_{2} = 30$		0.053	0.269	0.768
$n_{1} = 30, n_{2} = 30$		0.049	0.419	0.926
$n_{1}, n_{2}$ $(m = 30)$	Method	$c = 0$ (Size)	$c = 0.01$ (Power)	$c = 0.02$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.096	0.269
$n_{1} = 10, n_{2} = 20$		0.051	0.134	0.393
$n_{1} = 10, n_{2} = 30$	t test	0.050	0.160	0.432
$n_{1} = 20, n_{2} = 20$		0.055	0.264	0.774
$n_{1} = 20, n_{2} = 30$		0.049	0.361	0.891
$n_{1} = 30, n_{2} = 30$		0.051	0.528	0.982
$n_{1} = 10, n_{2} = 10$		0.052	0.112	0.283
$n_{1} = 10, n_{2} = 20$		0.053	0.143	0.211
$n_{1} = 10, n_{2} = 30$	EL test	0.052	0.164	0.438
$n_{1} = 20, n_{2} = 20$		0.053	0.279	0.787
$n_{1} = 20, n_{2} = 30$		0.051	0.375	0.899
$n_{1} = 30, n_{2} = 30$		0.050	0.548	0.988
$n_{1}, n_{2}$ $(m = 40)$	Method	$c = 0$ (Size)	$c = 0.01$ (Power)	$c = 0.02$ (Power)
$n_{1} = 10, n_{2} = 10$		0.049	0.119	0.329
$n_{1} = 10, n_{2} = 20$		0.051	0.169	0.495
$n_{1} = 10, n_{2} = 30$	t test	0.052	0.180	0.540
$n_{1} = 20, n_{2} = 20$		0.055	0.350	0.886
$n_{1} = 20, n_{2} = 30$		0.050	0.463	0.953
$n_{1} = 30, n_{2} = 30$		0.052	0.657	0.996
$n_{1} = 10, n_{2} = 10$		0.053	0.122	0.338
$n_{1} = 10, n_{2} = 20$		0.052	0.176	0.503
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.183	0.550
$n_{1} = 20, n_{2} = 20$		0.052	0.360	0.890
$n_{1} = 20, n_{2} = 30$		0.053	0.471	0.958
$n_{1} = 30, n_{2} = 30$		0.055	0.670	0.998

Table 2. Simulated size and power with graphs generated from Poisson distribution.

$n_{1}, n_{2}$ $(m = 20)$	Method	$c = 0$ (Size)	$c = 0.02$ (Power)	$c = 0.04$ (Power)
$n_{1} = 10, n_{2} = 10$		0.048	0.107	0.169
$n_{1} = 10, n_{2} = 20$		0.050	0.208	0.418
$n_{1} = 10, n_{2} = 30$	t test	0.052	0.234	0.464
$n_{1} = 20, n_{2} = 20$		0.053	0.292	0.774
$n_{1} = 20, n_{2} = 30$		0.051	0.440	0.931
$n_{1} = 30, n_{2} = 30$		0.052	0.633	0.982
$n_{1} = 10, n_{2} = 10$		0.051	0.141	0.280
$n_{1} = 10, n_{2} = 20$		0.052	0.252	0.478
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.289	0.540
$n_{1} = 20, n_{2} = 20$		0.056	0.333	0.811
$n_{1} = 20, n_{2} = 30$		0.053	0.466	0.943
$n_{1} = 30, n_{2} = 30$		0.055	0.670	0.995
$n_{1}, n_{2}$ $(m = 30)$	Method	$c = 0$ (Size)	$c = 0.02$ (Power)	$c = 0.04$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.126	0.243
$n_{1} = 10, n_{2} = 20$		0.051	0.235	0.498
$n_{1} = 10, n_{2} = 30$	t test	0.050	0.261	0.549
$n_{1} = 20, n_{2} = 20$		0.055	0.402	0.912
$n_{1} = 20, n_{2} = 30$		0.049	0.577	0.974
$n_{1} = 30, n_{2} = 30$		0.051	0.790	0.985
$n_{1} = 10, n_{2} = 10$		0.052	0.157	0.352
$n_{1} = 10, n_{2} = 20$		0.053	0.285	0.561
$n_{1} = 10, n_{2} = 30$	EL test	0.052	0.333	0.650
$n_{1} = 20, n_{2} = 20$		0.053	0.441	0.916
$n_{1} = 20, n_{2} = 30$		0.051	0.597	0.985
$n_{1} = 30, n_{2} = 30$		0.050	0.807	0.992
$n_{1}, n_{2}$ $(m = 40)$	Method	$c = 0$ (Size)	$c = 0.02$ (Power)	$c = 0.04$ (Power)
$n_{1} = 10, n_{2} = 10$		0.049	0.113	0.325
$n_{1} = 10, n_{2} = 20$		0.051	0.234	0.616
$n_{1} = 10, n_{2} = 30$	t test	0.052	0.272	0.621
$n_{1} = 20, n_{2} = 20$		0.055	0.496	0.972
$n_{1} = 20, n_{2} = 30$		0.050	0.658	0.980
$n_{1} = 30, n_{2} = 30$		0.052	0.891	0.996
$n_{1} = 10, n_{2} = 10$		0.053	0.140	0.404
$n_{1} = 10, n_{2} = 20$		0.052	0.310	0.694
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.352	0.713
$n_{1} = 20, n_{2} = 20$		0.052	0.524	0.981
$n_{1} = 20, n_{2} = 30$		0.053	0.689	0.997
$n_{1} = 30, n_{2} = 30$		0.055	0.900	1.000

Table 3. Simulated size and power with graphs generated from exponential distribution.

$n_{1}, n_{2}$ $(m = 20)$	Method	$c = 0$ (Size)	$c = 0.04$ (Power)	$c = 0.06$ (Power)
$n_{1} = 10, n_{2} = 10$		0.051	0.165	0.288
$n_{1} = 10, n_{2} = 20$		0.053	0.170	0.322
$n_{1} = 10, n_{2} = 30$	t test	0.054	0.174	0.347
$n_{1} = 20, n_{2} = 20$		0.050	0.411	0.759
$n_{1} = 20, n_{2} = 30$		0.052	0.512	0.827
$n_{1} = 30, n_{2} = 30$		0.055	0.671	0.931
$n_{1} = 10, n_{2} = 10$		0.050	0.219	0.346
$n_{1} = 10, n_{2} = 20$		0.053	0.262	0.467
$n_{1} = 10, n_{2} = 30$	EL test	0.055	0.275	0.515
$n_{1} = 20, n_{2} = 20$		0.052	0.458	0.788
$n_{1} = 20, n_{2} = 30$		0.051	0.565	0.871
$n_{1} = 30, n_{2} = 30$		0.054	0.699	0.952
$n_{1}, n_{2}$ $(m = 30)$	Method	$c = 0$ (Size)	$c = 0.04$ (Power)	$c = 0.06$ (Power)
$n_{1} = 10, n_{2} = 10$		0.049	0.206	0.410
$n_{1} = 10, n_{2} = 20$		0.052	0.258	0.514
$n_{1} = 10, n_{2} = 30$	t test	0.053	0.280	0.551
$n_{1} = 20, n_{2} = 20$		0.048	0.590	0.891
$n_{1} = 20, n_{2} = 30$		0.053	0.674	0.953
$n_{1} = 30, n_{2} = 30$		0.052	0.848	0.982
$n_{1} = 10, n_{2} = 10$		0.053	0.239	0.457
$n_{1} = 10, n_{2} = 20$		0.051	0.351	0.621
$n_{1} = 10, n_{2} = 30$	EL test	0.054	0.397	0.679
$n_{1} = 20, n_{2} = 20$		0.055	0.622	0.904
$n_{1} = 20, n_{2} = 30$		0.049	0.727	0.968
$n_{1} = 30, n_{2} = 30$		0.050	0.863	0.995
$n_{1}, n_{2}$ $(m = 40)$	Method	$c = 0$ (Size)	$c = 0.04$ (Power)	$c = 0.06$ (Power)
$n_{1} = 10, n_{2} = 10$		0.053	0.253	0.522
$n_{1} = 10, n_{2} = 20$		0.052	0.334	0.671
$n_{1} = 10, n_{2} = 30$	t test	0.055	0.357	0.694
$n_{1} = 20, n_{2} = 20$		0.051	0.711	0.958
$n_{1} = 20, n_{2} = 30$		0.053	0.816	0.983
$n_{1} = 30, n_{2} = 30$		0.056	0.933	1.000
$n_{1} = 10, n_{2} = 10$		0.055	0.281	0.553
$n_{1} = 10, n_{2} = 20$		0.049	0.423	0.756
$n_{1} = 10, n_{2} = 30$	EL test	0.052	0.461	0.797
$n_{1} = 20, n_{2} = 20$		0.054	0.732	0.969
$n_{1} = 20, n_{2} = 30$		0.055	0.845	0.995
$n_{1} = 30, n_{2} = 30$		0.051	0.948	1.000

Table 4. Simulated size and power with graphs generated from multivariate normal distribution.

$n_{1}, n_{2}$ $(m = 20)$	Method	$c = (0, 0)$ (Size)	$c = (0.1, 0.1)$ (Power)	$c = (0.2, 0.2)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.126	0.273
$n_{1} = 10, n_{2} = 20$		0.051	0.161	0.315
$n_{1} = 10, n_{2} = 30$	t test	0.054	0.232	0.409
$n_{1} = 20, n_{2} = 20$		0.056	0.245	0.414
$n_{1} = 20, n_{2} = 30$		0.049	0.252	0.468
$n_{1} = 30, n_{2} = 30$		0.051	0.263	0.531
$n_{1} = 10, n_{2} = 10$		0.052	0.175	0.344
$n_{1} = 10, n_{2} = 20$		0.054	0.183	0.358
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.243	0.414
$n_{1} = 20, n_{2} = 20$		0.052	0.258	0.468
$n_{1} = 20, n_{2} = 30$		0.053	0.263	0.521
$n_{1} = 30, n_{2} = 30$		0.052	0.289	0.588
$n_{1}, n_{2}$ $(m = 30)$	Method	$c = (0, 0)$ (Size)	$c = (0.1, 0.1)$ (Power)	$c = (0.2, 0.2)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.404	0.830
$n_{1} = 10, n_{2} = 20$		0.054	0.512	0.881
$n_{1} = 10, n_{2} = 30$	t test	0.049	0.616	0.926
$n_{1} = 20, n_{2} = 20$		0.058	0.619	0.959
$n_{1} = 20, n_{2} = 30$		0.052	0.731	0.974
$n_{1} = 30, n_{2} = 30$		0.052	0.796	0.983
$n_{1} = 10, n_{2} = 10$		0.052	0.466	0.857
$n_{1} = 10, n_{2} = 20$		0.055	0.527	0.898
$n_{1} = 10, n_{2} = 30$	EL test	0.053	0.635	0.933
$n_{1} = 20, n_{2} = 20$		0.050	0.662	0.972
$n_{1} = 20, n_{2} = 30$		0.051	0.775	0.982
$n_{1} = 30, n_{2} = 30$		0.053	0.812	0.992
$n_{1}, n_{2}$ $(m = 40)$	Method	$c = (0, 0)$ (Size)	$c = (0.1, 0.1)$ (Power)	$c = (0.2, 0.2)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.732	0.981
$n_{1} = 10, n_{2} = 20$		0.049	0.841	0.991
$n_{1} = 10, n_{2} = 30$	t test	0.052	0.872	0.992
$n_{1} = 20, n_{2} = 20$		0.050	0.942	0.993
$n_{1} = 20, n_{2} = 30$		0.049	0.951	1.000
$n_{1} = 30, n_{2} = 30$		0.055	0.985	1.000
$n_{1} = 10, n_{2} = 10$		0.051	0.777	0.992
$n_{1} = 10, n_{2} = 20$		0.052	0.850	0.996
$n_{1} = 10, n_{2} = 30$	EL test	0.050	0.882	0.997
$n_{1} = 20, n_{2} = 20$		0.051	0.947	1.000
$n_{1} = 20, n_{2} = 30$		0.052	0.958	1.000
$n_{1} = 30, n_{2} = 30$		0.052	0.989	1.000

Table 5. Simulated size and power with graphs generated from multivariate Gamma distribution.

$n_{1}, n_{2}$ $(m = 20)$	Method	$c = (0, 0)$ (Size)	$c = (0.05, 0.05)$ (Power)	$c = (0.1, 0.1)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.147	0.159
$n_{1} = 10, n_{2} = 20$		0.053	0.151	0.165
$n_{1} = 10, n_{2} = 30$	t test	0.049	0.231	0.171
$n_{1} = 20, n_{2} = 20$		0.055	0.254	0.256
$n_{1} = 20, n_{2} = 30$		0.052	0.266	0.269
$n_{1} = 30, n_{2} = 30$		0.049	0.279	0.295
$n_{1} = 10, n_{2} = 10$		0.052	0.153	0.179
$n_{1} = 10, n_{2} = 20$		0.054	0.158	0.332
$n_{1} = 10, n_{2} = 30$	EL test	0.055	0.272	0.346
$n_{1} = 20, n_{2} = 20$		0.050	0.283	0.353
$n_{1} = 20, n_{2} = 30$		0.052	0.289	0.385
$n_{1} = 30, n_{2} = 30$		0.051	0.340	0.410
$n_{1}, n_{2}$ $(m = 30)$	Method	$c = (0, 0)$ (Size)	$c = (0.05, 0.05)$ (Power)	$c = (0.1, 0.1)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.271	0.795
$n_{1} = 10, n_{2} = 20$		0.050	0.323	0.845
$n_{1} = 10, n_{2} = 30$	t test	0.051	0.331	0.872
$n_{1} = 20, n_{2} = 20$		0.052	0.411	0.934
$n_{1} = 20, n_{2} = 30$		0.055	0.429	0.954
$n_{1} = 30, n_{2} = 30$		0.052	0.466	0.970
$n_{1} = 10, n_{2} = 10$		0.049	0.338	0.812
$n_{1} = 10, n_{2} = 20$		0.053	0.401	0.888
$n_{1} = 10, n_{2} = 30$	EL test	0.051	0.432	0.912
$n_{1} = 20, n_{2} = 20$		0.052	0.467	0.950
$n_{1} = 20, n_{2} = 30$		0.054	0.481	0.962
$n_{1} = 30, n_{2} = 30$		0.052	0.552	0.989
$n_{1}, n_{2}$ $(m = 40)$	Method	$c = (0, 0)$ (Size)	$c = (0.05, 0.05)$ (Power)	$c = (0.1, 0.1)$ (Power)
$n_{1} = 10, n_{2} = 10$		0.052	0.612	0.950
$n_{1} = 10, n_{2} = 20$		0.053	0.729	0.973
$n_{1} = 10, n_{2} = 30$	t test	0.051	0.754	0.987
$n_{1} = 20, n_{2} = 20$		0.050	0.831	0.990
$n_{1} = 20, n_{2} = 30$		0.052	0.849	1.000
$n_{1} = 30, n_{2} = 30$		0.050	0.914	1.000
$n_{1} = 10, n_{2} = 10$		0.051	0.677	0.959
$n_{1} = 10, n_{2} = 20$		0.052	0.742	0.988
$n_{1} = 10, n_{2} = 30$	EL test	0.054	0.777	0.995
$n_{1} = 20, n_{2} = 20$		0.052	0.843	0.999
$n_{1} = 20, n_{2} = 30$		0.051	0.864	1.000
$n_{1} = 30, n_{2} = 30$		0.053	0.925	1.000

Table 6. p-value of tests with real data graph.

$m_{1},$ $m_{2}$	Method	$n = 10$	$n = 20$	$n = 50$	$n = 7129$
$m_{1} = 47,$ $m_{2} = 25$	t test	0.423	0.231	0.115	< $0.001$
$m_{1} = 47,$ $m_{2} = 25$	EL test	0.146	0.047	0.032	< $0.001$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, X.; Yuan, M. Weighted Graph-Based Two-Sample Test via Empirical Likelihood. Mathematics 2024, 12, 2745. https://doi.org/10.3390/math12172745

AMA Style

Zhao X, Yuan M. Weighted Graph-Based Two-Sample Test via Empirical Likelihood. Mathematics. 2024; 12(17):2745. https://doi.org/10.3390/math12172745

Chicago/Turabian Style

Zhao, Xiaofeng, and Mingao Yuan. 2024. "Weighted Graph-Based Two-Sample Test via Empirical Likelihood" Mathematics 12, no. 17: 2745. https://doi.org/10.3390/math12172745

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weighted Graph-Based Two-Sample Test via Empirical Likelihood

Abstract

1. Introduction

2. Weighted Graph-Based Two-Sample Empirical Likelihood Test

3. Simulations

4. Real Data Analysis

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI