Undirected Structural Markov Property for Bayesian Model Determination

Xiong Kang; Yingying Hu; Yi Sun

doi:10.3390/math11071590

Abstract

This paper generalizes the structural Markov properties for undirected decomposable graphs to arbitrary ones. This helps us to exploit the conditional independence properties of joint prior laws to analyze and compare multiple graphical structures, while being able to take advantage of the common conditional independence constraints. This work provides a theoretical support for full Bayesian posterior updating about the structure of a graph using data from a certain distribution. We further investigate the ratio of graph law so as to simplify the acceptance probability of the Metropolis–Hastings sampling algorithms.

Keywords:

graphical model; structural Markov law; Bayesian inference; model determination

MSC:

62H05; 62D05

1. Introduction

A probabilistic graphical model (PGM) or a structured probabilistic model (SPM) is a statistical model that consists of a graph and a distribution family for which the graph encodes the conditional independence information between random variables. Such models always associate with independence models, arise naturally in multivariate analysis and can provide certain versatility and convenience in analyzing complex data with large scales, while independence models are the sets of conditional independence constraints encoded by graphs via the global Markov property.

It is known that different classes of graphs with different interpretations of independence have been developed in the past decades, and the reader can refer to [1,2,3,4] for details. One of the most important classes of graphs in graphical models is undirected graphs (UGs). Their corresponding Markov models are often known as undirected graphical models or Markov networks [1,2]. These models have been found to have many applications in a wide range of areas such as econometrics, medical science, artificial intelligence [5,6,7] and so on. Our research in this paper is related to the work in the area of the structure determination of these models with the Bayesian method.

The main objective of Bayesian structure learning is to learn the structure of a graph from data. Meanwhile, Bayesian structure learning requires a clear illustration of a prior distribution about graphical structures, which is termed as a graph law. Statisticians have proposed some approaches to calculate the prior law of a graph. The simplest graph law is the uniform distribution in [8]. Additionally, the Erdös-Rényi random graph model is also used to indicate the graph law in [9]. Furthermore, a characterization of graph law with the form of exponential family is proposed by [10]. However, how to simplify this prior law is a significant task for us, especially in the posterior inference of graphical structures. In view of this, the structural Markov property is first proposed for the purpose of characterizing the conditional independence of the structure of a graph. The structural Markov properties require that the structures of distinct components of graphs are conditionally independent given the existence of a separating component; see [10]. These properties reflect the conditional independence at the structural level. It has been proved that a graph law is structural Markov if and only if it is a member of the clique exponential family given the support condition as the set of decomposable undirected graphs; see [10]. Further, a weaker support condition of equivalent characterization for graph laws is given via closure operation of graphical structures in [11].

Indeed, the structural Markov property is an extension of the hyper Markov property, which was proposed in [12] and reflects the global Markov property at the parameter level. These hyper Markov properties are used to describe the conditional independence properties of a distribution of random variables or statistical quantities in graphical models. The hyper Markov laws arise naturally as sampling distributions of maximum likelihood estimators and as prior or posterior distributions in Bayesian inference.

Recently, a weaker version of the structural Markov properties for decomposable graphs was introduced in [13], where the authors provided an analogous clique-separator factorization for the graph law. These weakly structural Markov properties require that the separator is complete. It has been shown that this provides a more flexible family of graph prior laws to use in full Bayesian posterior updating.

It should be pointed out that all the work in [8,10,13] only focuses on decomposable graphical models. However, based on conditional independence and graphical separation, the structural Markov properties might be extended to non-decomposable undirected graphical models. The aim of this paper is filling this gap in the field of graphical models. Further, we focus on a full Bayesian method for the posterior updating of graph laws via the observed data from a certain distribution, and we also prove that this full Bayesian posterior of graph law is feasible and reasonable. Finally, as examples, we illustrate our theory with detailed investigations of two significant cases based on the graphical Gaussian models and the multinomial models, respectively.

The outline of this paper is organized as follows. In Section 2, we introduce the terminologies and conceptions used in this paper. Section 3 first investigates the structural Markov properties for non-decomposable graphs, and then exploits the joint prior laws of a random sample distribution for full Bayesian inference. Section 4 gives two examples such as the inverse Wishart distribution and the Dirichlet distribution to study the posterior updating of graph laws in details. Further, we discuss some details about the computation for the structural Markov graph laws in Section 5. Finally, in Section 6, we give the conclusion of this paper.

2. Preliminaries

For terms and symbols, we follow the references [10,12] as many theoretical frameworks of this paper are constructed and developed based on them. Several concrete notions and terminologies used in this paper will be given in the following for clarity and consistency.

2.1. Graphical Terminologies and Notation

A graph

G = (V, E)

consists of a finite set of vertices

V (G) = {v_{1}, v_{2}, \dots, v_{p}}

and the set of edges

E (G) \subseteq V (G) \times V (G) = {(u, v) : u, v \in V (G)}

. An edge

(u, v)

of G is said to be undirected if

(u, v)

is an unordered pair. A graph G is said to be an undirected graph if all its edges are undirected. Unless otherwise specified, here G is always assumed to be undirected, simple and connected throughout the paper.

For

A \subseteq V (G)

, an induced subgraph of G on A will be denoted by

G_{A} = (A, E (G_{A}))

, where

E (G_{A}) = {(u, v) \in E (G) : u, v \in A}

. All subgraphs in this paper are induced subgraphs. A is complete (or a clique) if any two different vertices

u, v \in A

are adjacent, i.e.,

(u, v) \in E (G)

. A graph is a clique if its vertex set is a clique. For

A, B \subseteq V (G)

, a clique

G_{A}

is a maximal clique if

G_{B}

is incomplete for any superset

B \supset A

. Two vertices u and v are considered to be neighbors if

(u, v) \in E (G)

. For

A \subseteq V (G)

, the boundary

b d (A)

is the set of vertices in

V (G) \ A

that are neighbors of vertices in A. G can be collapsible onto A if every connected component of

V (G) \ A

has a complete boundary in G.

For any subsets A, B and C of

V (G)

, we say that C separates A from B, and write

A ∐ B | C [G]

, if any path in G between some

u \in A

and

v \in B

contains a vertex in C. Usually, we call C a separator of A and B. Separators that are cliques are called clique separators.

For any disjoint subsets A, B and S of

V (G)

, we say

(A, B, S)

forms a decomposition if (i)

A \cup B \cup S = V (G)

; (ii)

A ∐ B | S [G]

, and (iii) S is a clique separator in G. A decomposition

(A, B, S)

is said to be proper if the sets

A \cup S

and

B \cup S

are both proper subsets of

V (G)

.

Definition 1

([14]). Let

G = (V, E)

be an undirected graph. A graph G is reducible if its vertex set contains a clique separator, otherwise G is said to be prime. E.g., G is prime if G is a clique, while G is reducible if G is a disconnected graph. An induced subgraph

G_{U}

is a maximal prime subgraph of G if it satisfies

(i): $G_{U}$ is prime, and
(ii): $\forall W \subseteq V (G)$ s.t. $U \subset W$ , $G_{W}$ is reducible.

In Figure 1, it is easy to find that

G_{1}

is prime since there is no clique separator in

G_{1}

. However,

G_{2}

is reducible because of a clique separator

S = {a, c}

in

G_{2}

.

Figure 1.

G_{1}

is a prime graph and

G_{2}

is a reducible graph.

Definition 2

([14]). A proper decomposition

(A, B, S)

of an undirected graph G is stated to form a prime decomposition if

G_{A \cup S}

and

G_{B \cup S}

are prime, or

G_{A \cup S}

and

G_{B \cup S}

can be recursively decomposed into pairwise different maximal prime subgraphs of G.

In particular, G is decomposable if

G_{A \cup S}

and

G_{B \cup S}

are complete, or they are both decomposable subgraphs of G. Note that the prime decomposition of arbitrary undirected graphs is a generalization of that of chordal graphs. For instance, in Figure 2, G is a non-decomposable undirected graph with

V (G) = {a, b, c, d, e}

, which involves two maximal prime subgraphs

G_{U_{1}}

and

G_{U_{2}}

, with

U_{1} = {a, b, c}

and

U_{2} = {b, c, d, e}

, respectively, and a clique separator

S = U_{1} \cap U_{2} = {b, c}

. It is obvious that

({a}, {d, e}, {b, c})

forms a prime decomposition of G. Additionally, we find that

G_{U_{1}}

is complete since all its pairs of vertices are joined, while

G_{U_{2}}

is incomplete because the vertices between b and d, or c and e, are not joined.

Figure 2. A prime decomposition for an undirected graph G.

It is worthwhile to point out that all the maximal prime subgraphs of an undirected graph can form a perfect sequence in a certain way. If there exists a proper decomposition of an undirected graph G, then G admits a perfect sequence

(U_{1}, U_{2}, \dots, U_{k})

of maximal prime subgraphs, so that for each

j = 2, \dots, k

, there exists some

h \in {1, 2, \dots, j - 1}

, and we have

S_{j} = U_{j} \cap (⋃_{i = 1}^{j - 1} U_{i}) \subseteq U_{h},

where

S_{j}

are clique separators actually. Specifically, G is decomposable if its all maximal prime subgraphs are complete (cliques).

In a PGM, a vertex v denotes a random variable

X_{v}

, which takes values in a space

X_{v}

. Let

X = X_{V (G)} = {(X_{v})}_{v \in V (G)}

be a p-dimensional random vector on some product space

\prod_{v \in V (G)} X_{v}

with P or

θ

representing its distribution. All the concerned distributions in the present paper are assumed to be positive and closed under marginalization and conditioning with respect to the type of a joint distribution family. For the sake of simplicity, we use

P

to represent the set of all positive distributions over X. For

A, B \subseteq V (G)

,

θ_{A}

will denote the marginal distribution of

X_{A}

and

θ_{B | A}

the conditional distribution of

X_{B}

given

X_{A} = x_{A}

.

Let

U

be the set of undirected graphs with fixed vertex set

V (G)

. A probability distribution of a random graph G, which takes values in

U

, is said to be a law, denoted by

G

. Further, define

U (A, B, S)

to be the set of undirected graphs for which

(A, B, S)

is a prime decomposition.

2.2. Independence Model and Collapsibility

Given a finite set N, for

A, B, C \subseteq N

, an independence model, denoted by

I

, is the set of triplets of the form

⟨ A, B | C ⟩

, which are termed as conditional independence statements. A graphical independence model is an independence model induced by a graph. For a graph

G \in U

, the graphical independence model of G can be defined as

I (G) = \{⟨ A, B | C ⟩ : A ∐ B | C [G] for A, B, C \subseteq V (G)\} .

Obviously,

I (G)

is the set of triples

⟨ A, B | C ⟩

, encoding its global Markov property over G.

It should be pointed out that the conditional independence of a statistical model in [15,16] shares the same properties of graph separation in [2], i.e., for a graphical independence model

I (G)

, it has the following properties:

for all $A, B \subseteq V (G)$ , $⟨ A, B | A ⟩ \in I (G)$ , $⟨ A, B | B ⟩ \in I (G)$ and $⟨ A, B | A \cap B ⟩ \in I (G)$ ;
if $⟨ A, B | C ⟩ \in I (G)$ , then $⟨ B, A | C ⟩ \in I (G)$ ;
if $⟨ A, B | C ⟩ \in I (G)$ , and $U \subseteq A$ , then $⟨ U, B | C ⟩ \in I (G)$ ;
if $⟨ A, B | C ⟩ \in I (G)$ , and $U \subseteq A$ , then $⟨ A, B | C \cup U ⟩ \in I (G)$ ;
if $⟨ A, B | C ⟩ \in I (G)$ , and $⟨ A, W | B \cup C ⟩ \in I (G)$ , then $⟨ A, B \cup W | C ⟩ \in I (G)$ .

In particular, the following property holds when

A, B, C

are disjoint.

If

⟨ A, B | C ⟩ \in I (G)

and

⟨ A, C | B ⟩ \in I (G)

, then

⟨ A, B \cup C | Ø ⟩ \in I (G)

.

Further, a graphical independence model

I (G)

has a natural projection operation on

D \subseteq V (G)

that

I {(G)}_{D} = \{⟨ A, B | C ⟩ \in I (G) : A, B, C \subseteq D\} .

It is worthwhile to point out that

I {(G)}_{D} \subseteq I (G_{D})

, where

I (G_{D})

is the independence model induced by the induced subgraph

G_{D}

.

Definition 3 (CI-collapsibility).

Let G be a fixed undirected graph in

U

. For

D \subseteq V (G)

,

I (G)

can be conditional independence collapsible (CI-collapsible) onto D if

I {(G)}_{D} = I (G_{D})

.

CI-collapsibility reflects the consistence of conditional independence relations induced by

G_{D}

and those induced by G, but constrained on D.

We say a distribution P is Markov with respect to G if for

A, B, C \subseteq V (G)

, it holds that

⟨ A, B | C ⟩ \in I (G) \Rightarrow X_{A} ∐ X_{B} | X_{C} [P],

where

X_{A} ∐ X_{B} | X_{C} [P]

represents the assertion that

X_{A}

is independent of

X_{B}

given

X_{C}

under P.

In order to ensure that various distributions and those of statistical quantities are Markov with respect to G, we now are in a position to review the graphical models within the framework of undirected graphs. A graphical model, denoted by

P (G)

, is a statistical model such that

P (G) = \{P \in P : X_{A} ∐ X_{B} | X_{C} [P] for ⟨ A, B | C ⟩ \in I (G)\} .

For the Markov distribution family

P (G)

, we say that it is faithful to G if there exists a distribution

P^{*} \in P (G)

such that

I (P^{*}) = I (G)

, where

I (P) = \{⟨ A, B | C ⟩ : X_{A} ∐ X_{B} | X_{C} [P] for A, B, C \subseteq V (G)\} .

All the graphical models concerned throughout this paper are assumed to be faithful to G. Such an assumption is called “Faithfulness Assumption” [17]. In fact, this assumption is broad and mild since Gaussian distribution families and multinomial distribution families satisfy the faithfulness assumption.

Moreover, a statistical model

P (G)

also admits a natural projection operation on

D \subseteq V (G)

, denoted by

P {(G)}_{D}

, which is defined as follows:

P {(G)}_{D} = \{P_{D} = \int_{V (G) \ D} d P : P \in P (G)\} .

Generally,

P {(G)}_{D}

is not equal to

P (G_{D})

, but it is obviously shown that

P {(G)}_{D} \supseteq P (G_{D})

.

Definition 4 (M-collapsibility).

Let G be a fixed undirected graph in

U

. For

D \subseteq V (G)

,

P (G)

can be model collapsible (M-collapsible) onto D if

P {(G)}_{D} = P (G_{D})

.

M-collapsibility indicates that the marginal distribution family is identical to the distribution family induced by

G_{D}

.

Theorem 1.

Let G be a fixed undirected graph in

U

and

D \subseteq V (G)

. Then, the following statements are equivalent.

1.: G is graphical collapsible onto D;
2.: $I (G)$ is CI-collapsible onto D;
3.: $P (G)$ is M-collapsible onto D.

Proof.

See Appendix A. □

Let

H_{j} = ⋃_{i = 1}^{j} U_{i}

denote the histories set for each

j = 1, 2, \dots, k

. By Theorem 1, we can obtain the following result.

Proposition 1.

Let G be a fixed graph in

U

and G has a perfect sequence

(U_{1}, U_{2}, \dots, U_{k})

of maximal prime subgraphs. Then, the following statements hold for each

j \in {1, 2, \dots, k}

.

1.: G can be graphical collapsible onto $H_{j}$ ;
2.: $I (G_{H_{j}}) = I {(G)}_{H_{j}}$ ;
3.: $P (G_{H_{j}}) = P {(G)}_{H_{j}}$ .

Proof.

This can be easily obtained from the meaning of collapsibility and Theorem 1. □

3. Structural Markov Graph Laws for Full Bayesian Inference

3.1. Basic Concepts and Properties

We begin with the definition of the structural Markov property of [10].

Definition 5.

A graph law

G (G)

over

U

is structural Markov if

G_{A \cup S} ∐ G_{B \cup S} | {G \in U (A, B, S)} [G],

where

G (U (A, B, S)) > 0

and

U (A, B, S)

is the set of undirected graphs for which

(A, B, S)

is a prime decomposition.

Specifically, if G is decomposable in

U

, Definition 5 degenerates to that defined in [10].

The structural Markov property indicates that the structures of different induced subgraphs are conditionally independent when the event

{G \in U (A, B, S)}

happens; see Figure 3 as an illustration.

Figure 3. A representation of the structural Markov property for non-decomposable undirected graphs:

A \cap B

is complete and separates A from B.

Proposition 2.

Let G be a fixed undirected graph in

U

. For any subsets A, B and S of

V (G)

satisfying

A \cup B \cup S = V (G)

, if

G (G)

is structural Markov, then

G_{A \cup S} ∐ G_{B \cup S}

whenever S is complete and separates A from B in G.

Proof.

By Definition 5, the existence of the remaining edges in

G_{A \cup S}

is independent of those in

G_{B \cup S}

since S is complete and separates A from B in G. Therefore, we are naturally left with a statement of marginal independence

G_{A \cup S} ∐ G_{B \cup S}

since the term

G_{S}

is redundant. Hence, the result follows. □

Proposition 2 indicates that different components of undirected graphs are conditionally independent provided that their corresponding separators are complete. In order to illustrate our results with detailed investigations, we give a non-decomposable graph G in which

A \cap B

separates A from B while

A \cap B

is incomplete in Figure 4. We can easily find that the two subgraphs

G_{B}

and

G_{A}

have possible common edges in

A \cap B

, which make the existence of the remaining edges in

G_{A}

dependent of those in

G_{B}

. In other words, these dependencies will disappear as long as

A \cap B

is complete.

Figure 4.

A \cap B

separates A from B while

A \cap B

is incomplete.

It also implies that an arbitrary undirected graph can be denoted by the graph product of its induced subgraphs as

G = G_{A \cup S} \otimes G_{B \cup S}, G \in U (A, B, S) .

The structural Markov property can be well-characterized by the above operation.

Proposition 3.

Let π be the density of a graph law

G

with respect to the counting measure on

U

. Suppose that

G, G^{'} \in U (A, B, S)

. Then,

1.: $G_{A \cup S} \otimes G_{B \cup S}^{'} \in U (A, B, S)$ and $G_{A \cup S}^{'} \otimes G_{B \cup S} \in U (A, B, S)$ ;
2.: if $G$ is structural Markov on $U$ , then

$π (G) π (G^{'}) = π (G_{A \cup S} \otimes G_{B \cup S}^{'}) π (G_{A \cup S}^{'} \otimes G_{B \cup S}) .$

Proof.

See Appendix A. □

For any subset

C \subseteq A

, define

G_{A}^{(C)}

to be the graph on A such that

G_{A}^{(C)}

is complete in C and empty otherwise.

Proposition 4.

Let G be a fixed graph in

U

and G has a perfect sequence

(U_{1}, U_{2}, \dots, U_{k})

of maximal prime subgraphs. If G has a structural Markov graph law

G

with the density π, then the density π can be factorized as

π (G) = \frac{\prod_{j = 1}^{k} π (G_{U_{j}})}{\prod_{j = 2}^{k} π (G_{S_{j}})} .

(1)

Proof.

See Appendix A. □

3.2. Joint Distribution Law

In this section, we will investigate how the structural Markov laws interact with the hyper Markov laws when they are considered as the joint prior laws.

Hyper Markov laws are motivated by the property that graph decomposition allows one to decompose a prior or posterior distribution into the product of marginal distributions on corresponding maximal prime subgraphs. For a fixed graph

G \in U (A, B, S)

, any prior or posterior distribution of

θ \in P (G)

is uniquely characterized by its marginals

θ_{A \cup S}

and

θ_{B \cup S}

, taking values in

P (G_{A \cup S})

and

P (G_{B \cup S})

, respectively.

Following [12], to be specific, a probability distribution of a random distribution

θ

, which takes values in

P (G)

, is said to be a law, denoted by

L

. For

A \subseteq V (G)

, the marginal law of

θ_{A}

will be denoted by

L_{A}

and

L_{B | A}

will denote the conditional law of

θ_{B | A}

.

Here, we give the definitions of weak and strong hyper Markov properties.

Definition 6

([12], Weak and strong hyper Markov). Suppose that G is a fixed graph in

U (A, B, S)

and

θ \in P (G)

. Let

L (θ)

be a law of θ. We say that

L (θ)

is weak hyper Markov over G if

θ_{A \cup S} ∐ θ_{B \cup S} | θ_{S} [L] .

Further, we say that

L (θ)

is strong hyper Markov over G if

θ_{A \cup S} ∐ θ_{B \cup S | S} [L] .

Let X be a random sample from

θ \in P (G)

. The conditional independence property of the joint distribution law

(P, L)

for the pair

(X, θ)

on

G \in U

can be characterized as follows.

Proposition 5.

Let G be a fixed undirected graph in

U

with a prime decomposition

(A, B, S)

. X is a random sample from

θ \in P (G)

. Then, the joint distribution law of

(X, θ)

satisfies:

1.: if $L (θ)$ is weak hyper Markov with respect to G, then

$(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}) [P, L];$
2.: if $L (θ)$ is strong hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S | S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | X_{S} [P, L] .$

Proof.

See Appendix A. □

It is worth mentioning that the hyper Markov property does not hold for the cases where separators are not complete. For instance, the graph

G_{1}

in Figure 1 is incomplete, and we do not have

θ_{{a, b, c}} ∐ θ_{{a, d, c}} | θ_{{a, c}}

or

θ_{{b, a, d}} ∐ θ_{{b, c, d}} | θ_{{b, d}}

. However, it is worthwhile to point out that the corresponding pairwise Markov property

X_{b} ∐ X_{d} | X_{{a, c}}

or

X_{a} ∐ X_{c} | X_{{b, d}}

holds under P if P is Markov with respect to

G_{1}

.

Let

Θ

be the family of Markov distributions over

U

and

L

the family of hyper Markov laws over

U

. For the sake of discussion, it is necessary for us to reconsider the notion of hyper compatibility, which was first proposed by [10], to characterize families of laws for every graph.

Definition 7 (Hyper compatibility).

Let

L, L^{'} \in L

be the laws of

θ \in Θ

with respect to G and

G^{'}

on

U

, respectively. For

A \subseteq V (G)

, we say

L

is hyper compatible on

U

if

L_{A} (θ) = L_{A}^{'} (θ)

whenever

G, G^{'}

are collapsible onto A and

G_{A} = G_{A}^{'}

.

Here

L

is always assumed to be hyper compatible over

U

. Based on the arguments above, some significant conditional independence properties of such joint law

(L, G)

can be investigated as the following.

Proposition 6.

Suppose that G has a graph law

G

over

U

. If θ has a law

L

from a hyper compatible family

L

over

U

, then

θ_{A \cup S} ∐ G_{B \cup S} | (G_{A \cup S}, {G \in U (A, B, S)}) [L, G] .

Proof.

Suppose that

G \in U (A, B, S)

. Since G is collapsible onto both

A \cup S

and

B \cup S

, by hyper compatibility,

L_{A \cup S}

can only take values in

L (G_{A \cup S})

for any

L \in L

. □

Theorem 2.

Suppose that

G (G)

is structural Markov over

U

. For any

L \in L

,

1.: if $L (θ)$ is weak hyper Markov, then

$(θ_{A \cup S}, G_{A \cup S}) ∐ (θ_{A \cup S}, G_{B \cup S}) | (θ_{S}, {G \in U (A, B, S)}) [L, G];$
2.: if $L (θ)$ is strong hyper Markov, then

$(θ_{A \cup S}, G_{A \cup S}) ∐ (θ_{B \cup S | S}, G_{B \cup S}) | {G \in U (A, B, S)} [L, G] .$

Proof.

See Appendix A. □

Theorem 2 reflects the conditional independence properties at both parameter and structural level.

Further, for any

G \in U

, let X be a random sample from a distribution

θ \in Θ

on

U

. If G is assigned the prior law

G

and

θ

is assigned the prior law

L

, then a joint distribution law is thereby created for

(X, θ, G)

.

Proposition 7.

Suppose that

G (G)

is structural Markov on

U

. Let X be a random sample from

θ \in Θ

. For

L \in L

,

1.: if $L (θ)$ is weak hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S}) ∐ G_{B \cup S} | (X_{S}, θ_{S}, {G \in U (A, B, S)}) [Θ, L, G];$
2.: if $L (θ)$ is strong hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S | S}) ∐ G_{B \cup S} | (X_{S}, {G \in U (A, B, S)}) [Θ, L, G] .$

Proof.

See Appendix A. □

The conditional independence property of any such joint distribution law of

(X, θ, G)

can be characterized as follows.

Theorem 3.

Suppose that

G (G)

is structural Markov on

U

. Let X be a random sample from

θ \in Θ

on

U

. For

L \in L

,

1.: if $L (θ)$ is weak hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S}, G_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}, G_{B \cup S}) | (X_{S}, θ_{S}, {G \in U (A, B, S)}) [Θ, L, G];$
2.: if $L (θ)$ is strong hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S}, G_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S | S}, G_{B \cup S}) | (X_{S}, {G \in U (A, B, S)}) [Θ, L, G] .$

Proof.

See Appendix A. □

Theorem 3 reflects that a random sample can be determined by both hyper and structural parameters, which will play a significant role in full Bayesian inference.

Corollary 1.

Suppose that

G (G)

is structural Markov on

U

. Let X be a random sample from

θ \in Θ

on

U

. For

L \in L

,

1.: if $L (θ)$ is weak hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}, {G \in U (A, B, S)}) [Θ, L, G];$
2.: if $L (θ)$ is strong hyper Markov, then

$(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S | S}) | (X_{S}, {G \in U (A, B, S)}) [Θ, L, G] .$

Proof.

It can be easily obtained from Theorem 3. □

Corollary 1 can be considered as a generalization of Proposition 5 since G is a random undirected graph on

U

with a prime decomposition

(A, B, S)

. Without loss of generality, when the event

{G \in U (A, B, S)}

happens, i.e., given a graph G with a prime decomposition

(A, B, S)

, we can deduce from Corollary 1 that

(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}) [P, L] .

3.3. Posterior Updating for Graph Law

Our research in this section aims to identify the structure of models via the Bayesian approach. Based on our results in Section 3.2, in the following, we will use data from a certain distribution to learn the structure of a graph.

We assume that G has a structural Markov graph law

G

over

U

. For

θ \in Θ

, let

θ

have a law from a hyper compatible family

L

. Let

X^{(n)} = (X^{1}, X^{2}, \dots, X^{n})

denote a random sample of n observations from

θ

. If we focus on the density of posterior graph law

π (G | x^{(n)}, θ)

with its conjugated prior graph law

π (G)

, then the full Bayesian posterior graph law follows:

π (G | x^{(n)}, ϑ) = \frac{1}{Z} π (G) ℓ (θ | G) p (x^{(n)} | θ; G), θ \in Θ, G \in U,

where Z is a normalizing constant and

ϑ

is a hyperparameter that characterizes the law of

θ

. In general, it is hardly for us to estimate the structure of a graph G since the hyper parameter

ϑ

is unknown.

In the following, we investigate the properties of structural Markov laws when used as priors for models.

Proposition 8.

If the prior graph law

G (G)

is structural Markov on

U

, then the posterior graph law, obtained by conditioning on data

X^{(n)} = x^{(n)}

, is structural Markov on

U

.

Proof.

By the conditional independence and Theorem 3, we can easily find that

G_{A \cup S} ∐ G_{B \cup S} | (X^{(n)}, θ, {G \in U (A, B, S)}) .

□

Proposition 9.

Assume that the prior graph law

G (G)

is structural Markov and

L (θ)

is strong hyper Markov on

U

. Then, the following properties hold:

1.: The posterior graph law obtained by conditioning on data $X^{(n)} = x^{(n)}$ is structural Markov with respect to $U$ ;
2.: The marginal data distribution of $X^{(n)}$ is Markov with respect to $U$ ;
3.: The posterior law of θ conditioning on $X^{(n)} = x^{(n)}$ is Markov with respect to $U$ .

Proof.

By the conditional independence and Theorem 3, we have

G_{A \cup S} ∐ G_{B \cup S} | (X^{(n)}, {G \in U (A, B, S)}) .

This implies (i).

To prove (ii), by the conditional independence and Theorem 3, we have

X_{A \cup S}^{(n)} ∐ X_{B \cup S}^{(n)} | (X_{S}^{(n)}, {G \in U (A, B, S)}) .

In particular, if G is given from

U (A, B, S)

, then

X_{A \cup S}^{(n)} ∐ X_{B \cup S}^{(n)} | X_{S}^{(n)} .

From Theorem 3, we have

θ_{S \cup S} ∐ θ_{B \cup S} | (X^{(n)}, θ_{S}, {G \in U (A, B, S)}),

which implies (iii). □

Our Bayesian approaches call for a strong hyper Markov prior law on

θ

with respect to

G \in U

. By Proposition 9, the posterior law of

θ

, given G, has a density ℓ of the following form:

ℓ (θ | x^{(n)}, G) = \frac{\prod_{U \in U} ℓ (θ_{U} | x_{U}^{(n)})}{\prod_{S \in S} ℓ (θ_{S} | x_{S}^{(n)})}, θ \in Θ,

(2)

where

U

is the set of maximal prime subgraphs of G and

S

is the set of corresponding clique separators.

If

G (G)

is structural Markov and

L (θ)

is strong hyper Markov with respect to G, then the posterior graph law of G will be given by

ℓ (G | x^{(n)}, ϑ) \propto \frac{\prod_{U \in U} π (G_{U}) ℓ (θ_{U} | x_{U}^{(n)})}{\prod_{S \in S} π (G_{S}) ℓ (θ_{S} | x_{S}^{(n)})}, θ \in Θ, G \in U .

(3)

It is worthwhile to point out that (3) indicates that the posterior graph law of G will preserve the structural Markov property under the hyper compatible laws. This result coincides with Proposition 8. Further, this updating may be performed locally by (3), which implies that the posterior graph laws on each maximal prime subgraphs of G are only dependent of the posterior of hyper compatible laws on the maximal prime subgraph.

4. Two Special Cases

4.1. Graphical Gaussian Models and the Inverse Wishart Law

A graphical Gaussian model is defined by a p-dimensional multivariate Gaussian distribution with the expected value

μ

and covariance matrix

Σ

, i.e.,

P (X) = N_{p} (μ, Σ) .

For simplicity, we assume that the model has zero mean in the following. Define

K = Σ^{- 1}

to be the precision matrix of G, where

K \in M_{p}^{+} : K_{u v} = 0 for all (u, v) \notin E (G),

where

M_{p}^{+}

denotes the set of

p \times p

positive definite matrices. For any matrix

M \in M_{p}^{+}

,

M_{A}

will denote the

| A | \times | A |

matrix obtained by

{(M_{u v})}_{(u, v) \in A^{2}}

. It has been shown that the global, local and pairwise Markov properties are equivalent in graphical Gaussian models; see [2]. We therefore conclude that the graphical Gaussian distribution P is Markov with respect to G if and only if

K_{u v} = 0 \Leftrightarrow X_{u} ∐ X_{v} | X_{V (G) \ {u, v}} .

(4)

Let

x^{(n)}

be observations of

n \times p

sample matrix

X^{(n)}

, a random sample of size n from the graphical Gaussian distribution

N_{p} (0, Σ)

, and let

S = x^{(n)} {(x^{(n)})}^{T}

denote the observed sum-of-products matrix. Then, for any

U \in U

,

p (x_{U}^{(n)} | Σ_{U}) = \frac{1}{{(2 π)}^{| U |}} {| Σ_{U} |}^{- \frac{n}{2}} exp \{- \frac{1}{2} t r (Σ_{U}^{- 1} S_{U})\},

where

| U |

is the cardinality of U, and

| Σ_{U} |

is the determinant of

Σ_{U}

. It is similar for

p (x_{S}^{(n)} | Σ_{S})

,

S \in S

.

The inverse Wishart distribution is also termed inverse Wishart law, denoted by

I W (δ, Φ)

. It is as the prior for the graphical Gaussian distribution

N_{p} (0, Σ)

. Conditioning on (4),

Σ

has a hyper inverse Wishart prior law, denoted by

H I W (δ, Φ)

. The marginal density

ℓ (Σ_{U} | Φ_{U})

is of the form

ℓ (Σ_{U} | Φ_{U}) = {| Σ_{U} |}^{- \frac{δ + | U | - 1}{2}} exp \{- \frac{1}{2} t r (Σ_{U}^{- 1} Φ_{U})\} .

It is already shown in [12] that the hyper inverse Wishart law satisfies the strong hyper Markov property, which would allow us to compute the posterior updating of

Σ

by the margins of maximal prime subgraphs of the graph G. That is, for any

U \in U

,

ℓ (Σ_{U} | X^{(n)} = x^{(n)}) = ℓ (Σ_{U} | X_{U}^{(n)} = x_{U}^{(n)})

with the density

ℓ (Σ_{U} | x_{U}^{(n)}) \propto Σ_{U} |^{- \frac{δ + n + | U | - 1}{2}} exp \{- \frac{1}{2} Σ_{U}^{- 1} (Φ_{U} + S_{U})\} .

We conclude that

L (Σ_{U} | X_{U}^{(n)} = x_{U}^{(n)}) = I W ((δ + n; S_{U} + Φ_{U}) .

Consequently, if we assign a prior law of form (1) for G, then from Proposition 8 we can conclude that the posterior graph law of G, given data

X^{(n)} = x^{(n)}

from the Gaussian distribution

N_{p} (0, Σ)

, can be obtained through (3) with a density of the following form:

π (G | x^{(n)}, Φ) \propto \frac{\prod_{U \in U} π (G_{U}) ℓ (Σ_{U} | Φ_{U} + S_{U})}{\prod_{S \in S} π (G_{S}) ℓ (Σ_{S} | Φ_{S} + S_{S})}, G \in U .

(5)

4.2. Multinomial Models and the Dirichlet Law

Suppose that all the variables

(X_{1}, X_{2}, \dots, X_{p})

are discrete-valued. Let

V (G)

denote the contingency table by

I = I_{1} \times I_{2} \times \dots \times I_{p}

, where

I_{h}

is a finite set for each

h \in {1, 2, \dots, p}

. An element

i \in I

is referred to as a cell in this table. Based on this,

(X_{1}, X_{2}, \dots, X_{p})

will take value in finite sets

I = (I_{1}, I_{2}, \dots, I_{p})

. Indeed, I is a discrete-valued random vector whose distribution

θ

is assumed to be Markov with respect to G. Then,

θ (i) = \frac{\prod_{U \in U} θ (i_{U})}{\prod_{S \in S} θ (i_{S})}, i \in I,

(6)

where

θ (i_{U}) \in (0, 1), θ (i_{S}) \in (0, 1)

and

\sum_{i \in I} θ (i) = 1

.

Let

x^{(n)}

be observations of

X^{(n)}

, a random sample from

θ

.

X^{(n)}

is an

n \times p

matrix where each row denotes an observation of I. The distribution of

X^{(n)}

is the multinomial distribution with index n and probabilities

θ

, denoted by

M (n, θ)

. Then, the likelihood function

p (x^{(n)} | θ, G)

has the form

p (x_{U}^{(n)} | θ_{U}) = \prod_{i_{U} \in I_{C}} θ {(i_{U})}^{n (i_{U})}, U \in U,

where

I_{U} = \prod_{u \in U} I_{u}

,

θ_{U} = {(θ (i_{U}))}_{i_{U} \in I_{U}}

, and

n (i_{U})

counts the number of elements of

x_{U}^{(n)}

from the marginal cell

i_{U}

. It is similar for

p (x_{S}^{(n)} | θ_{S})

,

S \in 𝒮

.

The Dirichlet distribution is also termed Dirichlet law, denoted by

D (α)

, where

α = {(α (i))}_{i \in I}

are hyper parameters. It is used as the prior for multinomial distribution

M (n, θ)

. It is shown that the Dirichlet law satisfies strong hyper Markov property; see [12]. Thus, we have

ℓ (θ_{U} | α_{U}) = \prod_{i_{U} \in I_{U}} θ {(i_{U})}^{α (i_{U}) - 1}, U \in U,

and then the posterior law can be written as

ℓ (θ_{U} | x^{(n)}, α_{U}) = \prod_{i_{U} \in I_{U}} θ {(i_{U})}^{α (i_{U}) + n (i_{U}) - 1}, U \in U .

Based on the above arguments, we can conclude that

L (θ_{U} | x_{U}^{(n)}) = D (α_{U} + n_{U})

. Further, if we assign a prior law of form (1) for G, by Proposition 8, the posterior graph law of G, given data

X^{(n)} = x^{(n)}

obtained from

θ

, has density in the following way:

π (G | x^{(n)}, α) \propto \frac{\prod_{U \in U} π (G_{U}) ℓ (α_{U} + n)}{\prod_{S \in S} π (G_{S}) ℓ (α_{S} + n)}, G \in U .

4.3. An Example on Simulated Data

4.3.1. Dataset Description

In this section, we present the results for one application to a real dataset. We analyze a labor force survey dataset, which is available from [18]. This dataset is used to analyze the multivariate associations among income, education and family background on 1002 males in the American labor force. Here, we briefly describe these variables in this dataset.

inc: The income of the respondents.
deg: Tespondents’ highest educational degree.
chi: The number of children of the respondents.
pin: The income of the respondents’ parents.
pde: The highest educational degree of respondents’ parents.
pch: The number of children of respondents’ parents.
age: Respondents’ age in years.

4.3.2. Experiments and Results

We consider the posterior graph law of G in Equation (5), a Gibbs sampler can then be formed by using the following conditional posteriors:

$X \sim N_{p} (0, Σ)$ ;
$Σ | X, Φ \sim I W (n + δ - 1, S + Φ)$ .

For the prior graph law of G, following from Example 3.5 in [10], we consider an Erdös-Rényi random graph model prior on each edge

(u, v)

with

π (G) \propto {(\frac{φ}{1 - φ})}^{| E (G) |},

where the parameter

φ \in (0, 1)

is a prior probability of existing edges. In this case, we set

φ = 0.5

. We use the inverse Wishart law

I W (δ, Φ

) as a prior for the covariance matrix over the graph G, with

δ = 7

and

Φ = I_{7}

as an identity matrix here.

By using the function above, we simulate

n = 1002

observations. The experimental results are implemented by R package for 5000 iterations with 2500 as burn-in as follows:

The experimental results on this dataset are displayed in Figure 5 and Figure 6. The estimated posterior probabilities of the size of the graphs are shown in the left of Figure 5, which shows that our algorithm mainly visits graphs with sizes between nine and twelve edges. The figure on the right exhibits the estimated posterior probabilities of all visited graphs with various sizes, and also shows that more than 15 different graphs are visited. The graph in Figure 6 is the selected graph with the highest posterior probability from these visited graphs.

Figure 5. The figure in the left is the estimated posterior probabilities of the size of the graphs. The figure in the right is the estimated posterior probabilities of all visited graphs.

Figure 6. The figure is the inferred graph with the highest posterior probability.

The results also suggest that the respondents’ income has relationships with their own education and age. It is also shown that the income of respondents’ parents is only related to their education.

5. Computations

In this section, we aim to design an algorithm to take samples that we are interested in, such as decomposable undirected graphs, from the structural Markov graph law

G

on

U

.

5.1. Ratio for Graph Law

Model comparison plays an important role in statistical analysis, especially in solving the problem of the ratio of distributions of variables in different states. We consider a graph itself as a random variable into the construction of this ratio between two undirected graphs

G^{'}

and G, where

G^{'}

is obtained from G by removing or adding one edge. This ratio can be written as

Λ (G^{'} : G) = \frac{π (G^{'})}{π (G)} .

The main objective of this next section is to greatly simplify this complex calculation under the assumption that the graph law

G

is structural Markov on

U

. For the sake of convenience, we define

η_{U} = π (G_{U})

and

ζ_{S} = π (G_{S})

for

U \in U

and

S \in S

.

In Figure 7, it is a special case where

G^{'}

is obtained from G by removing the edge

(u, v)

, which is exactly in one prime component

U \in U

of G.

Figure 7.

G^{'}

is obtained from G by removing the edge

(u, v)

.

Proposition 10.

Let G be a fixed graph in

U

and G has a perfect sequence

(U_{1}, U_{2}, \dots, U_{k})

of maximal prime subgraphs. Suppose that

G^{'}

is obtained from G by removing the edge

(u, v)

. Then,

1.: if u and v are contained in exactly one maximal prime subgraph $U_{j}$ of G, then

$Λ (G^{'} : G) = \frac{η_{U_{j}^{'}}}{η_{U_{j}}}, j = 1, 2, \dots, k;$
2.: if u and v are contained in both two neighboring maximal prime subgraphs $U_{j}, U_{j + 1}$ of G, then

$Λ (G^{'} : G) = \frac{η_{W^{'}}}{η_{W}},$

(7)

where $W = U_{j} \cup U_{j + 1}$ in G.

Proof.

See Appendix A □

In Figure 8, it is a certain case where

G^{'}

is obtained from G by adding the edge

(u, v)

within two neighboring prime components

U_{i}

and

U_{j}

of G such that

u \in U_{i}

and

v \in U_{j}

.

Figure 8.

G^{'}

is obtained from G by adding the edge

(u, v)

.

Proposition 11.

Let G be a fixed graph in

U

and G has a perfect sequence

(U_{1}, U_{2}, \dots, U_{k})

of maximal prime subgraphs. Suppose that

G^{'}

is obtained from G by adding the edge

(u, v)

. Then,

1.: if u and v are contained in exactly one incomplete prime subgraph $U_{h}$ , then

$Λ (G^{'} : G) = \frac{η_{U_{h}^{'}}}{η_{U_{h}}} .$
2.: if $U_{i} ∋ u$ and $U_{j} ∋ v$ are the two distinct maximal prime subgraphs of G, then there are some prime components $U_{i} = U_{h_{1}}, U_{h_{2}}, \dots, U_{h_{m}} = U_{j}$ such that

$Λ (G^{'} : G) = \frac{η_{T^{'}}}{η_{T}},$

(8)

where $T = U_{h_{1}} \cup U_{h_{2}} \cup \dots \cup U_{h_{m}}$ .

Proof.

See Appendix A. □

In particular, if G is a decomposable graph in

U

, then we have the following results.

Lemma 1

([19]). Let G be a decomposable graph in

U

and G has a perfect cliques sequence

(U_{1}, U_{2}, \dots, U_{k})

. Suppose that

G^{'}

is decomposable, obtained from G by removing or adding one edge

(u, v)

. Then,

1.: If $G^{'}$ is obtained from G by removing the edge $(u, v)$ , then u and v must belong to a clique $U_{j}$ of G;
2.: If $G^{'}$ is obtained from G by adding the edge $(u, v)$ , then there exist two different cliques $U_{i} ∋ u$ and $U_{j} ∋ v$ such that $S = U_{i} \cap U_{j}$ is complete and separates $U_{i}$ and $U_{j}$ .

Corollary 2.

Let G be a decomposable graph in

U

and G has a perfect cliques sequence

(U_{1}, U_{2}, \dots, U_{k})

. Suppose that

G^{'}

is decomposable, obtained from G by removing or adding one edge

(u, v)

. Then,

1.: If $G^{'}$ is obtained from G by removing the edge $(u, v)$ within $U_{j}$ , then

$Λ (G^{'} : G) = \frac{η_{U_{u}} η_{U_{v}}}{η_{0} η_{U}},$

where $U_{u} = U_{j} \ {v}$ , $U_{v} = U_{j} \ {u}$ and $U_{0} = U_{j} \ {u, v}$ ;
2.: If $G^{'}$ is obtained from G by adding the edge $(u, v)$ such that $u \in U_{i}$ and $v \in U_{j}$ , then the ratio $Λ (G^{'} : G)$ is

$Λ (G^{'} : G) = \frac{ζ_{S} ζ_{S_{0}}}{ζ_{S_{u}} ζ_{S_{v}}},$

where $S = U_{i} \cap U_{j}$ , $S_{u} = S \cup {u}$ , $S_{v} = S \cup {v}$ and $S_{0} = S \cup {u, v}$ .

Proof.

We first give the proof of 1. If

(u, v) \in E (G)

and

(u, v) \notin E (G^{'})

, by Lemma 1, the deleted edge

(u, v)

must belong to a single clique

U_{j}

. It is worthwhile to point out that all of

U_{u}

,

U_{v}

and

U_{0}

are cliques in G and

G^{'}

. Then,

η_{U_{j}^{'}} = \frac{η_{U_{u}} η_{U_{v}}}{η_{U_{0}}},

which combines with (A19) gives the result. The proof of 2 is similar. □

5.2. Sampling Decomposable Graphs from Structural Markov Graph Laws

We now take a random graph on

U

as the initial state to design the Markov chain Monte Carlo (MCMC) sampler for sampling from a structural Markov graph law. This technique relies on small perturbations to the edge set of a graph, indicating that one edge could be removed or added.

A reversible jump MCMC sampler is introduced for posterior sampling of decomposable graphical models, which relies on making single edge additions and removals; see [8]. We now use this jump MCMC methodology for our sample from structural Markov law in further details.

Let G denote a state variable and

G^{'}

the destination variable where

G^{'}

is obtained from a random graph G by removing or adding one edge, and so G would take the chain to the destination

G^{'}

with probability

q (G, G^{'})

, which ensures detailed balance with respect to the target distribution

π (G)

. Then, the Metropolis–Hastings acceptance ratio can be written as

α (G^{'}, G) = min \{1, \frac{π (G^{'}) q (G^{'}, G)}{π (G) q (G, G^{'})}\} .

(9)

In fact, the Equation (9) is not the only choice yielding detailed balance. In particular, in order to reduce the error caused by excessive proportion, we can make the following adjustment that

α (G^{'}, G) = min \{1, \frac{π (G^{'})}{π (G)}\} \times min \{1, \frac{q (G^{'}, G)}{q (G, G^{'})}\} .

In general, since the proposal kernel, which we will set as symmetric, that is,

q (G, G^{'}) = q (G^{'}, G)

. Consequently, it is indicated that the acceptance probability is only dependent on the relative densities, which will only require us to compute

α (G^{'}, G) = min \{1, Λ (G^{'} : G)\} .

We randomly select a pair of vertices

u, v \in V (G)

. If

(u, v) \in E (G)

, then it is removed. If

(u, v) \notin E (G)

, then it is added. Let

G^{+ (u, v)}

denote the graph, which is obtained from G by adding the edge

(u, v)

, and similarly for

G^{- (u, v)}

. Let

G^{(t)}

denote the state of G at time t and let

U^{*}

be the set of decomposable undirected graphs with vertex set

V (G)

. We begin with an ER random graph as its initial state, and then a Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law

G

can be constructed in the following Algorithm 1:

Algorithm 1 A Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law.

Input: An ER random graph

G \in U

.
Output: A set of decomposable graph from

U

.
Set

G^{(0)} = G

for

t = 0, 1, 2, \dots .

do
if

(u, v) \in E (G^{(t)})

and

G^{- (u, v)} \in U^{*}

then
set

G^{(t + 1)} = G^{- (u, v)}

with probability

min (\frac{η_{U_{u}} η_{U_{v}}}{η_{U_{0}} η_{U}}, 1)

else if

(u, v) \notin E (G^{(t)})

and

G^{+ (u, v)} \in U^{*}

then
set

G^{(t + 1)} = G^{+ (u, v)}

with probability

min (\frac{ζ_{S} ζ_{S_{0}}}{ζ_{S_{u}} ζ_{S_{v}}}, 1)

else

G^{(t + 1)} = G^{(t)}

         end if
    end for
    return A set of decomposable graphs.

Based on our results in Section 5.1, this algorithm implies that the acceptance probability can be obtained by only evaluating the marginal likelihood of corresponding subsets of

V (G)

at each step when sampling from a posterior graph law in Proposition 8 or Proposition 9.

6. Conclusions

The main contribution of this paper is to define the structural Markov properties of [10] for non-decomposable undirected graphs. It is shown that an arbitrary undirected graph can be primely decomposed into the sum of several prime subgraphs. Based on the prime decomposition of undirected graphs and conditional independence, the structural Markov properties can be naturally extended to arbitrary undirected graphs.

Then, we propose a full Bayesian method for estimating the structure of a graph. This method requires that our observed data are from a certain distribution. By using our results, we have shown that the computation of posterior updating of graph law can be determined by the prime components margins, which would make the computation of the posterior graph law greatly simplified.

It should be pointed that all our research only focuses on undirected graphs. However, other classes of graphs, such as chain graphs or ancestral graphs, may have more interesting and valuable properties that can reflect the conditional independence of the graph structure in the problem of models determination. In the future, we will detail them at length.

Author Contributions

Methodology, Y.S.; Validation, X.K.; Writing—review & editing, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Grant No. 2022D01C406), the National Natural Science Foundation of China (Grant Nos. 11861064, 11726629, 11726630) and the National Key Laboratory for Applied Statistics of MOE, Northeast Normal University (Grant No. 130028906).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Some Main Theorems and Propositions

Proof of Theorem 1.

The equivalence of (i) and (ii) can be implied by [Corollary 2.5] [20]. So, it suffices to show that (ii) ⇔ (iii). We first give the proof of (ii) ⇒ (iii). Firstly, we know that

P (G) = {P : I (G) \subseteq I (P)} .

By the meaning of

P (G)

, we define

P (G_{D}) = {Q : I (G_{D}) \subseteq I (Q)} .

(A1)

For

P \in P (G)

, let

R = P_{D} \in P {(G)}_{D}

. By CI-collapsibility, we have

I (G_{D}) = I {(G)}_{D} \subseteq I (R) .

(A2)

So, we implied that

R \in P (G_{D})

by (A1) and (A2). From which, it follows that

P {(G)}_{D} \subseteq P (G_{D})

. Hence, the result follows by

P {(G)}_{D} \supseteq P (G_{D})

. Conversely, under the “Faithfulness Assumption”, there is a

P^{*} \in P (G)

such that

I (P^{*}) = I (G)

, implying

I (P_{D}^{*}) = I {(P^{*})}_{D} = I {(G)}_{D}

. By M-collapsibility, we know that

P_{D}^{*} \in P (G_{D})

, which gives

I (G_{D}) \subseteq I (P_{D}^{*})

. Hence, we have

I (G_{D}) \subseteq I {(G)}_{D}

. The result follows since it is easy to obtain that

I {(G)}_{D} \subseteq I (G_{D})

. □

Proof of Proposition 3.

By the graph product operation, since S is complete and separates A from B, then for any

G, G^{'} \in U (A, B, S)

,

(A, B, S)

is a prime decomposition of the graph

G_{A \cup S} \otimes G_{B \cup S}^{'}

with vertex set

V (G)

, and so is

G_{A \cup S}^{'} \otimes G_{B \cup S}

. They imply that (i) holds. As for (ii), if

G

is structural Markov on

U

, then

\begin{matrix} π (G) & = π (G_{A \cup S} \otimes G_{B \cup S}) \\ = π (G_{A \cup S} | {G \in U (A, B, S)}) π (G_{B \cup S} | {G \in U (A, B, S)}), \end{matrix}

and similarly we can have the same result for

π (G^{'})

. From (i), we have

\begin{matrix} π (G_{A \cup S} \otimes G_{B \cup S}^{'}) = & π (G_{A \cup S} | {G \in U (A, B, S)}) \\ \times π (G_{B \cup S}^{'} | {G \in U (A, B, S)}), \end{matrix}

and so is

π (G_{A \cup S}^{'} \otimes G_{B \cup S})

. Thus, our results by the above arguments follow. □

Proof of Proposition 4.

Let

H_{j} = ⋃_{i = 1}^{j} U_{i}

,

S_{j} = U_{j} \cap H_{j - 1}

,

j \in {2, 3, \dots, k}

. Since

(H_{j - 1} \ S_{j}, U_{j} \ S_{j}, S_{j})

forms a prime decomposition of

G_{H_{j}}

, for each

j \in {2, 3, \dots, k}

, we have

G_{H_{j}} = G_{U_{j}} \otimes G_{H_{j - 1}} .

For

j = k

, since

S_{k} = U_{k} \cap H_{k - 1}

is complete, then

G_{S_{k}} = G_{H_{k}}^{(S_{k})} = G_{U_{k}}^{(S_{k})} \otimes G_{H_{k - 1}}^{(S_{k})} .

Whence we have

π (G_{H_{k}}) π (G_{S_{k}}) = π (G_{U_{k}} \otimes G_{H_{k - 1}}) π (G_{U_{k}}^{(S_{k})} \otimes G_{H_{k - 1}}^{(S_{k})}) .

By Proposition 3, we can obtain

\begin{matrix} π (G_{H_{k}}) π (G_{S_{k}}) & = π (G_{U_{k}} \otimes G_{U_{k}}^{(S_{k})}) π (G_{H_{k - 1}} \otimes G_{H_{k - 1}}^{(S_{k})}) \\ = π (G_{U_{k}}) π (G_{H_{k - 1}}) . \end{matrix}

The Equation (1) can be obtained recursively. □

Proof of Proposition 5.

Suppose that

(A, B, S)

forms a prime decomposition of G. Since G can be graphical collapsible onto

A \cup S

, by Theorem 1,

θ_{A \cup S}

only takes values in

P (G_{A \cup S})

. This implies that

X_{A \cup S}

can be obtained from

θ_{A \cup S}

actually. Then, we obtain

X_{B \cup S} ∐ θ_{A \cup S} | θ_{B \cup S} .

(A3)

From (A3), we deduce

X_{B \cup S} ∐ θ_{A \cup S} | (θ_{B \cup S}, θ_{S}) .

(A4)

By the meaning of the hyper Markov property, it follows that

θ_{A \cup S} ∐ θ_{B \cup S} | θ_{S}

. Combing this with (A4) and the axioms of conditional independence gives

θ_{A \cup S} ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}),

(A5)

which implies that

X_{A \cup S} ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{A \cup S}) .

(A6)

Together, (A5) and (A6) yield the result. The proof for the strong case follows similar steps. □

Proof of Theorem 2.

The weak hyper Markov property states that

θ_{A \cup S} ∐ θ_{B \cup S} | (θ_{S}, G, {G \in U (A, B, S)}) .

(A7)

Since

G \in U (A, B, S)

, then

G = G_{A \cup S} \otimes G_{B \cup S}

. Thus, from (A7) we deduce

θ_{A \cup S} ∐ θ_{B \cup S} | (θ_{S}, G_{A \cup S}, G_{B \cup S}, {G \in U (A, B, S)}) .

(A8)

From Proposition 6, we obtain

θ_{A \cup S} ∐ G_{B \cup S} | (θ_{S}, G_{A \cup S}, {G \in U (A, B, S)}),

which gives the result that

θ_{A \cup S} ∐ (θ_{B \cup S}, G_{B \cup S}) | (θ_{S}, G_{A \cup S}, {G \in U (A, B, S)})

(A9)

by combining with (A8).

Again, by Proposition 6 and the structural Markov property, we have

G_{A \cup S} ∐ (θ_{B \cup S}, G_{B \cup S}) | {G \in U (A, B, S)} .

Then, we have

G_{A \cup S} ∐ (θ_{B \cup S}, G_{B \cup S}) | (θ_{S}, {G \in U (A, B, S)}) .

(A10)

Thus, our result follows from (A9) and (A10). The proof for the strong case follows similar steps. □

Proof of Proposition 7.

By Theorem 2, we obtain

(θ_{A \cup S}, G_{A \cup S}) ∐ (θ_{B \cup S}, G_{B \cup S}) | (θ_{S}, {G \in U (A, B, S)}) .

(A11)

From (A11), we deduce

θ_{A \cup S} ∐ G_{B \cup S} | (θ_{S}, {G \in U (A, B, S)}) .

Whence we have

θ_{A \cup S} ∐ G_{B \cup S} | (X_{S}, θ_{S}, {G \in U (A, B, S)}) .

(A12)

By conditional independence property and Theorem 2,

X_{A \cup S} ∐ G_{B \cup S} | (X_{S}, θ_{S}, {G \in U (A, B, S)}) .

Thus, we have

X_{A \cup S} ∐ G_{B \cup S} | (X_{S}, θ_{S}, θ_{A \cup S}, {G \in U (A, B, S)}),

(A13)

which combines with (A12) to give the result. The proof of the strong case is similar, so we omit it for simplicity. □

Proof of Theorem 3.

Since X is a random sample from

θ

and

L (θ)

is hyper Markov with respect to G, then by Proposition 5,

(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}, G, {G \in U (A, B, S)}) .

(A14)

Since

G \in U (A, B, S)

,

G = G_{A \cup S} \otimes G_{B \cup S}

. Then, from (A14) we can find that

(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}) | (X_{S}, θ_{S}, G_{A \cup S}, G_{B \cup S}, {G \in U (A, B, S)}) .

(A15)

From Proposition 7, we obtain

(X_{A \cup S}, θ_{A \cup S}) ∐ G_{B \cup S} | (X_{S}, θ_{S}, G_{A \cup S}, {G \in U (A, B, S)}),

which combines with (A15) to give the result that

(X_{A \cup S}, θ_{A \cup S}) ∐ (X_{B \cup S}, θ_{B \cup S}, G_{B \cup S}) | (X_{S}, θ_{S}, G_{A \cup S}, {G \in U (A, B, S)}) .

(A16)

Additionally, from the structurally Markov property and Proposition 7, we have

G_{A \cup S} ∐ (X_{B \cup S}, θ_{B \cup S}, G_{B \cup S}) | (X_{S}, θ_{S}, {G \in U (A, B, S)}) .

(A17)

From (A17), we deduce

G_{A \cup S} ∐ (X_{B \cup S}, θ_{B \cup S}, G_{B \cup S}) | (X_{B \cup S}, θ_{A \cup S}, {X_{S}, θ}_{S}, {G \in U (A, B, S)}) .

(A18)

So, the result follows from (A16) and (A18). Similar proof can be given for the case of the strong hyper Markov law. □

Proof of Proposition 10.

We first give the proof of (i). Suppose that

G, G^{'} \in U

. If

G

is structural Markov on

U

, then we have

\begin{matrix} Λ (G^{'} : G) = \frac{π (G^{'})}{π (G)} & = \frac{\prod_{j = 1}^{k} π (G_{U_{j}}^{'})}{\prod_{j = 2}^{k} π (G_{S_{j}}^{'})} \times \frac{\prod_{j = 2}^{k} π (G_{S_{j}})}{\prod_{j = 1}^{k} π (G_{U_{j}})} \\ = \frac{η_{U_{j}^{'}}}{η_{U_{j}}}, j \in {1, 2, \dots, k} . \end{matrix}

(A19)

The proof of (ii) is given as follows. It is obvious that

W^{'}

is prime in

G^{'}

. Consequently,

G^{'}

has a perfect maximal prime subgraphs sequence

(U_{1}, \dots, U_{j - 1}, W^{'}, U_{j + 2}, \dots, U_{k})

, and then the Equation (7) follows by using (i). □

Proof of Proposition 11.

The proof of (i) follows similar steps to that of Proposition 10. To give the proof of (ii), let

T

be the junction tree with vertices being all maximal prime subgraphs of G. The construction of

T

can be referred to [21]. Since

u, v

are in two different maximal prime subgraphs of G, we then connect the

U_{i}

and

U_{j}

in

T

. Then, we will obtain a unique cycle. Without a loss of generality, the vertices on this cycle are denoted by

(U_{i} = U_{h_{1}}, U_{h_{2}}, \dots, U_{h_{m}} = U_{j})

, where

U_{h_{t}}

and

U_{h_{t + 1}}

are connected by an edge in

T

. Then, it is easy to see that

U \ {U_{i} = U_{h_{1}}, U_{h_{2}}, \dots, U_{h_{m}} = U_{j}} \cup T

is the set of all the maximal prime subgraphs of

G^{'}

. So, by applying (i), Equation (8) follows. □

References

Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Lauritzen, S.L. Graphical Models; Oxford University Press: New York, NY, USA, 1996. [Google Scholar]
Richardson, T. A factorization criterion for acyclic directed mixed graphs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009. [Google Scholar]
Richardson, T.; Spirtes, P. Ancestral graph Markov models. Ann. Stat. 2002, 30, 962–1030. [Google Scholar] [CrossRef]
Iqbal, K.; Buijsse, B.; Wirth, J. Gaussian Graphical Models Identify Networks of Dietary Intake in a German Adult Population. J. Nutr. Off. Organ Am. Inst. Nutr. 2016, 146, 646–652. [Google Scholar] [CrossRef] [PubMed]
Larranaga, P.; Moral, S. Probabilistic graphical models in artificial intelligence. Appl. Soft Comput. 2011, 11, 1511–1528. [Google Scholar] [CrossRef]
Verzilli, C.J.; Stallard, N.; Whittaker, J.C. Bayesian graphical models for genomewide association studies. Am. J. Hum. Genet. 2006, 79, 100–112. [Google Scholar] [CrossRef] [PubMed]
Giudici, P.; Green, P.J. Decomposable graphical Gaussian model determination. Biometrika 1999, 86, 785–801. [Google Scholar] [CrossRef]
Madigan, D.; Raftrey, A.E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Stat. Assoc. 1994, 89, 1535–1546. [Google Scholar] [CrossRef]
Byrne, S.; Dawid, A.P. Structural Markov graph laws for Bayesian model uncertainty. Ann. Stat. 2015, 43, 1647–1681. [Google Scholar] [CrossRef]
Li, B.C. Support condition for equivalent characterization of graph laws. Sci. Sin. Math. 2022, 52, 467–474. [Google Scholar]
Dawid, A.P.; Lauritzen, S.L. Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 1993, 21, 1272–1317. [Google Scholar] [CrossRef]
Green, P.J.; Thomas, A. A structural Markov property for decomposable graph laws that allows control of clique intersections. Biometrika 2018, 105, 19–29. [Google Scholar] [CrossRef]
Leimer, H.G. Optimal decomposition by clique separators. Discret. Math. 1993, 113, 99–123. [Google Scholar] [CrossRef]
Dawid, A.P. Conditional independence in statistical theory. J. R. Stat. Soc. B. 1979, 41, 1–15. [Google Scholar] [CrossRef]
Dawid, A.P. Conditional independence for statistical operations. Ann. Stat. 1980, 8, 598–617. [Google Scholar] [CrossRef]
Meek, C. Strong Completeness and Faithfulness in Bayesian Networks; Morgan Kaufmann: San Francisco, CA, USA, 1995. [Google Scholar]
Hoff, P.D. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 2007, 23, 103–122. [Google Scholar]
Frydennberg, M.; Lauritzen, S.L. Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 1989, 76, 539–555. [Google Scholar] [CrossRef]
Asmussen, S.; Edwards, D. Collapsibility and response variables in contingency tables. Biometrika 1983, 70, 567–578. [Google Scholar] [CrossRef]
Wang, X.F.; Guo, J.H. Junction trees of general graphs. Front. Math. China 2008, 3, 399–413. [Google Scholar] [CrossRef]

Figure 1.

G_{1}

is a prime graph and

G_{2}

is a reducible graph.

Figure 1.

G_{1}

is a prime graph and

G_{2}

is a reducible graph.

Figure 2. A prime decomposition for an undirected graph G.

Figure 3. A representation of the structural Markov property for non-decomposable undirected graphs:

A \cap B

is complete and separates A from B.

Figure 3. A representation of the structural Markov property for non-decomposable undirected graphs:

A \cap B

is complete and separates A from B.

Figure 4.

A \cap B

separates A from B while

A \cap B

is incomplete.

Figure 4.

A \cap B

separates A from B while

A \cap B

is incomplete.

Figure 5. The figure in the left is the estimated posterior probabilities of the size of the graphs. The figure in the right is the estimated posterior probabilities of all visited graphs.

Figure 6. The figure is the inferred graph with the highest posterior probability.

Figure 7.

G^{'}

is obtained from G by removing the edge

(u, v)

.

Figure 7.

G^{'}

is obtained from G by removing the edge

(u, v)

.

Figure 8.

G^{'}

is obtained from G by adding the edge

(u, v)

.

Figure 8.

G^{'}

is obtained from G by adding the edge

(u, v)

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.