Abstract
This paper generalizes the structural Markov properties for undirected decomposable graphs to arbitrary ones. This helps us to exploit the conditional independence properties of joint prior laws to analyze and compare multiple graphical structures, while being able to take advantage of the common conditional independence constraints. This work provides a theoretical support for full Bayesian posterior updating about the structure of a graph using data from a certain distribution. We further investigate the ratio of graph law so as to simplify the acceptance probability of the Metropolis–Hastings sampling algorithms.
MSC:
62H05; 62D05
1. Introduction
A probabilistic graphical model (PGM) or a structured probabilistic model (SPM) is a statistical model that consists of a graph and a distribution family for which the graph encodes the conditional independence information between random variables. Such models always associate with independence models, arise naturally in multivariate analysis and can provide certain versatility and convenience in analyzing complex data with large scales, while independence models are the sets of conditional independence constraints encoded by graphs via the global Markov property.
It is known that different classes of graphs with different interpretations of independence have been developed in the past decades, and the reader can refer to [1,2,3,4] for details. One of the most important classes of graphs in graphical models is undirected graphs (UGs). Their corresponding Markov models are often known as undirected graphical models or Markov networks [1,2]. These models have been found to have many applications in a wide range of areas such as econometrics, medical science, artificial intelligence [5,6,7] and so on. Our research in this paper is related to the work in the area of the structure determination of these models with the Bayesian method.
The main objective of Bayesian structure learning is to learn the structure of a graph from data. Meanwhile, Bayesian structure learning requires a clear illustration of a prior distribution about graphical structures, which is termed as a graph law. Statisticians have proposed some approaches to calculate the prior law of a graph. The simplest graph law is the uniform distribution in [8]. Additionally, the Erdös-Rényi random graph model is also used to indicate the graph law in [9]. Furthermore, a characterization of graph law with the form of exponential family is proposed by [10]. However, how to simplify this prior law is a significant task for us, especially in the posterior inference of graphical structures. In view of this, the structural Markov property is first proposed for the purpose of characterizing the conditional independence of the structure of a graph. The structural Markov properties require that the structures of distinct components of graphs are conditionally independent given the existence of a separating component; see [10]. These properties reflect the conditional independence at the structural level. It has been proved that a graph law is structural Markov if and only if it is a member of the clique exponential family given the support condition as the set of decomposable undirected graphs; see [10]. Further, a weaker support condition of equivalent characterization for graph laws is given via closure operation of graphical structures in [11].
Indeed, the structural Markov property is an extension of the hyper Markov property, which was proposed in [12] and reflects the global Markov property at the parameter level. These hyper Markov properties are used to describe the conditional independence properties of a distribution of random variables or statistical quantities in graphical models. The hyper Markov laws arise naturally as sampling distributions of maximum likelihood estimators and as prior or posterior distributions in Bayesian inference.
Recently, a weaker version of the structural Markov properties for decomposable graphs was introduced in [13], where the authors provided an analogous clique-separator factorization for the graph law. These weakly structural Markov properties require that the separator is complete. It has been shown that this provides a more flexible family of graph prior laws to use in full Bayesian posterior updating.
It should be pointed out that all the work in [8,10,13] only focuses on decomposable graphical models. However, based on conditional independence and graphical separation, the structural Markov properties might be extended to non-decomposable undirected graphical models. The aim of this paper is filling this gap in the field of graphical models. Further, we focus on a full Bayesian method for the posterior updating of graph laws via the observed data from a certain distribution, and we also prove that this full Bayesian posterior of graph law is feasible and reasonable. Finally, as examples, we illustrate our theory with detailed investigations of two significant cases based on the graphical Gaussian models and the multinomial models, respectively.
The outline of this paper is organized as follows. In Section 2, we introduce the terminologies and conceptions used in this paper. Section 3 first investigates the structural Markov properties for non-decomposable graphs, and then exploits the joint prior laws of a random sample distribution for full Bayesian inference. Section 4 gives two examples such as the inverse Wishart distribution and the Dirichlet distribution to study the posterior updating of graph laws in details. Further, we discuss some details about the computation for the structural Markov graph laws in Section 5. Finally, in Section 6, we give the conclusion of this paper.
2. Preliminaries
For terms and symbols, we follow the references [10,12] as many theoretical frameworks of this paper are constructed and developed based on them. Several concrete notions and terminologies used in this paper will be given in the following for clarity and consistency.
2.1. Graphical Terminologies and Notation
A graph consists of a finite set of vertices and the set of edges . An edge of G is said to be undirected if is an unordered pair. A graph G is said to be an undirected graph if all its edges are undirected. Unless otherwise specified, here G is always assumed to be undirected, simple and connected throughout the paper.
For , an induced subgraph of G on A will be denoted by , where . All subgraphs in this paper are induced subgraphs. A is complete (or a clique) if any two different vertices are adjacent, i.e., . A graph is a clique if its vertex set is a clique. For , a clique is a maximal clique if is incomplete for any superset . Two vertices u and v are considered to be neighbors if . For , the boundary is the set of vertices in that are neighbors of vertices in A. G can be collapsible onto A if every connected component of has a complete boundary in G.
For any subsets A, B and C of , we say that C separates A from B, and write , if any path in G between some and contains a vertex in C. Usually, we call C a separator of A and B. Separators that are cliques are called clique separators.
For any disjoint subsets A, B and S of , we say forms a decomposition if (i) ; (ii) , and (iii) S is a clique separator in G. A decomposition is said to be proper if the sets and are both proper subsets of .
Definition 1
([14]). Let be an undirected graph. A graph G is reducible if its vertex set contains a clique separator, otherwise G is said to be prime. E.g., G is prime if G is a clique, while G is reducible if G is a disconnected graph. An induced subgraph is a maximal prime subgraph of G if it satisfies
- (i)
- is prime, and
- (ii)
- s.t. , is reducible.
In Figure 1, it is easy to find that is prime since there is no clique separator in . However, is reducible because of a clique separator in .
Figure 1.
is a prime graph and is a reducible graph.
Definition 2
([14]). A proper decomposition of an undirected graph G is stated to form a prime decomposition if and are prime, or and can be recursively decomposed into pairwise different maximal prime subgraphs of G.
In particular, G is decomposable if and are complete, or they are both decomposable subgraphs of G. Note that the prime decomposition of arbitrary undirected graphs is a generalization of that of chordal graphs. For instance, in Figure 2, G is a non-decomposable undirected graph with , which involves two maximal prime subgraphs and , with and , respectively, and a clique separator . It is obvious that forms a prime decomposition of G. Additionally, we find that is complete since all its pairs of vertices are joined, while is incomplete because the vertices between b and d, or c and e, are not joined.
Figure 2.
A prime decomposition for an undirected graph G.
It is worthwhile to point out that all the maximal prime subgraphs of an undirected graph can form a perfect sequence in a certain way. If there exists a proper decomposition of an undirected graph G, then G admits a perfect sequence of maximal prime subgraphs, so that for each , there exists some , and we have
where are clique separators actually. Specifically, G is decomposable if its all maximal prime subgraphs are complete (cliques).
In a PGM, a vertex v denotes a random variable , which takes values in a space . Let be a p-dimensional random vector on some product space with P or representing its distribution. All the concerned distributions in the present paper are assumed to be positive and closed under marginalization and conditioning with respect to the type of a joint distribution family. For the sake of simplicity, we use to represent the set of all positive distributions over X. For , will denote the marginal distribution of and the conditional distribution of given .
Let be the set of undirected graphs with fixed vertex set . A probability distribution of a random graph G, which takes values in , is said to be a law, denoted by . Further, define to be the set of undirected graphs for which is a prime decomposition.
2.2. Independence Model and Collapsibility
Given a finite set N, for , an independence model, denoted by , is the set of triplets of the form , which are termed as conditional independence statements. A graphical independence model is an independence model induced by a graph. For a graph , the graphical independence model of G can be defined as
Obviously, is the set of triples , encoding its global Markov property over G.
It should be pointed out that the conditional independence of a statistical model in [15,16] shares the same properties of graph separation in [2], i.e., for a graphical independence model , it has the following properties:
- for all , , and ;
- if , then ;
- if , and , then ;
- if , and , then ;
- if , and , then .
In particular, the following property holds when are disjoint.
If and , then .
Further, a graphical independence model has a natural projection operation on that
It is worthwhile to point out that , where is the independence model induced by the induced subgraph .
Definition 3 (CI-collapsibility).
Let G be a fixed undirected graph in . For , can be conditional independence collapsible (CI-collapsible) onto D if .
CI-collapsibility reflects the consistence of conditional independence relations induced by and those induced by G, but constrained on D.
We say a distribution P is Markov with respect to G if for , it holds that
where represents the assertion that is independent of given under P.
In order to ensure that various distributions and those of statistical quantities are Markov with respect to G, we now are in a position to review the graphical models within the framework of undirected graphs. A graphical model, denoted by , is a statistical model such that
For the Markov distribution family , we say that it is faithful to G if there exists a distribution such that , where
All the graphical models concerned throughout this paper are assumed to be faithful to G. Such an assumption is called “Faithfulness Assumption” [17]. In fact, this assumption is broad and mild since Gaussian distribution families and multinomial distribution families satisfy the faithfulness assumption.
Moreover, a statistical model also admits a natural projection operation on , denoted by , which is defined as follows:
Generally, is not equal to , but it is obviously shown that .
Definition 4 (M-collapsibility).
Let G be a fixed undirected graph in . For , can be model collapsible (M-collapsible) onto D if .
M-collapsibility indicates that the marginal distribution family is identical to the distribution family induced by .
Theorem 1.
Let G be a fixed undirected graph in and . Then, the following statements are equivalent.
- 1.
- G is graphical collapsible onto D;
- 2.
- is CI-collapsible onto D;
- 3.
- is M-collapsible onto D.
Proof.
See Appendix A. □
Let denote the histories set for each . By Theorem 1, we can obtain the following result.
Proposition 1.
Let G be a fixed graph in and G has a perfect sequence of maximal prime subgraphs. Then, the following statements hold for each .
- 1.
- G can be graphical collapsible onto;
- 2.
- ;
- 3.
- .
Proof.
This can be easily obtained from the meaning of collapsibility and Theorem 1. □
3. Structural Markov Graph Laws for Full Bayesian Inference
3.1. Basic Concepts and Properties
We begin with the definition of the structural Markov property of [10].
Definition 5.
A graph law over is structural Markov if
where and is the set of undirected graphs for which is a prime decomposition.
Specifically, if G is decomposable in , Definition 5 degenerates to that defined in [10].
The structural Markov property indicates that the structures of different induced subgraphs are conditionally independent when the event happens; see Figure 3 as an illustration.
Figure 3.
A representation of the structural Markov property for non-decomposable undirected graphs: is complete and separates A from B.
Proposition 2.
Let G be a fixed undirected graph in . For any subsets A, B and S of satisfying , if is structural Markov, then
whenever S is complete and separates A from B in G.
Proof.
By Definition 5, the existence of the remaining edges in is independent of those in since S is complete and separates A from B in G. Therefore, we are naturally left with a statement of marginal independence since the term is redundant. Hence, the result follows. □
Proposition 2 indicates that different components of undirected graphs are conditionally independent provided that their corresponding separators are complete. In order to illustrate our results with detailed investigations, we give a non-decomposable graph G in which separates A from B while is incomplete in Figure 4. We can easily find that the two subgraphs and have possible common edges in , which make the existence of the remaining edges in dependent of those in . In other words, these dependencies will disappear as long as is complete.
Figure 4.
separates A from B while is incomplete.
It also implies that an arbitrary undirected graph can be denoted by the graph product of its induced subgraphs as
The structural Markov property can be well-characterized by the above operation.
Proposition 3.
Let π be the density of a graph law with respect to the counting measure on . Suppose that . Then,
- 1.
- and ;
- 2.
- if is structural Markov on , then
Proof.
See Appendix A. □
For any subset , define to be the graph on A such that is complete in C and empty otherwise.
Proposition 4.
Let G be a fixed graph in and G has a perfect sequence of maximal prime subgraphs. If G has a structural Markov graph law with the density π, then the density π can be factorized as
Proof.
See Appendix A. □
3.2. Joint Distribution Law
In this section, we will investigate how the structural Markov laws interact with the hyper Markov laws when they are considered as the joint prior laws.
Hyper Markov laws are motivated by the property that graph decomposition allows one to decompose a prior or posterior distribution into the product of marginal distributions on corresponding maximal prime subgraphs. For a fixed graph , any prior or posterior distribution of is uniquely characterized by its marginals and , taking values in and , respectively.
Following [12], to be specific, a probability distribution of a random distribution , which takes values in , is said to be a law, denoted by . For , the marginal law of will be denoted by and will denote the conditional law of .
Here, we give the definitions of weak and strong hyper Markov properties.
Definition 6
([12], Weak and strong hyper Markov). Suppose that G is a fixed graph in and . Let be a law of θ. We say that is weak hyper Markov over G if
Further, we say thatis strong hyper Markov over G if
Let X be a random sample from . The conditional independence property of the joint distribution law for the pair on can be characterized as follows.
Proposition 5.
Let G be a fixed undirected graph in with a prime decomposition . X is a random sample from . Then, the joint distribution law of satisfies:
- 1.
- if is weak hyper Markov with respect to G, then
- 2.
- if is strong hyper Markov, then
Proof.
See Appendix A. □
It is worth mentioning that the hyper Markov property does not hold for the cases where separators are not complete. For instance, the graph in Figure 1 is incomplete, and we do not have or . However, it is worthwhile to point out that the corresponding pairwise Markov property or holds under P if P is Markov with respect to .
Let be the family of Markov distributions over and the family of hyper Markov laws over . For the sake of discussion, it is necessary for us to reconsider the notion of hyper compatibility, which was first proposed by [10], to characterize families of laws for every graph.
Definition 7 (Hyper compatibility).
Let be the laws of with respect to G and on , respectively. For , we say is hyper compatible on if whenever are collapsible onto A and .
Here is always assumed to be hyper compatible over . Based on the arguments above, some significant conditional independence properties of such joint law can be investigated as the following.
Proposition 6.
Suppose that G has a graph law over . If θ has a law from a hyper compatible family over , then
Proof.
Suppose that . Since G is collapsible onto both and , by hyper compatibility, can only take values in for any . □
Theorem 2.
Suppose that is structural Markov over . For any ,
- 1.
- if is weak hyper Markov, then
- 2.
- if is strong hyper Markov, then
Proof.
See Appendix A. □
Theorem 2 reflects the conditional independence properties at both parameter and structural level.
Further, for any , let X be a random sample from a distribution on . If G is assigned the prior law and is assigned the prior law , then a joint distribution law is thereby created for .
Proposition 7.
Suppose that is structural Markov on . Let X be a random sample from . For ,
- 1.
- if is weak hyper Markov, then
- 2.
- if is strong hyper Markov, then
Proof.
See Appendix A. □
The conditional independence property of any such joint distribution law of can be characterized as follows.
Theorem 3.
Suppose that is structural Markov on . Let X be a random sample from on . For ,
- 1.
- if is weak hyper Markov, then
- 2.
- if is strong hyper Markov, then
Proof.
See Appendix A. □
Theorem 3 reflects that a random sample can be determined by both hyper and structural parameters, which will play a significant role in full Bayesian inference.
Corollary 1.
Suppose that is structural Markov on . Let X be a random sample from on . For ,
- 1.
- if is weak hyper Markov, then
- 2.
- if is strong hyper Markov, then
Proof.
It can be easily obtained from Theorem 3. □
Corollary 1 can be considered as a generalization of Proposition 5 since G is a random undirected graph on with a prime decomposition . Without loss of generality, when the event happens, i.e., given a graph G with a prime decomposition , we can deduce from Corollary 1 that
3.3. Posterior Updating for Graph Law
Our research in this section aims to identify the structure of models via the Bayesian approach. Based on our results in Section 3.2, in the following, we will use data from a certain distribution to learn the structure of a graph.
We assume that G has a structural Markov graph law over . For , let have a law from a hyper compatible family . Let denote a random sample of n observations from . If we focus on the density of posterior graph law with its conjugated prior graph law , then the full Bayesian posterior graph law follows:
where Z is a normalizing constant and is a hyperparameter that characterizes the law of . In general, it is hardly for us to estimate the structure of a graph G since the hyper parameter is unknown.
In the following, we investigate the properties of structural Markov laws when used as priors for models.
Proposition 8.
If the prior graph law is structural Markov on , then the posterior graph law, obtained by conditioning on data , is structural Markov on .
Proof.
By the conditional independence and Theorem 3, we can easily find that
□
Proposition 9.
Assume that the prior graph law is structural Markov and is strong hyper Markov on . Then, the following properties hold:
- 1.
- The posterior graph law obtained by conditioning on data is structural Markov with respect to ;
- 2.
- The marginal data distribution of is Markov with respect to ;
- 3.
- The posterior law of θ conditioning on is Markov with respect to .
Proof.
By the conditional independence and Theorem 3, we have
This implies (i).
To prove (ii), by the conditional independence and Theorem 3, we have
In particular, if G is given from , then
From Theorem 3, we have
which implies (iii). □
Our Bayesian approaches call for a strong hyper Markov prior law on with respect to . By Proposition 9, the posterior law of , given G, has a density ℓ of the following form:
where is the set of maximal prime subgraphs of G and is the set of corresponding clique separators.
If is structural Markov and is strong hyper Markov with respect to G, then the posterior graph law of G will be given by
It is worthwhile to point out that (3) indicates that the posterior graph law of G will preserve the structural Markov property under the hyper compatible laws. This result coincides with Proposition 8. Further, this updating may be performed locally by (3), which implies that the posterior graph laws on each maximal prime subgraphs of G are only dependent of the posterior of hyper compatible laws on the maximal prime subgraph.
4. Two Special Cases
4.1. Graphical Gaussian Models and the Inverse Wishart Law
A graphical Gaussian model is defined by a p-dimensional multivariate Gaussian distribution with the expected value and covariance matrix , i.e.,
For simplicity, we assume that the model has zero mean in the following. Define to be the precision matrix of G, where
where denotes the set of positive definite matrices. For any matrix , will denote the matrix obtained by . It has been shown that the global, local and pairwise Markov properties are equivalent in graphical Gaussian models; see [2]. We therefore conclude that the graphical Gaussian distribution P is Markov with respect to G if and only if
Let be observations of sample matrix , a random sample of size n from the graphical Gaussian distribution , and let denote the observed sum-of-products matrix. Then, for any ,
where is the cardinality of U, and is the determinant of . It is similar for , .
The inverse Wishart distribution is also termed inverse Wishart law, denoted by . It is as the prior for the graphical Gaussian distribution . Conditioning on (4), has a hyper inverse Wishart prior law, denoted by . The marginal density is of the form
It is already shown in [12] that the hyper inverse Wishart law satisfies the strong hyper Markov property, which would allow us to compute the posterior updating of by the margins of maximal prime subgraphs of the graph G. That is, for any ,
with the density
We conclude that
4.2. Multinomial Models and the Dirichlet Law
Suppose that all the variables are discrete-valued. Let denote the contingency table by , where is a finite set for each . An element is referred to as a cell in this table. Based on this, will take value in finite sets . Indeed, I is a discrete-valued random vector whose distribution is assumed to be Markov with respect to G. Then,
where and .
Let be observations of , a random sample from . is an matrix where each row denotes an observation of I. The distribution of is the multinomial distribution with index n and probabilities , denoted by . Then, the likelihood function has the form
where , , and counts the number of elements of from the marginal cell . It is similar for , .
The Dirichlet distribution is also termed Dirichlet law, denoted by , where are hyper parameters. It is used as the prior for multinomial distribution . It is shown that the Dirichlet law satisfies strong hyper Markov property; see [12]. Thus, we have
and then the posterior law can be written as
Based on the above arguments, we can conclude that . Further, if we assign a prior law of form (1) for G, by Proposition 8, the posterior graph law of G, given data obtained from , has density in the following way:
4.3. An Example on Simulated Data
4.3.1. Dataset Description
In this section, we present the results for one application to a real dataset. We analyze a labor force survey dataset, which is available from [18]. This dataset is used to analyze the multivariate associations among income, education and family background on 1002 males in the American labor force. Here, we briefly describe these variables in this dataset.
- inc: The income of the respondents.
- deg: Tespondents’ highest educational degree.
- chi: The number of children of the respondents.
- pin: The income of the respondents’ parents.
- pde: The highest educational degree of respondents’ parents.
- pch: The number of children of respondents’ parents.
- age: Respondents’ age in years.
4.3.2. Experiments and Results
We consider the posterior graph law of G in Equation (5), a Gibbs sampler can then be formed by using the following conditional posteriors:
- ;
- .
For the prior graph law of G, following from Example 3.5 in [10], we consider an Erdös-Rényi random graph model prior on each edge with
where the parameter is a prior probability of existing edges. In this case, we set . We use the inverse Wishart law ) as a prior for the covariance matrix over the graph G, with and as an identity matrix here.
By using the function above, we simulate observations. The experimental results are implemented by R package for 5000 iterations with 2500 as burn-in as follows:
The experimental results on this dataset are displayed in Figure 5 and Figure 6. The estimated posterior probabilities of the size of the graphs are shown in the left of Figure 5, which shows that our algorithm mainly visits graphs with sizes between nine and twelve edges. The figure on the right exhibits the estimated posterior probabilities of all visited graphs with various sizes, and also shows that more than 15 different graphs are visited. The graph in Figure 6 is the selected graph with the highest posterior probability from these visited graphs.
Figure 5.
The figure in the left is the estimated posterior probabilities of the size of the graphs. The figure in the right is the estimated posterior probabilities of all visited graphs.
Figure 6.
The figure is the inferred graph with the highest posterior probability.
The results also suggest that the respondents’ income has relationships with their own education and age. It is also shown that the income of respondents’ parents is only related to their education.
5. Computations
In this section, we aim to design an algorithm to take samples that we are interested in, such as decomposable undirected graphs, from the structural Markov graph law on .
5.1. Ratio for Graph Law
Model comparison plays an important role in statistical analysis, especially in solving the problem of the ratio of distributions of variables in different states. We consider a graph itself as a random variable into the construction of this ratio between two undirected graphs and G, where is obtained from G by removing or adding one edge. This ratio can be written as
The main objective of this next section is to greatly simplify this complex calculation under the assumption that the graph law is structural Markov on . For the sake of convenience, we define and for and .
In Figure 7, it is a special case where is obtained from G by removing the edge , which is exactly in one prime component of G.
Figure 7.
is obtained from G by removing the edge .
Proposition 10.
Let G be a fixed graph in and G has a perfect sequence of maximal prime subgraphs. Suppose that is obtained from G by removing the edge . Then,
- 1.
- if u and v are contained in exactly one maximal prime subgraph of G, then
- 2.
- if u and v are contained in both two neighboring maximal prime subgraphs of G, thenwherein G.
Proof.
See Appendix A □
In Figure 8, it is a certain case where is obtained from G by adding the edge within two neighboring prime components and of G such that and .
Figure 8.
is obtained from G by adding the edge .
Proposition 11.
Let G be a fixed graph in and G has a perfect sequence of maximal prime subgraphs. Suppose that is obtained from G by adding the edge . Then,
- 1.
- if u and v are contained in exactly one incomplete prime subgraph , then
- 2.
- if and are the two distinct maximal prime subgraphs of G, then there are some prime components such thatwhere.
Proof.
See Appendix A. □
In particular, if G is a decomposable graph in , then we have the following results.
Lemma 1
([19]). Let G be a decomposable graph in and G has a perfect cliques sequence . Suppose that is decomposable, obtained from G by removing or adding one edge . Then,
- 1.
- If is obtained from G by removing the edge , then u and v must belong to a clique of G;
- 2.
- If is obtained from G by adding the edge , then there exist two different cliques and such that is complete and separates and .
Corollary 2.
Let G be a decomposable graph in and G has a perfect cliques sequence . Suppose that is decomposable, obtained from G by removing or adding one edge . Then,
- 1.
- If is obtained from G by removing the edge within , thenwhere, and ;
- 2.
- If is obtained from G by adding the edge such that and , then the ratio iswhere,,and.
Proof.
We first give the proof of 1. If and , by Lemma 1, the deleted edge must belong to a single clique . It is worthwhile to point out that all of , and are cliques in G and . Then,
which combines with (A19) gives the result. The proof of 2 is similar. □
5.2. Sampling Decomposable Graphs from Structural Markov Graph Laws
We now take a random graph on as the initial state to design the Markov chain Monte Carlo (MCMC) sampler for sampling from a structural Markov graph law. This technique relies on small perturbations to the edge set of a graph, indicating that one edge could be removed or added.
A reversible jump MCMC sampler is introduced for posterior sampling of decomposable graphical models, which relies on making single edge additions and removals; see [8]. We now use this jump MCMC methodology for our sample from structural Markov law in further details.
Let G denote a state variable and the destination variable where is obtained from a random graph G by removing or adding one edge, and so G would take the chain to the destination with probability , which ensures detailed balance with respect to the target distribution . Then, the Metropolis–Hastings acceptance ratio can be written as
In fact, the Equation (9) is not the only choice yielding detailed balance. In particular, in order to reduce the error caused by excessive proportion, we can make the following adjustment that
In general, since the proposal kernel, which we will set as symmetric, that is, . Consequently, it is indicated that the acceptance probability is only dependent on the relative densities, which will only require us to compute
We randomly select a pair of vertices . If , then it is removed. If , then it is added. Let denote the graph, which is obtained from G by adding the edge , and similarly for . Let denote the state of G at time t and let be the set of decomposable undirected graphs with vertex set . We begin with an ER random graph as its initial state, and then a Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law can be constructed in the following Algorithm 1:
| Algorithm 1 A Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law. |
| Input: An ER random graph . Output: A set of decomposable graph from . Set for do if and then set with probability else if and then set with probability else end if end for return A set of decomposable graphs. |
Based on our results in Section 5.1, this algorithm implies that the acceptance probability can be obtained by only evaluating the marginal likelihood of corresponding subsets of at each step when sampling from a posterior graph law in Proposition 8 or Proposition 9.
6. Conclusions
The main contribution of this paper is to define the structural Markov properties of [10] for non-decomposable undirected graphs. It is shown that an arbitrary undirected graph can be primely decomposed into the sum of several prime subgraphs. Based on the prime decomposition of undirected graphs and conditional independence, the structural Markov properties can be naturally extended to arbitrary undirected graphs.
Then, we propose a full Bayesian method for estimating the structure of a graph. This method requires that our observed data are from a certain distribution. By using our results, we have shown that the computation of posterior updating of graph law can be determined by the prime components margins, which would make the computation of the posterior graph law greatly simplified.
It should be pointed that all our research only focuses on undirected graphs. However, other classes of graphs, such as chain graphs or ancestral graphs, may have more interesting and valuable properties that can reflect the conditional independence of the graph structure in the problem of models determination. In the future, we will detail them at length.
Author Contributions
Methodology, Y.S.; Validation, X.K.; Writing—review & editing, Y.H. All authors have read and agreed to the published version of the manuscript.
Funding
This work was partially supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Grant No. 2022D01C406), the National Natural Science Foundation of China (Grant Nos. 11861064, 11726629, 11726630) and the National Key Laboratory for Applied Statistics of MOE, Northeast Normal University (Grant No. 130028906).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proofs of Some Main Theorems and Propositions
Proof of Theorem 1.
The equivalence of (i) and (ii) can be implied by [Corollary 2.5] [20]. So, it suffices to show that (ii) ⇔ (iii). We first give the proof of (ii) ⇒ (iii). Firstly, we know that
By the meaning of , we define
For , let . By CI-collapsibility, we have
So, we implied that by (A1) and (A2). From which, it follows that . Hence, the result follows by . Conversely, under the “Faithfulness Assumption”, there is a such that , implying . By M-collapsibility, we know that , which gives . Hence, we have . The result follows since it is easy to obtain that . □
Proof of Proposition 3.
By the graph product operation, since S is complete and separates A from B, then for any , is a prime decomposition of the graph with vertex set , and so is . They imply that (i) holds. As for (ii), if is structural Markov on , then
and similarly we can have the same result for . From (i), we have
and so is . Thus, our results by the above arguments follow. □
Proof of Proposition 4.
Let , , . Since forms a prime decomposition of , for each , we have
For , since is complete, then
Whence we have
By Proposition 3, we can obtain
The Equation (1) can be obtained recursively. □
Proof of Proposition 5.
Suppose that forms a prime decomposition of G. Since G can be graphical collapsible onto , by Theorem 1, only takes values in . This implies that can be obtained from actually. Then, we obtain
From (A3), we deduce
By the meaning of the hyper Markov property, it follows that . Combing this with (A4) and the axioms of conditional independence gives
which implies that
Proof of Theorem 2.
The weak hyper Markov property states that
Since , then . Thus, from (A7) we deduce
Again, by Proposition 6 and the structural Markov property, we have
Then, we have
Proof of Proposition 7.
By Theorem 2, we obtain
From (A11), we deduce
Whence we have
By conditional independence property and Theorem 2,
Thus, we have
which combines with (A12) to give the result. The proof of the strong case is similar, so we omit it for simplicity. □
Proof of Theorem 3.
Since X is a random sample from and is hyper Markov with respect to G, then by Proposition 5,
Since , . Then, from (A14) we can find that
Additionally, from the structurally Markov property and Proposition 7, we have
From (A17), we deduce
Proof of Proposition 10.
We first give the proof of (i). Suppose that . If is structural Markov on , then we have
The proof of (ii) is given as follows. It is obvious that is prime in . Consequently, has a perfect maximal prime subgraphs sequence , and then the Equation (7) follows by using (i). □
Proof of Proposition 11.
The proof of (i) follows similar steps to that of Proposition 10. To give the proof of (ii), let be the junction tree with vertices being all maximal prime subgraphs of G. The construction of can be referred to [21]. Since are in two different maximal prime subgraphs of G, we then connect the and in . Then, we will obtain a unique cycle. Without a loss of generality, the vertices on this cycle are denoted by , where and are connected by an edge in . Then, it is easy to see that is the set of all the maximal prime subgraphs of . So, by applying (i), Equation (8) follows. □
References
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Lauritzen, S.L. Graphical Models; Oxford University Press: New York, NY, USA, 1996. [Google Scholar]
- Richardson, T. A factorization criterion for acyclic directed mixed graphs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009. [Google Scholar]
- Richardson, T.; Spirtes, P. Ancestral graph Markov models. Ann. Stat. 2002, 30, 962–1030. [Google Scholar] [CrossRef]
- Iqbal, K.; Buijsse, B.; Wirth, J. Gaussian Graphical Models Identify Networks of Dietary Intake in a German Adult Population. J. Nutr. Off. Organ Am. Inst. Nutr. 2016, 146, 646–652. [Google Scholar] [CrossRef] [PubMed]
- Larranaga, P.; Moral, S. Probabilistic graphical models in artificial intelligence. Appl. Soft Comput. 2011, 11, 1511–1528. [Google Scholar] [CrossRef]
- Verzilli, C.J.; Stallard, N.; Whittaker, J.C. Bayesian graphical models for genomewide association studies. Am. J. Hum. Genet. 2006, 79, 100–112. [Google Scholar] [CrossRef] [PubMed]
- Giudici, P.; Green, P.J. Decomposable graphical Gaussian model determination. Biometrika 1999, 86, 785–801. [Google Scholar] [CrossRef]
- Madigan, D.; Raftrey, A.E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Stat. Assoc. 1994, 89, 1535–1546. [Google Scholar] [CrossRef]
- Byrne, S.; Dawid, A.P. Structural Markov graph laws for Bayesian model uncertainty. Ann. Stat. 2015, 43, 1647–1681. [Google Scholar] [CrossRef]
- Li, B.C. Support condition for equivalent characterization of graph laws. Sci. Sin. Math. 2022, 52, 467–474. [Google Scholar]
- Dawid, A.P.; Lauritzen, S.L. Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 1993, 21, 1272–1317. [Google Scholar] [CrossRef]
- Green, P.J.; Thomas, A. A structural Markov property for decomposable graph laws that allows control of clique intersections. Biometrika 2018, 105, 19–29. [Google Scholar] [CrossRef]
- Leimer, H.G. Optimal decomposition by clique separators. Discret. Math. 1993, 113, 99–123. [Google Scholar] [CrossRef]
- Dawid, A.P. Conditional independence in statistical theory. J. R. Stat. Soc. B. 1979, 41, 1–15. [Google Scholar] [CrossRef]
- Dawid, A.P. Conditional independence for statistical operations. Ann. Stat. 1980, 8, 598–617. [Google Scholar] [CrossRef]
- Meek, C. Strong Completeness and Faithfulness in Bayesian Networks; Morgan Kaufmann: San Francisco, CA, USA, 1995. [Google Scholar]
- Hoff, P.D. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 2007, 23, 103–122. [Google Scholar]
- Frydennberg, M.; Lauritzen, S.L. Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 1989, 76, 539–555. [Google Scholar] [CrossRef]
- Asmussen, S.; Edwards, D. Collapsibility and response variables in contingency tables. Biometrika 1983, 70, 567–578. [Google Scholar] [CrossRef]
- Wang, X.F.; Guo, J.H. Junction trees of general graphs. Front. Math. China 2008, 3, 399–413. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).







