1. Introduction
The marginal problem consists of computing the consequences of a set of propositional formulae in a reduced subset of variables. The basic algorithm to solve it has been the so-called Davis and Putnam (DP) [
1] deletion algorithm. This algorithm is a particular case of the Shenoy–Shafer deletion algorithm [
2,
3] or the bucket elimination scheme [
4,
5]. The problem of this algorithm is the space complexity (it tends to produce too many clauses of large size). The time complexity is exponential in the tree-width of its connectivity graph [
4]. The computations can be organized in a join tree [
6] in the same way as probabilistic computations in Bayesian networks [
2,
7]. However, in the case of propositional logic, we are in a case of computation in an idempotent valuation system or valuation algebra [
3,
8], which has some special features which can be exploited in order to build efficient algorithms.
This paper proposes new algorithms for the marginal problem by applying the Shenoy–Shafer abstract framework [
2], but it has some differences compared to the DP algorithm. First, it represents sets of clauses as Boolean arrays. This is a semantic representation of the set of true assignments satisfying a given set of clauses, i.e., a truth table. Boolean arrays can be a very efficient representation of a large set of clauses, as the values on the array are simple 0–1 values, though its size is exponential in the number of variables. We define the basic operations and show that the set of Boolean arrays with these operations has the structure of an information algebra [
3]. Secondly, we also give alternative deletion procedures and a set of possible optimizations of the full procedure. In this sense, a very important contribution is the study of the cases in which a variable is functionally determined by a set of variables and the exploitation of this fact in the marginalization algorithms. The Boolean array representation is especially appropriate for this improvement.
The final result is a family of tools and algorithms that can solve moderate-sized problems, even in cases where the associated connectivity graph has a large tree width (more than 100), as is shown in the experiments.
A related problem is the satisfiability problem (SAT) [
9], which consists of determining if a set of propositional formulae is satisfiable, i.e., there is a true assignment such that every formula becomes true. It is the first problem proven to be NP-complete [
10]. This implies that any NP problem can be efficiently reduced to it, and therefore, any good algorithm to solve the SAT could also be used for such an NP problem. In fact, many well-known problems are solved nowadays by encoding them as an SAT [
11,
12,
13]. Current approaches for SATs are mainly based on the Davis–Putnam–Logemann–Loveland backtracking algorithm (DPLL) [
14], and its successors, conflict-driven clause-learning algorithms (CDCL) [
15]. The SAT problem can be solved from the marginal problem, by deleting all the variables (marginalizing over the empty set). The result is consistent if and only if this marginalization is vacuous. However, the marginal problem can be used without deleting all the variables for computing the marginal in a given set and it can be used to compute all the solutions (configurations satisfying all the clauses) or to simulate in the space of solutions, in the case of a satisfiable problem [
6]. As is said in [
5], the marginal problem is a kind of knowledge compilation that can be useful in many other related problems. Furthermore, in a recent survey of SAT-related problems [
16], the following was noted: “We see that DP responds to a more difficult problem than the simple problem of satisfiability. Except for a few specific problems of small induced width, DP-60 never really competed with the version of DP-62” (where DP-60 is what we have called DP and DP-62 makes reference to DPLL). For this reason, the basic deletion algorithm has received little attention in the literature. For example, in Knuth’s book [
17], covering the existing approaches for SAT, there is only a short reference to the DP approach, saying that it works well for small problems, but that it can be very inefficient in the worst case. Our paper will show that it is possible to solve large problems, even problems with a large tree width. The initial DP algorithm was revisited in [
4,
5], but this approach was also based on a clause representation and the main contributions were the determination of good deletion sequences.
Another related problem is propositional model counting or #-SAT, consisting of computing the number of models satisfying a given formula [
18]. This is related to probabilistic computations and it is known to be #P-complete. Similar deletion algorithms have been applied; however, as is said in Gogate and Dechter [
19], these algorithms are exponential with the tree-width size. In this paper, we take advantage of using idempotent valuations and develop a computation method that can solve some concrete cases, even if the tree width is large, using representations that are not exponential with such a tree width.
We provide some experiments carried out with our
Python 3.8.8 implementation based on the
NumPy library for representing arrays (the library together with data to reproduce experiments are available as a Github repository at:
https://github.com/serafinmoral/SAT-solver (accessed on 16 May 2023)). In them, we show that it is possible to solve some moderate-sized problems, expanding the class of problems solved by the existing deletion algorithms. The contribution is also significant because of the possibilities it opens for future developments, as proposed in the last section of the paper devoted to the conclusions and future work. We have also shown that it is possible to solve the 0–1 problems associated with hard Bayesian networks inference problems. The rest of the paper is organized as follows: in
Section 2, the notation and the problem specification are given;
Section 3 is devoted to the table representation of sets of clauses, the introduction of the basic operations (combination, marginalization, and conditioning), and the study of their properties;
Section 4 studies the basic deletion algorithm and the alternative procedures, providing formal results of their main properties; in
Section 5, a set of additional tools for the deletion algorithm are given, and the final decision procedure for the deletion algorithm is described;
Section 6 is devoted to the experimental part; and
Section 7 details the conclusions and future work. All the proofs of the results in the paper are in
Appendix A.
2. Problem Specification
Let a finite set of variables or propositional symbols. A literal is either a variable, p, (positive literal) or a negated variable (negative literal). A clause c is a disjunction of literals, , which we will represent as a set . If a clause contains literals p and , then it will be called trivial or tautology. The set of all non-trivial clauses defined for variables V is denoted as .
Information will be given as a finite set of non-trivial clauses C. The set of variables appearing in clause c will be denoted by . If C is a set of clauses, then .
A true assignment, t, is a mapping from V to . A true assignment t satisfies the set of clauses C when for each , there is a positive literal, , with or a negative literal, , with .
Two sets of clauses, and , defined for the same set of variables V, will be considered logically equivalent, , when for each true assignment t, we have that t satisfies if and only if t satisfies .
The basic syntactic operation with clauses is resolution: given two clauses and such that there is a variable r such that and , then the resolution of and by r is the clause . A set of clauses C is said to be complete under set of variables V if and only if for each , if is not trivial, then and if and (i.e., is subsumend by c), and then . Given a set of clauses, C, there is always a minimum set of clauses (in the sense of inclusion) which contains C and is complete (the set obtained by adding to C all the clauses obtained by resolution and subsumption). It will be denoted by . It is clear that two sets of clauses, and , defined for the same set of variables V, are equivalent if and only if
A set of clauses C implies clause c if and only if and this is denoted as . It is well known that this is equivalent to the fact that any true assignment t satisfying C also satisfies c. If and are two sets of clauses and for each we have that , it can be said that implies , which is denoted by . This is equivalent to the fact that any satisfying true assignment for is also a true assignment for . It is also well known that and are equivalent if and only if and .
The satisfiability problem (SAT) consists of determining if for a given set of clauses
C there is a satisfying true assignment
t for it. It is clear that if
C is empty, then the answer is positive, and if the empty clause belongs to
C, the answer is negative. This is an NP-complete problem [
10], and therefore, hard to solve.
The algorithms in this paper will be based on the Davis–Putnam deletion algorithm [
1]. The basic step of this algorithm is the deletion of a variable in a set of clauses
C, denoted as
. This operation is carried out with the following steps:
- (1)
Compute
- (2)
The result is
.
An algorithm for SAT based on the deletion of variables is depicted in Algorithm 1. At the end of the loop, as all the variables have been deleted, one of the two conditions ( or ) must be satisfied. Its efficiency depends of the order of removing variables; however, in general, this algorithm has the problem of producing too many clauses that are large in size (number of literals).
It is also advisable to make some basic operations on
C so that a simpler equivalent set is obtained. For example, it is always good to remove subsumed clauses, i.e., if
and
, then remove
from
C. Unit propagation is another important step. For that, we need the concept of restriction of a set of clauses
C to a literal
ℓ, which is the set of clauses that is equivalent to
C under the condition that
ℓ is true. It will be denoted as
and it is the set of clauses obtained from
C by removing any clause containing
ℓ and removing
from the clauses containing
. It is important to remark that the true assignments satisfying
are the same as those satisfying
C with
ℓ set to
.
Algorithm 1 David-Putnam deletion algorithm |
- Require:
C, a set of clauses. - Ensure:
, a logical value indicating whether C is satisfiable.
- 1:
procedure DP(C) - 2:
variables of C - 3:
for do - 4:
- 5:
if then - 6:
- 7:
Break - 8:
end if - 9:
if then - 10:
- 11:
Break - 12:
end if - 13:
end for - 14:
return - 15:
end procedure
|
Unit propagation consists of transforming C containing a unit clause with a single literal , into . This operation is repeated for each literal ℓ appearing in a unit clause .
If
C is a set of clauses defined for variables
V and
is a subset of
V, then the marginalization of
C to
is the set of clauses
, i.e., the set of all the clauses that are defined for variables in
and are a consequence of the clauses in
C. The marginalization to
will be denoted as
making reference to the set of removed variables. The David and Putnam algorithm has two main advantages against other satisfiability algorithms [
5,
6]. The first one is that the deletion algorithm is really an algorithm to compute the marginal information and has many other utilities. Then, if the loop starting in step 3 of the algorithm is applied for variables
, then the value of
C will be equivalent to
. This is a consequence of the fact that each step carries out a deletion of a variable
and that the set of clauses is a particular case of the Shenoy–Shafer axiomatic framework for local computation [
2,
6].
3. Table Representation of Sets of Clauses
A set of clauses C defined for variables V can be represented by a table with a dimension for each variable . If , we consider the set and , where ∏ stands for Cartesian product. An element from will be denoted in boldface . The component v of vector will be denoted as . If , the subvector of components will be denoted by , making reference to the removed components.
A table T for variables V will be a mapping . The set of variables of table T will be denoted by . A table T needs bits to be represented.
To simplify the notation, if a table T is defined for variables V and being , we will assume that , i.e., a table can be applied to a larger frame than the actual frame in which it is defined, simply by ignoring the components not in the set of variables for which it is defined.
The set of all tables defined for variables V will be denoted as .
A set of clauses C is represented by a table T if if and only if the true assignment given by when and when satisfies C. This is a semantic representation of the set of clauses C by determining the true assignments satisfying all the clauses, i.e., a truth table.
To simplify the notation, we will consider that vector and are equivalent, so that can also be called a true assignment, but in fact making reference to . Then, in the tables, a true assignment will be a vector , and this true assignment satisfies table T when . The set of true assignments satisfying a table T will be .
The trivial table is the table with for any and the contradictory table is the table with for any . A trivial table represents the empty set of clauses and a contradictory table; an unsatisfiable set of clauses.
Given C, a table T can be easily computed starting with a trivial table, and then for each , we make for each , such that for each literal associated with variable v, we have that , i.e., if and if .
Given a set of clauses C defined for variables V, the direct table representation is unfeasible if the number of variables of V is not small, given that the table size is , where is the cardinal of V. For this reason, to represent a set of clauses C, first, we will partition the full set of clauses in small sets, , each one of them defined for a small set of variables . Then, the set C will be represented by the set of tables , where is the table representing .
In our experiments, the partition has been computed by the following steps:
Carry out unit propagation in C;
Group together clauses defined for the same set of variables, i.e., if , then ;
Remove sets defined for non-maximal sets of variables, i.e., if is such that there is another set of variables such that , then is updated to and is removed.
This is a procedure that we have found reasonable, but it is not the only possible one. We find that steps 1 and 2 are basic, but that there can be other alternative ways of performing step 3. For example, joining into a set, and , if is not small and is not too large, where large and small can be defined in terms of a couple of parameters in each case.
There are three basic operations with tables:
Combination. If
and
are tables, then its combination is the table
defined for set of variables
and given by:
When considering , it is important to notice that this value is .
Variable deletion. If
T is a table and
, then table
is the table defined for variables
and given by:
will also be called the marginalization of T to . If , then will also be denoted as .
Conditioning. If
T is a table and
ℓ is a literal associated with
, then the conditioning of
T to
ℓ is the table
defined for variables
and given by:
The conditioning operator can be extended to a partial true assignment: if
T is a table,
and
is a true assignment for variables
, then
is the table defined for variables
and given by:
where
is the vector in
with
and
.
We will assume that conditioning can be applied even if the variable v associated with ℓ is not in . In this case, . Analogously, if is not included in and , we consider that
The following facts are immediate:
If and are two tables associated with and , respectively, then is associated with .
If table T is associated with C, then is associated with and is associated with .
We will now give an example, comparing the computations with clauses and tables.
Example 1. Assume that we have a set C of clauses given by:Assume that we want to delete variable q. Then, we have to compute all the resolutions of clauses containing q with clauses containing . After eliminating trivial clauses, the result is given by set of clauses : Using a table representation, we can build a table for the first three clauses and a table for the last three ones with the following values: The combination will be the table defined for variables and given by:
If in this combination, variable q is deleted by marginalization, table is obtained:which is exactly the table associated with the two clauses in . The following properties show that tables satisfy the basic Shenoy–Shafer axioms, and therefore, local computation is possible:
Combination is commutative and associate: , ;
If and are two disjoint subsets of V, then ;
If and , then .
The tables also satisfy the idempotent property: if
T is a table and
, then
. As a consequence of these properties, if we consider a set of variables
V and
,
is said to be information algebra [
3,
8]. It also has a neutral element,
, and a null element,
.
In information algebra, it is always possible to define a partial order, which in this case is the following: if and are tables defined for sets of variables and , respectively, then we say that if and only if for each , we have that . The intuitive idea is that contains more or the same information than (any true assignment satisfying will also satisfy ).
The following properties of this relation are immediate:
If T is a table and , then ;
If and are tables, then ;
If are tables, then if and only if and .
Two tables,
and
, are said to be equivalent,
, if and only if
and
. If
and
are defined on
and
, respectively, and
and
are the neutral tables in
and
, respectively, we can immediately prove that
and
are equivalent if and only if
. The multiplication by the neutral element is necessary for tables to be defined for the same set of variables. If
and
are equivalent and
, then they are identical, i.e.,
. The quotient set of
under this equivalence relation is called the domain-free valuation algebra associated with the valuation algebra [
3]. In the following, we will consider that we work with equivalent classes of tables, and that a table can be changed into any equivalent one. All the neutral tables
defined in different sets of variables are equivalent. Furthermore, the contradictory tables
are equivalent. As a consequence, we will not make reference to the set of variables in which they are defined.
There is a disjunction operation [
3,
8] which can be defined on the set of tables: if
and
are tables, then its disjunction is the table
defined for the set of variables
and given by:
It is immediately clear that disjunction is commutative and associative. Furthermore, that combination satisfies the distributive property with respect to disjunction and disjunction is also distributive with respect to the combination. In fact, we have the Boolean information algebra from [
3], being the complementary to
T in the table
.
We have some interesting properties relating disjunction with the basic table operations.
Proposition 1. If T is a table and , then .
Proposition 2. If and are tables and , then .
4. Deletion Algorithm with Tables
We assume that we have some information represented by a set of tables: . This set is intended to be a representation of the combination: . As the size of the tables increases exponentially with the number of dimensions of the tables and , then the representation by a set can use much less space than the representation by a single table (except when the tables are defined for the same set of variables, but this is never the case according to our procedure to build the initial tables). will be denoted as . The total size of the tables in H will be . For example, if we have three tables defined for variables , the size of the tables will be , but their combination will be defined for , which corresponds to a table of size 32.
If , vector satisfies the set of tables H when for any .
We will say that two sets, , are equivalent if and only if is equivalent to . We also say that when .
The following properties are immediate:
If and are equivalent, then will be equivalent to ;
if and only if for any ;
If , then is equivalent to H;
If , then is equivalent to H.
The set of true assignments of a set H will be the , where the intersection of two sets defined on different sets of indexes is defined as follows: if , then
The operations with tables can be translated to set of tables, taking into account that an operation on set H is really carried out on and that equivalent tables represent the same information. Some operations can be carried out in a simple way:
The combination can be completed by a simple union:
The disjunction of sets of tables can be computed as follows:
The conditioning is the conditioning of its tables:
The deletion of variables is the more complex operation for sets of tables. In the following, we describe several methods to carry out this operation.
The marginal problem is as follows: given a set of tables, H, and , then compute such that is equivalent to . This set will also be denoted as , considering that we are computing an element of the equivalence class.
As tables satisfy the basic Shenoy–Shafer axioms, the marginalization can be computed with basic Algorithm 2, which is very similar to Algorithm 1, but now expressed as a marginalization algorithm and with information represented by tables. Procedure
Marginalize0 (
) is very simple and it is depicted in Algorithm 3, where the condition of
T equivalent to
seems a bit artificial and unnecessary; it was included to show the similarity with other variants of this operation that we will introduce. When implemented, this test is also carried out, and when
T is equivalent to
, we return the contradictory table which is defined for the empty set of variables and contains a single 0 value.
Algorithm 2 Deletion algorithm |
- Require:
H, a set of tables. - Require:
, a set of variables to remove. - Ensure:
, a set of tables representing .
- 1:
procedure Deletion (H, ) - 2:
for do - 3:
tables such that - 4:
Marginalize0 (, v) - 5:
if contains then - 6:
return - 7:
else - 8:
- 9:
end if - 10:
end for - 11:
return H - 12:
end procedure
|
This algorithm can be used to solve the satisfiability problem: if for function
Deletion(
H,
),
, then all the variables are removed and
H will only contain tables defined for the empty set of variables which have a single value. Taking into account that trivial
tables are not introduced, there are only two possibilities:
H contains
and then the problem is unsatisfiable or
and the problem is satisfiable. However, the algorithm can also be used to compute marginal information and to compile the information in initial set
H. This compilation is based on the following result.
Algorithm 3 Basic version of marginalize |
- Require:
, a set of tables containing variable v. - Require:
v, the variable to remove - Ensure:
, a set of tables representing . - Ensure:
, a set of tables containing v. - 1:
procedure Marginalize0 (, v) - 2:
- 3:
- 4:
if is equivalent to then - 5:
- 6:
else - 7:
if is equivalent to then - 8:
- 9:
else - 10:
- 11:
end if - 12:
end if - 13:
- 14:
return - 15:
end procedure
|
Proposition 3. In each application of
Marginalize0 ()
, we have that is equivalent to . Furthermore, if we bring into the update set H, computed in Step 8 of Algorithm 2, then H is equivalent to .
As a consequence of this result, when applying the deletion algorithm, if we call to the set which is obtained after removing variable v, we have that H is equivalent to . It is important to remark that is the result of removing all the variables, and then a table from this set is defined for the empty set of variables, which is a number. There are two possibilities: first, if H is satisfiable, then all the tables are 1 () and can be removed, representing by the empty set. Second, if H is unsatisfiable, will contain , and the full H is equivalent to it: . Let us call this set .
If and the variables are removed in this order: , then we have that is equivalent to . Therefore, is equivalent to . In this way, we do not only compute the marginal information, but with sets , we have the necessary information to recover in a backward way the marginal sets: from the marginal in which all the variables are removed, , to the marginal in which no variable is removed, H. This fact can be useful, among other things to obtain the true assignments satisfying all the tables, in case they are satisfiable. The following result provides a procedure to obtain the true assignments satisfying a set of tables H computed from sets of tables obtained when applying a deletion algorithm.
Proposition 4. Assume a set of tables H and that Algorithm 2 is applied removing variables in in order . Assume also that is equivalent to the empty set, i.e., the problem is satisfiable; then, if is the set of true assignments satisfying the set of tables and is the true assignments of H, then these sets can be computed in reverse order of in the following way:
Start with ;
Make equal to the set containing an empty vector ;
For each , compute , which is the only table in , which is a table defined only for variable , then this table is never equal to , and if and are the true assignments obtained by extending to variables and given by , i.e., by considering true and false, respectively, then:
- −
if , add and to ;
- −
if , add to ;
- −
if , add to .
This result is the basic for algorithms to compute one solution, all the solutions, or a random solution given a satisfiable set of clauses. It should start with a , which is a set containing an empty vector . The main difference in these algorithms is when . When computing one solution, we only pick or , when computing all the solutions, both and are selected, and when computing a random solution, there is a random selection of or . In the last case, an importance sampling algorithm in the set of solutions is obtained: starting with a weight of , each time we have a random selection, the weight must be multiplied by .
In Algorithm 3, we have described the basic marginalization operation, which is the same as the one applied to the general valuation-based systems [
2]. However, Boolean tables allow other alternative forms of marginalization. The first one is depicted in Algorithm 4 and determines that it is not necessary to combine all the tables in order to compute the marginal. In fact, only pairwise combinations are necessary.
Algorithm 4 Pairwise combination version of marginalize |
- Require:
, a set of tables containing variable v. - Require:
v, the variable to remove - Ensure:
, a set of tables representing . - Ensure:
, a set of tables containing v.
- 1:
procedure Marginalize1 (, v) - 2:
- 3:
- 4:
if contains then - 5:
- 6:
end if - 7:
Remove neutral tables from - 8:
return - 9:
end procedure
|
The following proposition shows that is also a set of tables representing .
Proposition 5. If is a set of tables containing variables v, then computed in Algorithm 4 represents .
Once this is proved, then we can replace Marginalize0 by Marginalize1 in the deletion algorithm and everything works, even the method to compute the solutions given in Proposition 4. The only difference is that now contains, in general, more than one table, and when computing , we have to condition every table in , being the result a set of tables depending on variable . These tables are combined to produce .
The main difference between Marginalize0 and Marginalize1 is that the former produces an unique table in and , while the latter produces several tables in both sets, but with smaller size. As the size of a table T is , in general, Marginalize1 is more efficient, but we have to take into account that the number of tables in is quadratic in relation with the number of tables in and this fact should be taken into account.
However, there is another very important alternative marginalization when we have a variable which is functionally dependent of other variables [
20]. This happens very often, especially in problems encoding circuits [
21].
If
T is a table and
, we say that
v is functionally determined in
T if and only if
is equivalent to
. If
, this implies that for any
we have that
, i.e., either
or
, i.e., for each
, there is, at most, one possible value for variable
V,
v or
for which table
T has a value of 1 (at least, one of the values
v or
is impossible). This definition generalizes definition given in terms of clauses in [
20], which also requires that
is equivalent to the neutral element.
In the case that there is a table such that v is functionally determined on T, then marginalization can be done as in Algorithm 5.
This marginalization is much more efficient, as the number of the tables in
is smaller than in
Marginalize1. In fact, the number of tables in the problem does not increase: the number of tables in
is always less or equal than the number of tables in
. We give an example illustrating the benefits of using this marginalization.
Algorithm 5 Marginalize with functional dependence |
- Require:
, a set of tables containing variable v. - Require:
v, the variable to remove - Require:
T a table in which v is functionally determined - Ensure:
, a set of tables representing . - Ensure:
, a set of tables containing v.
- 1:
procedure Marginalize2 (, v, T) - 2:
- 3:
- 4:
if contains then - 5:
- 6:
end if - 7:
Remove neutral tables from - 8:
return - 9:
end procedure
|
Example 2. Assume that we are deleting variable p and that we have three tables: If Marginalize0 is applied, then we have to combine the three tables producing a table , which depends on six variables and has a size of 64. If Marginalize1 is applied, then we have to compute (the combination of a table by itself is not necessary, as this produces the same table which is included in one of these combinations), i.e., we have more tables but of smaller size (16 + 16 + 32). However, if it is known that p is determined by q in table , then, for Marginalize2, only combinations are necessary and the result is based on two tables of sizes 16+16, producing an important saving in space and computation. Please note that to finish the marginalization step, variable p should be removed on the computed tables by marginalization, but this step has been omitted in order to keep the notation simpler.
This marginalization is correct, as is equivalent to and is equivalent to , as the following results shows.
Proposition 6. If are the sets computed in Marginalize2 and the initial conditions required by the algorithm are satisfied; then, is equivalent to and is equivalent to .
5. Additional Processing Steps and Marginalization Strategy
In this section, we introduce some additional steps which can be added to the basic deletion algorithm to improve efficiency or to enlarge the family of problems that can be solved.
5.1. Combining Tables
Many times we have in a problem two tables and , such that . We can substitute them by its combination obtaining an equivalent problem. Doing this can have potential advantages: it reduces the size of the problem specification, as the size of T is the same than the size of , but it also increases the chances of having a variable which is functionally dependent of the others. Remember that v is determined in T when , so if , then v will be also determined in . We can consider the procedure CombineIncluded(H) which compares the sets of variables of each pair of tables , and substitutes the pair by its combination when one set is included in the other. One important point is the sets to which this procedure applies. In Algorithm 2, CombineIncluded(H) could be applied to the set H after each deletion of a variable, to the set , or to . Applying it to H could be very costly, as the number of tables can be high and we have to compare all the pairs of tables. A similar effect can be obtained by applying it to set , which are the tables effectively used in each deletion step, but which is of smaller size. It is also important to apply it to set , as it can significantly reduce the number of tables, which can be high when Marginalizalize1 is used to delete a variable.
In the case of , we have already implemented a more aggressive combination method, which combines two tables and when . This always happens when , but also when and , i.e., tables are defined for the same variables, except for variables . We will call to this procedure GroupTables(H), which will be applied to in the case of Marginalize1 when the number of tables in H is greater or equal is greater or equal than a given threshold, N.
Example 3. Assume three tables, i.e., for variables , for variables , and for variables , that are given by: If GroupTables(H)
is applied, then tables and are combined, as the result has a size of 8 and the combined tables sizes are 4 + 4. Then, table is also combined, as it is defined for a set of variables included in the variables of the combination of . The result is that the three tables are replaced by their combination, which is given by the following table: Observe that this table is smaller than the sum of the sizes of the three combined tables. Furthermore, in the combination, we can observe that r is determined by , for which Marginalize2 can be applied.
5.2. Unidimensional Arrays
Unidimensional arrays have a special consideration. If we are going to introduce in H a table T and , then there are three possibilities:
T is equal to , the neutral element. In this case, the table is not introduced in H.
T is equal to , the contradiction. In this case, the whole set is equivalent to and all the other tables are removed from H. In fact, in our implementation, the equality to is checked for any table to be introduced to H, whatever its dimension is.
In T, there is a value with value 0 and other with value 1. If ℓ is the literal with value 1, this table is equivalent to the unit literal ℓ and we can carry out a unit propagation. This is done by transforming any other table into and finally introducing table T. In our implementation, instead of introducing the table T, we keep a set of unit literals, and each time another table is introduced in H, the conditioning to the literals in this set is carried out.
Example 4. Consider the table about given by: If a table, T, with p is introduced with , then we can transform table into , which is again an unidimensional table defined for q with . This table may produce further simplifications of tables containing variable q.
5.3. Splitting
The size of the tables in is important when applying the deletion step: the smaller the size, the faster this step can be completed. For that reason, we have implemented a procedure to reduce this size. In order to do this, for each , Algorithm 6 is applied, where Minimize (,, T, V) is a procedure that tries to make as smaller as possible (by marginalization) under the condition that and that variables in V cannot be removed. The details of this minimizing algorithm can be found in Algorithm 7.
When this split is applied to any table
T in
, each time
T is split into
with
, then
T is changed to
in
H and to
in
. The general procedure making this transformation is called
SplitG(
,
H).
Algorithm 6 Splitting a table before deleting a variable |
- Require:
T, a table containing variable v. - Require:
v, the variable to remove - Ensure:
, a table containing v. - Ensure:
, a table not containing v.
- 1:
procedure Split(T, v) - 2:
- 3:
- 4:
Minimize(,, T, ) - 5:
return - 6:
end procedure
|
Algorithm 7 Minimizing the splitting table |
- Require:
, tables such that . - Require:
V, a set of variables that can not be removed. - Ensure:
M a table in which is minimized.
- 1:
procedure Minimize(,, T, V) - 2:
if then - 3:
- 4:
return M - 5:
end if - 6:
Let v be an element from - 7:
- 8:
if then - 9:
return Minimize(, T, , ) - 10:
else - 11:
return Minimize(, T, , ) - 12:
end if - 13:
end procedure
|
Example 5. Assume that in our set of tables, we have a table T with variables given by: If we want to remove variable r, then instead of using this table, we can try to split it into two tables, one of them not depending on r. Then we first compute the marginal , which is given by: Next, we minimize T conditioned to obtaining in this case the following table : In this way, we obtain the decomposition ; we can change T in our set of tables by the two tables , and then, as does not depend on r, when deleting this variable, only has to be considered, which has a lower dimension than the original table T, simplifying in this way the deletion step.
5.4. Minimizing the Dependence Set
If we have the set of tables and there is a table in which v is functionally determined, then the deletion is, in general, quite efficient, but it also depends on the size of T. If there is a table that can be obtained from T by marginalization and v continues being determined in , then the result is also correct if we call Marginalize2 (, v,). The reason is very simple: as is obtained by marginalization of a table in , then is equivalent to and Marginalize2 (, v,) produces a correct result. The only difference between Marginalize2 (, v,) and Marginalize2 (, v,) is that in the former, is included. However, this table is less informative than , which is included in Marginalize2 (, v,), and then the two results are equivalent.
The algorithm we have applied to compute a table
with smaller size in which there is functional dependence is depicted in Algorithm 8, initially with
. In it, we assume that we have a function
CheckDeter(
T,
v) that determines when
v is functionally determined in
T.
Algorithm 8 Minimizing the dependence of a variable in a table |
- Require:
T, a table. - Require:
v, a variable which is determined in T. - Require:
V, a set of variables which can not be deleted. - Ensure:
a marginal table in which v continues being determined.
- 1:
procedure MinDep(T, v, V) - 2:
if then - 3:
- 4:
end if - 5:
Let v be an element from - 6:
- 7:
if CheckDeter (, v) then - 8:
MinDep (, v, ) - 9:
else - 10:
MinDep (T, v, ) - 11:
end if - 12:
return - 13:
end procedure
|
Example 6. Assume a table T about variables given by: In this table, r is determined by , but if is computed, the result is: Furthermore, r is determined by p on it. Marginalize2 is more efficient using this smaller table instead of the original one.
5.5. Alternative Deletion Procedures
When applying Marginalize0 (, v), table is computed. In some situations, it is possible that this table is of a small size, but Marginalize1 (, v) or Marginalize2 (, v) are applied. In that case, instead of computing , an alternative method is to compute the maximal sets of the tables from : , where Maximal is removed from a family of sets, which are strictly included into another set of the family.
Observe that it is not necessary to actually compute the tables in , but only the sets of variables associated with these tables.
Then, we can compute . When comparing and we can observe the following facts:
Each element from is equal to , where or a combination of several sets of this type, when CombinedIncluded has been applied. Then, there will be another table defined for the same set of variables and computed as . As is the result of marginalizing after combining all the tables, instead of combining only two tables, we have that , and in some cases, the tables are not equivalent.
The whole sets of tables and are equivalent. In fact, both are equivalent to . This is known for . For , as any element from is a marginalization of T, . On the other hand, as a consequence of the above point, , and as is equivalent to T, we have that is also equivalent to .
Computing by computing T and and performing marginalizations has, in general, a higher computational cost, but this may be lowered if T contains few variables; moreover, on the positive side, we have more informative tables in and there are more opportunities to find functional dependencies of variables in tables, which can improve the efficiency of posterior deletion steps.
In our implementations this decision is taken by fixing a threshold K, and if , is computed. In other cases, is computed. The versions of Marginalize1 and Marginalize2 computing are called Marginalize1b and Marginalize2b, respectively.
Example 7. Assume that we are going to delete p and that we have three tables , defined for variables , and , given by: Assume that Marginalize1
is applied and we are going to compute , given by the following tables: Alternatively, we can compute , which are defined for the same variables, obtaining: The product of these three tables is equal to the product of the original tables in Equation (1). However, individually, each one of them can be more informative (has more 0’s) than the corresponding one in Equation (1). In this example, the first one has 0 assigned to in Equation (2), while this value was 1 in Equation (1).
Marginalize1
b will compute the arrays in Equation (2). 5.6. The Final Global Marginalization Procedure
Now, we put everything together and describe the global marginalization procedure. In Algorithm 9, we describe the algorithm MarginalizeG, which gives a set H, and the variable v computes a set equivalent to and other set , such that H is equivalent to . We assume that Determined (, v) is a procedure that checks whether there is a table in in which v is functionally determined. In that case, it returns MinDep (). If this table does not exist, it returns the neutral element .
The first thing is to test whether there is a functional dependence. In that case, Marginalize2 is applied, after minimizing the table with that dependence. If the size of the global combination is lower than a given threshold, then the version b of the marginalization is applied.
In another case, we have to choose between
Marginalize0 and
Marginalize1. For that, we first compute
, the set of maximal sets of the sets of variables of the tables of
after applying
Marginalize1 and compare the size of the tables defined for these sets with the size of the table obtained with the basic
Marginalize0, selecting the method with a smaller final size of the tables. If
Marginalize1 is selected, then
GroupTables is applied if the number of tables in
is greater than a given threshold
N.
Algorithm 9 The final global marginalization algorithm |
- Require:
H, a set of tables. - Require:
v, the variable to remove - Require:
W, a Boolean variable indicating the splitting procedure - Require:
N, a threshold for grouping tables - Require:
K, a threshold for the alternative deletion procedure - Ensure:
, a set of tables representing . - Ensure:
, a set of tables such that H is equivalent to .
- 1:
procedure MarginalizeG (H, v) - 2:
tables such that - 3:
CombineIncluded () - 4:
if W then - 5:
SPLIT - 6:
end if - 7:
Determined (, v) - 8:
if then - 9:
if then - 10:
Marginalize2b (, v, T) - 11:
else - 12:
Marginalize2 (, v, T) - 13:
end if - 14:
else - 15:
▹ Maximal sets of after Marginalize1 - 16:
if then - 17:
Marginalize0 (, v) - 18:
else - 19:
if then - 20:
GroupTables () - 21:
end if - 22:
if then - 23:
Marginalize1b (, v) - 24:
else - 25:
Marginalize1 (, v) - 26:
end if - 27:
end if - 28:
end if - 29:
CombineIncluded () - 30:
- 31:
- 32:
return - 33:
end procedure
|
6. Experiments
Computations with tables have been implemented using
Python and the
NumPy library [
22]. For the basic operations with tables (combination, marginalization, and conditioning) we have taken as basis the implementation of the same operations in probabilistic calculus in the
pgmpy library [
23]. An important fact about the deletion algorithm is the order in which variables are chosen to be deleted. We have followed the usual heuristic of selecting the variable
v with a minimum number of variables in
, but with the exception of the case in which
v is not functionally determined in any table of
, but there is another variable
that is determined in the table of
, and where
. This is done to have a preference for selecting variables in which we can apply
Marginalize2, which is the most efficient deletion procedure. We have carried out three experiments.
In the first experiment, we tested the basic deletion algorithms in several SAT examples imported from several repositories of benchmark problems for SAT, mainly from: Hoos and Stützle [
24], Junttila [
25], Tsuji and Gelder [
26], and Burkardt [
27]. The main criterion for selecting the cases has been that the size of the arrays does not surpass the maximum array size of 32 that exists in the
NumPy library. The characteristics of the examples can be found in
Table 1. We also provide the maximum cluster size (number of variables) of a join tree built from the connectivity graph under the same ordering than used in our deletion algorithm. We have selected the problems with the restrictions that they are not too simple (a maximum cluster size of at least 15) and that the deletion algorithm can be applied, taking into account that the number of dimensions of a table in
NumPy is limited to 32, i.e., it is required that for each table
T, we have that
.
We have applied the deletion algorithm with the general deletion step of Algorithm 9. In it, we have considered a value of for grouping tables when pairwise marginalization is applied. We have tested two variants of the algorithm and to test whether splitting is a good idea and a set of values of .
First, we can observe that, in general, the problems are solved fast, with an average of less than 1 min for all the parameters settings.
With a non-parametric Friedman test, the differences between the different combinations of
K and
W are significant (
p-value = 0.001149). The averages of times as a functions of
K are depicted in
Figure 1 for the case
(the case
is very similar). We can see that the time increases as a function of
K, but being more or less constant for
. This increasing pattern does not always occur. For example, in aes_32_1_keyfind_1.cnf, the times with
can be seen in
Figure 2. In that case, it is possible to see that the optimal
K is 15. The extra work of computing the global table is compensated by the presence of more zeros in the resulting tables, which speed up the consecutive deletions.
We have carried out a post hoc Conoven–Friedman test (without correction for multiple tests). The resulting p-values for pairwise comparisons can be seen in
Table 2. A ‘Y’ means that
and an `N’ means that
. First, we can observe that there are never significant differences with the use of
Split. There are significant differences of
with the cases in which
, especially if
Split is not applied. In this case, the computation of large tables does not compensate for the presence of more zeros. The comparison of
is only significant with
. Again, the significance is higher with no
Split.
with
Split shows no significant difference with any smaller value of
K. There is only one small significance (with
) if split is not applied.
To compare with previous versions of the DP algorithm, we have also implemented the basic Algorithm 2, as in [
5], including a step for unit propagation (each time a unit clause is found, unit propagation is carried out). As this algorithm is less efficient, we have given it a limit of 600 s to solve each problem to avoid very long running times in some of the problems. For example, case aes_32_1_keyfind_1.cnf was not solved even with 2 h of running time. A total of 12 problems could not be solved within this time limit (600 s), and the average time was 168.08 s. This average is much larger than the worst case in our algorithms, even taking into account that the running time was limited to 600 s. A non-parametric Wilcoxon test was not significant. This is due to the fact that this approach was usually faster in the simpler problems.
In the second experiment, we consider four sets of clauses with 504, 708, 912, and 1116 variables and 1840, 2664, 3488, and 4312 clauses, respectively. Our exact algorithms were not able to solve these cases. However, we have computed the number of variables which have been deleted without surpassing a maximum table size of 25, for each combination of
K and
(the results were the same for
). We can observe that there is a tendency to delete, in an exact way, more variables when
K increases, though this is not always true in each particular problem (
Table 3).
As a summary of the first two experiments, we find that the use of with Split is the best option, but we leave open the possibility of using higher values of K, especially in difficult problems.
The third experiment identifies a situation in which marginalization algorithms can be applied. For that, we consider the Bayesian networks used in the UAI 2022 competition (see
https://uaicompetition.github.io/uci-2022/ (accessed on 10 May 2023)). We discarded the networks in which all the values in the conditional probability tables are non-zero (five networks) and the networks in which there were non-binary variables (three networks). This makes a total of 96 networks used in the partition function and the marginal probability competitions. The characteristics of these networks can be seen in
Table 4, where
is the number of variables,
is the number of observed nodes,
is the maximum cluster size (in a deletion algorithm with the original probability tables), and
is the percentage of 0 values in the original conditional probability tables. We can observe that the percentage of 0 values is very high in all the networks. We have to take into account that, being conditional probability tables for binary variables, at least 0.5 of the values are different from 0 (for each 0 value, there is another value equal to 1). Thus, in some networks with 0.455 values that are 0, this implies that only 0.002% of the values are different from 0 and 1. With these networks, we have solved the associated propositional problem. This problem is defined by transforming each conditional probability table
T into a logical table
in such a way that
if
and
, otherwise. If a variable
v has been observed, then a unitary logical table is added with value 1 in the observed value and 0 in the unobserved one.
Then, we have applied our basic algorithm with and Split. All the problems were solved with an average time of 20.54 s (minimum of 0.035 s and maximum of 542.367 s). It is important to remark that most of the problems, 59 out of 96, were solved in less than 1 s, and almost all of them, 82 out of 96, were solved within a time limit of 10 s.
We have also applied the DP algorithm (Algorithm 2) to the equivalent set of clauses of the different problems. The average time was 200.736 s, but we need to take into account that there was a time limit of 2000 s and that nine problems were not solved within this time limit. The average, after taking into account the full resolution of these nine problems, would have been higher. This shows that our method was able of solving more problems and with less time, especially for difficult cases.
It is important to remark that the result of the deletion algorithms can be used to develop Monte Carlo algorithms to obtain compatible configurations according to Proposition 4, without rejecting any configuration due to a 0 probability. This is important for the development of approximate algorithms and it is not feasible with classical SAT algorithms deciding the satisfiability of the case and providing one satisfying assignment.
7. Conclusions and Future Work
In this paper, we have proposed a new procedure which can be applied to solve the marginalization problem in propositional logic based on the use of Boolean arrays. The experiments show that it is possible to solve, in an exact way, moderate-sized problems (even with thousands of variables and a tree width of more than 100). The method is based on a classical deletion algorithm but with some improvements based on the special characteristics of Boolean arrays. We have provided a full set of tools for working and operating with these arrays, allowing us to apply the Shenoy–Shafer abstract framework [
2]. We have studied different methods for carrying out the deletion of a variable. Of special interest is the deletion of a variable when its value is functionally determined from the values of the other variables. Previous experiments with deletion algorithms [
4,
5] reported experiments in general problems, with up to 25 variables. They have also reported experiments with other problems, called ‘chain problems’, that were randomly generated in such a way that the tree width was bounded by 5, while we have been able of solving problems with a tree width of 103. We have shown that we have been able to expand the class of problems that can be solved with the deletion algorithm, as the previous versions were unable of solving the more difficult problems in our experiments. Some problems which could be solved in seconds with our approach could not be solved in hours with former basic deletion algorithm. This opens a new set of possibilities for algorithms to solve the marginal problem.
For the future, this framework opens a wide range of possible developments:
To optimize the deletion strategy by selecting the most appropriate method depending on the characteristics of the tables in .
To improve the methods for decomposing a large table T as a product of smaller tables, which can be more useful. Of special interest is also to obtain tables of low dimensions, even if T is not decomposed as its product. The extreme case is a table of dimension 1, which always simplifies the problem (if different from the neutral element).
To combine the clause representation and the table representation, as we have found that the basic clause representation was able of solving faster the simpler problems.
To organize computations in a join tree with a local computation algorithm [
2,
6].
To develop approximate algorithms. For example, the mini-bucket elimination algorithm [
28] is a general framework for approximate algorithms based on partitioning the set
before carrying out the combination of all the tables. In this case, we can have more opportunities based on the fact that potentials are idempotent and the alternative marginalization procedures we have provided.
To combine approximate and exact computations. An approximate computation can be the basis to obtain small informative tables, which can be useful to speed up an exact algorithm.
To develop a backtracking algorithm, taking as a basis the array representation.
To approximate inference in Bayesian networks is NP-hard, especially when there are extreme probabilities [
29]. Approximate algorithms, such as likelihood weighting [
30] or penniless propagation [
31], could take advantage of a previous and fast propagation of 0–1 values with the procedures proposed in this paper. Experiment 3 has already shown that our algorithms can propagate 0–1 values very fast in hard problems (from the UAI 2022 competition).
The special role of potentials representing functional dependence could also be studied in the general framework of Information Algebras [
32] and applied to other problems which are particular cases, such as constraint satisfaction [
33].
To consider the easy deletion of some variables in an SAT problem as a preprocessing step in SAT problems [
34] that could be used as a simplification procedure.