Next Article in Journal
Bayesian Latent Class Analysis: Sample Size, Model Size, and Classification Precision
Previous Article in Journal
On the Uniqueness of the Bounded Solution for the Fractional Nonlinear Partial Integro-Differential Equation with Approximations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Deletion Algorithm for the Marginal Problem in Propositional Logic Based on Boolean Arrays

by
Efraín Díaz-Macías
1,*,† and
Serafín Moral
2,†
1
Faculty of Engineering Sciences, State Technical University of Quevedo, Quevedo 120301, Ecuador
2
Department of Computer Science and Artificial Intelligence, University of Granada, 18071 Granada, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2023, 11(12), 2748; https://doi.org/10.3390/math11122748
Submission received: 16 May 2023 / Revised: 14 June 2023 / Accepted: 15 June 2023 / Published: 17 June 2023
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
This paper proposes a deletion algorithm for the marginal problem in propositional logic. The algorithm is based on the general Davis and Putnam deletion algorithm DP, expressed as a bucket elimination algorithm, representing sets of clauses with the same set of variables employing a Boolean array. The main contribution is the development of alternative procedures when deleting a variable which allow more efficient computations. In particular, it takes advantage of the case in which the variable to delete is determined by a subset of the rest of the variables. It also provides a set of useful results and tools for reasoning with Boolean tables. The algorithms are implemented using Python and the NumPy library. Experiments show that this procedure is feasible for intermediate problems and for difficult problems from hard Bayesian networks cases

1. Introduction

The marginal problem consists of computing the consequences of a set of propositional formulae in a reduced subset of variables. The basic algorithm to solve it has been the so-called Davis and Putnam (DP) [1] deletion algorithm. This algorithm is a particular case of the Shenoy–Shafer deletion algorithm [2,3] or the bucket elimination scheme [4,5]. The problem of this algorithm is the space complexity (it tends to produce too many clauses of large size). The time complexity is exponential in the tree-width of its connectivity graph [4]. The computations can be organized in a join tree [6] in the same way as probabilistic computations in Bayesian networks [2,7]. However, in the case of propositional logic, we are in a case of computation in an idempotent valuation system or valuation algebra [3,8], which has some special features which can be exploited in order to build efficient algorithms.
This paper proposes new algorithms for the marginal problem by applying the Shenoy–Shafer abstract framework [2], but it has some differences compared to the DP algorithm. First, it represents sets of clauses as Boolean arrays. This is a semantic representation of the set of true assignments satisfying a given set of clauses, i.e., a truth table. Boolean arrays can be a very efficient representation of a large set of clauses, as the values on the array are simple 0–1 values, though its size is exponential in the number of variables. We define the basic operations and show that the set of Boolean arrays with these operations has the structure of an information algebra [3]. Secondly, we also give alternative deletion procedures and a set of possible optimizations of the full procedure. In this sense, a very important contribution is the study of the cases in which a variable is functionally determined by a set of variables and the exploitation of this fact in the marginalization algorithms. The Boolean array representation is especially appropriate for this improvement.
The final result is a family of tools and algorithms that can solve moderate-sized problems, even in cases where the associated connectivity graph has a large tree width (more than 100), as is shown in the experiments.
A related problem is the satisfiability problem (SAT) [9], which consists of determining if a set of propositional formulae is satisfiable, i.e., there is a true assignment such that every formula becomes true. It is the first problem proven to be NP-complete [10]. This implies that any NP problem can be efficiently reduced to it, and therefore, any good algorithm to solve the SAT could also be used for such an NP problem. In fact, many well-known problems are solved nowadays by encoding them as an SAT  [11,12,13]. Current approaches for SATs are mainly based on the Davis–Putnam–Logemann–Loveland backtracking algorithm (DPLL) [14], and its successors, conflict-driven clause-learning algorithms (CDCL) [15]. The SAT problem can be solved from the marginal problem, by deleting all the variables (marginalizing over the empty set). The result is consistent if and only if this marginalization is vacuous. However, the marginal problem can be used without deleting all the variables for computing the marginal in a given set and it can be used to compute all the solutions (configurations satisfying all the clauses) or to simulate in the space of solutions, in the case of a satisfiable problem [6]. As is said in [5], the marginal problem is a kind of knowledge compilation that can be useful in many other related problems. Furthermore, in a recent survey of SAT-related problems [16], the following was noted: “We see that DP responds to a more difficult problem than the simple problem of satisfiability. Except for a few specific problems of small induced width, DP-60 never really competed with the version of DP-62” (where DP-60 is what we have called DP and DP-62 makes reference to DPLL). For this reason, the basic deletion algorithm has received little attention in the literature. For example, in Knuth’s book [17], covering the existing approaches for SAT, there is only a short reference to the DP approach, saying that it works well for small problems, but that it can be very inefficient in the worst case. Our paper will show that it is possible to solve large problems, even problems with a large tree width. The initial DP algorithm was revisited in  [4,5], but this approach was also based on a clause representation and the main contributions were the determination of good deletion sequences.
Another related problem is propositional model counting or #-SAT, consisting of computing the number of models satisfying a given formula [18]. This is related to probabilistic computations and it is known to be #P-complete. Similar deletion algorithms have been applied; however, as is said in Gogate and Dechter [19], these algorithms are exponential with the tree-width size. In this paper, we take advantage of using idempotent valuations and develop a computation method that can solve some concrete cases, even if the tree width is large, using representations that are not exponential with such a tree width.
We provide some experiments carried out with our Python 3.8.8 implementation based on the NumPy library for representing arrays (the library together with data to reproduce experiments are available as a Github repository at: https://github.com/serafinmoral/SAT-solver (accessed on 16 May 2023)). In them, we show that it is possible to solve some moderate-sized problems, expanding the class of problems solved by the existing deletion algorithms. The contribution is also significant because of the possibilities it opens for future developments, as proposed in the last section of the paper devoted to the conclusions and future work. We have also shown that it is possible to solve the 0–1 problems associated with hard Bayesian networks inference problems. The rest of the paper is organized as follows: in Section 2, the notation and the problem specification are given; Section 3 is devoted to the table representation of sets of clauses, the introduction of the basic operations (combination, marginalization, and conditioning), and the study of their properties; Section 4 studies the basic deletion algorithm and the alternative procedures, providing formal results of their main properties; in Section 5, a set of additional tools for the deletion algorithm are given, and the final decision procedure for the deletion algorithm is described; Section 6 is devoted to the experimental part; and Section 7 details the conclusions and future work. All the proofs of the results in the paper are in Appendix A.

2. Problem Specification

Let V = { p , q , r , s , } a finite set of variables or propositional symbols. A literal is either a variable, p, (positive literal) or a negated variable ¬ p (negative literal). A clause c is a disjunction of literals, p ¬ r s , which we will represent as a set c = { p , ¬ r , s } . If a clause contains literals p and ¬ p , then it will be called trivial or tautology. The set of all non-trivial clauses defined for variables V is denoted as L ( V ) .
Information will be given as a finite set of non-trivial clauses C. The set of variables appearing in clause c will be denoted by V ( c ) . If C is a set of clauses, then V ( C ) = c C V ( c ) .
A true assignment, t, is a mapping from V to { F a l s e , T r u e } . A true assignment t satisfies the set of clauses C when for each c C , there is a positive literal, p c , with t ( p ) = T r u e or a negative literal, ¬ r c , with t ( r ) = F a l s e .
Two sets of clauses, C 1 and C 2 , defined for the same set of variables V, will be considered logically equivalent, C 1 C 2 , when for each true assignment t, we have that t satisfies C 1 if and only if t satisfies C 2 .
The basic syntactic operation with clauses is resolution: given two clauses c 1 and c 2 such that there is a variable r such that r c 1 and ¬ r c 2 , then the resolution of c 1 and c 2 by r is the clause R ( c 1 , c 2 , r ) = ( c 1 \ { r } ) ( c 2 \ { ¬ r } ) . A set of clauses C is said to be complete under set of variables V if and only if for each c 1 , c 2 C , if R ( c 1 , c 2 , r ) is not trivial, then R ( c 1 , c 2 , r ) C and if c C and c c L ( V ) (i.e., c is subsumend by c), and then c C . Given a set of clauses, C, there is always a minimum set of clauses (in the sense of inclusion) which contains C and is complete (the set obtained by adding to C all the clauses obtained by resolution and subsumption). It will be denoted by C ( C ) . It is clear that two sets of clauses, C 1 and C 2 , defined for the same set of variables V, are equivalent if and only if C ( C 1 ) = C ( C 2 )
A set of clauses C implies clause c if and only if c C ( C ) and this is denoted as C c . It is well known that this is equivalent to the fact that any true assignment t satisfying C also satisfies c. If C 1 and C 2 are two sets of clauses and for each c C 2 we have that C 1 c , it can be said that C 1 implies C 2 , which is denoted by C 1 C 2 . This is equivalent to the fact that any satisfying true assignment for C 1 is also a true assignment for C 2 . It is also well known that C 1 and C 2 are equivalent if and only if C 1 C 2 and C 2 C 1 .
The satisfiability problem (SAT) consists of determining if for a given set of clauses C there is a satisfying true assignment t for it. It is clear that if C is empty, then the answer is positive, and if the empty clause belongs to C, the answer is negative. This is an NP-complete problem [10], and therefore, hard to solve.
The algorithms in this paper will be based on the Davis–Putnam deletion algorithm  [1]. The basic step of this algorithm is the deletion of a variable in a set of clauses C, denoted as C v . This operation is carried out with the following steps:
(1)
Compute C v + = { c C : v s . c }
C v = { c C | ¬ v s . c }
C v 0 = C \ ( C v + C v )
(2)
The result is
C v = { R ( c , c , v ) : c C v + , c C v , R ( c , c , v )   non   trivial } C v 0 .
An algorithm for SAT based on the deletion of variables is depicted in Algorithm 1. At the end of the loop, as all the variables have been deleted, one of the two conditions ( C = or C ) must be satisfied. Its efficiency depends of the order of removing variables; however, in general, this algorithm has the problem of producing too many clauses that are large in size (number of literals).
It is also advisable to make some basic operations on C so that a simpler equivalent set is obtained. For example, it is always good to remove subsumed clauses, i.e., if c , c C and c c , then remove c from C. Unit propagation is another important step. For that, we need the concept of restriction of a set of clauses C to a literal , which is the set of clauses that is equivalent to C under the condition that is true. It will be denoted as U ( C , l ) and it is the set of clauses obtained from C by removing any clause containing and removing ¬ l from the clauses containing ¬ l . It is important to remark that the true assignments satisfying U ( C , l ) are the same as those satisfying C with set to T r u e .
Algorithm 1 David-Putnam deletion algorithm
Require: 
C, a set of clauses.
Ensure: 
s a t , a logical value indicating whether C is satisfiable.
  1:
procedure DP(C)
  2:
     V variables of C
  3:
    for   v V  do
  4:
           C C v
  5:
          if   C =  then
  6:
               s a t T r u e
  7:
              Break
  8:
          end if
  9:
          if   C  then
  10:
              s a t F a l s e
  11:
             Break
  12:
         end if
  13:
    end for
  14:
    return  s a t
  15:
end procedure
Unit propagation consists of transforming C containing a unit clause with a single literal { l } C , into U ( C , l ) { { l } } . This operation is repeated for each literal appearing in a unit clause c = { l } C .
If C is a set of clauses defined for variables V and V is a subset of V, then the marginalization of C to V is the set of clauses C ( C ) L ( V ) , i.e., the set of all the clauses that are defined for variables in V and are a consequence of the clauses in C. The marginalization to V will be denoted as C V \ V making reference to the set of removed variables. The David and Putnam algorithm has two main advantages against other satisfiability algorithms [5,6]. The first one is that the deletion algorithm is really an algorithm to compute the marginal information and has many other utilities. Then, if the loop starting in step 3 of the algorithm is applied for variables v V , then the value of C will be equivalent to C V . This is a consequence of the fact that each step carries out a deletion of a variable v V and that the set of clauses is a particular case of the Shenoy–Shafer axiomatic framework for local computation [2,6].

3. Table Representation of Sets of Clauses

A set of clauses C defined for variables V can be represented by a table with a dimension for each variable v V . If v V , we consider the set Ω v = { ¬ v , v } and Ω V = v V Ω v , where ∏ stands for Cartesian product. An element from Ω V will be denoted in boldface v . The component v of vector v will be denoted as v v . If V V , the subvector of V components will be denoted by v V \ V , making reference to the removed components.
A table T for variables V will be a mapping T : Ω V { 0 , 1 } . The set of variables of table T will be denoted by V ( T ) . A table T needs 2 | V ( T ) | bits to be represented.
To simplify the notation, if a table T is defined for variables V and being V V , v Ω V , we will assume that T ( v ) = T ( v V \ V ) , i.e., a table can be applied to a larger frame than the actual frame in which it is defined, simply by ignoring the components not in the set of variables for which it is defined.
The set of all tables defined for variables V will be denoted as T V .
A set of clauses C is represented by a table T if T ( v ) = 1 if and only if the true assignment given by t v ( v ) = T r u e when v v = v and t v ( v ) = F a l s e when v v = ¬ v satisfies C. This is a semantic representation of the set of clauses C by determining the true assignments satisfying all the clauses, i.e., a truth table.
To simplify the notation, we will consider that vector v and t v are equivalent, so that v can also be called a true assignment, but in fact making reference to t v . Then, in the tables, a true assignment will be a vector v , and this true assignment satisfies table T when T ( v ) = 1 . The set of true assignments satisfying a table T will be T ( T ) = { v Ω V ( T ) : T ( v ) = 1 } .
The trivial table is the table T e with T e ( v ) = 1 for any v Ω V and the contradictory table T 0 is the table with T 0 ( v ) = 0 for any v Ω V . A trivial table represents the empty set of clauses and a contradictory table; an unsatisfiable set of clauses.
Given C, a table T can be easily computed starting with a trivial table, and then for each c C , we make T ( v ) = 0 for each v , such that for each literal l c associated with variable v, we have that v v l , i.e., v v = v if l = ¬ v and v v = ¬ v if l = v .
Given a set of clauses C defined for variables V, the direct table representation is unfeasible if the number of variables of V is not small, given that the table size is 2 | V | , where | V | is the cardinal of V. For this reason, to represent a set of clauses C, first, we will partition the full set of clauses in small sets, C 1 , , C k , each one of them defined for a small set of variables V 1 , , V k . Then, the set C will be represented by the set of tables T 1 , , T k , where T i is the table representing C i .
In our experiments, the partition has been computed by the following steps:
  • Carry out unit propagation in C;
  • Group together clauses defined for the same set of variables, i.e., if V ( c ) = V ( c ) , then c , c C i ;
  • Remove sets defined for non-maximal sets of variables, i.e., if C i is such that there is another set of variables C j such that V ( C i ) V ( C j ) , then C j is updated to C i C j and C i is removed.
This is a procedure that we have found reasonable, but it is not the only possible one. We find that steps 1 and 2 are basic, but that there can be other alternative ways of performing step 3. For example, joining into a set, C i and C j , if | C i C j | is not small and | C i C j | is not too large, where large and small can be defined in terms of a couple of parameters n 1 , n 2 in each case.
There are three basic operations with tables:
  • Combination. If T 1 and T 2 are tables, then its combination is the table T 1 T 2 defined for set of variables V = V ( T 1 ) V ( T 2 ) and given by:
    ( T 1 T 2 ) ( v ) = min { T 1 ( v ) , T 2 ( v ) } .
    When considering T 1 ( v ) , it is important to notice that this value is T 1 ( v V \ V ( T 2 ) ) .
  • Variable deletion. If T is a table and V V ( T ) , then table T V is the table defined for variables V and given by:
    T V ( v ) = max { T ( v ) : v Ω V , v V = v } .
    T V will also be called the marginalization of T to V ( T ) \ V . If V = { v } , then T V will also be denoted as T v .
  • Conditioning. If T is a table and is a literal associated with v V ( T ) , then the conditioning of T to is the table U ( T , l ) defined for variables V = V ( T ) \ { v } and given by:
    U ( T , l ) ( v ) = T ( v , l ) ,
    The conditioning operator can be extended to a partial true assignment: if T is a table, V V ( T ) and v is a true assignment for variables V , then U ( T , v ) is the table defined for variables V = V ( T ) \ V and given by:
    U ( T , v ) ( v ) = T ( v , v ) ,
    where v = ( v , v ) is the vector in Ω V with v V = v and v V = v .
    We will assume that conditioning can be applied even if the variable v associated with is not in V ( T ) . In this case, U ( T , l ) = T . Analogously, if V is not included in T ( V ) and v Ω V , we consider that U ( T , v ) = U ( T , v V \ V ( T ) ) .
The following facts are immediate:
  • If T 1 and T 2 are two tables associated with C 1 and C 2 , respectively, then T 1 T 2 is associated with C 1 C 2 .
  • If table T is associated with C, then T V is associated with C V and U ( T , l ) is associated with U ( C , l ) .
We will now give an example, comparing the computations with clauses and tables.
Example 1.
Assume that we have a set C of clauses given by:
{ p , q , ¬ r } , { p , ¬ q , r } , { ¬ p , q , r } , { q , r , ¬ s } , { q , ¬ r , s } , { ¬ q , r , s }
Assume that we want to delete variable q. Then, we have to compute all the resolutions of clauses containing q with clauses containing ¬ q . After eliminating trivial clauses, the result is given by set of clauses C q :
{ p , r , ¬ s } , { ¬ p , r , s }
Using a table representation, we can build a table T 1 for the first three clauses and a table T 2 for the last three ones with the following values:
( p , q , r ) 1 ( p , q , ¬ r ) 0 ( p , ¬ q , r ) 0 ( p , ¬ q , ¬ r ) 1 ( ¬ p , q , r ) 0 ( ¬ p , q , ¬ r ) 1 ( ¬ p , ¬ q , r ) 1 ( ¬ p , ¬ q , ¬ r ) 1 , ( q , r , s ) 1 ( q , r , ¬ s ) 0 ( q , ¬ r , s ) 0 ( q , ¬ r , ¬ s ) 1 ( ¬ q , r , s ) 0 ( ¬ q , r , ¬ s ) 1 ( ¬ q , ¬ r , s ) 1 ( ¬ q , ¬ r , ¬ s ) 1
The combination T 1 T 2 will be the table defined for variables { p , q , r , s } and given by:
( p , q , r , s ) 1 ( p , q , r , ¬ s ) 0 ( p , q , ¬ r , s ) 0 ( p , q , ¬ r , ¬ s ) 0 ( p , ¬ q , r , s ) 0 ( p , ¬ q , r , ¬ s ) 0 ( p , ¬ q , ¬ r , s ) 1 ( p , ¬ q , ¬ r , ¬ s ) 1 ( ¬ p , q , r , s ) 0 ( ¬ p , q , r , ¬ s ) 0 ( ¬ p , q , ¬ r , s ) 0 ( ¬ p , q , ¬ r , ¬ s ) 1 ( ¬ p , ¬ q , r , s ) 0 ( ¬ p , ¬ q , r , ¬ s ) 1 ( ¬ p , ¬ q , ¬ r , s ) 1 ( ¬ p , ¬ q , ¬ r , ¬ s ) 1
If in this combination, variable q is deleted by marginalization, table ( T 1 T 2 ) q is obtained:
( p , r , s ) 1 ( p , r , ¬ s ) 0 ( p , ¬ r , s ) 1 ( p , ¬ r , ¬ s ) 1 ( ¬ p , r , s ) 0 ( ¬ p , r , ¬ s ) 1 ( ¬ p , ¬ r , s ) 1 ( ¬ p , ¬ r , ¬ s ) 1
which is exactly the table associated with the two clauses in C q .
The following properties show that tables satisfy the basic Shenoy–Shafer axioms, and therefore, local computation is possible:
  • Combination is commutative and associate: T 1 T 2 = T 2 T 1 , T 1 ( T 2 T 3 ) = ( T 1 T 2 ) T 3 ;
  • If V ( T ) = V and V 1 , V 2 are two disjoint subsets of V, then V ( V 1 V 2 ) = V V 1 V 2 ;
  • If V ( T 1 ) = V 1 and V ( T 2 ) = V 2 , then ( V 1 V 2 ) V 2 \ V 1 = V 1 V 2 V 2 \ V 1 .
The tables also satisfy the idempotent property: if T is a table and V V ( T ) , then T T V = T . As a consequence of these properties, if we consider a set of variables V and T = V V T V , T is said to be information algebra [3,8]. It also has a neutral element, T e , and a null element, T 0 .
In information algebra, it is always possible to define a partial order, which in this case is the following: if T 1 and T 2 are tables defined for sets of variables V 1 and V 2 , respectively, then we say that T 1 T 2 if and only if for each v Ω V 1 V 2 , we have that T 1 ( v ) T 2 ( v ) . The intuitive idea is that T 2 contains more or the same information than T 1 (any true assignment satisfying T 2 will also satisfy T 1 ).
The following properties of this relation are immediate:
  • If T is a table and V V , then T V T ;
  • If T 1 and T 2 are tables, then T 1 ( T 1 T 2 ) ;
  • If T 1 , T 2 , T 3 are tables, then ( T 1 T 2 ) T 3 if and only if T 1 T 3 and T 2 T 3 .
Two tables, T 1 and T 2 , are said to be equivalent, T 1 T 2 , if and only if T 1 T 2 and T 2 T 1 . If T 1 and T 2 are defined on V 1 and V 2 , respectively, and T e 1 and T e 2 are the neutral tables in T V 2 \ V 1 and T V 1 \ V 2 , respectively, we can immediately prove that T 1 and T 2 are equivalent if and only if T 1 T e 1 = T 2 T e 2 . The multiplication by the neutral element is necessary for tables to be defined for the same set of variables. If T 1 and T 2 are equivalent and V ( T 1 ) = V ( T 2 ) , then they are identical, i.e., T 1 = T 2 . The quotient set of T under this equivalence relation is called the domain-free valuation algebra associated with the valuation algebra [3]. In the following, we will consider that we work with equivalent classes of tables, and that a table can be changed into any equivalent one. All the neutral tables T e defined in different sets of variables are equivalent. Furthermore, the contradictory tables T 0 are equivalent. As a consequence, we will not make reference to the set of variables in which they are defined.
There is a disjunction operation [3,8] which can be defined on the set of tables: if T 1 and T 2 are tables, then its disjunction is the table T 1 T 2 defined for the set of variables V = V ( T 1 ) V ( T 2 ) and given by:
( T 1 T 2 ) ( v ) = max { T 1 ( v ) , T 2 ( v ) } .
It is immediately clear that disjunction is commutative and associative. Furthermore, that combination satisfies the distributive property with respect to disjunction and disjunction is also distributive with respect to the combination. In fact, we have the Boolean information algebra from [3], being the complementary to T in the table T c = 1 T .
We have some interesting properties relating disjunction with the basic table operations.
Proposition 1.
If T is a table and v V ( T ) , then T v = U ( T , v ) U ( T , ¬ v ) .
Proof. 
See Proposition A1 in Appendix A.    □
Proposition 2.
If T 1 and T 2 are tables and v V ( T 1 ) V ( T 2 ) , then ( T 1 T 2 ) v = T 1 v T 2 v .
Proof. 
See Proposition A2 in Appendix A.    □

4. Deletion Algorithm with Tables

We assume that we have some information represented by a set of tables: H = { T 1 , , T k } . This set is intended to be a representation of the combination: ( H ) = T 1 T k . As the size of the tables increases exponentially with the number of dimensions of the tables and V ( ( H ) ) = i = 1 k V ( T i ) , then the representation by a set can use much less space than the representation by a single table (except when the tables are defined for the same set of variables, but this is never the case according to our procedure to build the initial tables). V ( ( H ) ) will be denoted as V ( H ) . The total size of the tables in H will be T H 2 | V ( T ) | . For example, if we have three tables defined for variables { p 1 , p 2 } , { p 2 , p 3 , p 4 } , { p 4 , p 5 , p 6 } , the size of the tables will be 4 + 8 + 8 = 16 , but their combination will be defined for V ( H ) = { p 1 , p 2 , p 3 , p 4 , p 5 } , which corresponds to a table of size 32.
If V = V ( ( H ) ) , vector v Ω V satisfies the set of tables H when T i ( v ) = 1 for any T i H .
We will say that two sets, H 1 , H 2 , are equivalent if and only if ( H 1 ) is equivalent to ( H 2 ) . We also say that H 1 H 2 when ( H 1 ) ( H 2 ) .
The following properties are immediate:
  • If H 1 and H 2 are equivalent, then H H 1 will be equivalent to H H 2 ;
  • H 1 H 2 if and only if { T } H 2 for any T H 1 ;
  • If H H , then ( H \ H ) ( H ) is equivalent to H;
  • If H H , then H H is equivalent to H.
The set of true assignments of a set H will be the T ( H ) = T H T ( T ) , where the intersection of two sets defined on different sets of indexes is defined as follows: if R 1 Ω V 1 , R 2 Ω V 2 , then R 1 R 2 = { v Ω V 1 V 2 : v V 2 \ V 1 R 1 , v V 1 \ V 2 R 2 } .
The operations with tables can be translated to set of tables, taking into account that an operation on set H is really carried out on ( H ) and that equivalent tables represent the same information. Some operations can be carried out in a simple way:
  • The combination can be completed by a simple union:
    H 1 H 2 = H 1 H 2 .
  • The disjunction of sets of tables can be computed as follows:
    H 1 H 2 = { T 1 T 2 : T 1 H 1 , T 2 H 2 } .
  • The conditioning is the conditioning of its tables:
    U ( H , l ) = { U ( T , l ) : T H } .
The deletion of variables is the more complex operation for sets of tables. In the following, we describe several methods to carry out this operation.
The marginal problem is as follows: given a set of tables, H, and V V ( H ) , then compute H such that ( H ) is equivalent to ( H ) V . This set will also be denoted as H V , considering that we are computing an element of the equivalence class.
As tables satisfy the basic Shenoy–Shafer axioms, the marginalization can be computed with basic Algorithm 2, which is very similar to Algorithm 1, but now expressed as a marginalization algorithm and with information represented by tables. Procedure Marginalize0 ( H v ) is very simple and it is depicted in Algorithm 3, where the condition of T equivalent to T 0 seems a bit artificial and unnecessary; it was included to show the similarity with other variants of this operation that we will introduce. When implemented, this test is also carried out, and when T is equivalent to T 0 , we return the contradictory table which is defined for the empty set of variables and contains a single 0 value.
Algorithm 2 Deletion algorithm
Require: 
H, a set of tables.
Require: 
V , a set of variables to remove.
Ensure: 
H , a set of tables representing H V .
  1:
procedure Deletion (H, V )
  2:
      for  v V  do
  3:
           H v tables T H such that v V ( T )
  4:
           ( R 1 , R 2 )  Marginalize0 ( H v , v)
  5:
          if  R 1 contains T 0  then
  6:
               return  { T 0 }
  7:
          else
  8:
                H ( H \ H v ) R 1
  9:
          end if
  10:
     end for
  11:
     return H
  12:
end procedure
This algorithm can be used to solve the satisfiability problem: if for function Deletion(H, V ), V = V ( H ) , then all the variables are removed and H will only contain tables defined for the empty set of variables which have a single value. Taking into account that trivial T e = 1 tables are not introduced, there are only two possibilities: H contains T 0 and then the problem is unsatisfiable or H = and the problem is satisfiable. However, the algorithm can also be used to compute marginal information and to compile the information in initial set H. This compilation is based on the following result.
Algorithm 3 Basic version of marginalize
Require: 
H v , a set of tables containing variable v.
Require: 
v, the variable to remove
Ensure: 
R 1 , a set of tables representing H v v .
Ensure: 
R 2 , a set of tables containing v.
  1:
procedure Marginalize0 ( H v , v)
  2:
      T 1 ( H v )
  3:
      T 2 T 1 v
  4:
     if  T 2 is equivalent to T e  then
  5:
           R 1
  6:
     else
  7:
          if  T 2 is equivalent to T 0  then
  8:
               R 1 { T 0 }
  9:
          else
  10:
              R 1 { T 2 }
  11:
         end if
  12:
     end if
  13:
      R 2 { T 1 }
  14:
     return  ( R 1 , R 2 )
  15:
end procedure
Proposition 3.
In each application of  Marginalize0 ( H v ) , we have that H v is equivalent to R 1 R 2 . Furthermore, if we bring H into the update set H, computed in Step 8 of Algorithm 2, then H is equivalent to H R 2 .
Proof. 
See Proposition A3 in Appendix A.    □
As a consequence of this result, when applying the deletion algorithm, if we call R 2 [ v ] to the R 2 set which is obtained after removing variable v, we have that H is equivalent to ( v V ( H ) R 2 [ v ] ) H V ( H ) . It is important to remark that H V ( H ) is the result of removing all the variables, and then a table from this set is defined for the empty set of variables, which is a number. There are two possibilities: first, if H is satisfiable, then all the tables are 1 ( T e ) and can be removed, representing H V ( H ) by the empty set. Second, if H is unsatisfiable, H V ( H ) will contain T 0 , and the full H is equivalent to it: { T 0 } . Let us call this set R 2 [ ] .
If H ( V ) = { v 1 , , v k } and the variables are removed in this order: ( v 1 , , v k ) , then we have that H { v 1 , , v i } is equivalent to ( j = i + 1 k R 2 [ v j ] ) R 2 [ ] . Therefore, H { v 1 , , v i } is equivalent to H { v 1 , , v i + 1 } R 2 [ v i + 1 ] . In this way, we do not only compute the marginal information, but with sets R 2 [ v i ] , we have the necessary information to recover in a backward way the marginal sets: from the marginal in which all the variables are removed, R 2 [ ] , to the marginal in which no variable is removed, H. This fact can be useful, among other things to obtain the true assignments satisfying all the tables, in case they are satisfiable. The following result provides a procedure to obtain the true assignments satisfying a set of tables H computed from sets of tables { R 2 [ v i ] } obtained when applying a deletion algorithm.
Proposition 4. 
Assume a set of tables H and that Algorithm 2 is applied removing variables in V ( H ) in order ( v 1 , , v k ) . Assume also that R 2 [ ] is equivalent to the empty set, i.e., the problem is satisfiable; then, if T i is the set of true assignments satisfying the set of tables H { v 1 , , v i } and T 0 is the true assignments of H, then these sets can be computed in reverse order of i = 1 , , k in the following way:
  • Start with T i = ;
  • Make T k equal to the set containing an empty vector v 0 Ω ;
  • For each v i + 1 T i + 1 , compute T i + 1 , which is the only table in U ( R 2 [ v i + 1 ] , v i + 1 ) , which is a table defined only for variable v i + 1 , then this table is never equal to T 0 , and if v i 1 and v i 2 are the true assignments obtained by extending v i + 1 to variables { v i + 1 , , v k } and given by v i 1 = ( v i + 1 , v i + 1 ) , v i 2 = ( ¬ v i + 1 , v i + 1 ) , i.e., by considering v i + 1 true and false, respectively, then:
    if T i + 1 = T e , add v i 1 and v i 2 to T i ;
    if T i + 1 ( v i + 1 ) = 1 , T i + 1 ( ¬ v i + 1 ) = 0 , add v i 1 to T i ;
    if T i + 1 ( v i + 1 ) = 0 , T i + 1 ( ¬ v i + 1 ) = 1 , add v i 2 to T i .
Proof. 
See Proposition A4 in Appendix A.   □
This result is the basic for algorithms to compute one solution, all the solutions, or a random solution given a satisfiable set of clauses. It should start with a T k , which is a set containing an empty vector v 0 Ω . The main difference in these algorithms is when T i + 1 = T e . When computing one solution, we only pick v i 1 or v i 2 , when computing all the solutions, both v i 1 and v i 2 are selected, and when computing a random solution, there is a random selection of v i 1 or v i 2 . In the last case, an importance sampling algorithm in the set of solutions is obtained: starting with a weight of 1.0 , each time we have a random selection, the weight must be multiplied by 2.0 .
In Algorithm 3, we have described the basic marginalization operation, which is the same as the one applied to the general valuation-based systems [2]. However, Boolean tables allow other alternative forms of marginalization. The first one is depicted in Algorithm 4 and determines that it is not necessary to combine all the tables in order to compute the marginal. In fact, only pairwise combinations are necessary.
Algorithm 4 Pairwise combination version of marginalize
Require: 
H v , a set of tables containing variable v.
Require: 
v, the variable to remove
Ensure: 
R 1 , a set of tables representing H v v .
Ensure: 
R 2 , a set of tables containing v.
  1:
procedure Marginalize1 ( H v , v)
  2:
      R 1 = { ( T i T j ) v : T i , T j H v }
  3:
      R 2 H v
  4:
     if  R 1 contains T 0  then
  5:
           R 1 { T 0 }
  6:
     end if
  7:
     Remove neutral tables T e from R 1
  8:
     return  ( R 1 , R 2 )
  9:
end procedure
The following proposition shows that R 1 is also a set of tables representing H v v .
Proposition 5.
If H v is a set of tables containing variables v, then R 1 computed in Algorithm 4 represents H v v .
Proof. 
See Proposition A5 in Appendix A.   □
Once this is proved, then we can replace Marginalize0 by Marginalize1 in the deletion algorithm and everything works, even the method to compute the solutions given in Proposition 4. The only difference is that now R 2 [ v i + 1 ] contains, in general, more than one table, and when computing U ( R 2 [ v i + 1 ] , v i + 1 ) , we have to condition every table in R 2 [ v i + 1 ] , being the result a set of tables depending on variable v i + 1 . These tables are combined to produce T i + 1 .
The main difference between Marginalize0 and Marginalize1 is that the former produces an unique table in R 1 and R 2 , while the latter produces several tables in both sets, but with smaller size. As the size of a table T is 2 | V ( T ) | , in general, Marginalize1 is more efficient, but we have to take into account that the number of tables in R 1 is quadratic in relation with the number of tables in H v and this fact should be taken into account.
However, there is another very important alternative marginalization when we have a variable which is functionally dependent of other variables [20]. This happens very often, especially in problems encoding circuits [21].
If T is a table and v V ( T ) , we say that v is functionally determined in T if and only if U ( T , v ) U ( T , ¬ v ) is equivalent to T 0 . If V = V ( T ) \ { v } , this implies that for any v Ω V we have that ( U ( T , v ) U ( T , ¬ v ) ) ( v ) = 0 , i.e., either U ( T , v ) ( v ) = T ( v , v ) = U ( T , v ) ( v ) = 0 or U ( T , ¬ v ) ( v ) = T ( v , ¬ v ) = U ( T , v ) ( ¬ v ) = 0 , i.e., for each v , there is, at most, one possible value for variable V, v or ¬ v for which table T has a value of 1 (at least, one of the values v or ¬ v is impossible). This definition generalizes definition given in terms of clauses in [20], which also requires that T v is equivalent to the neutral element.
In the case that there is a table T H v such that v is functionally determined on T, then marginalization can be done as in Algorithm 5.
This marginalization is much more efficient, as the number of the tables in R 1 is smaller than in Marginalize1. In fact, the number of tables in the problem does not increase: the number of tables in R 1 is always less or equal than the number of tables in H v . We give an example illustrating the benefits of using this marginalization.
Algorithm 5 Marginalize with functional dependence
Require: 
H v , a set of tables containing variable v.
Require: 
v, the variable to remove
Require: 
T a table in which v is functionally determined
Ensure: 
R 1 , a set of tables representing H v v .
Ensure: 
R 2 , a set of tables containing v.
  1:
procedure Marginalize2 ( H v , v, T)
  2:
      R 1 = { ( T T ) v : T H v }
  3:
      R 2 { T }
  4:
     if  R 1 contains T 0  then
  5:
           R 1 { T 0 }
  6:
     end if
  7:
     Remove neutral tables T e from R 1
  8:
     return  ( R 1 , R 2 )
  9:
end procedure
Example 2.
Assume that we are deleting variable p and that we have three tables:
T 1 ( p , q ) , T 2 ( p , r , s ) , T 3 ( p , u , v ) .
If  Marginalize0 is applied, then we have to combine the three tables producing a table ( T 1 T 2 T 3 ) ( p , q , r , s , u , v ) , which depends on six variables and has a size of 64. If Marginalize1 is applied, then we have to compute ( T 1 T 2 ) ( p , q , r , s ) , ( T 1 T 3 ) ( p , q , u , v ) , ( T 2 T 3 ) ( p , r , s , u , v ) (the combination of a table by itself is not necessary, as this produces the same table which is included in one of these combinations), i.e., we have more tables but of smaller size (16 + 16 + 32). However, if it is known that p is determined by q in table T 1 , then, for  Marginalize2, only combinations ( T 1 T 2 ) ( p , q , r , s ) , ( T 1 T 3 ) ( p , q , u , v ) are necessary and the result is based on two tables of sizes 16+16, producing an important saving in space and computation. Please note that to finish the marginalization step, variable p should be removed on the computed tables by marginalization, but this step has been omitted in order to keep the notation simpler.
This marginalization is correct, as R 1 is equivalent to H v v and H v is equivalent to R 1 R 2 , as the following results shows.
Proposition 6.
If ( R 1 , R 2 ) are the sets computed in  Marginalize2 and the initial conditions required by the algorithm are satisfied; then, H v v is equivalent to R 1 and H v is equivalent to R 1 R 2 .
Proof. 
See Proposition A6 in Appendix A.   □

5. Additional Processing Steps and Marginalization Strategy

In this section, we introduce some additional steps which can be added to the basic deletion algorithm to improve efficiency or to enlarge the family of problems that can be solved.

5.1. Combining Tables

Many times we have in a problem two tables T 1 and T 2 , such that V ( T 1 ) V ( T 2 ) . We can substitute them by its combination T = T 1 T 2 obtaining an equivalent problem. Doing this can have potential advantages: it reduces the size of the problem specification, as the size of T is the same than the size of T 2 , but it also increases the chances of having a variable which is functionally dependent of the others. Remember that v is determined in T when U ( T , v ) U ( T , ¬ v ) = T 0 , so if T T , then v will be also determined in T . We can consider the procedure CombineIncluded(H) which compares the sets of variables of each pair of tables T 1 , T 2 H , and substitutes the pair by its combination when one set is included in the other. One important point is the sets to which this procedure applies. In Algorithm 2, CombineIncluded(H) could be applied to the set H after each deletion of a variable, to the set H v , or to R 1 . Applying it to H could be very costly, as the number of tables can be high and we have to compare all the pairs of tables. A similar effect can be obtained by applying it to set H v , which are the tables effectively used in each deletion step, but which is of smaller size. It is also important to apply it to set R 1 , as it can significantly reduce the number of tables, which can be high when Marginalizalize1 is used to delete a variable.
In the case of H v , we have already implemented a more aggressive combination method, which combines two tables T 1 and T 2 when 2 V ( T 1 T 2 ) 2 V ( T 1 ) + 2 V ( T 2 ) . This always happens when V ( T 1 ) V ( T 2 ) , but also when | V ( T 1 ) | = | V ( T 2 ) | and V ( T 1 ) = V ( T 2 ) \ { u } { v } , i.e., tables are defined for the same variables, except for variables u V ( T 2 ) , v V ( T 1 ) . We will call to this procedure GroupTables(H), which will be applied to H v in the case of Marginalize1 when the number of tables in H is greater or equal is greater or equal than a given threshold, N.
Example 3.
Assume three tables, i.e., T 1 for variables p , q , T 2 for variables q , r , and T 3 for variables p , r , that are given by:
( p , q ) 1 ( p , ¬ q ) 0 ( ¬ p , q ) 1 ( ¬ p , ¬ q ) 1 , ( q , r ) 0 ( q , ¬ r ) 1 ( ¬ q , r ) 1 ( ¬ q , ¬ r ) 1 , ( p , r ) 1 ( p , ¬ r ) 1 ( ¬ p , r ) 0 ( ¬ p , ¬ r ) 1
If GroupTables(H) is applied, then tables T 1 and T 2 are combined, as the result has a size of 8 and the combined tables sizes are 4 + 4. Then, table T 3 is also combined, as it is defined for a set of variables included in the variables of the combination of T 1 T 2 . The result is that the three tables are replaced by their combination, which is given by the following table:
( p , q , r ) 0 ( p , q , ¬ r ) 1 ( p , ¬ q , r ) 0 ( p , ¬ q , ¬ r ) 0 ( ¬ p , q , r ) 0 ( ¬ p , q , ¬ r ) 1 ( ¬ p , ¬ q , r ) 0 ( ¬ p , ¬ q , ¬ r ) 1
Observe that this table is smaller than the sum of the sizes of the three combined tables. Furthermore, in the combination, we can observe that r is determined by ( p , q ) , for which  Marginalize2 can be applied.

5.2. Unidimensional Arrays

Unidimensional arrays have a special consideration. If we are going to introduce in H a table T and | V ( T ) | = 1 , then there are three possibilities:
  • T is equal to T e , the neutral element. In this case, the table is not introduced in H.
  • T is equal to T 0 , the contradiction. In this case, the whole set is equivalent to { T 0 } and all the other tables are removed from H. In fact, in our implementation, the equality to T 0 is checked for any table to be introduced to H, whatever its dimension is.
  • In T, there is a value with value 0 and other with value 1. If is the literal with value 1, this table is equivalent to the unit literal and we can carry out a unit propagation. This is done by transforming any other table T H into U ( T , l ) and finally introducing table T. In our implementation, instead of introducing the table T, we keep a set of unit literals, and each time another table is introduced in H, the conditioning to the literals in this set is carried out.
Example 4.
Consider the table T about p , q given by:
( p , q ) 1 ( p , ¬ q ) 0 ( ¬ p , q ) 1 ( ¬ p , ¬ q ) 1
If a table, T, with p is introduced with T ( ¬ p ) = 1 , T ( p ) = 0 , then we can transform table T into U ( T , ¬ p ) , which is again an unidimensional table defined for q with U ( T , ¬ p ) ( q ) = 1 , U ( T , ¬ p ) ( ¬ q ) = 0 . This table may produce further simplifications of tables containing variable q.

5.3. Splitting

The size of the tables in H v is important when applying the deletion step: the smaller the size, the faster this step can be completed. For that reason, we have implemented a procedure to reduce this size. In order to do this, for each T H v , Algorithm 6 is applied, where Minimize ( T 1 , T 2 , T, V) is a procedure that tries to make T 1 as smaller as possible (by marginalization) under the condition that T = T 1 T 2 and that variables in V cannot be removed. The details of this minimizing algorithm can be found in Algorithm 7.
When this split is applied to any table T in H v , each time T is split into T 1 , T 2 with | V ( T 1 ) | < | V ( T ) | , then T is changed to T 1 , T 2 in H and to T 1 in H v . The general procedure making this transformation is called SplitG( H v , H).
Algorithm 6 Splitting a table before deleting a variable
Require: 
T, a table containing variable v.
Require: 
v, the variable to remove
Ensure: 
T 1 , a table containing v.
Ensure: 
T 2 , a table not containing v.
  1:
procedure Split(T, v)
  2:
       T 2 T v
  3:
       T 1 T
  4:
       T 1  Minimize( T 1 , T 2 , T, { v } )
  5:
      return  ( T 1 , T 2 )
  6:
end procedure
Algorithm 7 Minimizing the splitting table
Require: 
T 1 , T 2 , T , tables such that T 1 T 2 = T .
Require: 
V, a set of variables that can not be removed.
Ensure: 
M a table in which T 1 is minimized.
  1:
procedure Minimize( T 1 , T 2 , T, V)
  2:
      if  V ( T 1 ) \ V =  then
  3:
            M T 1
  4:
           return M
  5:
      end if
  6:
      Let v be an element from V ( T 1 ) \ V
  7:
       T 3 T 1 v
  8:
      if  T 3 T 2 = T  then
  9:
           return Minimize( T 3 , T, T 2 , V { v } )
  10:
     else
  11:
          return Minimize( T 1 , T, T 2 , V { v } )
  12:
     end if
  13:
end procedure
Example 5.
Assume that in our set of tables, we have a table T with variables p , q , r given by:
( p , q , r ) 1 ( p , q , ¬ r ) 0 ( p , ¬ q , r ) 0 ( p , ¬ q , ¬ r ) 0 ( ¬ p , q , r ) 1 ( ¬ p , q , ¬ r ) 0 ( ¬ p , ¬ q , r ) 1 ( ¬ p , ¬ q , ¬ r ) 1
If we want to remove variable r, then instead of using this table, we can try to split it into two tables, one of them not depending on r. Then we first compute the marginal T 2 = T r , which is given by:
( p , q ) 1 ( p , ¬ q ) 0 ( ¬ p , q ) 1 ( ¬ p , ¬ q ) 1
Next, we minimize T conditioned to T 2 obtaining in this case the following table T 1 :
( q , r ) 1 ( q , ¬ r ) 0 ( ¬ q , r ) 1 ( ¬ q , ¬ r ) 1
In this way, we obtain the decomposition T = T 1 T 2 ; we can change T in our set of tables by the two tables T 1 , T 2 , and then, as T 2 does not depend on r, when deleting this variable, only T 1 has to be considered, which has a lower dimension than the original table T, simplifying in this way the deletion step.

5.4. Minimizing the Dependence Set

If we have the set of tables H v and there is a table T H v in which v is functionally determined, then the deletion is, in general, quite efficient, but it also depends on the size of T. If there is a table T m that can be obtained from T by marginalization and v continues being determined in T m , then the result is also correct if we call Marginalize2 ( H v , v, T m ). The reason is very simple: as T m is obtained by marginalization of a table in H v , then H v { T m } is equivalent to H v and Marginalize2 ( H v { T m } , v, T m ) produces a correct result. The only difference between Marginalize2 ( H v { T m } , v, T m ) and Marginalize2 ( H v , v, T m ) is that in the former, ( T m T m ) v is included. However, this table is less informative than ( T T m ) v , which is included in Marginalize2 ( H v , v, T m ), and then the two results are equivalent.
The algorithm we have applied to compute a table T m with smaller size in which there is functional dependence is depicted in Algorithm 8, initially with V = { v } . In it, we assume that we have a function CheckDeter(T, v) that determines when v is functionally determined in T.
Algorithm 8 Minimizing the dependence of a variable in a table
Require: 
T, a table.
Require: 
v, a variable which is determined in T.
Require: 
V, a set of variables which can not be deleted.
Ensure: 
T m a marginal table in which v continues being determined.
  1:
procedure MinDep(T, v, V)
  2:
     if  V ( T ) \ V =  then
  3:
          T m T
  4:
     end if
  5:
     Let v be an element from V ( T ) \ V
  6:
      T T v
  7:
     if CheckDeter ( T , vthen
  8:
          T m  MinDep ( T , v, V { v } )
  9:
     else
  10:
          T m  MinDep (T, v, V { v } )
  11:
    end if
  12:
    return  T m
  13:
end procedure
Example 6.
Assume a table T about variables p , q , r given by:
( p , q , r ) 0 ( p , q , ¬ r ) 1 ( p , ¬ q , r ) 0 ( p , ¬ q , ¬ r ) 1 ( ¬ p , q , r ) 1 ( ¬ p , q , ¬ r ) 0 ( ¬ p , ¬ q , r ) 1 ( ¬ p , ¬ q , ¬ r ) 0
In this table, r is determined by ( p , q ) , but if T q is computed, the result is:
( p , r ) 0 ( p , ¬ r ) 1 ( ¬ p , r ) 1 ( ¬ p , ¬ r ) 0
Furthermore, r is determined by p on it.  Marginalize2 is more efficient using this smaller table instead of the original one.

5.5. Alternative Deletion Procedures

When applying Marginalize0 ( H v , v), table T = ( H v ) v is computed. In some situations, it is possible that this table is of a small size, but Marginalize1 ( H v , v) or Marginalize2 ( H v , v) are applied. In that case, instead of computing R 1 , an alternative method is to compute the maximal sets of the tables from R 1 : M ( R 1 ) = Maximal { V ( T ) : T R 1 , V ( T ) } , where Maximal is removed from a family of sets, which are strictly included into another set of the family.
Observe that it is not necessary to actually compute the tables in R 1 , but only the sets of variables associated with these tables.
Then, we can compute R 1 = { T V ( T ) \ B : B M ( R 1 ) } . When comparing R 1 and R 1 we can observe the following facts:
  • Each element from R 1 is equal to T = ( T i T j ) v , where T i , T j H v or a combination of several sets of this type, when CombinedIncluded has been applied. Then, there will be another table T R 1 defined for the same set of variables and computed as T V ( T ) \ V ( T ) . As T is the result of marginalizing after combining all the tables, instead of combining only two tables, we have that T T , and in some cases, the tables are not equivalent.
  • The whole sets of tables R 1 and R 1 are equivalent. In fact, both are equivalent to T = ( H v ) v . This is known for R 1 . For R 1 , as any element from R 1 is a marginalization of T, R 1 { T } . On the other hand, as a consequence of the above point, R 1 R 1 , and as R 1 is equivalent to T, we have that R 1 is also equivalent to { T } .
Computing R 1 by computing T and M ( R 1 ) and performing marginalizations has, in general, a higher computational cost, but this may be lowered if T contains few variables; moreover, on the positive side, we have more informative tables in R 1 and there are more opportunities to find functional dependencies of variables in tables, which can improve the efficiency of posterior deletion steps.
In our implementations this decision is taken by fixing a threshold K, and if | V ( T ) | K , R 1 is computed. In other cases, R 1 is computed. The versions of Marginalize1 and Marginalize2 computing R 1 are called Marginalize1b and Marginalize2b, respectively.
Example 7.
Assume that we are going to delete p and that we have three tables T 1 , T 2 , T 3 , defined for variables ( p , q ) , ( p , r ) , and ( p , s ) , given by:
( p , q ) 1 ( p , ¬ q ) 0 ( ¬ p , q ) 1 ( ¬ p , ¬ q ) 1 ( p , r ) 1 ( p , ¬ r ) 0 ( ¬ p , r ) 1 ( ¬ p , ¬ r ) 1 ( p , s ) 1 ( p , ¬ s ) 1 ( ¬ p , s ) 0 ( ¬ p , ¬ s ) 0 ,
Assume that  Marginalize1 is applied and we are going to compute ( T 1 T 2 ) p , ( T 1 T 3 ) p , ( T 2 T 3 ) p , given by the following tables:
( q , r ) 1 ( q , ¬ r ) 1 ( ¬ q , r ) 1 ( ¬ q , ¬ r ) 1 ( q , s ) 1 ( q , ¬ s ) 1 ( ¬ q , s ) 0 ( ¬ q , ¬ s ) 0 ( r , s ) 1 ( r , ¬ s ) 1 ( ¬ r , s ) 0 ( ¬ r , ¬ s ) 0 ,
Alternatively, we can compute ( T 1 T 2 T 3 ) { p , s } , ( T 1 T 2 T 3 ) { p , r } , ( T 1 T 2 T 3 ) { p , q } , which are defined for the same variables, obtaining:
( q , r ) 1 ( q , ¬ r ) 1 ( ¬ q , r ) 1 ( ¬ q , ¬ r ) 0 ( q , s ) 1 ( q , ¬ s ) 1 ( ¬ q , s ) 0 ( ¬ q , ¬ s ) 0 ( r , s ) 1 ( r , ¬ s ) 1 ( ¬ r , s ) 0 ( ¬ r , ¬ s ) 0
The product of these three tables is equal to the product of the original tables in Equation (1). However, individually, each one of them can be more informative (has more 0’s) than the corresponding one in Equation (1). In this example, the first one has 0 assigned to ( ¬ p , ¬ q ) in Equation (2), while this value was 1 in Equation (1).  Marginalize1b will compute the arrays in Equation (2).

5.6. The Final Global Marginalization Procedure

Now, we put everything together and describe the global marginalization procedure. In Algorithm 9, we describe the algorithm MarginalizeG, which gives a set H, and the variable v computes a set H equivalent to H v and other set H , such that H is equivalent to H H . We assume that Determined ( H v , v) is a procedure that checks whether there is a table in T H v in which v is functionally determined. In that case, it returns MinDep ( T , v , { v } ). If this table does not exist, it returns the neutral element T e .
The first thing is to test whether there is a functional dependence. In that case, Marginalize2 is applied, after minimizing the table with that dependence. If the size of the global combination is lower than a given threshold, then the version b of the marginalization is applied.
In another case, we have to choose between Marginalize0 and Marginalize1. For that, we first compute M ( R 1 ) , the set of maximal sets of the sets of variables of the tables of R 1 after applying Marginalize1 and compare the size of the tables defined for these sets with the size of the table obtained with the basic Marginalize0, selecting the method with a smaller final size of the tables. If Marginalize1 is selected, then GroupTables is applied if the number of tables in H v is greater than a given threshold N.
Algorithm 9 The final global marginalization algorithm
Require: 
H, a set of tables.
Require: 
v, the variable to remove
Require: 
W, a Boolean variable indicating the splitting procedure
Require: 
N, a threshold for grouping tables
Require: 
K, a threshold for the alternative deletion procedure
Ensure: 
H , a set of tables representing H v .
Ensure: 
H , a set of tables such that H is equivalent to H H .
  1:
procedure MarginalizeG (H, v)
  2:
      H v tables T H such that v V ( T )
  3:
     CombineIncluded ( H v )
  4:
     if W then
  5:
          SPLIT  ( H v , H )
  6:
     end if
  7:
      T Determined ( H v , v)
  8:
     if  T T e  then
  9:
           if  S ( ( H v ) ) K  then
  10:
               ( R 1 , R 2 )  Marginalize2b ( H v , v, T)
  11:
         else
  12:
               ( R 1 , R 2 )  Marginalize2 ( H v , v, T)
  13:
         end if
  14:
     else
  15:
          Q M ( R 1 )                                    ▹ Maximal sets of R 1 after Marginalize1
  16:
         if  S ( ( H v ) v ) S ( Q )  then
  17:
              ( R 1 , R 2 )  Marginalize0 ( H v , v)
  18:
         else
  19:
             if  | H v | N  then
  20:
                GroupTables ( H v )
  21:
            end if
  22:
            if  S ( ( H v ) ) K  then
  23:
                  ( R 1 , R 2 )  Marginalize1b ( H v , v)
  24:
            else
  25:
                  ( R 1 , R 2 )  Marginalize1 ( H v , v)
  26:
            end if
  27:
         end if
  28:
     end if
  29:
     CombineIncluded ( R 1 )
  30:
      H H \ H v R 1
  31:
      H R 2
  32:
     return  ( H , H )
  33:
end procedure

6. Experiments

Computations with tables have been implemented using Python and the NumPy library [22]. For the basic operations with tables (combination, marginalization, and conditioning) we have taken as basis the implementation of the same operations in probabilistic calculus in the pgmpy library [23]. An important fact about the deletion algorithm is the order in which variables are chosen to be deleted. We have followed the usual heuristic of selecting the variable v with a minimum number of variables in H v , but with the exception of the case in which v is not functionally determined in any table of H v , but there is another variable v that is determined in the table of H v , and where | V ( H v | | V ( H v ) | 2 . This is done to have a preference for selecting variables in which we can apply Marginalize2, which is the most efficient deletion procedure. We have carried out three experiments.
In the first experiment, we tested the basic deletion algorithms in several SAT examples imported from several repositories of benchmark problems for SAT, mainly from: Hoos and Stützle [24], Junttila [25], Tsuji and Gelder [26], and Burkardt [27]. The main criterion for selecting the cases has been that the size of the arrays does not surpass the maximum array size of 32 that exists in the NumPy library. The characteristics of the examples can be found in Table 1. We also provide the maximum cluster size (number of variables) of a join tree built from the connectivity graph under the same ordering than used in our deletion algorithm. We have selected the problems with the restrictions that they are not too simple (a maximum cluster size of at least 15) and that the deletion algorithm can be applied, taking into account that the number of dimensions of a table in NumPy is limited to 32, i.e., it is required that for each table T, we have that | V ( T ) | 32 .
We have applied the deletion algorithm with the general deletion step of Algorithm 9. In it, we have considered a value of N = 30 for grouping tables when pairwise marginalization is applied. We have tested two variants of the algorithm W = T r u e and W = F a l s e to test whether splitting is a good idea and a set of values of K = 5 , 10 , 15 , 20 , 25 , 30 .
First, we can observe that, in general, the problems are solved fast, with an average of less than 1 min for all the parameters settings.
With a non-parametric Friedman test, the differences between the different combinations of K and W are significant (p-value = 0.001149). The averages of times as a functions of K are depicted in Figure 1 for the case W = F a l s e (the case W = T r u e is very similar). We can see that the time increases as a function of K, but being more or less constant for K = 10 , 15 , 20 . This increasing pattern does not always occur. For example, in aes_32_1_keyfind_1.cnf, the times with W = F a l s e can be seen in Figure 2. In that case, it is possible to see that the optimal K is 15. The extra work of computing the global table is compensated by the presence of more zeros in the resulting tables, which speed up the consecutive deletions.
We have carried out a post hoc Conoven–Friedman test (without correction for multiple tests). The resulting p-values for pairwise comparisons can be seen in Table 2. A ‘Y’ means that W = T r u e and an `N’ means that W = F a l s e . First, we can observe that there are never significant differences with the use of Split. There are significant differences of K = 25 with the cases in which K = 5 , 10 , especially if Split is not applied. In this case, the computation of large tables does not compensate for the presence of more zeros. The comparison of K = 20 is only significant with K = 5 . Again, the significance is higher with no Split. K = 15 with Split shows no significant difference with any smaller value of K. There is only one small significance (with α = 0.05 ) if split is not applied.
To compare with previous versions of the DP algorithm, we have also implemented the basic Algorithm 2, as in [5], including a step for unit propagation (each time a unit clause is found, unit propagation is carried out). As this algorithm is less efficient, we have given it a limit of 600 s to solve each problem to avoid very long running times in some of the problems. For example, case aes_32_1_keyfind_1.cnf was not solved even with 2 h of running time. A total of 12 problems could not be solved within this time limit (600 s), and the average time was 168.08 s. This average is much larger than the worst case in our algorithms, even taking into account that the running time was limited to 600 s. A non-parametric Wilcoxon test was not significant. This is due to the fact that this approach was usually faster in the simpler problems.
In the second experiment, we consider four sets of clauses with 504, 708, 912, and 1116 variables and 1840, 2664, 3488, and 4312 clauses, respectively. Our exact algorithms were not able to solve these cases. However, we have computed the number of variables which have been deleted without surpassing a maximum table size of 25, for each combination of K and W = T r u e (the results were the same for W = F a l s e ). We can observe that there is a tendency to delete, in an exact way, more variables when K increases, though this is not always true in each particular problem (Table 3).
As a summary of the first two experiments, we find that the use of K = 5 with Split is the best option, but we leave open the possibility of using higher values of K, especially in difficult problems.
The third experiment identifies a situation in which marginalization algorithms can be applied. For that, we consider the Bayesian networks used in the UAI 2022 competition (see https://uaicompetition.github.io/uci-2022/ (accessed on 10 May 2023)). We discarded the networks in which all the values in the conditional probability tables are non-zero (five networks) and the networks in which there were non-binary variables (three networks). This makes a total of 96 networks used in the partition function and the marginal probability competitions. The characteristics of these networks can be seen in Table 4, where n v is the number of variables, n e is the number of observed nodes, m c s is the maximum cluster size (in a deletion algorithm with the original probability tables), and p 0 is the percentage of 0 values in the original conditional probability tables. We can observe that the percentage of 0 values is very high in all the networks. We have to take into account that, being conditional probability tables for binary variables, at least 0.5 of the values are different from 0 (for each 0 value, there is another value equal to 1). Thus, in some networks with 0.455 values that are 0, this implies that only 0.002% of the values are different from 0 and 1. With these networks, we have solved the associated propositional problem. This problem is defined by transforming each conditional probability table T into a logical table T in such a way that T ( v ) = 0 if T ( v ) = 0.0 and T ( v ) = 1 , otherwise. If a variable v has been observed, then a unitary logical table is added with value 1 in the observed value and 0 in the unobserved one.
Then, we have applied our basic algorithm with K = 20 and Split. All the problems were solved with an average time of 20.54 s (minimum of 0.035 s and maximum of 542.367 s). It is important to remark that most of the problems, 59 out of 96, were solved in less than 1 s, and almost all of them, 82 out of 96, were solved within a time limit of 10 s.
We have also applied the DP algorithm (Algorithm 2) to the equivalent set of clauses of the different problems. The average time was 200.736 s, but we need to take into account that there was a time limit of 2000 s and that nine problems were not solved within this time limit. The average, after taking into account the full resolution of these nine problems, would have been higher. This shows that our method was able of solving more problems and with less time, especially for difficult cases.
It is important to remark that the result of the deletion algorithms can be used to develop Monte Carlo algorithms to obtain compatible configurations according to Proposition 4, without rejecting any configuration due to a 0 probability. This is important for the development of approximate algorithms and it is not feasible with classical SAT algorithms deciding the satisfiability of the case and providing one satisfying assignment.

7. Conclusions and Future Work

In this paper, we have proposed a new procedure which can be applied to solve the marginalization problem in propositional logic based on the use of Boolean arrays. The experiments show that it is possible to solve, in an exact way, moderate-sized problems (even with thousands of variables and a tree width of more than 100). The method is based on a classical deletion algorithm but with some improvements based on the special characteristics of Boolean arrays. We have provided a full set of tools for working and operating with these arrays, allowing us to apply the Shenoy–Shafer abstract framework [2]. We have studied different methods for carrying out the deletion of a variable. Of special interest is the deletion of a variable when its value is functionally determined from the values of the other variables. Previous experiments with deletion algorithms [4,5] reported experiments in general problems, with up to 25 variables. They have also reported experiments with other problems, called ‘chain problems’, that were randomly generated in such a way that the tree width was bounded by 5, while we have been able of solving problems with a tree width of 103. We have shown that we have been able to expand the class of problems that can be solved with the deletion algorithm, as the previous versions were unable of solving the more difficult problems in our experiments. Some problems which could be solved in seconds with our approach could not be solved in hours with former basic deletion algorithm. This opens a new set of possibilities for algorithms to solve the marginal problem.
For the future, this framework opens a wide range of possible developments:
  • To optimize the deletion strategy by selecting the most appropriate method depending on the characteristics of the tables in H v .
  • To improve the methods for decomposing a large table T as a product of smaller tables, which can be more useful. Of special interest is also to obtain tables of low dimensions, even if T is not decomposed as its product. The extreme case is a table of dimension 1, which always simplifies the problem (if different from the neutral element).
  • To combine the clause representation and the table representation, as we have found that the basic clause representation was able of solving faster the simpler problems.
  • To organize computations in a join tree with a local computation algorithm [2,6].
  • To develop approximate algorithms. For example, the mini-bucket elimination algorithm [28] is a general framework for approximate algorithms based on partitioning the set H v before carrying out the combination of all the tables. In this case, we can have more opportunities based on the fact that potentials are idempotent and the alternative marginalization procedures we have provided.
  • To combine approximate and exact computations. An approximate computation can be the basis to obtain small informative tables, which can be useful to speed up an exact algorithm.
  • To develop a backtracking algorithm, taking as a basis the array representation.
  • To approximate inference in Bayesian networks is NP-hard, especially when there are extreme probabilities [29]. Approximate algorithms, such as likelihood weighting [30] or penniless propagation [31], could take advantage of a previous and fast propagation of 0–1 values with the procedures proposed in this paper. Experiment 3 has already shown that our algorithms can propagate 0–1 values very fast in hard problems (from the UAI 2022 competition).
  • The special role of potentials representing functional dependence could also be studied in the general framework of Information Algebras [32] and applied to other problems which are particular cases, such as constraint satisfaction [33].
  • To consider the easy deletion of some variables in an SAT problem as a preprocessing step in SAT problems [34] that could be used as a simplification procedure.

Author Contributions

Conceptualization, E.D.-M. and S.M.; methodology, E.D.-M. and S.M.; software, E.D.-M. and S.M.; validation, E.D.-M. and S.M.; writing—original draft preparation, E.D.-M. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported under project PID2019.106758GB.C31 funded by MCIN/AEI/10.13039/501100011033.

Data Availability Statement

The library with programs together with data to reproduce experiments are available as a Github repository at: https://github.com/serafinmoral/SAT-solver (accessed on 16 May 2023).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proposition A1.
If T is a table and v V ( T ) , then T v = U ( T , v ) U ( T , ¬ v ) .
Proof. 
The result is immediate, since if W = V ( T ) \ { v } , we have that:
T v ( w ) = max { T ( w , v ) , T ( w , ¬ v ) } = max { U ( T , v ) ( w ) , U ( T , ¬ v ) ( w ) } = ( U ( T , v ) U ( T , ¬ v ) ) ( w ) . □
Proposition A2.
If T 1 and T 2 are tables and v V ( T 1 ) V ( T 2 ) , then ( T 1 T 2 ) v = T 1 v T 2 v .
Proof. 
The proof is immediate taking into account that being W = ( V ( T 1 ) V ( T 2 ) ) \ { v } , we have:
T 1 v T 2 v ( w ) = max { T 1 v ( w ) , T 2 v ( w ) } =
max { max { T 1 ( w , v ) , T 1 ( w , ¬ v ) } , max { T 2 ( w , v ) , T 2 ( w , ¬ v ) } } =
max { max { T 1 ( w , v ) , T 2 ( w , v ) } , max { T 1 ( w , ¬ v ) , T 2 ( w , ¬ v ) } } =
max { T 1 T 2 ( w , v ) , T 1 T 2 ( w , ¬ v ) } = ( T 1 T 2 ) v ( w ) . □
Proposition A3.
In each application of  Marginalize0 ( H v ) , we have that H v is equivalent to R 1 R 2 . Furthermore, if we call H to the update set H computed in Step 8 of Algorithm 2, then H is equivalent to H R 2 .
Proof. 
R 2 is the combination of all the tables in H v , and therefore, it is equivalent to it. R 1 is the marginalization of T 1 , which is the combination of tables in H v , and therefore, it is less informative than H v . The union of a set equivalent to H v and a set that is less informative than H v produces a set which is equivalent to H v .
For the second part, H = ( H \ H v ) R 1 ; therefore, H R 2 = ( ( ( H \ H v ) R 2 ) R 1 ) . If we remove from H a set of tables ( H v ) and replace them by its combination ( R 2 ), then we obtain a set which is equivalent to the original set H. On the other hand, R 1 only contains a table which is less informative than H, and therefore, if we add it, we obtain a set which is equivalent to H. □
Proposition A4.
Assume a set of tables H and that Algorithm 2 is applied removing variables in V ( H ) in order ( v 1 , , v k ) . Assume also that R 2 [ ] is equivalent to the empty set, i.e., the problem is satisfiable; then, if T i is the set of true assignments satisfying set of tables H { v 1 , , v i } and T 0 is the true assignments of H, then these sets can be computed in reverse order of i = 1 , , k in the following way:
  • Start with T i = .
  • Make T k equal to the set containing an empty vector v 0 Ω .
  • For each v i + 1 T i + 1 , compute T i + 1 , the only table in U ( R 2 [ v i + 1 ] , v i + 1 ) , which is a table defined only for variable v i + 1 . Then, this table is never equal to T 0 , and if v i 1 and v i 2 are the true assignments obtained by extending v i + 1 to variables { v i + 1 , , v k } and given by v i 1 = ( v i + 1 , v i + 1 ) , v i 2 = ( ¬ v i + 1 , v i + 1 ) , i.e., by considering v i + 1 true and false, respectively, then:
    if T i + 1 = T e , add v i 1 and v i 2 to T i .
    if T i + 1 ( v i + 1 ) = 1 , T i + 1 ( ¬ v i + 1 ) = 0 , add v i 1 to T i .
    if T i + 1 ( v i + 1 ) = 0 , T i + 1 ( ¬ v i + 1 ) = 1 , add v i 2 to T i .
Proof. 
We have that H { v 1 , , v i + 1 } is equivalent to ( H { v 1 , , v i } ) v i + 1 , which is equivalent to U ( H { v 1 , , v i } ) , v i + 1 ) U ( H { v 1 , , v i } ) , ¬ v i + 1 ) ; then, we have that v i + 1 T i + 1 if and only if
v i + 1 ( T ( U ( H { v 1 , , v i } ) , v i + 1 ) ) T ( U ( H { v 1 , , v i } ) , ¬ v i + 1 ) ) ) .
Therefore, we have that:
T i + 1 = { v i v i + 1 : v i T i } .
Now, we take into account that H { v 1 , , v i } is equivalent to H { v 1 , , v i + 1 } R 2 [ v i + 1 ] , and then v i T ( H { v 1 , , v i + 1 } R 2 [ v i + 1 ] ) if and only if this assignment satisfies both H { v 1 , , v i + 1 } and R 2 [ v i + 1 ] .
To satisfy H { v 1 , , v i + 1 } , the condition is that there is v i + 1 T i + 1 such that v i v i + 1 = v i + 1 , i.e., that v i = v i 1 or v i = v i 2 for v i + 1 T i + 1 .
To satisfy, R 2 [ v i + 1 ] , we have that if T is the only table in R 2 [ v i + 1 ] , we have that T ( v i 1 ) = 1 when U ( T , v i + 1 ) ( v i + 1 ) = 1 , i.e., T i + 1 ( v i + 1 ) = 1 and T ( v i 2 ) = 1 when U ( T , v i + 1 ) ( ¬ v i + 1 ) = 1 , i.e., T i + 1 ( ¬ v i + 1 ) = 1 . Furthermore, these are the conditions that are checked to introduce v i 1 and/or v i 2 in T i .
T i can never be equal to T 0 , since for any v i + 1 T i + 1 , we must have that either v i 1 or v i 2 is in T i . □
Proposition A5.
If H v is a set of tables containing variables v, then R 1 computed in Algorithm 4 represents H v v .
Proof. 
We have to prove that R 1 represents ( H v ) v .
Each table ( T i T j ) v R 1 is such that T i T j ( H v ) , and therefore, ( T i T j ) v ( ( H v ) ) v . Then, R 1 { ( ( H v ) ) v } .
On the other hand:
( ( H v ) ) v = U ( ( H v ) , v ) U ( ( H v ) , ¬ v )
Thus, if v Ω V ( H v ) \ { v } , we have that if ( H v ) ) v ( v ) = 0 , then for l = v and l = ¬ v , U ( ( H v ) , l ) ) = 0 and ( U ( H v , l ) ) ( v ) = 0 . Since:
( U ( H v , l ) ) ( v ) = min T i H v U ( T i , l ) ( v ) ,
we have that for l = v , there is T i H v with U ( T i , v ) ( v ) = 0 , and for l = ¬ v , there is T j H v with U ( T j , ¬ v ) ( v ) = 0 . Since U ( T i , v ) ( v ) ( U ( T i , v ) U ( T j , v ) ) ( v ) and U ( T j , v ) ( v ) ( U ( T i , v ) U ( T j , v ) ) ( v ) , then U ( T i T j , v ) = 0 and U ( T i T j , ¬ v ) = 0 .
Since ( T i T j ) v = U ( T i T j | v ) U ( T i T j | ¬ v ) , we also have ( T i T j ) v ( v ) = 0 , and, taking into account that ( T i T j ) v R 1 , we have that ( R 1 ) ( v ) = 0 .
We have proved that if ( ( H v ) ) v ( v ) = 0 , then we have that ( R 1 ) ( v ) = 0 , and as a consequence, ( ( H v ) ) v ( R 1 ) , and we have the other inequality and the equivalence. □
Proposition A6.
If ( R 1 , R 2 ) are the sets computed in  Marginalize2 and the initial conditions required by the algorithm are satisfied, then H v v is equivalent to R 1 and H v is equivalent to R 1 R 2 .
Proof. 
To prove that H v v is equivalent to R 1 , we only has to prove that for any T i , T j H v , then ( T i T j ) v ( T T i ) v ( T T i ) v . We have:
( T T i ) v ( T T i ) v = ( U ( T T i , v ) U ( T T i , ¬ v ) ) ( U ( T T j , v ) U ( T T j , ¬ v ) ) = ( U ( T T i , v ) U ( T T j , v ) ) ( U ( T T i , ¬ v ) U ( T T j , ¬ v ) ) ,
where the last inequality comes from the fact that ( U ( T T i , v ) ( T T j , ¬ v ) ) is more informative than ( U ( T , v ) ( T , ¬ v ) ) , which is equivalent to T 0 , and then ( U ( T T i , v ) ( T T j , ¬ v ) ) is also equivalent to T 0 and can be removed from a disjunction.
Thus, if R 1 is the set of tables computed with Marginalize1, then R 1 is less informative than R 1 . On the other hand, since R 1 R 1 , then R 1 is less informative than R 1 and the two sets are equivalent.
Now, we have to prove that R 1 R 2 is equivalent to H v . R 1 R 2 is less informative than H v as R 1 and R 2 are contained in the sets R 1 , R 2 , computed with Marginalize1.
Consider V = V ( H u ) \ { v } . If ( H v ) ( v , l ) = 0 , where l = v or l = ¬ v , we can have the following situations:
  • If ( H v ) ( v , ¬ l ) is also 0, then ( H v ) v ( v ) = 0 and then ( R 1 ) ( v ) = ( R 1 ) ( v , l ) = 0 .
  • If ( H v ) ( v , ¬ l ) = 1 , we have that T ( v , ¬ l ) = 1 as T is an element from H v , and given that T determines v, we must have T ( v , l ) = 0 , and as T is the only element from R 2 , obviously, ( R 2 ) ( v , l ) = 0 .
As a consequence if ( H v ) ( v , l ) = 0 , we always have ( R 1 R 2 ) ( v , l ) = 0 , and R 1 R 2 is also more informative than H v , which are, finally, shown to be equivalent. □

References

  1. Davis, M.; Putnam, H. A Computing Procedure for Quantification Theory. J. ACM 1960, 7, 201–215. [Google Scholar] [CrossRef]
  2. Shafer, G.; Shenoy, P. Local Computation in Hypertrees; Working Paper N. 201; School of Business, University of Kansas: Lawrence, KS, USA, 1988. [Google Scholar]
  3. Kohlas, J. Information Algebras: Generic Structures for Inference; Springer Science & Business Media: London, UK, 2012. [Google Scholar]
  4. Dechter, R.; Rish, I. Directional resolution: The Davis-Putnam procedure, revisited. In Principles of Knowledge Representation and Reasoning; Morgan Kaufmann: San Francisco, CA, USA, 1994; pp. 134–145. [Google Scholar]
  5. Risht, I.; Dechter, R. Resolution versus Search: Two Strategies for SAT. J. Autom. Reason. 2000, 24, 225–275. [Google Scholar] [CrossRef]
  6. Kohlas, J.; Haenni, R.; Moral, S. Propositional information systems. J. Log. Comput. 1999, 9, 651–681. [Google Scholar] [CrossRef]
  7. Lauritzen, S.L.; Spiegelhalter, D.J. Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Ser. B 1988, 50, 157–194. [Google Scholar] [CrossRef]
  8. Hernández, L.D.; Moral, S. Inference with idempotent valuations. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, USA, 1–3 August 1997; pp. 229–237. [Google Scholar]
  9. Biere, A.; Heule, M.; van Maaren, H.; Walsh, T. Handbook of Satisfiability, 2nd ed.; Part I and II; IOS Press: Amsterdam, The Netherlands, 2021. [Google Scholar]
  10. Cook, S.A. The complexity of theorem-proving procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing, Shaker Heights, OH, USA, 3–5 May 1971; pp. 151–158. [Google Scholar]
  11. Biere, A. Bounded Model Checking. In Handbook of Satisfiability, 2nd ed.; Part II; IOS Press: Amsterdam, The Netherlands, 2009; pp. 739–764. [Google Scholar]
  12. Rintanen, J. Planning and SAT. In Handbook of Satisfiability, 2nd ed.; Part II; IOS Press: Amsterdam, The Netherlands, 2021; pp. 765–789. [Google Scholar]
  13. Björk, M. Successful SAT encoding techniques. J. Satisf. Boolean Model. Comput. 2011, 7, 189–201. [Google Scholar] [CrossRef] [Green Version]
  14. Davis, M.; Logemann, G.; Loveland, D. A Machine Program for Theorem-proving. Commun. ACM 1962, 5, 394–397. [Google Scholar] [CrossRef]
  15. Pipatsrisawat, K.; Darwiche, A. On the power of clause-learning SAT solvers as resolution engines. Artif. Intell. 2011, 175, 512–525. [Google Scholar] [CrossRef] [Green Version]
  16. Simon, L. Reasoning with propositional logic: From sat solvers to knowledge compilation. In A Guided Tour of Artificial Intelligence Research: Volume II: AI Algorithms; Springer Nature: Cham, Switzerland, 2020; pp. 115–152. [Google Scholar]
  17. Knuth, D.E. The Art of Computer Programming. Volume 4, Fascicle 6. Satisfiability; Addison-Wesley: Boston, MA, USA, 2015. [Google Scholar]
  18. Gomes, C.P.; Sabharwal, A.; Selman, B. Model counting. In Handbook of Satisfiability; IOS Press: Amsterdam, The Netherlands, 2021; pp. 993–1014. [Google Scholar]
  19. Gogate, V.; Dechter, R. Approximate counting by sampling the backtrack-free search space. In Proceedings of the AAAI, Vancouver, BC, Canada, 22–26 July 2007; pp. 198–203. [Google Scholar]
  20. Eén, N.; Biere, A. Effective preprocessing in SAT through variable and clause elimination. In Proceedings of the International Conference on Theory and Applications of Satisfiability Testing, Scotland, UK, 19–23 June 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 61–75. [Google Scholar]
  21. Tseitin, G.S. On the complexity of derivation in propositional calculus. In Automation of Reasoning; Springer: Berlin/Heidelberg, Germany, 1983; pp. 466–483. [Google Scholar]
  22. Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef] [PubMed]
  23. Ankan, A.; Panda, A. pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th Python in Science Conference (SCIPY 2015), Austin, TX, USA, 6–12 July 2015; pp. 6–11. [Google Scholar]
  24. Hoos, H.H.; Stützle, T. SATLIB: An online resource for research on SAT. Sat 2000, 2000, 283–292. [Google Scholar]
  25. Junttila, T. Tools for Constrained Boolean Circuits. Available online: https://users.ics.aalto.fi/tjunttil/circuits/ (accessed on 22 May 2022).
  26. Tsuji, Y.; Gelder, A.V. Instances Selected for the Second DIMACS Challenge for Circuit Fault Analysis. Available online: https://www.cs.ubc.ca/~hoos/SATLIB/Benchmarks/SAT/DIMACS/BF/descr.html (accessed on 22 May 2022).
  27. Burkardt, J. CNF Files. Available online: https://people.sc.fsu.edu/~jburkardt/data/cnf/cnf.html (accessed on 22 May 2022).
  28. Dechter, R.; Rish, I. Mini-buckets: A general scheme for bounded inference. J. ACM 2003, 50, 107–153. [Google Scholar] [CrossRef]
  29. Roth, D. On the hardness of approximate reasoning. Artif. Intell. 1996, 82, 273–302. [Google Scholar] [CrossRef] [Green Version]
  30. Fung, R.; Chang, K.C. Weighing and integrating evidence for stochastic simulation in Bayesian networks. In Machine Intelligence and Pattern Recognition; Elsevier: Amsterdam, The Netherlands, 1990; Volume 10, pp. 209–219. [Google Scholar]
  31. Cano, A.; Moral, S.; Salmerón, A. Penniless propagation in join trees. Int. J. Intell. Syst. 2000, 15, 1027–1059. [Google Scholar] [CrossRef]
  32. Kohlas, J. Algebras of information. A new and extended axiomatic foundation. arXiv 2017, arXiv:1701.02658. [Google Scholar]
  33. Dechter, R. Constraint Processing; Morgan Kaufmann: San Francisco, CA, USA, 2003. [Google Scholar]
  34. Biere, A.; Järvisalo, M.; Kiesl, B. Preprocessing in SAT Solving. In Handbook of Satisfiability; IOS Press: Amsterdam, The Netherlands, 2021; Volume 336, pp. 391–435. [Google Scholar]
Figure 1. Average time as a function of K.
Figure 1. Average time as a function of K.
Mathematics 11 02748 g001
Figure 2. Time as a function of K in aes_32_1_keyfind_1.cnf.
Figure 2. Time as a function of K in aes_32_1_keyfind_1.cnf.
Mathematics 11 02748 g002
Table 1. Benchmark problems used in the experiments.
Table 1. Benchmark problems used in the experiments.
BenchmarkVariablesClausesMax Cluster SizeSol
SATHolV42C133.cnf4213328SAT
SATPlanV48C261.cnf4826121SAT
UNSATaimV50C80.cnf508016UNSAT
UNSATaimV50C100.cnf5010031UNSAT
UNSATTmTbV112C245.cnf11224518UNSAT
SATTmTbV140C301.cnf14030125SAT
SATV300C1016.cnf300101028SAT
aes_32_1_keyfind_1.cnf300101628SAT
SATCircuitosV416C1136.cnf416113627SAT
UNSATBFCircuitosV421C1000.cnf421100021UNSAT
SATBFCircuitosV423C1010.cnf423101021SAT
UNSATBFCircuitosV424C1031.cnf424103123UNSAT
UNSATBFCircuitosV428C1037.cnf428103723UNSAT
UNSATCircuitosV607C1808.cnf607180835UNSAT
SATBFCircuitosV837C2169.cnf837216920SAT
SATBFCircuitosV837C2169_2.cnf837216922SAT
SATBFCircuitosV843C2286.cnf843228629SAT
UNSATBFCircuitosV864C2790.cnf864279074UNSAT
UNSATBFCircuitosV864C2790_2.cnf864279066UNSAT
UNSATBFCircuitosV865C2783.cnf865278369UNSAT
UNSATBFCircuitosV865C2783_2.cnf865278374UNSAT
UNSATBFCircuitosV865C2784.cnf865278463UNSAT
UNSATBFCircuitosV985C2324.cnf985232425UNSAT
UNSATCircuitosV986C2315.cnf986231524UNSAT
UNSATBFCircuitosV1040C3668_2.cnf1040366870UNSAT
UNSATBFCircuitosV1040C3668.cnf10403668103UNSAT
SATBFCircuitosV1207C2940.cnf1207294041SAT
UNSATBFCircuitosV1339C3249.cnf1339324929UNSAT
UNSATCircuitosV1355C3296.cnf1355329630UNSAT
UNSATCircuitosV1359C3321.cnf1359332130UNSAT
UNSATV1359C3321.cnf1359332130UNSAT
UNSATBFCircuitosV1363C3361.cnf1363336131UNSAT
UNSATBFCircuitosV1363C3361_2.cnf1363336127UNSAT
UNSATBFCircuitosV1365C3369.cnf1365336927UNSAT
UNSATBFCircuitosV1371C3383.cnf1371338331UNSAT
UNSATBFCircuitosV1371C3383_2.cnf1371338326UNSAT
UNSATBFCircuitosV1371C3401.cnf1371340130UNSAT
UNSATBFCircuitosV1373C3391.cnf1373339133UNSAT
UNSATBFCircuitosV1379C3417.cnf1379341726UNSAT
UNSATBFCircuitosV1379C3417_2.cnf1379341733UNSAT
UNSATBFCircuitosV1379C3423.cnf1379342329UNSAT
UNSATBFCircuitosV1387C3439.cnf1387343927UNSAT
UNSATBFCircuitosV1387C3439_2.cnf1387343927UNSAT
UNSATBFCircuitosV1389C3440.cnf1389344026UNSAT
UNSATBFCircuitosV1389C3440_2.cnf1389344025UNSAT
UNSATCircuitosV1393C3434.cnf1393343426UNSAT
UNSATBFCircuitosV1407C3496.cnf1407349626UNSAT
UNSATBFCircuitosV1423C3609.cnf1423360938UNSAT
UNSATBFCircuitosV1488C3859.cnf1488385921UNSAT
SATCircuitosV1501C3575.cnf1501357526SAT
SATCircuitosV2013C4795.cnf2013479529SAT
UNSATCircuitosV2177C6768.cnf2177676865UNSAT
UNSATCircuitosV2180C6778.cnf2180677870UNSAT
Table 2. Results of the post hoc Conoven–Friedman test.
Table 2. Results of the post hoc Conoven–Friedman test.
K05NK05YK10NK10YK15NK15YK20NK20YK25NK25Y
K05N1.0000001.0000000.3191960.9743520.0124490.4033900.0035900.0078560.0064940.067368
K05Y1.0000001.0000000.3191960.9743520.0124490.4033900.0035900.0078560.0064940.067368
K10N0.3191960.3191961.0000000.3038530.1312540.8722920.0542160.0950660.0830500.403390
K10Y0.9743520.9743520.3038531.0000000.0113740.3855620.0032420.0071460.0058970.062719
K15N0.0124490.0124490.1312540.0113741.0000000.0950660.6760130.8722920.8219450.499687
K15Y0.4033900.4033900.8722920.3855620.0950661.0000000.0370870.0673680.0583390.319196
K20N0.0035900.0035900.0542160.0032420.6760130.0370871.0000000.7970320.8470400.274662
K20Y0.0078560.0078560.0950660.0071460.8722920.0673680.7970321.0000000.9487310.403390
K25N0.0064940.0064940.0830500.0058970.8219450.0583390.8470400.9487311.0000000.368225
K25Y0.0673680.0673680.4033900.0627190.4996870.3191960.2746620.4033900.3682251.000000
Table 3. Number of deleted variables.
Table 3. Number of deleted variables.
K05K10K15K20K25
aes_32_2_keyfind_1.cnf412421421417430
aes_32_3_keyfind_1.cnf570580580576576
aes_32_4_keyfind_1.cnf726716716734732
aes_32_5_keyfind_1.cnf873864865882880
TOTAL25812581258226092618
Table 4. Benchmarks used in Experiment 3.
Table 4. Benchmarks used in Experiment 3.
Benchmark nv ne mcs p 0 Benchmark nv ne mcs p 0
Promedus_114618320.358mastermind_04_08_03-0014141848240.491
Promedus_125343260.366mastermind_04_08_03-0015141848240.491
Promedus_138944110.325mastermind_04_08_04-0001261636390.494
Promedus_144149370.363mastermind_04_08_04-0002261636390.494
Promedus_153854140.369mastermind_04_08_04-0003261636390.494
Promedus_167155190.302mastermind_04_08_04-0004261636390.494
Promedus_179169290.307mastermind_04_08_04-0005261636390.494
Promedus_183748390.369mastermind_05_08_03-0001161627280.490
Promedus_196247280.347mastermind_05_08_03-0002161627280.490
Promedus_205466230.350mastermind_05_08_03-0003161627280.490
Promedus_214733130.370mastermind_05_08_03-0004161627280.490
Promedus_224003150.3339mastermind_05_08_03-0009161663350.490
Promedus_236749280.320mastermind_06_08_03-0002181427330.489
Promedus_24200450.294mastermind_06_08_03-0003181427330.489
Promedus_2510057270.312mastermind_06_08_03-0005181427330.489
Promedus_26614640.298mastermind_06_08_03-0009181483410.489
Promedus_274105220.357mastermind_10_08_03-00082606769570.486
Promedus_284637190.350mastermind_10_08_03-00092606404560.486
Promedus_29434850.301or_chain_4.fg7009440.348
Promedus_303061370.298or_chain_12.fg46811420.361
Promedus_314662140.370or_chain_15.fg65611390.347
Promedus_325112130.329or_chain_53.fg74113450.348
Promedus_33378260.303or_chain_61.fg10287460.336
Promedus_344153220.361or_chain_64.fg4609380.358
Promedus_354672140.371or_chain_90.fg5129420.357
Promedus_364672140.371or_chain_102.fg86011450.325
Promedus_3710394270.330or_chain_106.fg69510410.344
Promedus_386685360.353or_chain_107.fg63111500.349
BN_311156120660.448or_chain_128.fg64811540.352
fs-0712251120360.440or_chain_132.fg72313430.344
mastermind_03_08_04-000022880310.495or_chain_138.fg70211400.350
mastermind_03_08_04-0001228836310.495or_chain_140.fg12688430.331
mastermind_03_08_04-0002228836310.495or_chain_149.fg62910380.352
mastermind_03_08_04-0003228836310.495or_chain_150.fg9286380.347
mastermind_03_08_04-00072288245330.495or_chain_153.fg7109350.347
mastermind_03_08_04-00082288355320.495or_chain_155.fg54211420.356
mastermind_03_08_04-00102288142320.495or_chain_161.fg7948460.350
mastermind_03_08_04-0012228864310.495or_chain_188.fg106111380.308
mastermind_03_08_04-0013228864310.495or_chain_209.fg85910380.341
mastermind_03_08_04-0014228864310.495or_chain_242.fg61310430.360
mastermind_03_08_04-0015228864310.495blockmap_10_01-000956501078560.499
mastermind_03_08_05-0001369245410.496blockmap_10_02-000962521159560.499
mastermind_03_08_05-0003369245390.496blockmap_10_03-000968481844600.499
mastermind_03_08_05-00093692785420.496blockmap_10_03-001068481610590.499
mastermind_04_08_03-000014180240.491blockmap_15_01-000816,4976193690.499
mastermind_04_08_03-0011141848240.491blockmap_15_02-000817,6495959810.499
mastermind_04_08_03-0012141848240.491blockmap_15_03-001018,7876273730.499
mastermind_04_08_03-0013141848240.491blockmap_20_01-000939,29715,2221050.499
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Díaz-Macías, E.; Moral, S. A Deletion Algorithm for the Marginal Problem in Propositional Logic Based on Boolean Arrays. Mathematics 2023, 11, 2748. https://doi.org/10.3390/math11122748

AMA Style

Díaz-Macías E, Moral S. A Deletion Algorithm for the Marginal Problem in Propositional Logic Based on Boolean Arrays. Mathematics. 2023; 11(12):2748. https://doi.org/10.3390/math11122748

Chicago/Turabian Style

Díaz-Macías, Efraín, and Serafín Moral. 2023. "A Deletion Algorithm for the Marginal Problem in Propositional Logic Based on Boolean Arrays" Mathematics 11, no. 12: 2748. https://doi.org/10.3390/math11122748

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop